Next generation sequencing technology is being widely used in various -omics applications to understand basic biology, identify useful drug targets and inform clinical translations. Huge amount of data have been generated, which poses substantial challenges to conventional statistics and fosters the development of many new analytical methods and computational tools.
This course is designed to cover statistical models that are useful for analyzing modern genomic and medical datasets. Tentative topics include single cell genomics, genetic association studies, genetic risk prediction, drug targets identification and electronic medical records.
There is no existing textbooks on these topics so the material will be developed from very recent publications. To be able to follow the course, you need to be familiar with 1) phd level statistics (mostly categorical data analysis, Bayesian statistics, linear models, classification and unsupervised learning 2) R 3) shell scripting language (like bash, awk, sed) and 4) a scripting language like Python, Perl. Necessary background will be introduced to help you understand the statistics used in omics and precision medicine studies.
We plan to cover the following topics in our course, and course slides will be posted after each class.
1. Precision Medicine
1.1. Overview of statistical genetics: from association to biology and clinical insights (slide)
1.2. Fine mapping, identification of causal variant; data integration; co-localization; allelic heterogeneity (slide)
1.3. Heritability, genetic correlation, functional enrichment (slide)
1.4. Disease subtypes (slide)
1.5. Causal inference; drug target discovery and validation; Mendelian randomization (slide)
1.6. Genetic predictions (slide)
1.7. Analysis of EMR-based biobanks; PheWAS approach (slide)
1.8. Imaging genetics;
2. Omics (mostly transcriptomics)
2.1. Microbiome QTLs (slide by Scott Eckert)
2.2. Quantify expression levels; differential expression analysis;
2.2. eQTL analysis; allelic specific expression; multi-tissue analysis;
2.3. Transcriptome-wide (or more generally) Omics-wide association analysis;
2.4. Genomics at single cell level: remove technical artifacts, quantify expression level; eQTLs. X chromosome biology;