This course provides a comprehensive introduction to whole-genome sequencing (WGS) data analysis, from raw sequencing reads to validated genetic findings. Learners will gain practical experience with data quality control, alignment, variant calling, annotation, and interpretation. The course also covers complex variant detection, experimental validation methods, and applications in Parkinson’s disease genetics.

Modules

This module provides an introduction to whole-genome sequencing (WGS) data analysis, covering the workflow from raw sequencing reads to variant identification and interpretation. It covers sequencing basics, data quality control, alignment to a reference genome, variant calling, and joint-analysis across samples. The module also introduces VCF interpretation, emphasizes the value of consistent pipelines for joint analysis across samples, and demonstrates how to annotate variants with tools like VEP and manually validate them in IGV to avoid false positives.

This module focuses on detecting complex genetic variation in whole-genome sequencing data that standard pipelines often miss, specifically GBA1 variants and short tandem repeat (STR) expansions. It introduces Gauchian as a targeted tool for identifying pathogenic and risk variants in the GBA1 gene, which is challenging due to its nearby pseudogene and structural complexity. The module also demonstrates how ExpansionHunter estimates STR repeat lengths from short-read data to identify repeat expansions associated with neurological diseases. Because short-read sequencing has limitations for complex regions and large repeats, the module emphasizes the need for follow-up validation using methods such as long-range PCR, repeat-primed PCR, long-read sequencing, and other molecular techniques.

This module focuses on variant prioritization and interpretation, guiding learners through how to filter millions of variants down to credible candidates and assess their pathogenicity. It covers key principles such as variant frequency, family segregation, and inheritance patterns (including de novo and compound heterozygous variants), along with essential tools — population databases, ACMG criteria, in-silico prediction methods, functional studies, and collaborative platforms like GeneMatcher. Together, these approaches help determine whether a genetic variant is likely disease-causing, while highlighting the challenges, limitations, and need for careful validation in Parkinson’s genetics.

This module explains how to confirm genetic variants detected by sequencing, focusing on validating single-nucleotide variants, small indels, and copy-number changes. It explains how PCR and Sanger sequencing are used to validate single-nucleotide variants and indels, including how band patterns and sequencing traces differ based on zygosity and indel size. Special attention is given to the GBA1 gene, where high similarity with its pseudogene requires nested PCR and careful interpretation to distinguish true variants and recombinant alleles. The module also covers copy-number variant validation, emphasizing MLPA as the preferred method, with qPCR and digital PCR as alternatives. Overall, it highlights how proper experimental design and variant-specific approaches help avoid false positives and confirm key genetic findings.

This module provides an overview of how to analyze rare genetic variants in Parkinson’s disease. It explains why rare variants are important, reviews essential steps like quality control and variant annotation, and introduces prediction tools such as CADD and LOFTEE. The module covers key statistical approaches, including burden tests and methods like REGENIE and SKAT, which help detect associations when variants are too rare to analyze individually. It also highlights challenges, such as limited power, data quality needs, and population stratification, and emphasizes validating findings and exploring biological relevance.