Participants will learn the ins and outs of Terra during this course. This includes learning what Terra is used for (and what it’s not used for). Participants will also learn about other available resources and get an introduction to notebooks and the Google Cloud Platform while also learning how to query data and access VCF and related files.
Modules
Terra is a platform based in the cloud that allows researchers to access data, perform analyses, and easily collaborate and share workflows with other researchers.
In the first lesson you will:
- Find out what’s to come in this course
- Understand what Terra is and what it’s used for
Accelerating Medicines Partnership for Parkinson’s (AMP-PD) provides a portal with de-identified data from Parkinson’s patients and healthy control subjects for use by researchers. The aim of AMP-PD is to accelerate trials through diagnostic, prognostic, and progression biomarkers.
In this lesson you will:
- Find out how to access AMP-PD datasets
- Learn which resources are available from AMP-PD and Terra to assist your learning
Notebooks are files containing code and embedded comments and documentation. Terra has integrated Jupyter Notebooks, to provide the infrastructure interactive analyses
In this lesson you will:
- Understand the structure of a notebook
- Learn what you can do with a notebook
- See examples of Terra Notebooks
The Google Cloud Platform is a suite of cloud computing services offering data storage, data analytics, and machine learning. The raw data files and clinical data files for AMP-PD are stored on the platform.
Google BigQuery is a data warehouse that lets you query the data you want from the main databases. A number of the available AMP-PD databases are stored in queryable format in Google BigQuery.
In this lesson you will:
- Learn more about Google Cloud Platform and Google BigQuery
- Find out how AMP-PD data is stored in the cloud and how to access it
- See demos of the platform in use
Now you have familiarized yourself with the Google Cloud Platform and BigQuery, you can move on getting the data from the Big Query tables to your notebook using SQL, a language used for data management and retrieving.
In this lesson you will:
- Learn how to use SQL queries to query AMP-PD clinical data
- Learn how to use SQL queries to query AMP-PD variant data
- Practice manipulating data in a notebook and writing results to the bucket
Some files available on the cloud aren’t queryable through BigQuery using SQL. These are the raw data files and are stored in Google Buckets, and are accessed from your command line using the python application, gsutil.
In this lesson you will:
- Learn how to find the available genetic files
- Learn how to view the file locations in a notebook
- Understand how to use gsutil to access data
It costs money to store and access data on the cloud. But once you understand the pricing structure and follow some basic principles, it’s possible to keep your costs down.
In this lesson you will:
- Understand the pricing structure for running notebooks and queries
- Gain tips for keeping your prices low
In this lesson, you will use all the information learned to run an analysis on Terra.
You will use AMP-PD data and Plink, an open-source whole-genome association analysis tool, to perform a case/control analysis using Fisher’s exact test, which is used to determine if there are non-random associations between two variables. This will test whether the distribution of the major and minor alleles for that variant is significantly different between participants with a Parkinson’s Disease diagnosis and participants with no diagnosis.
In this lesson you will:
- Learn which software you’ll need and how to download it
- Run an analysis from start to finish in a Terra notebook