Using Terra to Access Data and Perform Analysis

Not Registered? Click here to request access!

Participants will learn the ins and outs of Terra during this course. This includes learning what Terra is used for (and what it’s not used for). Participants will also learn about other available resources and get an introduction to notebooks and the Google Cloud Platform while also learning how to query data and access VCF and related files.

Modules

Terra is a platform based in the cloud that allows researchers to access data, perform analyses, and easily collaborate and share workflows with other researchers.

In the first lesson you will:

Find out what’s to come in this course
Understand what Terra is and what it’s used for

Accelerating Medicines Partnership for Parkinson’s (AMP-PD) provides a portal with de-identified data from Parkinson’s patients and healthy control subjects for use by researchers. The aim of AMP-PD is to accelerate trials through diagnostic, prognostic, and progression biomarkers.

In this lesson you will:

Find out how to access AMP-PD datasets
Learn which resources are available from AMP-PD and Terra to assist your learning

Notebooks are files containing code and embedded comments and documentation. Terra has integrated Jupyter Notebooks, to provide the infrastructure interactive analyses

In this lesson you will:

Understand the structure of a notebook
Learn what you can do with a notebook
See examples of Terra Notebooks

The Google Cloud Platform is a suite of cloud computing services offering data storage, data analytics, and machine learning. The raw data files and clinical data files for AMP-PD are stored on the platform.

Google BigQuery is a data warehouse that lets you query the data you want from the main databases. A number of the available AMP-PD databases are stored in queryable format in Google BigQuery.

In this lesson you will:

Learn more about Google Cloud Platform and Google BigQuery
Find out how AMP-PD data is stored in the cloud and how to access it
See demos of the platform in use

Now you have familiarized yourself with the Google Cloud Platform and BigQuery, you can move on getting the data from the Big Query tables to your notebook using SQL, a language used for data management and retrieving.

In this lesson you will:

Learn how to use SQL queries to query AMP-PD clinical data
Learn how to use SQL queries to query AMP-PD variant data
Practice manipulating data in a notebook and writing results to the bucket

Some files available on the cloud aren’t queryable through BigQuery using SQL. These are the raw data files and are stored in Google Buckets, and are accessed from your command line using the python application, gsutil.

In this lesson you will:

Learn how to find the available genetic files
Learn how to view the file locations in a notebook
Understand how to use gsutil to access data

It costs money to store and access data on the cloud. But once you understand the pricing structure and follow some basic principles, it’s possible to keep your costs down.

In this lesson you will:

Understand the pricing structure for running notebooks and queries
Gain tips for keeping your prices low

In this lesson, you will use all the information learned to run an analysis on Terra.

You will use AMP-PD data and Plink, an open-source whole-genome association analysis tool, to perform a case/control analysis using Fisher’s exact test, which is used to determine if there are non-random associations between two variables. This will test whether the distribution of the major and minor alleles for that variant is significantly different between participants with a Parkinson’s Disease diagnosis and participants with no diagnosis.

In this lesson you will:

Learn which software you’ll need and how to download it
Run an analysis from start to finish in a Terra notebook

Using Terra to Access Data and Perform Analysis

Modules

What is Terra?

Available Resources from AMP-PD and Terra

Introduction to Notebooks

Introduction to the Google Cloud Platform

Querying Data in a Terra Notebook

Accessing VCF and Related Files

Pricing

Demo: Analysis in a Terra Notebook

More Trainings