{"id":50011925,"date":"2022-05-20T22:50:33","date_gmt":"2022-05-21T02:50:33","guid":{"rendered":"https:\/\/gp2stg.wpenginepowered.com\/the-components-of-gp2s-second-data-release\/"},"modified":"2024-11-04T15:04:24","modified_gmt":"2024-11-04T20:04:24","slug":"the-components-of-gp2s-second-data-release","status":"publish","type":"post","link":"https:\/\/gp2.org\/ar\/the-components-of-gp2s-second-data-release\/","title":{"rendered":"The Components of GP2\u2019s Second Data Release"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">In April 2022, GP2 announced the second data release on the Terra platform in collaboration with AMP<\/span><span style=\"font-weight: 400;\">\u00ae<\/span><span style=\"font-weight: 400;\"> PD. This release contains data from both complex and monogenic GP2 networks. The complex disease data now consists of a total of 8,644 genotyped participants (5,249 PD, 3,395 non-PD). New to this release are Movement Disorders Genotypes and Phenotypes &#8211; Queen Square Brain Bank (MDGAP-QSBB), a United Kingdom brain bank, and SYNAPS Study &#8211; Kazakhstan (SYNAPS-KZ), a PD cohort from Kazakhstan. Additional CORIELL samples are also added in this release. The monogenic disease data consists of 235 whole genome sequenced (WGS) participants with PD from the <a href=\"https:\/\/www.parkinson.org\/PDGENEration\" target=\"_blank\" rel=\"noopener\">PDGENEration cohort<\/a><\/span><span style=\"font-weight: 400;\">, which were selected based on <a href=\"https:\/\/monogenic.gp2.org\/samplePrioritization.html\" target=\"_blank\" rel=\"noopener\">set criteria<\/a><\/span><span style=\"font-weight: 400;\">\u00a0for having a suspected monogenic cause of PD from the Monogenic disease network.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The key difference between genotyping microarray and WGS data is in the number of genetic markers that are detected during the genotyping process. WGS provides a comprehensive view of the genome by potentially interrogating all 3.2 billion base-pairs in the human genome, while genotyping with a targeted microarray (such as GP2\u2019s custom NeuroBooster array) interrogates a more targeted number of up to 1.9 million region-tagging markers per sample. WGS are better suited for analyses investigating rare genetic variation, while genotypes imputed to a well-matched reference panel are a scalable and efficient solution for studying more common genetic variation.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Genetically-determined ancestry of complex disease GP2 participants is broken into nine ancestry groups; the table below details the genetically-determined ancestry of complex disease participants in GP2 release 2 that have been passed quality control and been imputed.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-50009507\" src=\"https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-43-1024x720.jpg\" alt=\"\" width=\"700\" height=\"492\" srcset=\"https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-43-1024x720.jpg 1024w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-43-300x211.jpg 300w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-43-768x540.jpg 768w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-43-1536x1080.jpg 1536w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-43-2048x1440.jpg 2048w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-43-256x180.jpg 256w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-43-512x360.jpg 512w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Genetically-determined ancestry of monogenic disease GP2 participants is broken into four main ancestry groups at this time, as this is the first available data from the monogenic network. As more data is processed, more diverse samples will become available. The table below details the genetically-determined ancestry of monogenic disease participants in GP2 release 2 by age at onset and family history.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-50009472\" src=\"https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-42-1024x426.jpg\" alt=\"\" width=\"700\" height=\"291\" srcset=\"https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-42-1024x426.jpg 1024w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-42-300x125.jpg 300w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-42-768x320.jpg 768w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-42-1536x639.jpg 1536w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-42-2048x853.jpg 2048w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-42-256x107.jpg 256w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-42-512x213.jpg 512w, https:\/\/gp2.org\/wp-content\/uploads\/2022\/05\/MicrosoftTeams-image-42-1280x533.jpg 1280w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Future data releases will continue to grow the diversity of participants available. You can check out <a href=\"https:\/\/gp2.org\/cohort-dashboard\/\" rel=\"\">our dashboard<\/a> to see our progress<\/span><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In addition to including the first WGS data, we have included an additional new data type: probabilistic copy number variant (CNV) calls for all genotyped samples passing quality control (gene-level plus 250kb flanking regions). CNV refers to variation in the number of times a certain stretch of DNA is repeated. This variation may have come about through deletions, insertions, or other events and can potentially provide more information about how structural variation affects disease risk. The pipeline used to produce the probabilistic CNV calls can be found on the <a href=\"https:\/\/github.com\/GP2code\" target=\"_blank\" rel=\"noopener\">GP2 Github<\/a><\/span><span style=\"font-weight: 400;\">. This is currently a work in progress and will improve as we include more data and make adjustments to the pipeline. Consider this CNV data \u201chypothesis generating\u201d. Usage notes will be included in the blog post covering the first stable release scheduled for next quarter.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This release contains WGS data from the monogenic disease network in addition to the NeuroBooster array genotyped complex disease data. More information on the structure of the complex disease genotype and clinical data is detailed in the blog post \u2018<a href=\"https:\/\/gp2.org\/the-components-of-gp2-first-data-release\/\" rel=\"\">The Components of GP2\u2019s First Data Release<\/a>\u2019<\/span><span style=\"font-weight: 400;\">\u00a0as well as in the README which is updated at each release and is available on the official GP2 Terra workspaces. The monogenic PD WGS data is also detailed in the same README.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We are excited to make this beta release available to the PD research community and there is much more to come in the near future!<\/span><\/p>\n<p>This blog was jointly authored by Hampton Leonard, Mike Nalls, Dan Vitale, Yeajin Song, Kristin Levine, Mary Makarious, Zih-Hua Fang and Peter Heutink. Please visit GP2\u2019s <strong><a href=\"https:\/\/gp2.org\/working-groups\/complex-disease-data-analysis-working-group\/\" rel=\"\">Complex Disease \u2013 Data Analysis Working Group<\/a><\/strong> and <a href=\"https:\/\/gp2.org\/working-groups\/monogenic-data-analysis-working-group\/\" rel=\"\"><strong>Monogenic &#8211; Data Analysis Working Group<\/strong><\/a> to learn more about their background.<\/p>\n<p>Check out our other <strong><a href=\"https:\/\/gp2.org\/blog\/?search=components%20of%20GP2%27s%20data%20release\">data releases<\/a><\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In April 2022, GP2 announced the second data release on the Terra platform in collaboration with AMP\u00ae PD. This release contains data from both complex and monogenic GP2 networks. The complex disease data now consists of a total of 8,644 genotyped participants (5,249 PD, 3,395 non-PD).<\/p>\n","protected":false},"author":16,"featured_media":50024745,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":"","_links_to":"","_links_to_target":""},"class_list":["post-50011925","post","type-post","status-publish","format-standard","hentry","post-type-2388","topic-----ar","topic-research-collaboration-2","topic---ar"],"acf":[],"_links":{"self":[{"href":"https:\/\/gp2.org\/ar\/wp-json\/wp\/v2\/posts\/50011925","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gp2.org\/ar\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gp2.org\/ar\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gp2.org\/ar\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/gp2.org\/ar\/wp-json\/wp\/v2\/comments?post=50011925"}],"version-history":[{"count":0,"href":"https:\/\/gp2.org\/ar\/wp-json\/wp\/v2\/posts\/50011925\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gp2.org\/ar\/wp-json\/wp\/v2\/media\/50024745"}],"wp:attachment":[{"href":"https:\/\/gp2.org\/ar\/wp-json\/wp\/v2\/media?parent=50011925"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}