{"id":638,"date":"2022-10-27T18:40:48","date_gmt":"2022-10-27T18:40:48","guid":{"rendered":"https:\/\/terrabioappdev.wpenginepowered.com\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/"},"modified":"2023-12-27T04:55:46","modified_gmt":"2023-12-27T04:55:46","slug":"scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store","status":"publish","type":"post","link":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/","title":{"rendered":"Scaling variant discovery to a million genomes with the Genomic Variant Store"},"content":{"rendered":"<p><i><span style=\"font-weight: 400;\">Kylee Degatano is a Senior Product Manager in the Data Sciences Platform at the Broad Institute. In this guest blog post, she introduces the Genomic Variant Store, a highly scalable solution for genomic analysis based on Google BigQuery, designed to scale joint variant discovery to a million whole genome samples. Researchers interested in trying out this approach are invited to join the <\/span><\/i><a href=\"http:\/\/broad.io\/variantstore\"><i><span style=\"font-weight: 400;\">GVS Early Access program<\/span><\/i><\/a><i><span style=\"font-weight: 400;\">.\u00a0<\/span><\/i><\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Researchers around the world are generating petabytes of genomics data to understand human biology,\u00a0 identify the causes of diseases and develop new treatments. The analyses involved, such as association studies, linkage analysis, and exploration of Mendelian inheritance all require high confidence in the quality of the genomic variant calls they rely on.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">One of the most successful approaches of the past decade for generating such high quality variant calls involves jointly analyzing the genomes of many different people across a population. This &#8220;joint calling&#8221; approach increases statistical power for differentiating true variants from artifacts, which makes it possible to identify extremely rare variants with confidence. In the GATK Best Practices for germline short variant discovery, it is implemented as a two-step process: first we identify potential variants individually per sample, then we evaluate the evidence found for each genomic site across all samples to produce &#8220;joint calls&#8221;,<\/span><i><span style=\"font-weight: 400;\"> i.e. <\/span><\/i><span style=\"font-weight: 400;\">a multi-sample variant callset. We can then apply additional filtering to further refine the callset for downstream analysis.\u00a0\u00a0\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone size-full wp-image-1368\" src=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gatk-bp-detail-2.png\" alt=\"\" width=\"1000\" height=\"320\" \/><\/p>\n<p><i><span style=\"font-weight: 400;\">Diagram illustrating the overall data generation and analysis process involved in joint variant discovery.<\/span><\/i><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">As an example of how this approach empowers discovery, my colleague Laura Gauthier<\/span>\u00a0<span style=\"font-weight: 400;\">recently wrote a <\/span><a href=\"https:\/\/terra.bio\/schizophrenia-advances-demonstrate-value-of-joint-calling-methods\/\"><span style=\"font-weight: 400;\">blog post<\/span><\/a><span style=\"font-weight: 400;\">\u00a0commenting on a <\/span><a href=\"https:\/\/www.nature.com\/articles\/s41586-022-04556-w\"><span style=\"font-weight: 400;\">recent study<\/span><\/a><span style=\"font-weight: 400;\"> that used a joint callset of 75,000 exome samples to implicate ten new genes in the development of schizophrenia. Crucially, she explained, the study focused on ultra-rare variants, never seen before and only observed in a single individual in that entire callset, that affected the same small set of genes. The joint calling methodology was essential to the success of that analysis because it enabled the authors to accurately detect ultra-rare variants with confidence, and to do so for enough individuals to accumulate enough statistical power to implicate the corresponding genes. Laura concluded her post with this prediction<\/span><span style=\"font-weight: 400;\">:\u00a0<\/span><\/p>\n<blockquote><p><strong>&#8220;Future breakthroughs in common disease research will continue to come about through the hard work of recruiting large numbers of affected participants; accurately detecting the few, tiny, rare mutations that make them different; and combining as many ultra-rare variants as we can until we can point the finger at genes harboring too many mutations across people who suffer from disease.&#8221;<\/strong><\/p><\/blockquote>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">It is with that same vision in mind that our engineering team has been working hard to scale up our joint calling capabilities.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Every jump in the scale of the studies we have supported so far has required that we re-engineer various parts of our analysis pipelines to address new barriers related to cost efficiency, computing power, runtime\u2026 or the maximum amount of data you can store in a single file on a standard hard drive. (That was a fun one to hit.)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Today, I&#8217;m excited to share the solution we developed to get to the next order of magnitude, and to make this kind of scaling accessible to a wider range of researchers and institutions.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Aiming for a million<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">A few years ago, the NIH asked our team to figure out a way to run joint calling on one million human genomes for the <\/span><a href=\"https:\/\/allofus.nih.gov\/\"><span style=\"font-weight: 400;\">All of Us Research Program<\/span><\/a><span style=\"font-weight: 400;\">. For context, at the time, performing joint calling for &#8220;just&#8221; 15,000 whole genomes was a costly months-long endeavor for a full team of engineers equipped with cutting-edge tools. We knew that to scale to a million genomes, we would need to revisit the engineering design behind the GATK Joint Calling pipeline (again). We also foresaw that many downstream analysis tools would not be able to handle the resulting callset, which would contain trillions of variants. Researchers would need to be able to subset the variant calls to their samples and genomic regions of interest.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Oh, and we needed to ensure our solution to these problems would be cost- and time-efficient, and scalable to both ends of the spectrum: it should allow us to scale down to 10 samples as easily as scaling up to a million.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So we took joint calling back to the drawing board. We started by designing a core schema for the variant data based on data access patterns for key use cases like training a filtering model, searching to identify samples with a specific variant or variation in a specific location, and extracting data \u2014for all samples or for subsets\u2014 into <\/span><a href=\"https:\/\/gatk.broadinstitute.org\/hc\/en-us\/articles\/360035531692-VCF-Variant-Call-Format\"><span style=\"font-weight: 400;\">Variant Call Format<\/span><\/a><span style=\"font-weight: 400;\"> (VCF). We contemplated which information in the variant files was necessary for joint calling and downstream analysis, and stripped out anything extraneous. We tested many different foundational technologies for processing and storing data, including Spark, Dataflow, and custom developed infrastructure. We ultimately chose Google BigQuery because it is easy to operate, can handle huge data sizes, is cheap to store data and cheap to query, and has excellent security features.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The result of this three-year development effort: a highly scalable and cost-effective variant storage and processing solution we call the Genomic Variant Store.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Introducing Joint Calling with the Genomic Variant Store<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">The Genomic Variant Store (GVS) uses <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/introduction\"><span style=\"font-weight: 400;\">Google BigQuery<\/span><\/a><span style=\"font-weight: 400;\"> to store variant data according to the core data model. It is designed to function as a persistent data warehouse to which we can add new data over time, simply by importing gVCF files produced by the single-sample calling step of the variant discovery process outlined above. This automatically combines the per-sample variant data across samples and genomic coordinates. We can then query the GVS by sample and\/or by genomic coordinates to generate subset callsets of interest in either VCF or <\/span><a href=\"https:\/\/hail.is\/docs\/0.2\/vds\/index.html\"><span style=\"font-weight: 400;\">Hail VDS<\/span><\/a><span style=\"font-weight: 400;\"> file format.\u00a0<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">If you are familiar with the details of the current GATK Best Practices workflow implementation, the GVS data ingestion step is analogous to combining gVCFs into a GenomicsDB data store. However GVS is much more scalable than GenomicsDB due to its use of Google BigQuery.\u00a0<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">The GVS also includes a built-in variant filtering model (GATK VQSR) that determines which variant calls will be considered true variants, as opposed to artifacts. The filtering model is applied when a subset of variant data is extracted to file, and it can be re-trained and improved as new data is added.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-large wp-image-1477\" src=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/pipeline-gatk-scale-gvs-2-1024x381.png\" alt=\"\" width=\"800\" height=\"298\" \/><\/p>\n<p><i><span style=\"font-weight: 400;\">Overview of the operations supported by the Genomic Variant Store\u00a0<\/span><\/i><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">For convenience, we have developed utility workflows written in the <\/span><a href=\"https:\/\/openwdl.org\/\"><span style=\"font-weight: 400;\">Workflow Description Language<\/span><\/a><span style=\"font-weight: 400;\"> (WDL) to perform the import, model training, subsetting and variant search operations. These can be used individually to grow, curate and analyze a persistent variant data store.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We have tested this on a wide range of callset sizes; there is no minimum number of samples, and we anticipate that it will scale to the planned one million genomes for the All of Us Research Program. In fact, we have already used the GVS in production to produce a joint callset from 250,000 human whole genomes for the AoU program, which we believe is the largest joint-called human whole genome callset in the world so far.\u00a0\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Get early access to a self-contained GVS Joint Calling Pipeline<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Due to the engineering challenges involved in operating at this scale, there are only a handful of genome centers around the world that are currently able to create joint callsets from tens of thousands of human whole genomes, let alone hundreds of thousands. Yet it is very likely that progress in human genetics could be substantially accelerated if this capability were made more widely accessible.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As we continue to improve the GVS \u2014 increasing scalability, adding support for new data types, improving the internal algorithms \u2014 we are looking for feedback from external groups to ensure the GVS will work well for a wide audience. To that end, we are starting an early access program for researchers who are interested in trying out the GVS approach for making joint callsets of up to 10,000 human whole genomes.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We are currently targeting that project size because most population genetics projects today involve less extreme cohorts than the All of Us Research Program; we&#8217;re seeing many projects with whole genome sample numbers in the low thousands.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To enable those projects to use GVS for scalable joint calling without having to manage complex infrastructure and run multiple separate operations, we developed a &#8220;one and done&#8221; workflow called the GVS Joint Calling pipeline that wraps all the necessary steps. This single self-contained workflow takes in a set of per-sample gVCFs, trains and applies the GATK VQSR filtering model, and extracts the variants into a VCF file containing the complete joint callset.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We plan to make this pipeline publicly available in a Terra workspace, pre-configured in such a way that anyone can run it out of the box with minimal effort. In our current tests, the GVS Joint Calling pipeline set up in Terra can produce a joint callset from up to 10,000 human whole genomes in less than half a day, at a cost of $0.06 USD per genome.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-large wp-image-1480\" src=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs-pipeline-terra-1024x488.png\" alt=\"\" width=\"800\" height=\"381\" \/><\/p>\n<p><em>Screenshot of the Terra workflow configuration panel for the GVS Joint Calling Pipeline<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">We invite you to apply to join the early access program by filling out a <\/span><a href=\"http:\/\/broad.io\/variantstore\"><span style=\"font-weight: 400;\">short form<\/span><\/a> <span style=\"font-weight: 400;\">that will help us assess if your callset would be a good fit for the initial release. If you are selected, we will help you get started with the GVS <\/span><span style=\"font-weight: 400;\">Joint Calling pipeline in Terra, <\/span><span style=\"font-weight: 400;\">and we will be available to assist you if you run into any problems.\u00a0\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Even if the Genomic Variant Store doesn\u2019t meet your needs right now, please feel free to use the form to tell us more about your work and what features you be interested in seeing in a future version. We look forward to hearing from you!\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Join the early access program to try out the Broad Institute&#8217;s new Genomic Variant Store, a highly scalable variant storage and processing solution powered by Google BigQuery on Terra.<\/p>\n","protected":false},"author":8,"featured_media":641,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[18,24,25,106,13,37,119,32],"tags":[],"class_list":["post-638","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data","category-data-management","category-data-model","category-genomics","category-guest-author","category-medical-and-population-genetics","category-most-recent","category-workflows"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Scaling variant discovery to a million genomes with the Genomic Variant Store - Terra<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Scaling variant discovery to a million genomes with the Genomic Variant Store - Terra\" \/>\n<meta property=\"og:description\" content=\"Join the early access program to try out the Broad Institute&#039;s new Genomic Variant Store, a highly scalable variant storage and processing solution powered by Google BigQuery on Terra.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/\" \/>\n<meta property=\"og:site_name\" content=\"Terra\" \/>\n<meta property=\"article:published_time\" content=\"2022-10-27T18:40:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-27T04:55:46+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"627\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Kylee Degatano\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kylee Degatano\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/\"},\"author\":{\"name\":\"Kylee Degatano\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/a13dae5fd4d9205cf67a9cc46bd50d2e\"},\"headline\":\"Scaling variant discovery to a million genomes with the Genomic Variant Store\",\"datePublished\":\"2022-10-27T18:40:48+00:00\",\"dateModified\":\"2023-12-27T04:55:46+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/\"},\"wordCount\":1664,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/terra.bio\/#organization\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png\",\"articleSection\":[\"Data\",\"Data Management\",\"Data Model\",\"Genomics\",\"Guest Author\",\"Medical and Population Genetics\",\"Most Recent\",\"Workflows\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/\",\"url\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/\",\"name\":\"Scaling variant discovery to a million genomes with the Genomic Variant Store - Terra\",\"isPartOf\":{\"@id\":\"https:\/\/terra.bio\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png\",\"datePublished\":\"2022-10-27T18:40:48+00:00\",\"dateModified\":\"2023-12-27T04:55:46+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#primaryimage\",\"url\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png\",\"contentUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png\",\"width\":1200,\"height\":627,\"caption\":\"gvs_OG\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/terra.bio\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Scaling variant discovery to a million genomes with the Genomic Variant Store\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/terra.bio\/#website\",\"url\":\"https:\/\/terra.bio\/\",\"name\":\"Terra\",\"description\":\"Science at Scale\",\"publisher\":{\"@id\":\"https:\/\/terra.bio\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/terra.bio\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/terra.bio\/#organization\",\"name\":\"Terra\",\"url\":\"https:\/\/terra.bio\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp\",\"contentUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp\",\"width\":287,\"height\":318,\"caption\":\"Terra\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/a13dae5fd4d9205cf67a9cc46bd50d2e\",\"name\":\"Kylee Degatano\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e15a9f7562867da0e726195679ceab8b4d4c6c5edd997ebe4befcda5ee69d65e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e15a9f7562867da0e726195679ceab8b4d4c6c5edd997ebe4befcda5ee69d65e?s=96&d=mm&r=g\",\"caption\":\"Kylee Degatano\"},\"url\":\"https:\/\/terra.bio\/author\/kdegatano\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Scaling variant discovery to a million genomes with the Genomic Variant Store - Terra","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/","og_locale":"en_US","og_type":"article","og_title":"Scaling variant discovery to a million genomes with the Genomic Variant Store - Terra","og_description":"Join the early access program to try out the Broad Institute's new Genomic Variant Store, a highly scalable variant storage and processing solution powered by Google BigQuery on Terra.","og_url":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/","og_site_name":"Terra","article_published_time":"2022-10-27T18:40:48+00:00","article_modified_time":"2023-12-27T04:55:46+00:00","og_image":[{"width":1200,"height":627,"url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png","type":"image\/png"}],"author":"Kylee Degatano","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kylee Degatano","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#article","isPartOf":{"@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/"},"author":{"name":"Kylee Degatano","@id":"https:\/\/terra.bio\/#\/schema\/person\/a13dae5fd4d9205cf67a9cc46bd50d2e"},"headline":"Scaling variant discovery to a million genomes with the Genomic Variant Store","datePublished":"2022-10-27T18:40:48+00:00","dateModified":"2023-12-27T04:55:46+00:00","mainEntityOfPage":{"@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/"},"wordCount":1664,"commentCount":0,"publisher":{"@id":"https:\/\/terra.bio\/#organization"},"image":{"@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#primaryimage"},"thumbnailUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png","articleSection":["Data","Data Management","Data Model","Genomics","Guest Author","Medical and Population Genetics","Most Recent","Workflows"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/","url":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/","name":"Scaling variant discovery to a million genomes with the Genomic Variant Store - Terra","isPartOf":{"@id":"https:\/\/terra.bio\/#website"},"primaryImageOfPage":{"@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#primaryimage"},"image":{"@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#primaryimage"},"thumbnailUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png","datePublished":"2022-10-27T18:40:48+00:00","dateModified":"2023-12-27T04:55:46+00:00","breadcrumb":{"@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#primaryimage","url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png","contentUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/gvs_OG.png","width":1200,"height":627,"caption":"gvs_OG"},{"@type":"BreadcrumbList","@id":"https:\/\/terra.bio\/scaling-variant-discovery-to-a-million-genomes-with-the-genomic-variant-store\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/terra.bio\/"},{"@type":"ListItem","position":2,"name":"Scaling variant discovery to a million genomes with the Genomic Variant Store"}]},{"@type":"WebSite","@id":"https:\/\/terra.bio\/#website","url":"https:\/\/terra.bio\/","name":"Terra","description":"Science at Scale","publisher":{"@id":"https:\/\/terra.bio\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/terra.bio\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/terra.bio\/#organization","name":"Terra","url":"https:\/\/terra.bio\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/#\/schema\/logo\/image\/","url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp","contentUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp","width":287,"height":318,"caption":"Terra"},"image":{"@id":"https:\/\/terra.bio\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/terra.bio\/#\/schema\/person\/a13dae5fd4d9205cf67a9cc46bd50d2e","name":"Kylee Degatano","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e15a9f7562867da0e726195679ceab8b4d4c6c5edd997ebe4befcda5ee69d65e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e15a9f7562867da0e726195679ceab8b4d4c6c5edd997ebe4befcda5ee69d65e?s=96&d=mm&r=g","caption":"Kylee Degatano"},"url":"https:\/\/terra.bio\/author\/kdegatano\/"}]}},"_links":{"self":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts\/638","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/comments?post=638"}],"version-history":[{"count":0,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts\/638\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/media\/641"}],"wp:attachment":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/media?parent=638"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/categories?post=638"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/tags?post=638"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}