{"id":464,"date":"2021-11-11T15:30:20","date_gmt":"2021-11-11T15:30:20","guid":{"rendered":"https:\/\/terrabioappdev.wpenginepowered.com\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/"},"modified":"2023-12-27T04:55:07","modified_gmt":"2023-12-27T04:55:07","slug":"calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference","status":"publish","type":"post","link":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/","title":{"rendered":"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference"},"content":{"rendered":"<p><i><span style=\"font-weight: 400;\"><a href=\"https:\/\/slzarate.github.io\/\">Samantha Zarate<\/a> <\/span><\/i><i><span style=\"font-weight: 400;\">is a third-year PhD student in the computer science department at Johns Hopkins University in Baltimore, MD, working in the lab of Dr. Michael Schatz. As a member of the Telomere-to-Telomere consortium, she has been working for the last year to evaluate how the T2T-CHM13 reference genome affects variant calling with short-read data. In this guest blog post, Samantha explains what this entails, then walks us through the computational challenges she faced in implementing this analysis and how she solved them using Terra and the AnVIL.<\/span><\/i><\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Earlier this year, the <\/span><a href=\"https:\/\/sites.google.com\/ucsc.edu\/t2tworkinggroup\"><span style=\"font-weight: 400;\">Telomere-to-Telomere (T2T) consortium<\/span><\/a><span style=\"font-weight: 400;\"> released the complete sequence of a human genome, unlocking the remaining 8% of the human genome reference unfinished in the current human reference genome and introducing nearly 200 million bp of novel sequence <\/span><a href=\"https:\/\/paperpile.com\/c\/kE0hLK\/wcIQ\"><span style=\"font-weight: 400;\">(Nurk, Koren, Rhie, Rautiainen, <\/span><i><span style=\"font-weight: 400;\">et al.<\/span><\/i><span style=\"font-weight: 400;\">, 2021)<\/span><\/a><span style=\"font-weight: 400;\">. For context, this is about as much novel sequence as in all of chromosome 3!<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Within the T2T consortium, I led an analysis that demonstrated that the new T2T-CHM13 reference genome improves read mapping and variant calling for 3,202 globally diverse samples sequenced with short reads. We found that compared to the current standard, GRCh38, using this new reference mitigates or eliminates major sources of error that derived from incorrect assembly and certain idiosyncrasies of the samples previously used as the basis for reference construction. For example, the T2T-CHM13 reference includes corrections to collapsed segmental duplications, which are regions that previously appeared highly enriched for heterozygous paralog-specific variants in nearly all individuals due to the false pileup of reads from duplicated regions to a single location. This and other such corrections lead to a decrease in the number of variants erroneously called per sample when using the T2T-CHM13 reference genome. In addition, because we also added nearly 200Mbp of additional sequence, we discovered over 1 million additional high quality variants across the entire collection.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Our collaborators in the T2T consortium evaluated the scientific utility of our improved variant calling results. Excitingly, they found that the use of the new reference genome led to the discovery of novel signatures of selection in the newly assembled regions of the genome, as well as improved variant analysis throughout. This included reporting up to 12 times fewer false-positive variants in clinically relevant genes that have traditionally proved difficult to sequence and analyze.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Collectively, these results represent a significant improvement in variant calling using the T2T-CHM13 reference genome, which has broad implications for clinical genetics analyses, including inherited, <\/span><i><span style=\"font-weight: 400;\">de novo,<\/span><\/i><span style=\"font-weight: 400;\"> and somatic mutations.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If you&#8217;d like to learn more about our findings and the supporting evidence, you can read the <\/span><a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2021.07.12.452063v1\"><span style=\"font-weight: 400;\">preprint<\/span><\/a><span style=\"font-weight: 400;\"> on bioRxiv. In the rest of this blog post, I want to give some behind-the-scenes insight into what it took to accomplish this analysis \u2014 the computational challenges we faced, the decision to use Terra and the AnVIL, and how it went in practice \u2014 in case it might be useful for others tackling such a large-scale project for the first time.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Designing the pipeline and mapping out scaling challenges<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">To evaluate the T2T-CHM13 reference genome across multiple populations and a large number of openly accessible samples, we turned to the 1000 Genomes Project (1KGP), which recently expanded its scope to encompass 3,202 samples representing 602 trios and 26 populations around the world.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A team at the New York Genome Center (NYGC) had previously generated variant calls from that same dataset using GRCh38 as the reference genome <\/span><a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2021.02.06.430068v1.full\"><span style=\"font-weight: 400;\">(Byrska-Bishop <\/span><i><span style=\"font-weight: 400;\">et al.<\/span><\/i><span style=\"font-weight: 400;\">, 2021)<\/span><\/a><span style=\"font-weight: 400;\">, with a pipeline based on the <\/span><a href=\"https:\/\/www.nature.com\/articles\/s41467-018-06159-4\"><span style=\"font-weight: 400;\">functional equivalence pipeline standard<\/span><\/a><span style=\"font-weight: 400;\"> established by the Centers for Common Disease Genomics (CCDG) and collaborators. The functional equivalence standard provides guidelines for implementing genome analysis pipelines in such a way that you can compare results obtained with different pipelines with confidence that any differences in the results are due to differences in specific inputs \u2014 such as a different reference genome \u2014 rather than to technical differences in the tools and methods employed.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">By that same logic, if we followed their pipeline closely, we could perform an apples-to-apples comparison between their results and those we were going to generate with the T2T-CHM13 reference, and thus avoid having to redo the work that they had already done with GRCh38. As we implemented our pipeline, we only updated those elements that substantially improved efficiency without introducing divergences, and whenever possible, we used the same flags and options as the original study.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h5><span style=\"font-weight: 400;\">Pipeline overview<\/span><\/h5>\n<p><span style=\"font-weight: 400;\">Conceptually, the pipeline consists of two main operations: (1) read alignment, in which we take the raw sequencing data for each sample and align, sort, and organize each read\u2019s alignment to the reference genome, and (2) variant calling, in which we examine the aligned data to identify potential variants, first within each sample, then across the entire 1KGP collection. In practice, each of these operations consists of multiple steps of data processing and analysis, each with different computational requirements and constraints. Let&#8217;s take a closer look at how this plays out when you have to apply these steps to a large number of samples.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h5><span style=\"font-weight: 400;\">Read alignment<\/span><\/h5>\n<p><span style=\"font-weight: 400;\">Read alignment involves aligning each read to the reference genome, plus a few additional steps to address data formatting and sorting requirements. When samples are sequenced using multiple flowcells, the data for each sample is subdivided into subsets called &#8220;read groups&#8221;, so we perform these initial steps for each read group. We then merge the aligned read group data per sample and apply a few additional steps: marking duplicate reads and compressing the alignment data into per-sample CRAM files. Finally, we generate quality control statistics that allow us to gauge the quality of the data we&#8217;re starting from.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone size-large wp-image-1156\" src=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/read-alignment-1024x349.png\" alt=\"\" width=\"800\" height=\"273\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Our chosen cohort comprised genome sequencing data from 3,202 samples in the form of paired-end FASTQ files, meaning that our starting dataset consisted of 6,404 files. Applying the initial read alignment per read group would involve generating approximately 32,000 files at the &#8220;widest&#8221; point, to eventually produce 3,202 per-sample CRAM files, plus multiple quality control files for each sample. That&#8217;s a lot of files!<\/span><\/p>\n<p>&nbsp;<\/p>\n<h5><span style=\"font-weight: 400;\">Variant calling<\/span><\/h5>\n<p><span style=\"font-weight: 400;\">Read alignment is just the beginning. To generate variant calls across a large cohort, we have to decompose the process into two main operations, as described in the <\/span><a href=\"https:\/\/gatk.broadinstitute.org\/hc\/en-us\/articles\/360035894711-About-the-GATK-Best-Practices\"><span style=\"font-weight: 400;\">GATK Best Practices<\/span><\/a><span style=\"font-weight: 400;\">: first, we identify variants individually for each sample, then we combine the calls from all the samples in the cohort to generate overall variant calls in VCF format.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We performed the per-sample variant calling using a special mode of the GATK HaplotypeCaller tool that generates &#8220;genomic VCFs&#8221;, or GVCFs, which contain detailed information representing the entire genome (not just potentially variant sites, as you would find in normal VCF files). For efficiency reasons, we performed this step on a per-chromosome basis, generating over 75,000 files in total.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We then combined the variant calls from all the samples in the cohort for the &#8220;joint genotyping&#8221; analysis, which involves looking at the evidence at each possible variant site across all the samples in the cohort. This analysis produced the joint callset for the whole cohort, containing all the variant sites with detailed genotyping information and variant call statistics for each sample in the cohort.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The twist is that this step has to be run on intervals smaller than a whole chromosome, so now we&#8217;re combining per-chromosome GVCF files across all the samples, but producing per-interval joint-called VCFs \u2014 about 30,000 of them in total across the 24 chromosomes (more on that in a minute). Fortunately, we can then concatenate these files into just 24 per-chromosome VCF files.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">And, finally, there&#8217;s a filtering step called variant recalibration (VQSR) that is done per-chromosome. This involves scoring variants to identify those with sufficiently high quality to use in downstream analysis, as opposed to those below this threshold, which we assume to be artifacts. The VCF files annotated with final variant scores and filtering information (PASS or otherwise) are the final outputs of the pipeline.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-large wp-image-1155\" src=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/variant-calling-1024x613.png\" alt=\"\" width=\"800\" height=\"479\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">As you can imagine from this brief overview, joint genotyping alone presents various scaling challenges, and our T2T-CHM13 reference added an interesting complexity. Normally, for whole genomes, the GATK development team recommends defining the joint genotyping intervals by using regions with Ns in the reference as &#8220;naturally occurring&#8221; points where you don&#8217;t have to worry about variants spanning the interval boundaries. (For exomes, you can just use the capture intervals.)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, the T2T-CHM13 reference doesn&#8217;t have any regions with Ns \u2014 that&#8217;s the whole point of a &#8220;telomere-to-telomere&#8221; reference! As a result, we developed a strategy that involves generating intervals of arbitrary length (100kb) and using 1kb padding intervals that we later trimmed off from the interval-level VCF files. The padding ensured that our variant calls didn&#8217;t suffer from edge effects, and we were able to verify that it didn&#8217;t make any difference to the final results.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Scaling up with Terra and the AnVIL<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Implementing an end-to-end variant discovery analysis at the scale of thousands of whole genomes is not trivial, both in terms of basic logistics and computational requirements. We originally started out using the computing cluster available to us at Johns Hopkins University, but we realized very quickly that it would take too long and require too much storage for us to be successful on the timeline we needed to keep pace with the project. Based on our early testing, we estimated it would take many months, possibly up to a year, of computation to do all the data processing on our institutional servers.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So instead, we turned to <\/span><a href=\"https:\/\/anvilproject.org\/\"><span style=\"font-weight: 400;\">AnVIL<\/span><\/a><span style=\"font-weight: 400;\"> and the Terra platform, which promised massively scalable analysis, collaborative workspaces, and a host of other features to meet our needs.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h5><span style=\"font-weight: 400;\">Importing data<\/span><\/h5>\n<p><span style=\"font-weight: 400;\">At the outset, we ran into some difficulties getting the starting dataset ready. We had originally planned to start from the NYGC&#8217;s version of the 1000 Genomes Project data (CRAM files aligned to GRCh38), which is already available through Terra as part of the AnVIL project. The idea was to revert those files to an unmapped state and then start our pipeline from there. However, we found that those files included some processing, such as replacing ambiguous nucleotides (N&#8217;s in the reads) with other bases, which we didn&#8217;t want \u2014 we wanted it to be as close to the raw data as possible to eliminate any possible biases introduced from GRCh38.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So we decided to shift gears and start from the original FASTQ files, which are available through the European Nucleotide Archive (ENA), and this is where we hit the biggest technical obstacle in the whole process. As I mentioned earlier, that dataset comprises 6,404 paired-end FASTQ files, which amount to about 100TB of compressed sequence data \u2014 for reference, that&#8217;s about 100,000 hours of streaming movies on Netflix in standard definition. Unfortunately, though not unexpectedly, the ENA didn&#8217;t plan on having people download that much data at once, so we had to come up with a way to transfer the data that wouldn&#8217;t crash their servers. We ended up using a WDL workflow to copy batches of about 100 files at a time to a Google bucket, managing the batches manually over several days. That was not fun.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h5><span style=\"font-weight: 400;\">Executing workflows at scale and collaborating<\/span><\/h5>\n<p><span style=\"font-weight: 400;\">After that somewhat rocky start, however, running the analysis itself was surprisingly smooth. We implemented the pipeline described above as a set of WDL workflows, and we used Terra&#8217;s built-in workflow execution system, called Cromwell, to run them on the cloud.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Scaling up went well, especially considering the alternative to using Terra was using our university&#8217;s HPC, which is subject to limited quotas, traffic restrictions from other users and equipment failures in a way that Google Cloud servers aren&#8217;t. The push-button capabilities of Terra let us scale up easily and rapidly: after verifying the success of our WDLs on a few samples, we could move on to processing hundreds or thousands of workflows at a time. <\/span><span style=\"font-weight: 400;\">It took us about a week to process everything, and that was with Google&#8217;s default compute quotas in place (eg max 25,000 cores at a time), which can be raised on request.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We also really appreciated how easy it was to collaborate with others. Working in a cloud environment, it was very easy to keep our collaborators informed on progress and share results with members of the consortium. If we had been using our institutional HPC, which does not allow access to external users, we would have had to copy files to multiple institutions to provide the same level of access.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">More generally, we found that the reproducibility and reusability of our analyses have increased significantly. Having implemented our workflows in WDL to run in Terra, we can now publish them on <\/span><a href=\"https:\/\/github.com\/schatzlab\/t2t-variants\"><span style=\"font-weight: 400;\">GitHub<\/span><\/a><span style=\"font-weight: 400;\">, knowing that anyone can download them and replicate our analysis on their infrastructure, as Cromwell supports all major HPC schedulers and public clouds. We have also published <\/span><a href=\"https:\/\/hub.docker.com\/repository\/docker\/szarate\/t2t_variants\"><span style=\"font-weight: 400;\">the accompanying Docker image<\/span><\/a><span style=\"font-weight: 400;\">, so the environment in which we are running our code is also reproducible. With all of the materials we used for this analysis available publicly in Terra and relevant repositories, any interested party has all of the tools they need to fully reproduce our analysis and extend it for their purposes.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h5><span style=\"font-weight: 400;\">Room for improvement<\/span><\/h5>\n<p><span style=\"font-weight: 400;\">One of the things we didn&#8217;t like so much was having to select inputs and launch workflows manually in the graphical web interface. That was convenient for initial testing, but we would have preferred to use a CLI environment to launch and manage workflows once we moved to full-scale execution. We only learned after completing the work that Terra has an open API and that there is a Python-based client called FISS that makes it possible to perform all the same actions through scripted commands. The FISS client is covered in the <\/span><a href=\"https:\/\/support.terra.bio\/hc\/en-us\/articles\/360042259232-Managing-data-and-automating-workflows-with-the-FISS-API\"><span style=\"font-weight: 400;\">Terra documentation<\/span><\/a><span style=\"font-weight: 400;\"> (there&#8217;s even a <\/span><a href=\"https:\/\/app.terra.bio\/#workspaces\/help-terra\/FISS%20Tutorial\"><span style=\"font-weight: 400;\">public workspace<\/span><\/a><span style=\"font-weight: 400;\"> with a couple of tutorial notebooks) but we never saw any references to it until it was pointed out to us by someone from the Terra team, so this feature needs more visibility.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We&#8217;d also love to see more functionality added around data provenance. The job history dashboard gives you a lot of details if you&#8217;re looking at workflow execution records. However, you can&#8217;t select a piece of data and see how it was generated, by what version of the pipeline, and so on. Being able to identify the source of a given file, notably the workflow that created it as well as the parameters used, would be greatly helpful for tracing back files, especially when a workflow has been updated and files need to be re-analyzed. Or as another example, when two or more files are named XXX.bam, it&#8217;s very helpful to be able to tell which one is the final version in a way other than writing down the time each respective workflow was launched.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Finally, as mentioned above, transferring the data from ENA was painful, so we&#8217;d love to see some built-in file transfer utilities to make that process more efficient. There is currently no built-in way to obtain data from a URL or FTP link; implementing a universal file fetcher would help users move data into Terra more efficiently.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Conclusions<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Scientifically, our analysis strongly supports the use of T2T-CHM13 for variant calling. We find impressive improvements in both alignment and variant calling, as CHM13 both resolves errors in GRCh38 and adds novel sequences. Our collaborators within the T2T consortium performed further analysis using the joint genotyped chromosome-wide VCF files and also found improvements in medically relevant genes and the overall accuracy of variant calling genomewide, thus demonstrating the utility of T2T-CHM13 in clinical analysis.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From a technical standpoint, this was our first time using Terra for large-scale analysis, and we found that the benefits of Terra outweighed the few pain points. Compared to a high-performance cluster, Terra is much more user-friendly for scaling up, reproducing workflows, and collaborating with others across institutions. We have noted some quality-of-life changes that would improve Terra&#8217;s useability, and we are confident that if these are implemented, Terra would become an even stronger option in the cloud genomics space.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Moving forward, we have been buoyed by the quality of the T2T-CHM13 reference, and we plan on using it for future large-scale analyses using Terra. As we demonstrate here, CHM13 is easy to use as a reference for large-scale genomic analysis, and we hope that both the clinical and research genomics communities use it to improve their own workflows.<\/span><\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<h4><\/h4>\n<p>&nbsp;<\/p>\n<h4><span style=\"color: #008000;\">References<\/span><\/h4>\n<p><a href=\"http:\/\/paperpile.com\/b\/kE0hLK\/vjbI\"><span style=\"font-weight: 400;\">Byrska-Bishop, M. <\/span><i><span style=\"font-weight: 400;\">et al.<\/span><\/i><span style=\"font-weight: 400;\"> (2021) \u2018High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios\u2019, <\/span><i><span style=\"font-weight: 400;\">bioRxiv<\/span><\/i><span style=\"font-weight: 400;\">. doi: <\/span><\/a><a href=\"http:\/\/dx.doi.org\/10.1101\/2021.02.06.430068\"><span style=\"font-weight: 400;\">10.1101\/2021.02.06.430068<\/span><\/a><a href=\"http:\/\/paperpile.com\/b\/kE0hLK\/vjbI\"><span style=\"font-weight: 400;\">.<\/span><\/a><\/p>\n<p><a href=\"http:\/\/paperpile.com\/b\/kE0hLK\/wcIQ\"><span style=\"font-weight: 400;\">Nurk, S. <\/span><i><span style=\"font-weight: 400;\">et al.<\/span><\/i><span style=\"font-weight: 400;\"> (2021) \u2018The complete sequence of a human genome\u2019, <\/span><i><span style=\"font-weight: 400;\">bioRxiv<\/span><\/i><span style=\"font-weight: 400;\">. doi: <\/span><\/a><a href=\"http:\/\/dx.doi.org\/10.1101\/2021.05.26.445798\"><span style=\"font-weight: 400;\">10.1101\/2021.05.26.445798<\/span><\/a><a href=\"http:\/\/paperpile.com\/b\/kE0hLK\/wcIQ\"><span style=\"font-weight: 400;\">.<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Samantha Zarate of the Schatz Lab takes us behind the scenes of the large-scale analysis that demonstrated the benefits of the new T2T-CHM13 reference genome for variant calling.<\/p>\n","protected":false},"author":30,"featured_media":467,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[12,31,106,13,19,119,58,60,32],"tags":[167,168,41],"class_list":["post-464","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analysis","category-ecosystem","category-genomics","category-guest-author","category-most-popular","category-most-recent","category-publications","category-testimonials","category-workflows","tag-1000-genomes","tag-1kgp","tag-anvil"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Calling variants from telomere to telomere with the new T2T-CHM13 genome reference - Terra<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference - Terra\" \/>\n<meta property=\"og:description\" content=\"Samantha Zarate of the Schatz Lab takes us behind the scenes of the large-scale analysis that demonstrated the benefits of the new T2T-CHM13 reference genome for variant calling.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/\" \/>\n<meta property=\"og:site_name\" content=\"Terra\" \/>\n<meta property=\"article:published_time\" content=\"2021-11-11T15:30:20+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-27T04:55:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"627\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Samantha Zarate\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Samantha Zarate\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/\"},\"author\":{\"name\":\"Samantha Zarate\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/2868cbbd8a3e4e43bfbea7f6d84bf08e\"},\"headline\":\"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference\",\"datePublished\":\"2021-11-11T15:30:20+00:00\",\"dateModified\":\"2023-12-27T04:55:07+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/\"},\"wordCount\":2795,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/terra.bio\/#organization\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg\",\"keywords\":[\"1000 genomes\",\"1kgp\",\"anvil\"],\"articleSection\":[\"Analysis\",\"Ecosystem\",\"Genomics\",\"Guest Author\",\"Most Popular\",\"Most Recent\",\"Publications\",\"Testimonials\",\"Workflows\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/\",\"url\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/\",\"name\":\"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference - Terra\",\"isPartOf\":{\"@id\":\"https:\/\/terra.bio\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg\",\"datePublished\":\"2021-11-11T15:30:20+00:00\",\"dateModified\":\"2023-12-27T04:55:07+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#primaryimage\",\"url\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg\",\"contentUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg\",\"width\":1200,\"height\":627,\"caption\":\"Telomere concept\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/terra.bio\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/terra.bio\/#website\",\"url\":\"https:\/\/terra.bio\/\",\"name\":\"Terra\",\"description\":\"Science at Scale\",\"publisher\":{\"@id\":\"https:\/\/terra.bio\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/terra.bio\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/terra.bio\/#organization\",\"name\":\"Terra\",\"url\":\"https:\/\/terra.bio\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp\",\"contentUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp\",\"width\":287,\"height\":318,\"caption\":\"Terra\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/2868cbbd8a3e4e43bfbea7f6d84bf08e\",\"name\":\"Samantha Zarate\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/a41b467e954f353b41286328cf490edab78ab2ceab0d81a41bc81818d2fb6078?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/a41b467e954f353b41286328cf490edab78ab2ceab0d81a41bc81818d2fb6078?s=96&d=mm&r=g\",\"caption\":\"Samantha Zarate\"},\"url\":\"https:\/\/terra.bio\/author\/samanthaz\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference - Terra","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/","og_locale":"en_US","og_type":"article","og_title":"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference - Terra","og_description":"Samantha Zarate of the Schatz Lab takes us behind the scenes of the large-scale analysis that demonstrated the benefits of the new T2T-CHM13 reference genome for variant calling.","og_url":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/","og_site_name":"Terra","article_published_time":"2021-11-11T15:30:20+00:00","article_modified_time":"2023-12-27T04:55:07+00:00","og_image":[{"width":1200,"height":627,"url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg","type":"image\/jpeg"}],"author":"Samantha Zarate","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Samantha Zarate","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#article","isPartOf":{"@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/"},"author":{"name":"Samantha Zarate","@id":"https:\/\/terra.bio\/#\/schema\/person\/2868cbbd8a3e4e43bfbea7f6d84bf08e"},"headline":"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference","datePublished":"2021-11-11T15:30:20+00:00","dateModified":"2023-12-27T04:55:07+00:00","mainEntityOfPage":{"@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/"},"wordCount":2795,"commentCount":0,"publisher":{"@id":"https:\/\/terra.bio\/#organization"},"image":{"@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#primaryimage"},"thumbnailUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg","keywords":["1000 genomes","1kgp","anvil"],"articleSection":["Analysis","Ecosystem","Genomics","Guest Author","Most Popular","Most Recent","Publications","Testimonials","Workflows"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/","url":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/","name":"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference - Terra","isPartOf":{"@id":"https:\/\/terra.bio\/#website"},"primaryImageOfPage":{"@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#primaryimage"},"image":{"@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#primaryimage"},"thumbnailUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg","datePublished":"2021-11-11T15:30:20+00:00","dateModified":"2023-12-27T04:55:07+00:00","breadcrumb":{"@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#primaryimage","url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg","contentUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/telomere-seasons.jpg","width":1200,"height":627,"caption":"Telomere concept"},{"@type":"BreadcrumbList","@id":"https:\/\/terra.bio\/calling-variants-from-telomere-to-telomere-with-the-new-t2t-chm13-genome-reference\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/terra.bio\/"},{"@type":"ListItem","position":2,"name":"Calling variants from telomere to telomere with the new T2T-CHM13 genome reference"}]},{"@type":"WebSite","@id":"https:\/\/terra.bio\/#website","url":"https:\/\/terra.bio\/","name":"Terra","description":"Science at Scale","publisher":{"@id":"https:\/\/terra.bio\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/terra.bio\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/terra.bio\/#organization","name":"Terra","url":"https:\/\/terra.bio\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/#\/schema\/logo\/image\/","url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp","contentUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp","width":287,"height":318,"caption":"Terra"},"image":{"@id":"https:\/\/terra.bio\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/terra.bio\/#\/schema\/person\/2868cbbd8a3e4e43bfbea7f6d84bf08e","name":"Samantha Zarate","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/a41b467e954f353b41286328cf490edab78ab2ceab0d81a41bc81818d2fb6078?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a41b467e954f353b41286328cf490edab78ab2ceab0d81a41bc81818d2fb6078?s=96&d=mm&r=g","caption":"Samantha Zarate"},"url":"https:\/\/terra.bio\/author\/samanthaz\/"}]}},"_links":{"self":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts\/464","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/users\/30"}],"replies":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/comments?post=464"}],"version-history":[{"count":0,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts\/464\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/media\/467"}],"wp:attachment":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/media?parent=464"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/categories?post=464"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/tags?post=464"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}