Features Archives - Terra https://terra.bio/category/features/ Science at Scale Fri, 19 Sep 2025 21:29:54 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://terra.bio/wp-content/uploads/2023/12/Terra-Color-logo-300-150x150.pngFeatures Archives - Terrahttps://terra.bio/category/features/ 32 32 Update on Terra’s Strategy for a More Scalable Futurehttps://terra.bio/update-on-terras-strategy-for-a-more-scalable-future/ Fri, 19 Sep 2025 21:29:54 +0000 https://terra.bio/?p=1069In January 2025, we shared that we’ve been reimagining how research will be done in the next five to ten years and outlined some of our plans for future capabilities. Since then, we’ve been working closely with Manifold on a major upgrade to Terra—“Terra Powered by Manifold”—and we’re excited to share an update.  As a […]

The post Update on Terra’s Strategy for a More Scalable Future appeared first on Terra.

]]>
In January 2025, we shared that we’ve been reimagining how research will be done in the next five to ten years and outlined some of our plans for future capabilities. Since then, we’ve been working closely with Manifold on a major upgrade to Terra—“Terra Powered by Manifold”—and we’re excited to share an update. 

As a reminder, we won’t make any sudden changes to how your day-to-day work happens! We’ll share our timelines and give you a heads up on any user interface and functionality changes well in advance as we roll out the next-gen upgrades.

Platform capabilities coming soon

  • AWS support: Since a lot of the life sciences industry is on AWS, we have partnered closely with AWS to enable seamless data access and collaboration within an organization’s AWS cloud environments, and to enable taking advantage of the latest AWS services.
  • NextFlow support: Robust NextFlow pipelines that feature low-code workflow configuration, integration of community workflows, and comprehensive logging and debugging capabilities.
  • AI agents: AI agents that empower scientists to go faster with their data and tools, starting with a dataset chat and cohort building agents, and then looking ahead to an extensible ecosystem where the community can develop and deploy their own specialized agents to address unique research challenges.

Tools & datasets coming soon

  • Popular data science tools including CellDega, Imputation Services, and PRS developed at Broad—and not easily accessible on Terra today—will be coming soon as scientific apps on the platform.
  • Controlled datasets hosted by Broad on behalf of the NIH and others, previously not accessible on AWS, will be available to a broader range of researchers who are authorized to access them.

As a recap, Manifold is focused on maintaining and upgrading the core technology platform. Broad’s Data Sciences Platform (DSP) is building the next generation of scientific tools that will be made available to everyone via the platform. As part of the partnership with Manifold, members of the DSP team operating the original Terra platform have moved to Manifold while retaining their Broad access and collaborations. This team will focus on maintaining Terra while creating the next generation platform. This strategic step enables us to accelerate the pace at which the expanded platform capabilities, tools, and datasets become available to the Terra user community and for portals powered by Terra.

Thank you for making Terra a critical tool in so many groundbreaking projects. We’re excited to continue working with you to accelerate your science!

The post Update on Terra’s Strategy for a More Scalable Future appeared first on Terra.

]]>
Accelerating Discovery: Terra’s Strategy for a More Scalable Futurehttps://terra.bio/accelerating-discovery-terras-strategy-for-a-more-scalable-future/ Mon, 13 Jan 2025 10:00:00 +0000 https://terrabioappdev.wpenginepowered.com/?p=1009We’ve spent the last several months mapping out a strategy to evolve Terra for the future. Together with leading scientists at the Broad Institute and our collaborators, we’ve been imagining how research will be done in the next five to ten years, and how Broad’s Data Sciences Platform (DSP) can deliver on what you’ll need to drive discovery in the next decade.

The post Accelerating Discovery: Terra’s Strategy for a More Scalable Future appeared first on Terra.

]]>
It’s incredible that Terra, along with its predecessor, FireCloud, have been in development for over nine years. In that time, Terra has grown far beyond our original vision, playing a critical role in areas like public health surveillance, ‘omics’ data delivery, biobank data management, and rare disease diagnosis. 

We’ve spent the last several months mapping out a strategy to evolve Terra for the future. Together with leading scientists at the Broad Institute and our collaborators, we’ve been imagining how research will be done in the next five to ten years, and how Broad’s Data Sciences Platform (DSP) can deliver on what you’ll need to drive discovery in the next decade. In this blog post, we’re excited to share our plans and give a teaser on some of our future capabilities. Let’s start by looking at some of these capabilities for the biological research use case.

Future Platform Capabilities:

  • Supporting all major clouds: Organizations store scientific data in a variety of cloud platforms, but users shouldn’t have to worry about where the data resides. We need to support multiple clouds and make it easy for users to work across them cost-effectively.
  • Support for more workflow languages: Not all users write WDL. Nextflow’s popularity has soared, for example, so we want to support it while also making it easier to support any new workflow languages that come along. 
  • Scaling data tables and workflow execution: Over the years, Cromwell, Terra’s workflow execution engine, has done an impressive job handling massive scales of data. As data continue to grow and new types of data emerge, however, we need to scale even further. This is why we’re focused on building a platform that’s not only better at handling more data, but also evolves with the changing landscape of scientific research.
  • Advancing the researcher’s lifecycle with Al agents: Al agents could assist with cohort building, data analysis, scientific literature searches and much more. To improve the speed of research, we need to include more of these capabilities natively.
  • Enhanced data management capabilities: From ingestion and harmonization to discovery and access — organizing and managing data needs to be easier than it is today. 

As we look to build these capabilities in the years to come, we want to continue to work with innovative partners who can help accelerate engineering. To help bring these capabilities to life, we’re excited to announce a strategic collaboration with Manifold. Their technical expertise and shared vision will be crucial in helping us deliver on these goals and positioning us for a scalable and innovative future.

Manifold is a technology company focused on building advanced, modern cloud infrastructure for biomedical science. Manifold started in the cancer research space, where they demonstrated the effectiveness of their research cloud infrastructure in partnerships with leading organizations such as the American Cancer Society (ACS) and Indiana University. 

We’re working with Manifold on a new platform, one that’ll incorporate Terra’s features while adding new functionality to help address the needs discussed above. Like Terra, this new platform will act as a steward— not an owner —of scientific data, ensuring users retain full ownership and control over access, while providing high levels of data security. Manifold will focus on building the core platform infrastructure while we in Broad’s DSP will develop advanced open-source analysis tools and capabilities at the cutting edge of biomedical science and then make them available to everyone via the platform. You can read more about this collaboration in a press release we shared earlier today.

And, we remain hard at work improving the Terra platform that you use today to support your science. For example, soon you’ll be able to cap the cost of a workflow—allowing you to have more control over your spend in Terra when running workflows. To read more about other features that are coming, check out our new public roadmap. The goal of the roadmap is to invite your feedback and ideas on what we’re building, while also clearly outlining our upcoming plans. You can even sign up to be an early tester of new features.

We’re excited to collaborate more with all of you — and Manifold — in 2025. Thank you for making Terra a critical tool for so many groundbreaking projects. Check out our FAQs about this collaboration and if you have any other questions, please feel free to reach us at support@terra.bio.

Happy New Year!

The post Accelerating Discovery: Terra’s Strategy for a More Scalable Future appeared first on Terra.

]]>
GPU Virtual Machine availability on Terra on Microsoft Azurehttps://terra.bio/gpu-virtual-machine-availability-on-terra-on-microsoft-azure/ https://terra.bio/gpu-virtual-machine-availability-on-terra-on-microsoft-azure/#respond Thu, 13 Jul 2023 12:00:49 +0000 https://terrabioappdev.wpenginepowered.com/gpu-virtual-machine-availability-on-terra-on-microsoft-azure/Erdal Cosgun-co author of this blog-  is the Lead Data Scientist in the Microsoft Genomics Team and working on the Interactive Analysis component of Terra on Microsoft Azure.   Terra on Microsoft Azure cloud computing continues to evolve to meet the demand for high-performance virtual machines (VMs) capable of handling intensive workloads. We’re thrilled to […]

The post GPU Virtual Machine availability on Terra on Microsoft Azure appeared first on Terra.

]]>
Erdal Cosgun-co author of this blog-  is the Lead Data Scientist in the Microsoft Genomics Team and working on the Interactive Analysis component of Terra on Microsoft Azure.


 

Terra on Microsoft Azure cloud computing continues to evolve to meet the demand for high-performance virtual machines (VMs) capable of handling intensive workloads. We’re thrilled to announce the availability of NCSv3 (NC6s_v3 and NC12s_v3) series GPU VMs, which bring unprecedented computational power to Terra, making it ideal for a wide range of applications such as Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) for biomedical analysis.

Harnassing the power of NCSv3-series GPU VMs

NCSv3-series GPU VMs provide an exceptional level of performance by leveraging NVIDIA’s advanced GPU technology. You can take advantage of these updated GPUs powered by NVIDIA Tesla V100 GPUs on Terra for traditional High Performance Computing (HPC) workloads such as reservoir modeling, DNA sequencing, protein analysis, Monte Carlo simulations, and others. Working in Terra lets you easily access several datasets in the cloud such as Microsoft Genomics Data Lake without having to compete for fixed HPC resources. In addition to GPUs, the NCv3-series VMs are also powered by Intel Xeon E5-2690 v4 (Broadwell) CPUs. [Ref]

 

Accelerate model training and inferencing

Whether training neural networks or deploying real-time inferencing solutions, these virtual machines accelerate model training and inferencing tasks on Terra. With the NCSv3-series GPUs, you can bring your ML/DL models to real-life problems faster, empowering breakthroughs in biomedical research. 

A new era of High-Performance Computing (in the cloud)

The availability of NCSv3-series GPU VMs on Terra on Microsoft Azure brings a new era of high-performance computing to the biomedical computing landscape. With their exceptional GPU capabilities and extensive memory, these virtual machines let you tackle the most demanding workloads – from AI and ML to deep learning – all within the ecosystem of Terra on Microsoft Azure.

Try it out in a Microsoft Azure Public Workspace 

To get started, try our featured workspace, which includes a sample notebook for training the deep learning models. As of July 2023, running this notebook on the NC6s_v3 GPU VM in your own Terra on Microsoft Azure workspace costs about $3.06 per hour with the included data.Please use Microsoft Azure Pricing calculator for the latest price information. 

Important Note: Be sure to check GPU quotas from the Azure portal before deploying the GPU VMs on Terra on Microsoft Azure. Please check the Terra Support Article for step-by-step instructions as well as caveats and troubleshooting about quota increase requests.

View quotas – Azure Quotas | Microsoft Learn

 

The post GPU Virtual Machine availability on Terra on Microsoft Azure appeared first on Terra.

]]>
https://terra.bio/gpu-virtual-machine-availability-on-terra-on-microsoft-azure/feed/ 0
Celebrating a year of progress — and a sneak peek at what’s coming nexthttps://terra.bio/celebrating-a-year-of-progress-and-a-sneak-peek-at-whats-coming-next/ https://terra.bio/celebrating-a-year-of-progress-and-a-sneak-peek-at-whats-coming-next/#respond Thu, 15 Dec 2022 17:29:02 +0000 https://terrabioappdev.wpenginepowered.com/celebrating-a-year-of-progress-and-a-sneak-peek-at-whats-coming-next/Highlights from Terra's development and growth in 2022, heading into the multi-cloud future of 2023.

The post Celebrating a year of progress — and a sneak peek at what’s coming next appeared first on Terra.

]]>
Kyle Vernest is Head of Product in the Data Sciences Platform at the Broad Institute. In this guest blog post, Kyle takes a look back at how Terra has grown over the past year, and gives us a preview of what to expect in the first quarter of 2023. 


 

It’s been an incredible year for Terra, with a lot of new users coming to the platform as more labs, groups, and organizations move their computational work to the cloud. We’re also thrilled to see user growth being fueled by scientific consortia such as the Human Cell Atlas, and NIH-driven programs such as AnVIL, rallying their communities around Terra as a platform for secure data sharing and collaboration. 

The Terra development teams spanning the Broad Institute, Microsoft, and Verily have worked tirelessly to continue to expand the platform’s capabilities in service of these growing communities. Highlights of the year’s releases include an improved user interface for managing cloud environments for interactive analysis, increased scalability of the workflow management system, and better tooling for uploading and organizing data in workspaces. We also rolled out numerous useability improvements, like email notifications for workflow status and better organization of the list of workspaces. Most recently, we launched the public preview of the Terra Data Repository, a new component of the Terra platform designed to provide data storage and access management capabilities tailored for the life sciences.  

Yet all these upgrades are in many ways only the tip of the iceberg. Behind the scenes, an enormous amount of work has gone into laying the groundwork for a major development that will come to fruition in the first quarter of 2023: support for storing data and running analyses on Microsoft Azure. 

 

Coming soon to a cloud near you

We have been working closely with our partners at Microsoft to expand Terra to a multi-cloud offering, and we are nearing the launch of Terra on Azure coming early in the new year. Leading up to the launch, you may notice a new “Sign in with Microsoft” option on the Terra welcome screen (which will take you to a “Coming Soon” page until the preview phase starts). 

But don’t worry if you’re planning to stick with Terra on Google; we have plenty of upgrades in store for you as well! In particular, you can look forward to taking advantage of WDL 1.1’s workflow language updates, and switching from Jupyter Notebook to JupyterLab for a more full-featured code development experience.

Whether you’re using Terra on Google or on Azure, you’ll be presented with a new version of the Terra Terms of Service, which we’ve updated to reflect the expanded functionality and new multi-cloud nature of the platform.

☁

Finally, as we close out this brief tour of the year’s achievements, we’re especially proud to celebrate the many scientific successes that Terra has already enabled. These have covered an impressive range of domains, from the Telomere-to-Telomere reference genome project to the CDC’s efforts to empower public health labs across the country to adopt genomics for biosurveillance. We look forward to many more in the coming year, featuring even greater variety — including more ‘omics data technologies beyond genomics.

 

 

The post Celebrating a year of progress — and a sneak peek at what’s coming next appeared first on Terra.

]]>
https://terra.bio/celebrating-a-year-of-progress-and-a-sneak-peek-at-whats-coming-next/feed/ 0
Introducing Terra Data Repository public previewhttps://terra.bio/introducing-terra-data-repository-public-preview/ https://terra.bio/introducing-terra-data-repository-public-preview/#respond Thu, 08 Dec 2022 18:44:51 +0000 https://terrabioappdev.wpenginepowered.com/introducing-terra-data-repository-public-preview/Discover the newest component of the Terra platform, designed to provide data storage and access management capabilities tailored for the life sciences.

The post Introducing Terra Data Repository public preview appeared first on Terra.

]]>
Jonathan Lawson is a Senior Software Product Manager in the Broad Institute Data Sciences Platform, overseeing data management products including the Terra Data Repository and the Data Use Oversight System. In this guest blog post, Jonathan announces the public preview phase of the Terra Data Repository, a new component of the Terra platform designed to provide data storage and access management capabilities tailored for the life sciences.


 

Life sciences research has entered an age of extraordinary opportunity thanks to the rapid technological developments of the past decade. We are now able to generate vast amounts of molecular information, such as genomic sequencing, and we can put that molecular data in the context of phenotypes and clinical history to probe the biology of both health and disease in unprecedented detail. These capabilities are already starting to revolutionize how we approach everything from fundamental research into population genetics to diagnostics and drug development.

Yet these technological prowesses also bring forth new technical challenges. The resulting datasets are complex, combining enormous files of molecular data with structured information —such as phenotypic data— that is best stored in database form. In addition, data assets collected from human participants are subject to various constraints with regard to how they can be shared, and with whom. 

Solving this challenge calls for data storage and sharing solutions that empower data owners and custodians to make their datasets available for analysis to the research community securely, responsibly and effectively.

Today, we are excited to introduce the Terra Data Repository (TDR), a new component of the Terra platform designed to provide data storage and access management capabilities tailored for the life sciences. It is already actively being used for large collaborative projects including the Human Cell Atlas and the NHGRI’s AnVIL. 

The system supports using formal schemas to represent relationships between different data entities, and generating versioned snapshots that can be used to grant collaborators access to specific subsets of data depending on research purpose and authorizations. Data snapshots are immutable, making it possible to release continuous updates to datasets while ensuring reproducibility of analyses over time. 

For a complete overview of features, usage instructions and detailed technical information, please visit the TDR documentation in the Terra knowledge base. 

The Terra Data Repository is available as a public preview to all registered users of Terra. Please note that the graphical user interface is still under active development, and many operations can currently only be performed through API calls. During this time, we recommend reaching out to the Terra support team to discuss whether the Terra Data Repository might be a good fit for your project’s needs.

 

The post Introducing Terra Data Repository public preview appeared first on Terra.

]]>
https://terra.bio/introducing-terra-data-repository-public-preview/feed/ 0
Ten simple rules — #6 Version both software and datahttps://terra.bio/ten-simple-rules-6-version-both-software-and-data/ https://terra.bio/ten-simple-rules-6-version-both-software-and-data/#respond Thu, 10 Nov 2022 19:39:33 +0000 https://terrabioappdev.wpenginepowered.com/ten-simple-rules-6-version-both-software-and-data/Bringing workflows under version control to make large-scale data processing reproducible; inspired by “Ten simple rules for large-scale data processing” (Fungtammasan 2022).

The post Ten simple rules — #6 Version both software and data appeared first on Terra.

]]>
This blog post is part of a series based on the paper “Ten simple rules for large-scale data processing” by Arkarachai Fungtammasan et al. (PLOS Computational Biology, 2022). Each installment reviews one of the rules proposed by the authors and illustrates how it can be applied when running workflows in Terra. In this installment, we take a look at version control across a range of components including tools, dependencies, workflow scripts and data resources.


 

Version control is one of those technical concepts that’s obviously a good idea yet can be really tricky to do correctly. And as much as it has become an established practice for most computational scientists, many tend to underestimate the scope of what should be version-controlled.

(If you’ve never heard of version control or would like a refresher tailored for scientists, check out the Software Carpentry lesson on version control with Git.)

In this sixth rule, Arkarachai Fungtammasan and colleagues rightly emphasize that it’s not enough to control the version of the main software code and tools involved in analysis:

“Applying version control to all code is always recommended for reproducible research. In the context of a large-scale data analysis, we need to go beyond this initial step […].”

Indeed, the tools researchers use directly — those that feature most obviously in command lines and in scripts — typically rely on other, less visible components, or dependencies. Changes to those dependencies can affect analysis results, so it’s important to ensure specific versions are used rather than “the latest available”. 

The authors also call out workflow scripts and data resources such as genome builds as components that should be carefully version-controlled.

 

Version-controlled workflows in Terra

Terra’s workflow execution system is designed specifically to enable version control at multiple levels, with minimal effort on the part of pipeline developers and end users. 

 

Workflow scripts

“When multiple processing steps are combined into workflow (Rule 4), the workflow itself should be versioned.”

The WDL workflow scripts used in Terra are held in version controlled repositories — either the built-in Broad Methods Repository, which supports version control through the concept of “snapshots”, or the external Dockstore repository, which offers workflow-specific versioning features backed by the industry-standard code versioning of GitHub

 

Tools, dependencies and computing environment

“To guard against interruptions, all dependencies should be pinned to a specific version (ideally through a version control hash or equivalent) that has been thoroughly tested (Rule 5). […] Utilizing container technology is highly encouraged, if allowed in the system, to guarantee the computing environment for processing.”

Within a WDL workflow, each individual analysis task specifies a Docker container that encapsulates all software tools and dependencies involved. Workflow developers and users can specify the exact version of the container using a unique identifier that ensures absolute reproducibility of the computing environment. 

 

Data resources

“In biomedical data analysis, there are often components beyond the software and related dependencies that need to be included in the reproducibility plan. For instance, if the data processing relies on a genome build, using the most recent build and release in the pipeline will be insufficient. Instead, the processing needs to be tied to a specific build and release, much like the dependencies in the pipeline.”

Terra workspaces provide data management features that include data manifests (see “data tables”), the ability to load versioned genome reference builds, and a system of key-value pairs that can be used to specify and label custom data resources for use in workflow configurations.

 

It’s worth noting that the workflow logging system also provides features that support version control by making it possible to go back and look at configuration details for past analyses, as we touched on in Rule 2 (Document Everything).

To learn more about making effective use of version control for large-scale data processing in Terra, please read “How does pipeline versioning work?” in the Terra knowledge base.

 

 

The post Ten simple rules — #6 Version both software and data appeared first on Terra.

]]>
https://terra.bio/ten-simple-rules-6-version-both-software-and-data/feed/ 0
Paper Spotlight: Germline predisposition to pediatric Ewing sarcomahttps://terra.bio/paper-spotlight-germline-predisposition-to-pediatric-ewing-sarcoma/ https://terra.bio/paper-spotlight-germline-predisposition-to-pediatric-ewing-sarcoma/#respond Thu, 03 Nov 2022 18:13:27 +0000 https://terrabioappdev.wpenginepowered.com/paper-spotlight-germline-predisposition-to-pediatric-ewing-sarcoma/This pediatric oncology study used Terra to identify germline variants contributing to Ewing sarcoma pathogenesis in 1,147 individuals with pediatric sarcoma diagnoses.

The post Paper Spotlight: Germline predisposition to pediatric Ewing sarcoma appeared first on Terra.

]]>
This blog is part of our Paper Spotlight series, which features peer-reviewed research publications involving work done in Terra and highlights how the analysis methods were applied.


 

Germline predisposition to pediatric Ewing sarcoma is characterized by inherited pathogenic variants in DNA damage repair genes

By Riaz Gillani, Sabrina Y. Camp, Seunghun Han, Jill K. Jones, Hoyin Chu, Schuyler O’Brien, Erin L. Young, Lucy Hayes, Gareth Mitchell, Trent Fowler, Alexander Gusev, Junne Kamihara, Katherine A. Janeway, Joshua D. Schiffman, Brian D. Crompton, Saud H. AlDubayan and Eliezer M. Van Allen

The American Journal of Human Genetics (2022) https://doi.org/10.1016/j.ajhg.2022.04.007

Abstract: More knowledge is needed regarding germline predisposition to Ewing sarcoma to inform biological investigation and clinical practice. Here, we evaluated the enrichment of pathogenic germline variants in Ewing sarcoma relative to other pediatric sarcoma subtypes, as well as patterns of inheritance of these variants. We carried out European-focused and pan-ancestry case-control analyses to screen for enrichment of pathogenic germline variants in 141 established cancer predisposition genes in 1,147 individuals with pediatric sarcoma diagnoses (226 Ewing sarcoma, 438 osteosarcoma, 180 rhabdomyosarcoma, and 303 other sarcoma) relative to identically processed cancer-free control individuals. Findings in Ewing sarcoma were validated with an additional cohort of 430 individuals, and a subset of 301 Ewing sarcoma parent-proband trios was analyzed for inheritance patterns of identified pathogenic variants. A distinct pattern of pathogenic germline variants was seen in Ewing sarcoma relative to other sarcoma subtypes. FANCC was the only gene with an enrichment signal for heterozygous pathogenic variants in the European Ewing sarcoma discovery cohort (three individuals, OR 12.6, 95% CI 3.0–43.2, p = 0.003, FDR = 0.40). This enrichment in FANCC heterozygous pathogenic variants was again observed in the European Ewing sarcoma validation cohort (three individuals, OR 7.0, 95% CI 1.7–23.6, p = 0.014), representing a broader importance of genes involved in DNA damage repair, which were also nominally enriched in individuals with Ewing sarcoma. Pathogenic variants in DNA damage repair genes were acquired through autosomal inheritance. Our study provides new insight into germline risk factors contributing to Ewing sarcoma pathogenesis.


 

What part of the work was done in Terra?

Excerpts from the paper’s Methods section:

Raw sequencing data was downloaded to Terra (https://firecloud.terra.bio/), a collaborative cloud-computing platform utilized for genomic analyses, developed as part of the NCI Cloud Pilot program and supported by the Broad Institute.

Personal communication from first author Riaz Gillani:

We used Terra for many parts of the analysis, including harmonizing raw sequencing data, calling variants, inferring ancestry, running quality control, and counting variants. Some of the key public workflows that enabled this were DeepVariant, BAMRealigner, and various adaptations of the GATK workflows.

Links to the relevant public workflows:

https://portal.firecloud.org/?return=firecloud#methods/vanallenlab/deepvariant_wes/

https://portal.firecloud.org/?return=firecloud#methods/GPTAG/BamRealigner/

 

How did they do it?

Automated workflows 

The authors used previously described bioinformatics analysis pipelines implemented as WDL workflows and shared in the Broad Methods Repository

Terra also supports importing workflows from Dockstore, a free and open source platform for sharing reusable and scalable analytical tools and workflows. 

They ran the workflows at scale using Terra’s workflow execution service

To try your hand at running a workflow in Terra, check out the Workflows Quickstart Guide

Interactive analysis

The authors used Jupyter Notebooks in Terra’s interactive Cloud Environments system for the ancestry inference analysis. 

To get started with Jupyter Notebooks in Terra, check out the Notebooks Quickstart Guide.

 

 

 

The post Paper Spotlight: Germline predisposition to pediatric Ewing sarcoma appeared first on Terra.

]]>
https://terra.bio/paper-spotlight-germline-predisposition-to-pediatric-ewing-sarcoma/feed/ 0
Run your notebooks programmatically in the cloudhttps://terra.bio/run-your-notebooks-programmatically-in-the-cloud/ https://terra.bio/run-your-notebooks-programmatically-in-the-cloud/#respond Thu, 20 Oct 2022 18:20:37 +0000 https://terrabioappdev.wpenginepowered.com/run-your-notebooks-programmatically-in-the-cloud/Verily software engineers share tips and resources for running Jupyter notebooks programmatically without writing any code.

The post Run your notebooks programmatically in the cloud appeared first on Terra.

]]>
John Bates is a software engineer at Verily. In this guest blog post, he and his fellow software engineer Nicole Deflaux share two solutions they developed for running Jupyter notebooks programmatically in support of their own analysis work. 


 

Our team makes extensive use of Jupyter Notebooks for developing new analyses, because they enable us to iterate very quickly and collaboratively in an interactive environment. 

However, we have found there are certain situations where we want to run a notebook programmatically — meaning, just launch the entire analysis with a single command, without having to manually open the notebook and run cells.

  • To run a notebook with a known, clean virtual machine configuration to confirm it has no unresolved dependencies on locally installed Python packages, R packages, or on local files.
  • To run a notebook with many different sets of parameters, all in parallel.
  • To execute a long-running notebook (e.g., taking hours or even days) on a machine separate from where you are working interactively.
  • To automate an analysis that was developed in a notebook without porting it to a workflow.

 

Fortunately, there is a command-line tool called Papermill that makes it possible to parameterize and execute Jupyter Notebooks programmatically. So all you really need to achieve these goals from within Terra is to devise a way to launch the Papermill command on a clean virtual machine.

We recently developed a pair of approaches to do exactly that, using either the Workflow execution system or the Terminal in a Cloud Environment. This has been very useful for our team, so we created a public workspace that demonstrates how you too can do this with minimal effort.

☁

The Workflow approach uses a single-task WDL script, notebook_workflow.wdl, that we wrote to serve as a lightweight wrapper for the Papermill command. You can submit this WDL through Terra’s Workflows execution interface as usual, specifying as inputs the path to the notebook file you want to run programmatically as well as the environment container to use for testing, and any number of other relevant parameters. 

 

 

The output of this workflow is a copy of the original notebook, fully executed and rendered in html, along with any files generated by the notebook execution itself.

☁

In contrast, the Terminal option uses dsub, a Google Cloud tool that was developed for submitting and running batch scripts in the cloud. The basic idea behind dsub is to emulate the experience of using high-performance computing job schedulers like Grid Engine and Slurm, which allow you to write a script and then submit it to a job scheduler from a shell prompt on your local machine. You can then disconnect from the shell, go about your business, then later come back and query the status of your job using a predefined command generated at submission time. 

You can use this tool in Terra by launching a Jupyter Cloud Environment (Python kernel), which includes a built-in Terminal app that you can fire up by clicking on its icon in the right-hand toolbar. Once you’ve installed dsub and its dependencies into your environment, you can run dsub commands to submit jobs to Google Cloud as if it were your local compute server. 

For the purpose of running notebooks programmatically, you need to run a dsub command that will in turn launch the desired Papermill command, with the appropriate inputs and environment configuration. This may sound complicated, but the actual command that you will run in the Terminal is short and straightforward: to keep things simple, we wrote a Python script called dsub_notebook.py that wraps all the functionality you need to configure, launch and monitor the Papermill job through dsub. All you need to do is adapt the command with your input notebook and any appropriate parameters, and run it in the Terminal of your Python Cloud Environment. 

 

 

This produces the same outputs as the Workflows option: a copy of the original notebook, fully executed and rendered in html, along with any files generated by the notebook execution itself.

☁

You can find a detailed tutorial with step by steps instructions in the public workspace that we created to demonstrate how this works in practice. The tutorial includes an example notebook parameterized with Papermill, with three choices of input datasets, as well as a setup notebook to install dsub into your environment quickly and painlessly. 

We hope you will find this resource useful and would love to hear your feedback on how we could make it even better, either in the public Terra forum or privately through the helpdesk. You can also open an issue in the terra-examples repository to report a problem or discuss a technical aspect of the scripts.

 


Resources

 

 

The post Run your notebooks programmatically in the cloud appeared first on Terra.

]]>
https://terra.bio/run-your-notebooks-programmatically-in-the-cloud/feed/ 0
Get trained on Terra with our interactive online workshopshttps://terra.bio/get-trained-on-terra-with-our-interactive-online-workshops/ https://terra.bio/get-trained-on-terra-with-our-interactive-online-workshops/#respond Thu, 13 Oct 2022 18:58:25 +0000 https://terrabioappdev.wpenginepowered.com/get-trained-on-terra-with-our-interactive-online-workshops/In response to popular demand, our User Education team has been running weekly interactive workshops to help newcomers get started with Terra. They’ve also developed some satellite workshops that use Terra to teach specific applications like whole genome analysis with DRAGEN-GATK.  These online events are open to all and are entirely free: registered participants even […]

The post Get trained on Terra with our interactive online workshops appeared first on Terra.

]]>
In response to popular demand, our User Education team has been running weekly interactive workshops to help newcomers get started with Terra. They’ve also developed some satellite workshops that use Terra to teach specific applications like whole genome analysis with DRAGEN-GATK. 

These online events are open to all and are entirely free: registered participants even receive access to a cloud billing project, so everyone can follow along and do the exercises without having to jump through any administrative hoops! 

As you work through the hands-on exercises, you’ll be able to ask questions in real time and interact directly with our experts, developers and trainers. 

Whether you’re completely new to the platform, or you’re already using it but looking to get more formal training on how to utilize its features effectively, this is a great opportunity to level up your skills in a friendly, inclusive online environment. 

 

Introduction to Terra: A scalable platform for biomedical research

Online workshop hosted by the Terra team. Register today to receive the Zoom invite!

This interactive workshop consists of two 2-hour sessions held on consecutive days covering the following topics through demos and hands-on exercises:

  • Terra architecture as it relates to cloud-based data sets, tools, and computing resources
  • How data is organized in Terra
  • How to run a WDL workflow to automate analysis
  • How to launch a Cloud Environment from your Terra workspace to run interactive analysis tools like Jupyter Notebooks, RStudio, and Galaxy

Available dates: 

  • October 19 and 20, 10:00 am – 12:00 pm ET
  • October 31 and November 1, 10:00 am – 12:00 pm ET

For more information, visit the Introduction to Terra event page.

 

DRAGEN-GATK Webinar

Online workshop hosted by the GATK team. Register today to receive the Zoom invite!

This interactive workshop consists of a 2-hour session on Whole-Genome Analysis with DRAGEN-GATK. You will learn about DRAGEN-GATK — what it is, why it exists, and how to use it. Then, you’ll get an opportunity to use DRAGEN-GATK in a controlled demo environment within Terra.

Date: October 21, 10:30AM – 12:30PM ET

For more information, visit the DRAGEN-GATK Webinar event page.

 

We hope you will join us for one or more of these online training events! If you’d like to get notified when new events are scheduled, go to the Upcoming Events page of the Terra support website and hit the “Follow” button. 

Make sure to also check out our library of supporting materials and recordings from past events.

The post Get trained on Terra with our interactive online workshops appeared first on Terra.

]]>
https://terra.bio/get-trained-on-terra-with-our-interactive-online-workshops/feed/ 0
New in Terra: Get email notifications for your workflow statushttps://terra.bio/new-in-terra-get-email-notifications-for-your-workflow-status/ https://terra.bio/new-in-terra-get-email-notifications-for-your-workflow-status/#respond Tue, 13 Sep 2022 17:57:05 +0000 https://terrabioappdev.wpenginepowered.com/new-in-terra-get-email-notifications-for-your-workflow-status/New notification system emails you when your workflow submission status changes to "Succeeded", "Aborted" or "Failed".

The post New in Terra: Get email notifications for your workflow status appeared first on Terra.

]]>
One of the great advantages of executing your analyses in the form of automated workflows is that you can launch a whole pile of work and walk away while the machines do their thing. But how do you know when to come back for the results?

That’s right, you sign up for email notifications. In response to popular demand, the SUE team has implemented a notification system that can email you when your workflow submission status changes to “Succeeded”, “Aborted” or “Failed”. 

 

 

To be clear, you get one email per submission (i.e. per group of workflow runs launched at the same time), not per individual workflow run — so you don’t have to worry about your inbox getting flooded if you’re launching a workflow on 10,000 samples to be processed in parallel.

 

Control what you receive

The workflow submission status notifications are enabled by default; you don’t need to do anything to opt into receiving them. If you don’t want to receive these notifications, you can opt out of getting them on a per-workspace basis. Just head over to your Profile page, which now includes a new Notifications tab that houses all notification-related settings, and toggle the checkbox for each workspace accordingly. 

 

 

For more details about this new feature, see the workflow submission documentation

We hope you will find this new feature useful and would love to hear your feedback on how we could make it even better, either in the public Terra forum or privately through the helpdesk as you prefer. 

 

 

The post New in Terra: Get email notifications for your workflow status appeared first on Terra.

]]>
https://terra.bio/new-in-terra-get-email-notifications-for-your-workflow-status/feed/ 0