Robert Majovski, Author at Terra https://terra.bio/author/rmajovsk/ Science at Scale Wed, 27 Dec 2023 04:55:09 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://terra.bio/wp-content/uploads/2023/12/Terra-Color-logo-300-150x150.pngRobert Majovski, Author at Terrahttps://terra.bio/author/rmajovsk/ 32 32 8 great resources for learning to use Terra effectivelyhttps://terra.bio/8-great-resources-for-learning-to-use-terra-effectively/ https://terra.bio/8-great-resources-for-learning-to-use-terra-effectively/#respond Thu, 18 Nov 2021 21:31:28 +0000 https://terrabioappdev.wpenginepowered.com/8-great-resources-for-learning-to-use-terra-effectively/Lead Educator Robert Majovski presents a listicle of 8 key resources that can help Terra newcomers and veterans alike get the most out of the Terra platform.

The post 8 great resources for learning to use Terra effectively  appeared first on Terra.

]]>
Robert Majovski leads the User Education team in the Data Sciences Platform at the Broad Institute. His team is responsible for developing tutorials, documentation and other supporting educational materials for DSP products and services including Terra. In this listicle-style blog post, Robert goes over key resources that can help Terra newcomers and veterans alike get the most out of the platform.


 

1 — The intro video from the Terra.bio homepage

If you’ve only just recently heard of Terra, you probably have a ton of questions. The fastest way to get the gist of what Terra is for is to watch the intro video on the Terra.bio homepage (or on Youtube), which explains why biomedical data is moving to the cloud and how Terra helps researchers work productively in this new world order. 

While you’re there, you may also be interested in reading about who makes Terra and what it takes to get started.

 

2 — The “Introduction to Terra” course on Leanpub

Speaking of getting started, my team recently published an introductory course on the Leanpub online learning platform. This first course is focused on introducing the Terra platform, and consists of reading assignments, videos and quizzes that walk you through foundational topics. By the end of it, you should be able to:

  1. Describe Terra’s goals and its guiding principles,
  2. Name the types of analysis that you can do in Terra,
  3. Define key cloud computing components that will enable you to work in Terra,
  4. Securely access Terra with your own account,
  5. Access Terra support to learn more and get help when you need it,
  6. Articulate your next steps to getting started with Terra.


The course is completely free and available on-demand; all you need to do to access it is create a free account on leanpub.com and check out the
Introduction to Terra course. 

UPDATE (Apr 2022): We have released several additional courses and will continue to develop more courses to cover progressively more advanced topics, so stay tuned for updates. You can find all our courses at https://leanpub.com/universities/terra

 

3 — Plenty of useful videos on Terra’s YouTube channel

A big part of learning to use Terra effectively is just getting used to the interface, knowing where to find what you need and what are the key steps involved in accessing data and proceeding to analysis. What better way to learn these things than by watching them play out on-screen? 

The Terra channel on Youtube offers a great collection of videos organized into playlists, from short-form Quick Tips, TerraBites demos and Getting Started tutorials, to long-form Webinars, Workshops, and more.

 

Our video collection has grown substantially since the last time we blogged about it, and it will undoubtedly continue to grow with the platform’s capabilities. Be sure to check it out and subscribe to be notified when we post new content.

 

4 — The Terra showcase

Another great way to grok what Terra can be used for — and how to make the most of its powerful features — is to browse the public workspaces that are featured in the Terra showcase.

The showcase is a searchable catalog of workspaces that contain fully configured analyses and example data across a variety of topics and research domains, currently including medical and population genomics, single-cell transcriptomics, cancer multi-omics, and viral genomics. These workspaces have been contributed by a variety of groups including tool developers, data generators and researchers as a way to share their tools, data and research findings in a fully reproducible way

You can simply browse the contents of any workspace of interest to inspect how the analyses it demonstrates are set up, or you can clone it if you’d like to run through and potentially adapt the analyses for your own purposes. 

In that spirit, the showcase also includes tutorial workspaces developed by our team that are specifically designed to teach newcomers how to use Terra features with hands-on, step by step instructions (currently covering Data Tables, Workflows and Notebooks). 

Public workspace summaries are visible to everyone without login. Browsing the full workspace contents requires being logged in (registration is free).

 

5 — The Terra User Guide search box

Once you’re elbows deep in data and plowing through your analysis, you’re bound to come across one-off questions about very specific topics. Your best bet to find answers fast for these is to head over to the User Guide and use the search box to pull up the relevant documentation. 

Of course, you can also browse the documentation by topic if you’re the type who enjoys reading the phone book. 

Note to younger readers: it used to be a thing to print massive books that listed everyone’s phone numbers and give copies to everyone. There was also a version that listed shops and companies, which you would use to look up plumbers and bakeries. You got a new edition every year so you’d use the old ones as makeshift booster seats for little kids, or to make papier maché. Wild, eh?

 

6 — Terra support (helpdesk and community forum)  

If you get stumped, don’t hesitate to reach out to our friendly support team, either privately through the Contact Us form (find it under Support in the app’s left-side expandable menu) or in the Community Forum where others can chime in with their own tips and perspectives.

 

7 — The Terra blog and newsletter

The Terra blog covers topics ranging from feature updates and events to research spotlights and news about what’s going on in the wider ecosystem. Many of the stories we publish are contributed by guest authors who share their experiences and highlight resources for working effectively with Terra and the cloud data ecosystem.  

With a new blog post coming out every week, it’s a great way to stay abreast of recent developments, discover practical tips and tricks, and get inspired. You can subscribe to the blog to get notified when we publish new stories using the form on the blog homepage. Or, if a weekly cadence is a bit too much for your taste, why not subscribe to the monthly newsletter instead, which provides a recap of the month’s content with handy links to read the full story at your convenience.

 

8 — The Genomics in the Cloud book

Finally, in addition to all these great free resources, you may also be interested in a book written by two of our colleagues, Geraldine Van der Auwera and Brian O’Connor, published by O’Reilly Media in 2020. The book provides a gentle introduction to the field of human genomics as well as practical, hands-on instructions for performing genomics analysis on the cloud with GATK, WDL and Terra. Chapters 11 through 13 in particular provide a thorough introduction to the Terra platform’s key features, culminating in a case study on computational reproducibility of published research using Terra in Chapter 14. 

See the 2020-2021 Genomics in the Cloud book club playlist on Youtube for a detailed preview of the material covered in the book. 

You can access the Genomics in the Cloud book through the O’Reilly online learning library (sometimes called Safari) which is available through many academic and public libraries. The book is also available for purchase in print or ebook format from all major bookstores. 

The post 8 great resources for learning to use Terra effectively  appeared first on Terra.

]]>
https://terra.bio/8-great-resources-for-learning-to-use-terra-effectively/feed/ 0
Workflow updates to the COVID-19 workspace: Better viral assembly and phylogenetics with NextStrainhttps://terra.bio/workflow-updates-to-the-covid-19-workspace-better-viral-assembly-and-phylogenetics-with-nextstrain/ https://terra.bio/workflow-updates-to-the-covid-19-workspace-better-viral-assembly-and-phylogenetics-with-nextstrain/#respond Thu, 16 Apr 2020 18:36:47 +0000 https://terrabioappdev.wpenginepowered.com/workflow-updates-to-the-covid-19-workspace-better-viral-assembly-and-phylogenetics-with-nextstrain/In our last blog post, we featured a public workspace containing best-practices workflows for viral genome analysis developed by Dr. Danny Park's Viral Genomics group, and used to process COVID-19 research data. We have been working with the viral genomics team to make additional improvements to the workspace, and I’m excited to tell [...]

The post Workflow updates to the COVID-19 workspace: Better viral assembly and phylogenetics with NextStrain appeared first on Terra.

]]>
In our last blog post, we featured a public workspace containing best-practices workflows for viral genome analysis developed by Dr. Danny Park’s Viral Genomics group and used to process COVID-19 research data. We have been working with the viral genomics team to make additional improvements to the workspace, and I’m excited to tell you about a couple of major updates.

What’s new?

  • Updated WDL for reference-based viral genome assembly
  • Addition of a WDL to run Augur, a NextStrain tool for phylogenetic analysis and visualization

These additions mean that you can now go all the way from sequencing data to an interactive phylogenetic tree in the same workspace. Let’s dive into the specifics for each of these important updates.

New reference-based viral genome assembly workflow: simpler and more efficient 

The viral genomics team previously provided an “assisted de novo” viral assembly pipeline that has been refined over the past decade of use and validated on metagenomic Illumina data from diverse viral taxa, including Lassa, Ebola, Zika, Mumps, Influenza A, HIV, Rabies, Hepatitis A, and several herpes viruses (HHV 1, 2, 3, and 5). It was designed to assemble contigs, scaffold, and polish assemblies for viruses that may exhibit up to 30% nucleotide divergence from available reference databases. This approach is robust for a wide range of viral taxa, but maybe more computationally intensive and complex than necessary for viruses that exhibit very limited diversity—such as those involved in single-origin disease outbreaks.

In recent months, the scientific and public health community tackling SARS-CoV-2 genomics has increasingly been favoring simplified approaches for both data generation and data analysis, many of which are documented by the CDC. Simple align-to-reference based approaches for consensus calling (similar to those used in the study of non-diverse genomes, such as humans) provide more efficient analysis processes and ease of interpretation. Additionally, the popularity of PCR tiled amplicon-based data generation approaches (such as ARTIC) frequently necessitates specialized filtration steps to remove the primer artifacts during analysis (the iVar trimming tool from Scripps being one of the more popular tools for ARTIC+Illumina data).

In this update, we are adding the viral genomics team’s reference based viral assembly tool (assemble_refbased.wdl), which they’ve updated to reflect these best practices and is appropriate for use on any Illumina data generated from SARS-CoV-2. In particular, there is an optional input parameter, a BED file, to describe any PCR amplicon primers used in the process of data generation (this can be omitted if no such primers were used). The original de novo assembly workflow is still provided in this workspace; although it is not necessary for SARS-CoV-2, it is applicable to a much broader range of viruses than the reference-based workflow.

A workflow for phylogenetic analysis with Augur (NextStrain)

Working with the viral genomics team, DSP created a WDL that runs the Augur tool from NextStrain, which we added to the Terra COVID-19 workspace. This allows you to run a phylogenetic analysis on a set of assembled viral genomes (files that are output by the assembly workflow described above) and visualize the resulting tree. The workspace we provide includes a set of publicly available genomes imported from the NCBI SRA repository, but you can import your own data as well.

What are Augur and NextStrain?

NextStrain is a collection of open source tools that help scientists, epidemiologists, and public health officials in their understanding of pathogen spread and evolution, especially in outbreak scenarios. Augur is one of those tools, developed for tracking pathogen evolution from sequencing data. With this particular tool, epidemiologists can build trees to analyze the evolutionary relationships between viral strains isolated from cases of COVID-19, which helps map the initial emergence and sustained transmission of the virus.

The nextstrain.org portal provides analysis results produced by running Augur on all publicly available datasets for SARS-CoV-2/COVID-19. However, researchers may need to perform “community builds” on defined subsets of data, or provide previews of data prior to public release. These “community builds” allow them to create their own analyses and store their results on GitHub.

Running the Augur workflow in Terra: configuration notes

The workflow that we provide (augur_build_tree.wdl) is configured to run on the collection of assembled FASTA files generated by the viral genome assembly workflow described above  (assemble_refbased). You can however change the configuration to run on any set of assembled viral genomes in FASTA format.

By default, the workflow will use a set of resources and parameters that are appropriate for SARS-CoV-2 genomes. The reference FASTA and Genbank (.gb) files are SARS-CoV-2 specific references. The default auspice_config.json file represents the metadata we have modeled in the data tab and was curated to include the metadata available from SRA (output from SRA_to_uBAM). If you plan to run the workflow on your own data, make sure to prepare your metadata file according to these directions.

We hope that these new resources will prove useful to you; as always, we welcome your feedback and suggestions for improving them.

The post Workflow updates to the COVID-19 workspace: Better viral assembly and phylogenetics with NextStrain appeared first on Terra.

]]>
https://terra.bio/workflow-updates-to-the-covid-19-workspace-better-viral-assembly-and-phylogenetics-with-nextstrain/feed/ 0
Broad scientists release COVID-19 best-practices workflows and analysis tools in Terrahttps://terra.bio/broad-scientists-release-covid-19-best-practices-workflows-and-analysis-tools-in-terra/ https://terra.bio/broad-scientists-release-covid-19-best-practices-workflows-and-analysis-tools-in-terra/#respond Tue, 17 Mar 2020 18:40:07 +0000 https://terrabioappdev.wpenginepowered.com/broad-scientists-release-covid-19-best-practices-workflows-and-analysis-tools-in-terra/Like you, we are adapting to a different way of living and working as the 2019 novel coronavirus (COVID-19) spreads, claiming many lives, sickening many more, and upending daily life around the world. We are heartened to see that the scientific community is mobilizing (in labs and remotely) to find and exploit any [...]

The post Broad scientists release COVID-19 best-practices workflows and analysis tools in Terra appeared first on Terra.

]]>
Like you, we are adapting to a different way of living and working as the 2019 novel coronavirus (COVID-19) spreads, claiming many lives, sickening many more, and upending daily life around the world. We are heartened to see that the scientific community is mobilizing (in labs and remotely) to find and exploit any potential avenue for tracking, slowing, or stopping this virus.

As our scientific collaborators at the Broad and around the world knuckle down to analyze the data that is starting to stream in, the Terra team is prioritizing work to support their efforts. In collaboration with Dr. Danny Park, Group Leader for Viral Computational Genomics at the Broad Institute of MIT and Harvard, and his colleagues, we are excited to release a first set of resources for COVID-19 analysis in Terra.

Best practices for analyzing COVID-19 genomic data

The workspace contains best-practices workflows for processing and analyzing viral genomic data that Dr. Park has been developing and teaching to public health lab scientists for the past six years. Dr. Park and colleagues will be using these workflows in Terra for processing COVID-19 research data generated internally at the Broad. These same workflows are now available in Terra, making it possible for anyone to analyze the publicly available data as well as any data they are generating themselves.

More specifically, the COVID-19 workspace contains:

  • Raw COVID-19 sequencing data (.fastq and .BAM) available from the NCBI Sequence Read Archive (SRA), which will be regularly updated as more data becomes available
  • Workflows for genome assembly, quality control, metagenomic classification, and aggregate statistics
  • A Jupyter Notebook that produces quality control plots from the data output by the workflows

We expect that this workspace will be most useful to scientists in public health labs and departments of health who need robust, best-practices workflows for analyzing COVID-19 genomics data from their jurisdiction. We do of course encourage anyone interested to check out the workspace, run their own analyses with these tools and suggest improvements.

The COVID-19 workspace is a growing and evolving resource

Today is the release of the first version of this COVID-19 workspace, but this is only the beginning. As more data and tools become available, we will continue to develop the workspace and expand its usefulness to the community. Here are some of our priorities for the immediate future:

  • Add COVID-19 sequences generated at the Broad Institute and from the SRA as they become available to enable more robust and comprehensive analysis
  • Incorporate additional tools for phylogenetic analysis that will make use of a growing library of COVID-19 sequences
  • Include modules that will help you prepare your assembled sequences for submission to sequence repositories like GenBank, SRA, and GISAID, and easier integration with community dashboards like Nextstrain
  • Solicit and incorporate feedback, contributions, tools, and data from the community

 

Update: Read about the next version of this COVID-19 workspace here.

 

Additional help for learning how to use COVID-19 resources in Terra

If you are a researcher or public health lab scientist who is coming to Terra for the first time, we recommend you start by reading the COVID-19 article in the Terra support center, which will orient you to relevant workspaces and tools, as well resources in the support center for learning the basics of Terra. We will continue to update that article as we develop new learning materials around this workspace and any others as they are updated or released.

The post Broad scientists release COVID-19 best-practices workflows and analysis tools in Terra appeared first on Terra.

]]>
https://terra.bio/broad-scientists-release-covid-19-best-practices-workflows-and-analysis-tools-in-terra/feed/ 0