Adrian Sharma, Author at Terra https://terra.bio/author/asharma/ Science at Scale Wed, 27 Dec 2023 04:54:13 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://terra.bio/wp-content/uploads/2023/12/Terra-Color-logo-300-150x150.pngAdrian Sharma, Author at Terrahttps://terra.bio/author/asharma/ 32 32 Community-maintained Notebook environments in Terrahttps://terra.bio/community-maintained-notebook-environments-in-terra/ https://terra.bio/community-maintained-notebook-environments-in-terra/#respond Thu, 01 Oct 2020 14:38:14 +0000 https://terrabioappdev.wpenginepowered.com/community-maintained-notebook-environments-in-terra/Ever since we introduced Jupyter Notebooks in Terra, we've sought to provide default environments pre-loaded with software packages that are likely to interest you, to minimize the amount of setup necessary to get your work going. However, we've found that there's a huge amount of variation in the needs and preferences of researchers, from [...]

The post Community-maintained Notebook environments in Terra appeared first on Terra.

]]>
Ever since we introduced Jupyter Notebooks in Terra, we’ve sought to provide default environments pre-loaded with software packages that are likely to interest you, to minimize the amount of setup necessary to get your work going. However, we’ve found that there’s a huge amount of variation in the needs and preferences of researchers, from the selection of packages themselves to the frequency at which people want to adopt new version updates. There’s clearly not a one-size-fits-all solution to that challenge, so we’ve developed a few complementary approaches: offering community-maintained environments, keeping legacy versions available, and providing options for building and sharing your own custom environments. Let’s take a closer look at what the first two entail, with a focus on the first; we’ll discuss custom environments in a follow-up blog post.

Community-maintained environments: Hail, Bioconductor, and Pegasus

We feel strongly that our role in this context is not to be the arbiters of what tools researchers should use, but to listen to what researchers say they need, on one hand, and on the other, to empower bioinformatics tool developers to make their tools available to the research community.

As a starting point, we try to identify software toolkits that are widely used within a particular research domain and can be provided in a dedicated Notebook environment in Terra. For example, Hail is a Python-based package for scalable data analysis specialized in genomics (e.g. genome-wide association studies); Bioconductor is a large R-based collection of bioinformatics tools, and Pegasus is a tool for single-cell and single-nucleus transcriptomics that can be used as a command-line tool or as a Python package.

To ensure that the pre-built environment for each toolkit will meet all of its requirements and provide good user experience, we engage with the project maintainers to design the environment. For example, in the case of Bioconductor, where all of the packages cumulatively amount to many gigabytes(GB) of data, the pre-built environment does not include all of the project packages, but it contains everything you need to get started and install the packages you want very quickly.

But we don’t stop there. To make sure that the pre-built environment will stay up to date with the latest project developments, we also enable the project maintainers to update the environment themselves. This is important because many of you rely on having access to the very latest algorithm improvements and bug fixes to make progress in your work. So when the Hail team, for example, develops a bug fix in response to a bug report, they can submit an updated version of their Hail environment with the bug fix to the Terra team, who can then take in the update with minimal effort. That way, updates are effectively no longer gated on the Terra team’s availability, which speeds up the process by an order of magnitude (e.g. only weeks instead of months between updates).

This collaborative approach allows us to offer pre-built environments that are likely to be useful to many of you, and it offers developers a platform for making their tools readily available to you without requiring you to do any installation or configuration.

On that note, if you or someone you know is a developer of a widely used bioinformatics software package that can be called from Jupyter Notebooks, we’d love to discuss options for making it available in Terra! Contact us at info@terra.bio with information about use cases, important datasets the tool(s) could be applied to, and the estimated size of the tool’s user base.

Availability of legacy versions

With all our enthusiasm for bleeding-edge development, we do recognize that you don’t actually always want to use the very latest version of a software package. If you’re already deep into an analysis when a major update happens, you probably won’t want to risk breaking perfectly good code with a library update that you don’t really need. Alternatively, you might be trying to reproduce some work that you or a collaborator did some time ago with an older version of the tools.

Accordingly, we are now providing a range of versions for each pre-built environment, to accommodate the need for continued access to older versions, and you can see the changelog for all versions in Github (eg Hail, Bioconductor, Pegasus). On this point, we are working on ways to further improve the presentation and range of these options, and we are open to suggestions, so don’t hesitate to let us know what you’d like to see by leaving a comment below or posting in the Feature and Documentation Requests section of the community forum.

Try out the community-maintained environments today

How would you like to take one of these environments for a spin? If you already have an account set up, it’ll only take a few minutes; just hop in and follow the instructions below. (If you don’t already have an account, follow the instructions for getting started first — sign up then set up billing with free credits from GCP)

Let’s check out the Pegasus environment, which is maintained by Bo Li’s group at Massachusetts General Hospital as part of their Cumulus framework for single-cell and single-nucleus transcriptomics. First, go to the Cumulus workspace and clone it. Then, in your clone, open the “Cloud environment” control panel (top right corner, gear icon), and expand the “Application configuration” dropdown menu to display the pre-built environments, as shown in the screenshot below.

Note that we are still working on refining how the pre-built environments are displayed and categorized within this menu, so you may see something slightly different from the screenshot depending on when you follow these instructions.

Screen_Shot_2020-09-30_at_11.25.10_PM.png

Select the Pegasus environment then click the “Next” button at the bottom of the panel to have Terra create your new environment. This will take a couple of minutes, during which Terra talks to Google Cloud to provision a virtual machine and set it up for you using a container that holds the Pegasus software and its dependencies.

Once your cloud environment is ready (the widget will say “RUNNING”), head over to the Notebooks tab of your Cumulus workspace clone and open one of the Pegasus tutorial notebooks. You should now be able to run all the code in the notebook without any additional steps. That’s it! That’s all it takes.

See the Notebooks Quickstart video if you’re not familiar with Jupyter Notebooks in Terra, and the Li lab’s Pegasus tutorial for more specific information about how to use the analysis package itself. Note that the Pegasus video shows an older version of the Terra cloud environments interface, which did not yet support community-maintained environments. Hopefully, the comparison illustrates why we are excited about this new functionality!

The steps for selecting other pre-built environments are the same starting from the “Cloud environment” control panel, which appears in the top right corner whenever you are in an open workspace. Note that you can get a complete list of the contents of any pre-built cloud environment by clicking the “What’s installed on this environment” text that appears below the dropdown menu when you have an environment selected.

Let us know what you think about this functionality in the comments below, and don’t hesitate to reach out to the Terra support team (via the helpdesk or the forum) if you run into any trouble.

The post Community-maintained Notebook environments in Terra appeared first on Terra.

]]>
https://terra.bio/community-maintained-notebook-environments-in-terra/feed/ 0
Update to Jupyter Notebook environment in Terra : Persistent Disk storage now availablehttps://terra.bio/update-to-jupyter-notebook-environment-in-terra-persistent-disk-storage-now-available/ https://terra.bio/update-to-jupyter-notebook-environment-in-terra-persistent-disk-storage-now-available/#respond Mon, 21 Sep 2020 12:44:10 +0000 https://terrabioappdev.wpenginepowered.com/update-to-jupyter-notebook-environment-in-terra-persistent-disk-storage-now-available/This week, we released one of those changes that looks small on the face of it but is actually a really big deal. Specifically, we upgraded the cloud environment (previously called "runtime") that we provide in Terra for running Jupyter Notebooks to support persistent disk storage. Until now, one of the limitations of our [...]

The post Update to Jupyter Notebook environment in Terra : Persistent Disk storage now available appeared first on Terra.

]]>
This week, we released one of those changes that looks small on the face of it but is actually a really big deal. Specifically, we upgraded the cloud environment (previously called “runtime”) that we provide in Terra for running Jupyter Notebooks to support persistent disk storage.

Until now, one of the limitations of our Notebook environment was that you had to manually save any outputs you cared about to a Google bucket (or other location of your choosing). For technical reasons, the storage space associated with the notebook was not guaranteed to stick around when you weren’t actively using it, and if you made certain configuration changes to your environment, the storage space was wiped and recreated from scratch.

Going forward, you’ll have the option to use what’s called a “detachable persistent disk” to store data that you plan to use as well as the outputs of any analyses you run in your notebooks. You can think of this as a sort of virtual USB thumb drive; you plug it in when you want to do some work, then detach it when you’re done, and keep it in your pocket until next time. (Just make sure you don’t leave it there when you do your laundry.)

In practice, the plugging in and detaching will be done automatically for you. When you reconfigure or delete your environment, the system will automatically detach and reattach your persistent disk, or save it for later if you don’t create a new environment right away.

You get one persistent disk per billing project, and you’ll be able to use that same persistent disk with notebooks in any of your workspaces within that billing project. You can see — and customize — the size of your persistent disk in the environment configuration panel, as shown in the screenshot below.

image1.png

Note that the persistent disk is mounted to the directory /home/jupyter-user/notebooks, so please make sure that’s where you save your data and outputs! A benefit of this setup is that any software packages you install through the notebook will be saved there as well, reducing the amount of reinstallation you have to do when you change your environment configuration.

Interested in learning more? We’ve created this step by step guide to walk you through how and when to utilize this feature when launching a notebook. Please note this feature only supports cloud environments that use the “standard VM” option and will not apply to Spark or Hail application configurations.

We’d love to hear your thoughts on this new capability in the comments below! And as always, don’t hesitate to reach out to the Terra helpdesk if you run into any issues.

The post Update to Jupyter Notebook environment in Terra : Persistent Disk storage now available appeared first on Terra.

]]>
https://terra.bio/update-to-jupyter-notebook-environment-in-terra-persistent-disk-storage-now-available/feed/ 0
Faster creation of Notebook environments with Google Compute Engine VMshttps://terra.bio/faster-creation-of-notebook-environments-with-google-compute-engine-vms/ https://terra.bio/faster-creation-of-notebook-environments-with-google-compute-engine-vms/#respond Fri, 17 Jul 2020 13:16:16 +0000 https://terrabioappdev.wpenginepowered.com/faster-creation-of-notebook-environments-with-google-compute-engine-vms/We know that the 4 minutes it takes to create a cloud environment to perform a Jupyter Notebook analysis can feel like a long time. To reduce this time and save you cost, Terra has added support for using standard Google Compute Engine Virtual Machines (GCE VMs) as the underlying compute/runtime. Terra researchers [...]

The post Faster creation of Notebook environments with Google Compute Engine VMs appeared first on Terra.

]]>
Summary: We know that the 4 minutes it takes to create a cloud environment to perform a Jupyter Notebook analysis can feel like a long time. To reduce this time and save you cost, Terra has added support for using standard Google Compute Engine Virtual Machines (GCE VMs) as the underlying compute/runtime.

Terra researchers frequently use Jupyter Notebooks for genomic analyses, but may not often think about the virtual machine (VM) used for the underlying Notebook compute. However, the VM can be a very important factor in not only how fast you can create a Notebook environment but the types of applications you can use in Terra. When Terra was originally created, we chose a Spark VM for the underlying Notebook compute so you could perform GWAS analysis with the Hail python library. But as Terra has expanded, researchers now use Jupyter Notebooks for multiple use cases beyond Hail, and others require new tools like RStudio and Galaxy which don’t run as efficiently on Spark. That’s why we are thrilled to introduce support for GCE VMs, a first step in faster cloud environment creation and the integration of new Terra applications.

There are multiple advantages to using GCE VMs. Compared to the current Spark VM in Terra, GCE VMs are created in 50% of the time due to fewer installation steps, saving you precious time for your analyses. They also reduce the costs for running and paused VMs by $0.01 per CPU. But perhaps the best part of the GCE VMs is that they provide support for detachable persistent disks, a more durable storage solution that will be required to support new applications like RStudio. Persistent disks will also allow you to save your analysis environment set up and your output files even after you delete your VM (you can read more about them in an upcoming blog).

Based on these benefits, we are now providing GCE VMs as a “Standard” option for your Jupyter Notebook’s underlying compute while continuing to support Hail and Spark. When you create a new cloud environment for Jupyter in Terra, you will have the choice of either a “Standard VM” or a “Spark” VM/cluster. There are two ways to make the selection:

  1. You can choose the appropriate environment from the “Runtime Configuration” window’s dropdown menu. Depending on the environment you choose, Terra will recommend which VM type to select. For example, if you choose the Default environment, Terra will automatically select “Standard VM”. If you select the Hail environment, Terra will automatically select Spark; you can then choose to create either a single “Spark master node”, or a Spark cluster with worker nodes.
  2. You can choose the VM by simply toggling the “Runtime type” dropdown; select either Standard or Spark VMs.

 

Runtime_Config800.png

 

When you choose the Standard VM option, you will find that cloud environments for Jupyter create in ~2 minutes as opposed to the ~4 minutes required to create a Spark VM/cluster (although sometimes this time fluctuates depending on Google).

We’re currently working on providing support for detachable persistent disks as the next step to improve the analysis experience on Terra.

The post Faster creation of Notebook environments with Google Compute Engine VMs appeared first on Terra.

]]>
https://terra.bio/faster-creation-of-notebook-environments-with-google-compute-engine-vms/feed/ 0