Events Archives - Terra https://terra.bio/category/events/ Science at Scale Wed, 27 Dec 2023 04:55:52 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://terra.bio/wp-content/uploads/2023/12/Terra-Color-logo-300-150x150.pngEvents Archives - Terrahttps://terra.bio/category/events/ 32 32 Terra 101 workshop: Learn how to use Terra quickly and efficientlyhttps://terra.bio/terra-101-workshop-learn-how-to-use-terra-quickly-and-efficiently/ https://terra.bio/terra-101-workshop-learn-how-to-use-terra-quickly-and-efficiently/#respond Thu, 05 Jan 2023 19:27:58 +0000 https://terrabioappdev.wpenginepowered.com/terra-101-workshop-learn-how-to-use-terra-quickly-and-efficiently/Register for a free hands-on virtual workshop to get started with Terra and level up your skills.

The post Terra 101 workshop: Learn how to use Terra quickly and efficiently appeared first on Terra.

]]>
What better way to kick off a new year than to learn some new skills that deliver immediate benefits?

We’re offering a hands-on virtual workshop for those who want to learn how to use Terra quickly and efficiently. It’s open to everyone and is completely free (we’ll take care of the compute costs for the exercises) so make sure to register today

 

Specifically, you’ll learn about:

  • Terra architecture as it relates to cloud-based data sets, tools, and computing resources
  • How data is organized in Terra
  • How to run a WDL workflow to automate analysis
  • How to launch a Cloud Environment from your Terra workspace to run interactive analysis tools like Jupyter Notebooks, RStudio, and Galaxy

 

Save the date(s)!

The workshop will consist of two two-hour sessions scheduled on consecutive days:

January 31st, 2023 at 10:00am – 12:00pm ET
February 1st, 2023 at 10:00am – 12:00pm ET

Be sure to register today so you don’t miss this opportunity.

We look forward to seeing you! If you have any questions, please don’t hesitate to contact us at dsp-workshops@broadinstitute.org.

 

 

The post Terra 101 workshop: Learn how to use Terra quickly and efficiently appeared first on Terra.

]]>
https://terra.bio/terra-101-workshop-learn-how-to-use-terra-quickly-and-efficiently/feed/ 0
What’s next for genomics? Plug into GA4GH to find outhttps://terra.bio/whats-next-for-genomics-plug-into-ga4gh-to-find-out/ https://terra.bio/whats-next-for-genomics-plug-into-ga4gh-to-find-out/#respond Fri, 30 Sep 2022 14:30:26 +0000 https://terrabioappdev.wpenginepowered.com/whats-next-for-genomics-plug-into-ga4gh-to-find-out/A short primer on what is the Global Alliance for Genomics and Health, why it matters, and links to recordings from the 2022 Plenary Meeting.

The post What’s next for genomics? Plug into GA4GH to find out appeared first on Terra.

]]>
Sequencing technology keeps getting better, faster, more productive — but that’s not the only thing that shapes the big picture of where we’re headed as a field. If you want to understand where genomics is going, you should pay attention to the Global Alliance for Genomics and Health, or GA4GH.

GA4GH is an international organization composed largely of contributors from member institutions in healthcare, research, patient advocacy, and information technology, seeking to enable responsible genomic data sharing within a human rights framework

The boring way to describe what GA4GH does is to say it develops standards and policies — ranging from technology standards like file formats and application programming interfaces (APIs), which aim to enable interoperability and broad access to data and tools among the global research community, to policy frameworks like patient consent clauses and data privacy policies. 

I prefer to think of it this way: GA4GH contributors are effectively building the scaffolding for what genomics will deliver in practice, at scale, over the next decade. 

The GA4GH standard development process involves collaboration with implementers, i.e. groups that apply the GA4GH standards in practice, typically in the context of driver projects — which have the advantage of providing real-life use cases. (As it turns out, you develop better solutions when you’re actually working on real problems.)

For technology standards, implementation means building software tools and operating services that follow the rules laid out by the relevant standards. 

For example, the CRAM and VCF file formats are widely-used bioinformatics standards stewarded by GA4GH that specify how to encode sequencing reads and variant calls, respectively. There is a software library called htsjdk that implements both of these standards in the Java programming language, meaning that it includes code that is capable of reading and writing files that are encoded according to those standards. Researchers can then use genome analysis tools like GATK and Picard, which include the htsjdk library, to read and write CRAM and VCF files as part of their analysis work. Tadaa. (For fans of C, substitute htslib and samtools in the library/tool mentions.)

In addition to these now-classic (if imperfect) workhorse formats, GA4GH has been driving the development of other, newer knowledge representation standards that you may not yet be aware of, but will likely transform the way many of us work. Take the Variation Representation Specification (VRS, pronounced “verse”), which among other things makes it possible to capture the complex information that underlies variant interpretation in computable form, a key feature for solving variant interpretation bottlenecks. Right now, VRS is a fairly niche product, but within a few years I expect we’ll see it being used across a variety of research and clinical diagnostic platforms. 

 

Computable standards for alleviating variant interpretation bottlenecks. From “Genomic Knowledge Standards Advancements” by Larry Babb, presented at the 10th Plenary Meeting of the GA4GH (see Day 1 recording). 

 

Pro-tip: you can check out the Python implementation of VRS in Github, and there’s even a Terra workspace hosting Jupyter notebooks that demonstrate how to use it in practice. 

Speaking of platforms, another major axis of GA4GH standard development is platform-level interoperability, i.e. infrastructure standards that enable platforms like Terra to talk to each other. 

I’ve written before about the big picture of interoperability for open ecosystems, the example of the AnVIL project, and how Terra uses the DRS standard for data interoperability specifically. The ultimate goal here is to make it possible for researchers to do things like combining data from separate repositories into powerful federated analyses without having to move any of it around. 

Excitingly, that dream is starting to materialize! We are now at the stage where data federation is effectively possible — for a limited set of datasets and platforms, with some clunkiness involved. As the work continues, you can expect the scope of what’s possible to include more datasets, with a smoother experience as the handoff between platforms gets ironed out. 

There is a lot more to say about the scope and impact of GA4GH work, but I’ll have to leave that for another time. 

The bottom line is, if any of this is new to you, now is a great time to start getting caught up.

The organization held its annual plenary meeting over two days last week, and the full recordings for both Day 1 and Day 2 are available on YouTube, annotated with timestamps for specific sessions and presentations (see the expanded video descriptions). You can also find links to the slide decks in the agenda. 

As a new member of the organization (I joined the Large-Scale Genomics Work Stream in June), I found the lineup of talks struck a great balance between showcasing progress made so far, outlining upcoming challenges and discussing concrete solutions. I hope you will find these resources useful too — and consider joining the effort!

 


Additional resources 

For a more comprehensive tour of the Global Alliance’s scope, vision, and outputs, read the “Perspective” paper published late last year in Cell Genomics by Heidi Rehm and colleagues:

GA4GH: International policies and standards for data sharing across genomic research and healthcare (2021) Cell Genomics, Volume 1, Issue 2, https://doi.org/10.1016/j.xgen.2021.100029 

The post What’s next for genomics? Plug into GA4GH to find out appeared first on Terra.

]]>
https://terra.bio/whats-next-for-genomics-plug-into-ga4gh-to-find-out/feed/ 0
Join us July 28 for a webinar on using the cloud to teach genomicshttps://terra.bio/join-us-july-28-for-a-webinar-on-using-the-cloud-to-teach-genomics/ https://terra.bio/join-us-july-28-for-a-webinar-on-using-the-cloud-to-teach-genomics/#respond Thu, 23 Jun 2022 15:05:25 +0000 https://terrabioappdev.wpenginepowered.com/join-us-july-28-for-a-webinar-on-using-the-cloud-to-teach-genomics/Learn how to use cloud-based data and tools to teach genomic concepts, in a webinar that will introduce key resources and demonstrate how to use them through concrete examples.

The post Join us July 28 for a webinar on using the cloud to teach genomics appeared first on Terra.

]]>
The migration of biomedical data and computational tools to the cloud is opening new opportunities for research-grade resources to be accessible by a wide range of audiences, including students, regardless of their institute’s computational infrastructure. Educators can now in principle leverage these exciting cloud data and tools to make basic genomics concepts and applications more tangible to students. 

In practice, using cloud-based resources for teaching purposes requires a basic understanding of how the cloud products and platforms work together, and how to make effective use of relevant resources depending on your teaching objectives.

That is why we developed a webinar that aims to support educators by demonstrating how to find and use cloud-based tools and open datasets to build hands-on exercises that would fit into an introductory genomics curriculum.

 

AnVIL in the Classroom:

Cloud-scale educational resources for modern genomics

July 28, 2022 at 1:00 PM (EDT)

 

In this webinar, we’ll introduce key resources that are publicly available as part of the NHGRI Analysis Visualization and Informatics Lab-space (AnVIL), and we’ll demonstrate how to use them through concrete examples that touch on relevant genomic concepts, such as genome sequencing and the identification and analysis of genomic variants. 

Our goal is to equip you with a versatile toolbox and a set of teaching examples that you can use as templates for your own courses about genomics, or extend to communicate other scientific concepts and use cases.

We hope you will join us for the live webinar on July 28; or catch the recording afterward if you can’t make that particular date and time. 

 


 

This webinar is organized by the American Society for Human Genetics (ASHG) as part of their online e-learning program. The live event will take place on Zoom and will be available to everyone (including non-members) through a free registration on the ASHG website. A recording of the webinar will be available after the live event.

 

 

The post Join us July 28 for a webinar on using the cloud to teach genomics appeared first on Terra.

]]>
https://terra.bio/join-us-july-28-for-a-webinar-on-using-the-cloud-to-teach-genomics/feed/ 0
Join us at the BOSC CollaborationFest July 15-16https://terra.bio/join-us-at-the-bosc-collaborationfest-july-15-16/ https://terra.bio/join-us-at-the-bosc-collaborationfest-july-15-16/#respond Thu, 19 May 2022 17:32:12 +0000 https://terrabioappdev.wpenginepowered.com/join-us-at-the-bosc-collaborationfest-july-15-16/Are you interested in learning new bioinformatics skills and contributing to open source projects in an inclusive and collaborative environment?

The post Join us at the BOSC CollaborationFest July 15-16 appeared first on Terra.

]]>
Are you interested in learning new bioinformatics skills and contributing to open source projects in an inclusive and collaborative environment? 

Then you should consider joining CollaborationFest (CoFest for short): a free, two-day hybrid collaborative event that takes place right after the Bioinformatics Open Source Conference (BOSC, July 13-14). 

CoFest is an opportunity to work together with other bioinformatics enthusiasts, from complete newcomers to grizzled veterans, to develop useful resources for the community. Importantly, it’s not just about coding, so you don’t have to be a programmer to participate; CoFest projects can also focus on documentation, training materials, and discussing challenging analysis problems and use cases. So everyone is welcome to contribute a different perspective and skillset.  

Personally, I’ve attended several previous CoFests and always had a great time: I met cool people, learned new skills (one year I learned how to run Nextflow in a Jupyter Notebook) and even made some modest contributions to open source bioinformatics. 10 out of 10 would recommend.

 

Sign up for CoFest today! It’s free!

This year’s CoFest is being planned as a virtual-first hybrid event, meaning the core planning is geared toward welcoming remote participants. There will also be an in-person gathering in Madison, Wisconsin (USA) if enough people express interest. 

So sign up for CoFest today by adding your name to this spreadsheet, and let the organizers know whether you’re interested in joining remotely or in-person.

Projects will be listed in this deck as people start proposing projects (which typically picks up steam as we get closer to the event). If you’d like to join someone else’s project, you can add yourself under “Interested participants” on the slide for any project you find interesting, or add a “contributor” slide to make yourself available for recruitment to a team. If you’d like to propose a project, feel free to add a “project” slide with a summary of your idea and what you hope to achieve. 

If you have any questions, you can reach out to CoFest organizer Thomas Schlapp by email at tschlapp@broadinstitute.org or on the OBF-BOSC community Slack.

 

Consider also attending the Bioinformatics Open Source Conference

If you’re not already familiar with BOSC itself, check out my blog post from last year that explains why it’s the absolute best annual bioinformatics conference. 

This year it is once again being run as a “Community of Special Interest” track, or COSI, as part of the ISMB/ECCB meeting of the International Society for Computational Biology (ISCB). The ISMB 2022 meeting is a hybrid event, with the in-person component located in Madison, Wisconsin (USA), running July 10-14. BOSC itself runs July 13-14.

Several of us Terrans will be there in person — teaching a Terra workshop, giving a talk about deciphering WDL workflows, hanging out in Birds of a Feather sessions and of course, participating in CoFest!

Calls for submissions are almost all closed (late poster abstracts are due today, 19 May) but there’s still time for you to register to attend the conference, either virtually or in person. Join us!

 

The post Join us at the BOSC CollaborationFest July 15-16 appeared first on Terra.

]]>
https://terra.bio/join-us-at-the-bosc-collaborationfest-july-15-16/feed/ 0
Register for ASHG’s January 2022 interactive workshops todayhttps://terra.bio/register-for-ashg-interactive-workshops/ https://terra.bio/register-for-ashg-interactive-workshops/#respond Tue, 07 Dec 2021 18:37:22 +0000 https://terrabioappdev.wpenginepowered.com/register-for-ashg-interactive-workshops/The ASHG January 2022 Interactive Workshops are now open for registration.

The post Register for ASHG’s January 2022 interactive workshops today appeared first on Terra.

]]>
This year, the American Society for Human Genetics (ASHG) decided to hold the Interactive Workshops separately from their October 2021 annual meeting. This means that even if you weren’t able to attend the conference a few months ago, there is still time to participate in the workshops and you don’t even have to be a member of ASHG. 

In fact, the January 2022 workshops are now open for registration!

There is exciting content lined up – you are getting not one, but TWO AnVIL-based workshops. These workshops will give you the opportunity to learn to run cutting-edge analyses using popular bioinformatics tools in a cloud-based environment that provides unprecedented access to data, tools, and computing capabilities. Check out the workshop descriptions below to get a taste of what you can expect: 

Structural variant discovery from long-read sequencing data on the cloud with Galaxy in Terra

Date/Time: January 19, 2022 at 12:00 pm (EST)

In this session, you will explore an end-to-end structural variant discovery analysis using Galaxy, a popular bioinformatics application that provides access to data analysis tools through a user-friendly graphical interface. The instructors will be using Galaxy in Terra to provide live demonstrations. By attending this workshop, you will learn how to:

  • Bring data into a project workspace in Terra
  • Combine data (your own or controlled access) with an open-access dataset
  • Launch a Galaxy instance in Terra and run a complete workflow to identify SVs
  • Visualize results and identify potentially pathogenic variants

 

Reproducible Analysis of Human Pangenome Data  using the AnVIL

Date/Time: January 26, 2022 at 12:00 pm (EST)

This session will explore and demonstrate open-access data from the Human Pangenome Reference Consortium (HPRC), an NHGRI funded initiative to create a more diverse and comprehensive reference human pangenome. The instructors will walk through how to access this data and workflows in the AnVIL. The participants of this workshop will learn how to:

  • Access and explore Human Pangenome Data hosted by AnVIL
  • Search for bioinformatics workflows in Dockstore and export them to a Terra workspace
  • Configure and launch a Docker-based WDL workflow to conduct a parallel analysis

 

Register today

This is a great opportunity to level up your skills and tackle some interesting analyses at the same time, so we hope you’ll take advantage of it. ASHG workshops are open to anyone interested in learning more about accessing and analyzing data in the cloud; programming experience is not required.

Check out the workshop information page to learn more about the sessions. Note that ASHG offers discounted prices for students and trainees as you can see in the pricing list below.

ASHG

 

 

 

 

 

 

 

 

 

Keep in mind that if the scheduled dates and times don’t work for you, the workshop recordings and all related materials will remain available on demand for an extended period after the initial live session, so you can choose to work through them on your own at a time that is more convenient for you.

If you can’t wait until the January sessions, you can already work through the first set of workshops that took place in September and included a workshop on the  BRAIN Initiative’s NeMO resources (learn more about NeMO on the Terra blog).

The post Register for ASHG’s January 2022 interactive workshops today appeared first on Terra.

]]>
https://terra.bio/register-for-ashg-interactive-workshops/feed/ 0
Grow your analysis skills with the ASHG Interactive Workshopshttps://terra.bio/grow-your-analysis-skills-with-the-ashg-interactive-workshops/ https://terra.bio/grow-your-analysis-skills-with-the-ashg-interactive-workshops/#respond Fri, 29 Oct 2021 13:30:05 +0000 https://terrabioappdev.wpenginepowered.com/grow-your-analysis-skills-with-the-ashg-interactive-workshops/The ASHG Interactive Workshops have always been a great way to learn new analysis skills in a hands-on way. This year, the workshops are available as ticketed events separate from the conference itself.

The post Grow your analysis skills with the ASHG Interactive Workshops appeared first on Terra.

]]>
Another great annual meeting of the American Society for Human Genetics (ASHG) concluded last week, and as always it’s been both thrilling and a little overwhelming to catch up on the latest work across the field. 

One important thing to note is that this year the conference organizers decided to hold their traditional Interactive Workshops separately from the conference itself, in two sessions: one was held before the conference, in September, and the second is scheduled for January 2022. 

Crucially, they’ve made these workshops independently ticketed events (with very low ticket prices), so unlike previous years, you don’t have to have registered for the full conference to be able to participate in the workshops. And of course thanks to the online nature of the delivery, the materials will remain available on demand for an extended period after the initial live session, so even if you can’t make the scheduled time, you can still work through it on your own time at a later date. 

The content lined up for these workshops is very exciting, and I’m not just saying that because they include one about the BRAIN Initiative’s NeMO resources (which I just wrote about last week) in the September session, and two AnVIL-based workshops in the upcoming January session. The whole lineup is well worth your time, so please do check it out.

Regarding the AnVIL-based workshops specifically, we’ll share more details as we get closer to the events, but you can already save these dates: 

Jan 19 – Structural variant discovery from long-read sequencing data on the cloud with Galaxy in Terra

Jan 26 – Reproducible Analysis of Human Pangenome Data using the AnVIL

As a teaser, I can already tell you that the structural variation workshop will be jointly organized and delivered by the AnVIL/Terra teams at Johns Hopkins University and the Broad Institute. The instructors will use live demonstrations and do-it-yourself exercises that will guide participants through an end-to-end structural variant identification journey using Galaxy in Terra. 

And if you’d like to get a jump start on the action, check out this blog post to get started with Galaxy on Terra today!

The post Grow your analysis skills with the ASHG Interactive Workshops appeared first on Terra.

]]>
https://terra.bio/grow-your-analysis-skills-with-the-ashg-interactive-workshops/feed/ 0
Flocking toward computational reproducibility: Highlights from our Birds of a Feather session at BOSC 2021https://terra.bio/flocking-toward-computational-reproducibility-highlights-from-our-birds-of-a-feather-session-at-bosc-2021/ https://terra.bio/flocking-toward-computational-reproducibility-highlights-from-our-birds-of-a-feather-session-at-bosc-2021/#respond Tue, 24 Aug 2021 19:17:09 +0000 https://terrabioappdev.wpenginepowered.com/flocking-toward-computational-reproducibility-highlights-from-our-birds-of-a-feather-session-at-bosc-2021/This blog shares highlights and key takeaways from the Birds of a Feather session at BOSC 2021. The discussion topic theme was computational reproducibility of published papers - which drew a great group of BOSC participants and led to some lively discussion, in the hope that some of the ideas we exchanged would live on beyond the close of the conference.

The post Flocking toward computational reproducibility: Highlights from our Birds of a Feather session at BOSC 2021 appeared first on Terra.

]]>
The 2021 Bioinformatics Open Source Conference (BOSC) held virtually a few weeks ago delivered a number of fantastic presentations, which you can now enjoy in their fully recorded glory on YouTube. Not included in those recordings, however, are the “Birds of a Feather” sessions, or “BoFs”, which are typically small, informal gatherings where participants have the opportunity to discuss topics of interest in a way that complements the main conference track. In this blog, I thought I’d share highlights and key takeaways from the BoF we organized on the theme of computational reproducibility of published papers, which drew a great group of BOSC participants and led to some lively discussion, in the hope that some of the ideas we exchanged would live on beyond the close of the conference. 


 

Think about the last time you wanted to reuse or adapt an analysis method that you read about in a published paper. How did it go? What were the biggest hurdles you ran into? What solutions are you aware of that you think everyone should know about, and what do you wish was available? Let’s talk about problems and solutions, and how we might make progress toward fully executable papers.”
Conference program blurb

 

We all agree that we should build on prior work and avoid reinventing the wheel wherever possible. Ideally, we’d also like to avoid having to spend a ton of time figuring out how a published paper’s authors actually implemented the computational analyses they describe. 

Yet we’ve all been stymied by computational methods sections that are just not complete enough to make it possible to redo the work step by step. For compute-intensive analyses, hardware requirements are often undocumented. Software version numbers and exact command lines are left out. Or, the command line is in there but some optional parameters were omitted despite making an important difference to the analysis. And of course, there’s the dreaded “the analysis was implemented with custom scripts (available on request)”.  

 

computational reproducibility
The reader of a published paper (pictured right) typically has an incomplete view of the full body of data, analysis method details, infrastructure configuration, and detailed results used by the author (pictured left).

 

Even granting that a lot of people are starting to pick up “good enough practices” like sharing code in Github, there is still typically a big gap between “I can download the code” and “I can get the code to run the same way you ran it in your study”.

It’s 2021. Why are we still dealing with these problems, and how do we get past them? 

 

Incentives, or lack thereof

The first challenge that our group discussion converged on right away was that it can take a lot of effort to help others reproduce your own work, and that there’s a systemic lack of incentives for academic researchers to put in that effort. Even with the best intentions, many (particularly early-career) researchers are under intense pressure to get more work done —more papers published— so it’s difficult for them to prioritize an activity that is not explicitly rewarded in terms of career advancement. 

Digging further into this, we found it useful to distinguish between two types of activities: there’s sharing a tool you developed to perform a particular type of data processing, then there’s documenting how you applied one or more tools in a study to solve a biological question. To be clear, it’s not just a matter of “tool developers vs. end-users” personas, because there’s a lot of overlap there — there are plenty of people who develop computational tools and apply them in their own research — and besides, many so-called “end users” typically generate their own code in the form of scripts that glue together invocations of command-line tools developed by others. That code in itself is an important, if sometimes undervalued and insufficiently shared, piece of the methodological puzzle. 

Overall our little group expressed quite a bit of optimism about how things are going on the side of tool sharing activities; there are certainly plenty of challenges there, but also some great success stories. For example, the Satija Lab was cited as a model for their level of investment in documentation and support to empower the single-cell analysis community to use their Seurat toolkit. Based on some of our own experiences, we recognized that at least sharing the tools we develop can be associated with incentives rewarding community enablement, since for some types of grants, funders do attach importance to community uptake metrics such as the number of times your software has been cited or downloaded. 

The situation felt bleaker at the other end of the spectrum, in regards to the computational reproducibility of methods in published studies. The only metric that seems to be clearly prized on the “output” side of scientific publishing is how many times your work has been cited, not whether anyone can actually reproduce the computational analysis you described (which is not the same thing as replicating/confirming findings). Intuitively we supposed that papers that present analyses that are more easily reproducible should get cited more often, by other investigators who were subsequently able to build on the work, but could not find any obvious support for this in the literature. It would be really interesting to see this be studied in a systematic way if only to be able to motivate paper authors to put in the work as a long-term investment. 

We also discussed whether there is anything we could do as a community to reward investigators who do a great job of making their work more readily reproducible (especially early-career folks). Perhaps through a new award by the Open Bioinformatics Foundation, if sponsors could be found to support such an initiative?  

 

The non-linear nature of the research process

Moving on from getting mildly depressed about how academic incentive structures are all wrong, we reflected on another challenge: the conflict between the reality of how most scientific investigation proceeds — on a winding path, with many branches and failures along the way — and the necessary exercise of rewriting history in order for the paper to present a coherent narrative.

We discussed the idea of doing more to show the messiness and the failed attempts that hide behind what is effectively a redacted sequence of events presented in the eventual paper. Greater transparency would be especially helpful for training students and newcomers to the field, and giving them a more realistic understanding of “how it all really works”. One participant mentioned that their lab shares unredacted notebooks in Github that track everything that happens in every project, for full transparency. 

Ultimately though, for the purpose of empowering others to build efficiently on our published work, we agreed it does make sense to compile a “cleaned up” version that explains linearly what to do to reproduce the work — in the narrow sense of recreating the same outputs based on the same inputs, with the same tools. 

What generally stands for this in an average paper’s methods section is a text description of what was done, perhaps with some supplemental materials providing additional details about the software, links to code in Github, and specific command lines if we’re really lucky. Yet as we discussed earlier, that is almost always insufficient, because key details tend to get lost in the process of rewriting the history of the analysis. This is especially the case when multiple contributors were involved, and the person in charge of compiling the materials may not themselves fully understand all the intricacies of every step in the computational journey that they are reporting. 

Here’s a provocative idea: perhaps every author (first? corresponding?) should go through the exercise of redoing the computational analysis their paper describes based only on their own methods section. That should encourage them to provide step-by-step commands and executable assets rather than prose alone, and would go a long way toward ensuring that their methods section is in fact sufficient to reproduce the work in a way that is equivalent to how it was originally done. (Perhaps on a downsampled or truncated dataset if cost or runtime were to be a major obstacle.) 

So, daydreams aside, assuming we could motivate people to do more to make their computational work reproducible, what are the technical opportunities that could grease the wheels from a practical standpoint?

 

Computational sandboxes in the cloud

We haven’t talked about actual computing infrastructure until now because a lot of the problems we face here are fundamentally human problems — misaligned incentives and gaps in accountability. Yet infrastructure has to be part of the solution because it is the remaining piece of the problem: we often struggle to reproduce computational work because of the heterogeneity of the computing environments that are available to us, and the complexity involved in installing and managing software packages. 

This suggests an important role for cloud-based infrastructure, which enables us to share software tools, code, and data in pre-configured environments where everything just works out of the box. 

 

Clouds out of box

The group brought up several cloud-based platforms like Code Ocean, Gigantum, Binder, and Google Colab, which are all popular solutions for sharing code in cloud environments that provide at least basic computing capabilities for free (typically with paid tiers that offer better performance). This allows you to do things like test-drive software tools and code solutions without installing anything yourself, and share your own tools and code with others. 

In fact, some journals are now working with Code Ocean and others to enable reviewers to actually run code described in manuscripts without having to do any installation legwork. This presents the exciting possibility that evolving peer-review standards — such as requiring code to be made available on such a platform as part of the manuscript submission— could lead to a higher quality of published software tools. 

One limitation of the free version of the resources provided by these platforms is that they are backed by fairly minimal virtual machine configurations, which typically can’t handle the scale of the new generation of big-data analysis domains like genomics. They’re great for teaching and for toy examples, but they’re not what we need for full-scale work. 

The other important limitation is that their features tend to be rather generic; they’re not designed to support bioinformatics work, nor to accommodate the needs of an audience of researchers who may have an advanced understanding of complex statistical analyses, but minimal training in using cloud computing infrastructure. 

So what other options are there? We wrapped up the discussion by talking about initiatives like AnVIL that aim specifically to empower the life sciences research community to take full advantage of the power of the cloud, in combination with access to relevant datasets and tools through interfaces tailored to their audience. By supporting the development of platforms like Terra, Dockstore, Gen3, and others, infrastructure projects like AnVIL solve the problem of the heterogeneity of what hardware and software environments people might have access to and provides a venue for staging full-scale analyses in a way that can be shareable and work out of the box for anyone.

Yet there is a lot more to say about how Terra as a computational platform offers enormous potential for supporting and promoting computational reproducibility, particularly for published papers. In the next Terra blog, I’ll lay out some relevant ideas as well as concrete examples of efforts that are already underway to use public workspaces as “the ultimate methods supplement” to make papers effectively executable, and more besides.

The post Flocking toward computational reproducibility: Highlights from our Birds of a Feather session at BOSC 2021 appeared first on Terra.

]]>
https://terra.bio/flocking-toward-computational-reproducibility-highlights-from-our-birds-of-a-feather-session-at-bosc-2021/feed/ 0
Join us at the Bioinformatics Open Source Conference, July 29-30https://terra.bio/join-us-at-the-bioinformatics-open-source-conference-july-29-30/ https://terra.bio/join-us-at-the-bioinformatics-open-source-conference-july-29-30/#respond Tue, 04 May 2021 19:21:26 +0000 https://terrabioappdev.wpenginepowered.com/join-us-at-the-bioinformatics-open-source-conference-july-29-30/The Broad Institute's Data Sciences Platform is sponsoring this year's Bioinformatics Open Source Conference (BOSC 2021), taking place virtually July 29-30. Abstracts are due Thursday, May 6, 2021 (late posters June 3). BOSC is a great way to get involved in the bioinformatics community; we're proud to support this event and we hope to see many of you there virtually.

The post Join us at the Bioinformatics Open Source Conference, July 29-30 appeared first on Terra.

]]>
I’m delighted to announce that the Broad Institute’s Data Sciences Platform is sponsoring this year’s Bioinformatics Open Source Conference (BOSC 2021), taking place virtually July 29-30 (abstracts due May 6!), because we believe that fostering a vibrant, inclusive open-source bioinformatics community is an essential piece of our mission

If you’re not familiar with BOSC, it’s a really neat community-driven conference that brings together a wide range of people in the bioinformatics space, including tool and pipeline developers, bioinformatics core staff, educators, and of course researchers in the biological sciences —many of whom routinely wear several if not all of those hats! All these folks have in common a core belief in the value of sharing methods and tools openly and reproducibly, and they care deeply about making the bioinformatics field inclusive and accessible to all. 

 

BOSC 2019 audience

Group photo from BOSC 2019 in Portland, OR

 

The conference is organized by the Open Bioinformatics Foundation, a non-profit, volunteer-run group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. Accordingly, BOSC sessions cover all the main topics you might expect like open science, open data, computational reproducibility, standards and interoperability, as applied to various disciplines of biology, from ecology to biomedicine. Some sessions are more focused on technical aspects — workflow management systems are a staple, of course — while others are dedicated to topics like education, outreach and policy that round out the conference programme.

 

BOSC in practice

Since its launch in 2000, BOSC has been part of the ISMB/ECCB conference of the International Society for Computational Biology (ISCB). In 2018 and 2020, BOSC partnered with the Galaxy Community Conference. This year BOSC has once again paired up with its original mothership and is offered as a “Community of Special Interest” track, or COSI, within ISMB/ECCB 2021

The mothership conference schedule is a bit of a beast: two days of optional tutorials, then six days of parallel tracks (one of which is BOSC) covering a vast number of topics under the umbrella of computational biology. BOSC itself takes place during the last two days of ISMB/ECCB 2021, with about 4 hours of scheduled talks each day plus time reserved for poster sessions and interest-based gatherings known as “Birds of a Feather” (BoF). On a personal note, I really enjoy BoFs because they’re typically very informal, loosely planned (some are crowdsourced “at runtime”), and a great way to meet people in the community, exchange ideas and nucleate collaborations. 

In that same spirit, BOSC is followed by a two-day “Collaboration Fest“, or CoFest, which consists of working together on projects that will benefit the community. The projects can range from discussing specifications and updating documentation, all the way to actively hacking code, so there’s plenty of ways for everyone to participate, whether you’re new to the field or a grizzled veteran. 

 

Register and submit your abstract today

Overall, BOSC is a really great way to get involved in the bioinformatics community; whether you’re just getting started, looking to learn new skills, update your libraries, get help on a project, share your own work, or a combination of the above. We’re proud to support this event and we hope to see many of you there virtually. 

To register, visit the ISMB/ECCB conference registration page; you will have the opportunity to select BOSC as one of the COSI tracks that you plan to attend. Registration rates are based on your country of origin and your career level, and the Open Bioinformatics Foundation provides financial assistance for BOSC presenters (of both talks and posters) who need it in order to offset the cost of ISMB/ECCB registration. The abstract submission form includes an option to request this assistance; rest assured that information will not be shared with reviewers. 

Abstracts are due this Thursday May 6 (by 11:59pm EDT), so don’t wait if you want a shot at giving a talk! Or if that’s just too little time, you can plan to submit a poster abstract in the “late posters” category, which are due June 3. And with that, I’m off to finish mine…

The post Join us at the Bioinformatics Open Source Conference, July 29-30 appeared first on Terra.

]]>
https://terra.bio/join-us-at-the-bioinformatics-open-source-conference-july-29-30/feed/ 0
Bootstrap your brain (research) by taking the BICCN Omics Workshop homehttps://terra.bio/bootstrap-your-brain-biccn-omics-workshop/ https://terra.bio/bootstrap-your-brain-biccn-omics-workshop/#respond Thu, 25 Feb 2021 20:30:22 +0000 https://terrabioappdev.wpenginepowered.com/bootstrap-your-brain-biccn-omics-workshop/We teamed up with the BRAIN Initiative Cell Census Network consortium to put together a hands-on workshop focused on helping researchers get started with key resources including the Neuroscience Multi-Omic Archive, Single Cell Portal, and Terra.

The post Bootstrap your brain (research) by taking the BICCN Omics Workshop home appeared first on Terra.

]]>
It’s an exciting time to work in cell biology research, as the technological innovations of recent years are yielding ever richer and more precise pictures of the complex molecular states within our cells, even within their very nuclei. On the flip side, this modern-day Cambrian explosion of data generation techniques and computational analysis tools also creates a level of complexity that can be difficult to navigate. Apply this to an organ as intricate and subtle as the brain, and you may find yourself quickly reaching for the paracetamol.

To lower the learning curve, we recently teamed up with the BRAIN Initiative Cell Census Network consortium to put together a hands-on workshop focused on helping researchers get started with key resources including the BRAIN Initiative’s Neuroscience Multi-Omic (NeMO) Archive, the Broad’s Single Cell Portal (SCP) and of course, Terra itself.

 

Contents of the workshop

We designed our section of the workshop (Day 2: Data Processing) as a series of hands-on exercises that walk the learner through finding data in the NeMO Archive, analyzing that data in Terra with popular analysis packages like the Optimus workflow from the Human Cell Atlas project and Seurat from the Satija lab at NYGC, then publishing the results to a study in Single Cell Portal. Our ultimate goal was that after going through the workshop, a researcher would be able to apply the skills and lessons they had learned to pursue their own work using these resources with minimal to no further assistance. 

 

Main steps in workshop: Import an example 10x dataset (FASTQs) from NeMO; Align example 10x FASTQs and produce a raw count matrix with quality metrics using the Optimus workflow; Filter, normalize, and cluster the raw count matrix with the Cumulus workflow; Explore single-cell data in a Seurat Jupyter Notebook

 

Taking advantage of Terra’s support for sharing preconfigured analyses that work out of the box, we bundled all the necessary data code and tools into a public workspace that anyone could easily clone and use to work through the exercises. The workspace dashboard includes basic instructions, as well as a link to download a PDF worksheet that details every action step by step — with screenshots! — to make it possible for anyone to work through it on their own.

 

Watch the video recording

But wait, there’s more. We delivered the workshop in Jan 2021 to twentyish researchers from institutions participating in the BICCN consortium; it was over Zoom (because of course that is our life now) and the entire thing was recorded, so we’re now able to also share the full video. Given the very enthusiastic feedback we received from workshop participants, we’re very much looking forward to seeing these materials benefit others in the community.

 

 

If you’d like to work through this workshop yourself, just log in to your Terra account (or start here if you don’t have one yet) and follow the instructions outlined in the BICCN Omics Workshop workspace. Don’t hesitate to reach out to the Terra Helpdesk if you have any questions or run into any difficulties, and let us know how it goes by commenting below or in the Terra Community forum.

The post Bootstrap your brain (research) by taking the BICCN Omics Workshop home appeared first on Terra.

]]>
https://terra.bio/bootstrap-your-brain-biccn-omics-workshop/feed/ 0
Upcoming event: Panel on Genomic Data Sharing Policieshttps://terra.bio/upcoming-event-panel-on-genomic-data-sharing-policies/ https://terra.bio/upcoming-event-panel-on-genomic-data-sharing-policies/#respond Wed, 18 Nov 2020 14:48:38 +0000 https://terrabioappdev.wpenginepowered.com/upcoming-event-panel-on-genomic-data-sharing-policies/Watch the recording of a panel on Genomic Data Sharing Policies hosted by the US National Human Genome Research Institute (NHGRI) on Nov 19, 2020.

The post Upcoming event: Panel on Genomic Data Sharing Policies appeared first on Terra.

]]>
Update: You can now watch the recording of this event here.

On Thursday, Nov 19, 2020, our colleague Jonathan Lawson will participate in a panel on Genomic Data Sharing Policies hosted by the US National Human Genome Research Institute (NHGRI). Jonathan will contribute the deep experience he has gained through years of working on data access policies and systems within the context of Terra. In particular, he will touch on the concept and operation of the Data Use Oversight System (DUOS) that our organization developed for automating the application and evaluation processes involved in allowing researchers to access datasets protected by data use restrictions.

If you’re interested in learning more about this topic, you can tune in to the panel livestream at 2 PM EST, or catch the recording at your convenience if you can’t make that time. I also recommend reading Jonathan’s blog post from last year, which did a great job of introducing the key issues involved as well as outlining the technical solutions in a very accessible way.

While you’re checking out the panel, you may also be interested in the NHGRI’s video collection, which includes a variety of online events, lectures, and policy meetings that were recorded.   

The post Upcoming event: Panel on Genomic Data Sharing Policies appeared first on Terra.

]]>
https://terra.bio/upcoming-event-panel-on-genomic-data-sharing-policies/feed/ 0