{"id":269,"date":"2021-01-15T16:53:45","date_gmt":"2021-01-15T16:53:45","guid":{"rendered":"https:\/\/terrabioappdev.wpenginepowered.com\/review-paper-getting-started-with-workflows\/"},"modified":"2023-12-27T04:54:25","modified_gmt":"2023-12-27T04:54:25","slug":"review-paper-getting-started-with-workflows","status":"publish","type":"post","link":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/","title":{"rendered":"A must-read review paper for getting started with bioinformatics workflows"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">New year, <\/span><a href=\"https:\/\/terra.bio\/exciting-new-horizon-for-terra-with-microsoft\/\"><span style=\"font-weight: 400;\">new partnership<\/span><\/a><span style=\"font-weight: 400;\">\u2026 and a new blog series focusing on highlighting papers that we think will be of interest to many of you. For this first iteration, we review a review paper (review-ception!) about workflow systems, coming out of <\/span><a href=\"http:\/\/ivory.idyll.org\/lab\/\"><span style=\"font-weight: 400;\">C. Titus Brown&#8217;s lab<\/span><\/a><span style=\"font-weight: 400;\"> at UC Davis and fresh off the virtual press over at GigaScience.\u00a0<\/span><\/p>\n<blockquote><p><b>Taylor Reiter, Phillip T Brooks, Luiz Irber, Shannon E K Joslin, Charles M Reid, Camille Scott, C Titus Brown, N Tessa Pierce-Ward, <\/b><span style=\"font-weight: 400;\"><strong><span style=\"color: #008000;\">Streamlining data-intensive biology with workflow systems<\/span><\/strong>, <\/span><i><span style=\"font-weight: 400;\">GigaScience<\/span><\/i><span style=\"font-weight: 400;\">, Volume 10, Issue 1, January 2021, giaa140, <\/span><a href=\"https:\/\/doi.org\/10.1093\/gigascience\/giaa140\"><span style=\"font-weight: 400;\">https:\/\/doi.org\/10.1093\/gigascience\/giaa140<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400;\">As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.<\/span><\/p><\/blockquote>\n<p>Read on to learn why this paper is a must-read if you&#8217;re getting started with workflows.<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p><span style=\"color: #008000;\"><strong>This paper in a nutshell:<\/strong><\/span><\/p>\n<h2>Everything you need to know to get started with bioinformatics workflows<\/h2>\n<p><span style=\"font-weight: 400;\">Seriously, this review covers an impressive amount of ground.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">It starts with an accessible explanation of what workflows are, and why they are such an important and rapidly growing part of biological data analysis, which I expect will be very helpful to anyone who might be new to the challenges posed by Really Large Datasets\u2122.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Then, the authors provide a clear and concise review of the main types of workflows, languages and systems that you might encounter \u2014 including WDL, Terra&#8217;s current workflow language of choice, which they identify alongside CWL as &#8220;workflow specification formats that are more geared towards scalability, making them ideal for production-level pipelines with hundreds of thousands of samples&#8221; (yep, that checks out). They also touch on software management systems, including container systems (like Docker) and package managers (like Conda), and how these systems integrate with workflow systems.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">That content alone is already solidly informative, yet we&#8217;re not even at the halfway point yet.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There&#8217;s a lot more in there, starting with a set of best-practice recommendations for managing a workflow-based project. This includes what to document (everything), how to document it (consistently) and what tools exist for visualization, version control and collaboration. I was nodding so hard reading that section, I pulled a neck muscle.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From there, the authors move to a series of practical recommendations for actually getting started with workflows, including finding and accessing compute resources. As stated in the abstract, these are &#8220;mainly focused on high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.&#8221; I found myself agreeing vehemently once more \u2014 the &#8220;Strategies for troubleshooting&#8221; should be required reading for every researcher who ever comes within three feet (~1m) of a computer, regardless of their field of study.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">I could go on, but frankly at this point you&#8217;d be better off just reading the review itself. It&#8217;s solidly researched and well supported, insightful, clearly written and just beautifully scoped overall \u2014 well worth your time if you&#8217;re somewhat or completely new to workflows. Or even if you&#8217;re not so new and you&#8217;re willing to consider that your habitual practices might still have some room for improvement!<\/span><\/p>\n<p><strong><em>For an introduction to running workflows on Terra, see the <a href=\"https:\/\/support.terra.bio\/hc\/en-us\/articles\/360034701991\">Workflows<\/a> documentation.<\/em><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>New year, new partnership\u2026 and a new blog series focusing on highlighting papers that we think will be of interest to many of you. For this first iteration, we review a review paper (review-ception!) fresh off the virtual press over at GigaScience, coming out of C. Titus Brown&#8217;s lab at UC Davis, on the topic of workflow systems.\u00a0<\/p>\n","protected":false},"author":4,"featured_media":233,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[12,42,58,32],"tags":[74],"class_list":["post-269","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analysis","category-community","category-publications","category-workflows","tag-bioinformatics"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A must-read review paper for getting started with bioinformatics workflows - Terra<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A must-read review paper for getting started with bioinformatics workflows - Terra\" \/>\n<meta property=\"og:description\" content=\"New year, new partnership\u2026 and a new blog series focusing on highlighting papers that we think will be of interest to many of you. For this first iteration, we review a review paper (review-ception!) fresh off the virtual press over at GigaScience, coming out of C. Titus Brown&#039;s lab at UC Davis, on the topic of workflow systems.\u00a0\" \/>\n<meta property=\"og:url\" content=\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/\" \/>\n<meta property=\"og:site_name\" content=\"Terra\" \/>\n<meta property=\"article:published_time\" content=\"2021-01-15T16:53:45+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-27T04:54:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"627\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Geraldine Van der Auwera\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Geraldine Van der Auwera\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/\"},\"author\":{\"name\":\"Geraldine Van der Auwera\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/ad0522d0b331a5e08fa1733f65086ee2\"},\"headline\":\"A must-read review paper for getting started with bioinformatics workflows\",\"datePublished\":\"2021-01-15T16:53:45+00:00\",\"dateModified\":\"2023-12-27T04:54:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/\"},\"wordCount\":679,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/terra.bio\/#organization\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png\",\"keywords\":[\"bioinformatics\"],\"articleSection\":[\"Analysis\",\"Community\",\"Publications\",\"Workflows\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/\",\"url\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/\",\"name\":\"A must-read review paper for getting started with bioinformatics workflows - Terra\",\"isPartOf\":{\"@id\":\"https:\/\/terra.bio\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png\",\"datePublished\":\"2021-01-15T16:53:45+00:00\",\"dateModified\":\"2023-12-27T04:54:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#primaryimage\",\"url\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png\",\"contentUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png\",\"width\":1200,\"height\":627,\"caption\":\"Lady_Scientist_Molecules\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/terra.bio\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A must-read review paper for getting started with bioinformatics workflows\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/terra.bio\/#website\",\"url\":\"https:\/\/terra.bio\/\",\"name\":\"Terra\",\"description\":\"Science at Scale\",\"publisher\":{\"@id\":\"https:\/\/terra.bio\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/terra.bio\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/terra.bio\/#organization\",\"name\":\"Terra\",\"url\":\"https:\/\/terra.bio\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp\",\"contentUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp\",\"width\":287,\"height\":318,\"caption\":\"Terra\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/ad0522d0b331a5e08fa1733f65086ee2\",\"name\":\"Geraldine Van der Auwera\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/d73bdaf6740465b385e0e3b290786d8cb9d9d548eadec23364254ba06c85204b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/d73bdaf6740465b385e0e3b290786d8cb9d9d548eadec23364254ba06c85204b?s=96&d=mm&r=g\",\"caption\":\"Geraldine Van der Auwera\"},\"sameAs\":[\"https:\/\/app.terra.bio\/\"],\"url\":\"https:\/\/terra.bio\/author\/geraldinevanterra\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A must-read review paper for getting started with bioinformatics workflows - Terra","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/","og_locale":"en_US","og_type":"article","og_title":"A must-read review paper for getting started with bioinformatics workflows - Terra","og_description":"New year, new partnership\u2026 and a new blog series focusing on highlighting papers that we think will be of interest to many of you. For this first iteration, we review a review paper (review-ception!) fresh off the virtual press over at GigaScience, coming out of C. Titus Brown's lab at UC Davis, on the topic of workflow systems.\u00a0","og_url":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/","og_site_name":"Terra","article_published_time":"2021-01-15T16:53:45+00:00","article_modified_time":"2023-12-27T04:54:25+00:00","og_image":[{"width":1200,"height":627,"url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png","type":"image\/png"}],"author":"Geraldine Van der Auwera","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Geraldine Van der Auwera","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#article","isPartOf":{"@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/"},"author":{"name":"Geraldine Van der Auwera","@id":"https:\/\/terra.bio\/#\/schema\/person\/ad0522d0b331a5e08fa1733f65086ee2"},"headline":"A must-read review paper for getting started with bioinformatics workflows","datePublished":"2021-01-15T16:53:45+00:00","dateModified":"2023-12-27T04:54:25+00:00","mainEntityOfPage":{"@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/"},"wordCount":679,"commentCount":0,"publisher":{"@id":"https:\/\/terra.bio\/#organization"},"image":{"@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#primaryimage"},"thumbnailUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png","keywords":["bioinformatics"],"articleSection":["Analysis","Community","Publications","Workflows"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/","url":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/","name":"A must-read review paper for getting started with bioinformatics workflows - Terra","isPartOf":{"@id":"https:\/\/terra.bio\/#website"},"primaryImageOfPage":{"@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#primaryimage"},"image":{"@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#primaryimage"},"thumbnailUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png","datePublished":"2021-01-15T16:53:45+00:00","dateModified":"2023-12-27T04:54:25+00:00","breadcrumb":{"@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#primaryimage","url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png","contentUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Lady_Scientist_Molecules.png","width":1200,"height":627,"caption":"Lady_Scientist_Molecules"},{"@type":"BreadcrumbList","@id":"https:\/\/terra.bio\/review-paper-getting-started-with-workflows\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/terra.bio\/"},{"@type":"ListItem","position":2,"name":"A must-read review paper for getting started with bioinformatics workflows"}]},{"@type":"WebSite","@id":"https:\/\/terra.bio\/#website","url":"https:\/\/terra.bio\/","name":"Terra","description":"Science at Scale","publisher":{"@id":"https:\/\/terra.bio\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/terra.bio\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/terra.bio\/#organization","name":"Terra","url":"https:\/\/terra.bio\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/#\/schema\/logo\/image\/","url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp","contentUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp","width":287,"height":318,"caption":"Terra"},"image":{"@id":"https:\/\/terra.bio\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/terra.bio\/#\/schema\/person\/ad0522d0b331a5e08fa1733f65086ee2","name":"Geraldine Van der Auwera","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/d73bdaf6740465b385e0e3b290786d8cb9d9d548eadec23364254ba06c85204b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d73bdaf6740465b385e0e3b290786d8cb9d9d548eadec23364254ba06c85204b?s=96&d=mm&r=g","caption":"Geraldine Van der Auwera"},"sameAs":["https:\/\/app.terra.bio\/"],"url":"https:\/\/terra.bio\/author\/geraldinevanterra\/"}]}},"_links":{"self":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts\/269","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/comments?post=269"}],"version-history":[{"count":0,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts\/269\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/media\/233"}],"wp:attachment":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/media?parent=269"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/categories?post=269"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/tags?post=269"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}