{"id":318,"date":"2021-04-14T12:08:39","date_gmt":"2021-04-14T12:08:39","guid":{"rendered":"https:\/\/terrabioappdev.wpenginepowered.com\/deleting-intermediate-workflow-outputs\/"},"modified":"2023-12-27T04:54:39","modified_gmt":"2023-12-27T04:54:39","slug":"deleting-intermediate-workflow-outputs","status":"publish","type":"post","link":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/","title":{"rendered":"Reduce storage costs by deleting intermediate workflow outputs"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Many workflows generate intermediate files that you won&#8217;t ever use again once the pipeline has run to completion. If those files are fairly small, it&#8217;s a minor nuisance that you can probably just ignore. However, if the files are large, or if there are very many of them, you can end up incurring significant storage costs for no reason. One large-scale project we support realized that, at one point, 85% of their storage costs were due to intermediate files that no one ever looked at. Yikes!<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The good news is that we recently introduced a couple of options for removing those pesky intermediate files without having to manually trawl through the execution directories where they are stored. There&#8217;s a &#8220;proactive&#8221; option, which involves checking a box in the workflow configuration before you launch it, that tells Terra &#8220;go ahead and delete intermediate files when the workflow has run successfully&#8221;. And for cases where you already ran the workflow without checking that box, there&#8217;s a &#8220;reactive&#8221; option, which involves running some custom functions in a notebook to delete intermediates in bulk after the fact.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The basic operations are all documented <\/span><a href=\"https:\/\/support.terra.bio\/hc\/en-us\/articles\/360039681632\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">, but personally, the thought of deleting data gives me cold sweats, so I thought it might be useful to go through a concrete example of how all this works in practice. There&#8217;s some context to why it&#8217;s set up the way it is that might not be obvious if you&#8217;re not a seasoned WDL developer. Being aware of that context may help you make more informed decisions about how to tackle intermediates in your own work, regardless of whether you write any WDLs yourself or not.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Meet our example workflow<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">I picked out a <\/span><a href=\"https:\/\/github.com\/broadinstitute\/genomics-in-the-cloud\/blob\/main\/workflows\/scatter-hc\/scatter-haplotypecaller.wdl\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">two-step workflow<\/span><\/a><span style=\"font-weight: 400;\"> to illustrate the WDL context in a way that should be accessible even if you don&#8217;t have much experience with WDL.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone wp-image-883 size-large\" src=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/scatter-hc-diagram-1024x518.png\" alt=\"\" width=\"1024\" height=\"518\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">This workflow runs a genomics tool called GATK HaplotypeCaller, which identifies variants in genome sequencing data (stored in a file format called <\/span><b>BAM<\/b><span style=\"font-weight: 400;\">). It&#8217;s a tool that can take a long time to run on a whole genome, so we split up the genome into smaller <\/span><b>intervals<\/b><span style=\"font-weight: 400;\"> and run the tool separately on each interval with a <\/span><b>scatter<\/b><span style=\"font-weight: 400;\"> function. In the cloud, we can run all these scatter jobs in parallel, so the overall work will be done much sooner. But \u2014of course, there&#8217;s a but\u2014 each job produces its own output file, so that leaves us with a whole lot of separate files (called <\/span><b>GVCFs<\/b><span style=\"font-weight: 400;\">) containing results for each of the genome regions. That&#8217;s a bummer because, for our next analysis, we want one single file containing results for the entire genome. To that end, we include a file merging step that combines all the results into a single GVCF file.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So in the end the workflow gives us what we want \u2014a single file with all the output data\u2014 but we still have all the per-interval files in storage, which are entirely redundant with our final output. Those are all &#8220;intermediate&#8221; files we want to get rid of.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Configuring the workflow to delete intermediate outputs\u00a0<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Let&#8217;s decide upfront that we want to delete the intermediate files once the workflow is done. According to the docs, all we need to do to get Terra to delete them is to check the box labeled <\/span><b>Delete intermediate outputs<\/b><span style=\"font-weight: 400;\"> in the workflow configuration, as shown below:\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-884\" src=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/delete-intermediates.png\" alt=\"\" width=\"1648\" height=\"408\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">That seems pretty straightforward. But how does Terra know which output files are intermediates and which are the final outputs? Is it just based on whatever gets generated last? (spoiler: no)<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Final outputs are defined in the WDL code<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">The WDL language allows workflow authors to specify what should be considered the final output(s) of the workflow, <\/span><i><span style=\"font-weight: 400;\">i.e.<\/span><\/i><span style=\"font-weight: 400;\"> files they want to keep around once the execution is complete. If we look at the WDL code for the workflow from our example, this is the important bit: the <\/span><b>workflow output block<\/b><span style=\"font-weight: 400;\"> (lines 38-40)<\/span><\/p>\n<p>&nbsp;<\/p>\n<pre><span style=\"font-weight: 400;\">output <\/span><span style=\"font-weight: 400;\">{<\/span>\n<span style=\"font-weight: 400;\"> \u00a0\u00a0\u00a0File output_gvcf = MergeVCFs.merged_vcf<\/span>\n<span style=\"font-weight: 400;\">}<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">This specifies that the &#8220;<\/span><span style=\"font-weight: 400;\">merged_vcf<\/span><span style=\"font-weight: 400;\">&#8221; file produced by the &#8220;<\/span><span style=\"font-weight: 400;\">MergeVCFs<\/span><span style=\"font-weight: 400;\">&#8221; task should be considered a final output of the workflow, under the name &#8220;<\/span><span style=\"font-weight: 400;\">output_gvcf<\/span><span style=\"font-weight: 400;\">&#8220;.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Any task output not listed in there will be considered an intermediate. And just to be clear, yes, you can list multiple outputs (including outputs from different tasks) in the workflow output block. See lines 156-217 of <\/span><a href=\"https:\/\/github.com\/broadinstitute\/genomics-in-the-cloud\/blob\/main\/workflows\/mystery-2\/WholeGenomeGermlineSingleSample.wdl\"><span style=\"font-weight: 400;\">this other workflow<\/span><\/a><span style=\"font-weight: 400;\"> for an example of an impressively long list of final outputs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The nice thing about this system is that WDL authors can use it to mark as &#8220;final output&#8221; any output produced at any step of the workflow, not just the ones that are run last. So if you have a workflow that is composed of multiple steps (maybe even multiple branches), and there&#8217;s an output produced by one of the early steps that you care about, you can list that output in the workflow output block too.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Terra uses the WDL output definitions to determine what to keep<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">As you might have guessed already, Terra is going to use those output definitions from the WDL to determine what files to keep vs. what to delete when you enable the &#8220;Delete intermediate outputs&#8221; option.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When you import a workflow into your workspace, Terra parses the code and identifies two sets of things: the <\/span><b>workflow inputs<\/b><span style=\"font-weight: 400;\">, which it lists in the INPUTS tab of the workflow configuration page, and the <\/span><b>workflow outputs<\/b><span style=\"font-weight: 400;\">, which it lists in the OUTPUTS tab, as shown below:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-885\" src=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/outputs-tab.png\" alt=\"\" width=\"2492\" height=\"826\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Here you see the <\/span><b>output_gvcf<\/b><span style=\"font-weight: 400;\"> that we saw earlier defined in our workflow&#8217;s output block. Conversely, you <\/span><i><span style=\"font-weight: 400;\">don&#8217;t <\/span><\/i><span style=\"font-weight: 400;\">see listed any of the outputs from the first step of the workflow, which is scattered over genomic intervals. Those unlisted outputs actually get saved automatically to cloud storage at the end of each job, and they stay there indefinitely \u2014and somewhat invisibly\u2014 unless you enabled the intermediate deletion option. If you did enable the deletion option, the system will perform a cleanup operation that deletes all unlisted outputs once the full workflow has run to completion.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">And that&#8217;s how the &#8220;proactive&#8221; deletion option helps you save money by avoiding pointless storage costs. But we&#8217;re not done; there&#8217;s another way to clean up intermediate outputs.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Deleting intermediate outputs after the fact<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Let&#8217;s pretend we ran our workflow without the deletion option enabled, or maybe we ran it a few months ago before that option even existed. Now we have these intermediate outputs sitting around, and we&#8217;d like to get rid of them. This is a bit trickier \u2014there&#8217;s not a simple checkbox that works in all cases\u2014 but if our use case fits certain requirements, it is possible to do this without manually combing through execution directories.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The key to this is a Jupyter notebook, which you can find in <\/span><strong><a href=\"https:\/\/app.terra.bio\/#workspaces\/help-terra\/Terra-Tools\" target=\"_blank\" rel=\"noopener\">this workspace<\/a><\/strong><span style=\"font-weight: 400;\">, that contains some template code for deleting intermediate outputs independently of the workflow system. This uses a system called an API, which lets you execute certain actions programmatically, bypassing Terra&#8217;s graphical user interface. The template code in the notebook is set up to make an &#8220;API call&#8221;, i.e. send a correctly formatted instruction that will trigger the deletion process; you just have to edit the bits that specify which workspace you want to clean up.\u00a0\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<pre><span style=\"font-weight: 400;\">args<\/span><span style=\"font-weight: 400;\">=<\/span><span style=\"font-weight: 400;\">[<\/span><span style=\"font-weight: 400;\">\"fissfc\"<\/span><span style=\"font-weight: 400;\">, <\/span><span style=\"font-weight: 400;\">\"-V\"<\/span><span style=\"font-weight: 400;\">,<\/span><span style=\"font-weight: 400;\">\"mop\"<\/span><span style=\"font-weight: 400;\">, <\/span><span style=\"font-weight: 400;\">\"-w\"<\/span><span style=\"font-weight: 400;\">, WORKSPACE_NAME, <\/span><span style=\"font-weight: 400;\">\"-p\"<\/span><span style=\"font-weight: 400;\">, WORKSPACE_NAMESPACE]<\/span>\n\n<span style=\"font-weight: 400;\">fiss_func(args)<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">However, there is a catch: this approach only works for workflow submissions that were configured to use data tables. Why? Well, to make it make sense, we need to take a tiny detour through how Terra updates data tables when you run workflows on their data.\u00a0<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">If you&#8217;re not familiar with Terra&#8217;s data tables system, and the related concept of Data Model, have a look at my <\/span><\/i><a href=\"https:\/\/terra.bio\/new-resources-for-unlocking-the-power-of-terras-data-tables\/\"><i><span style=\"font-weight: 400;\">introductory post<\/span><\/i><\/a><i><span style=\"font-weight: 400;\"> on this topic, which explains the basic idea and links to some relevant docs and tutorial videos.<\/span><\/i><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Terra uses the workflow output definitions to update the data table(s)\u00a0<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Remember the screenshot from earlier, showing the OUTPUTS tab in the workflow configuration? That is not something Terra shows you just as an FYI; when you&#8217;re using data tables to configure workflow inputs, output definitions play an important functional role.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Once a workflow completes successfully, Terra will look up each output listed in that tab and automatically add a link to the output file&#8217;s location in the appropriate data table, based on how you set up the workflow inputs. As a result, you&#8217;ll be able to find that piece of data \u2014and run subsequent analyses on it\u2014 without having to search for it in the storage bucket.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The technical term for this functionality is &#8220;<\/span><b>binding outputs to the data model<\/b><span style=\"font-weight: 400;\">&#8220;, and it&#8217;s incredibly useful. The data tables system admittedly comes with a learning curve, and it&#8217;s technically optional since it is possible to bypass it and run workflows using just direct file paths as inputs \u2014 but it&#8217;s worth the effort to master data tables because they will lift mountains for you down the road.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">And, here&#8217;s why we brought this up here: the notebook-based intermediate deletion option relies entirely on this.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Intermediate deletion after the fact relies on final outputs being listed in data tables<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Here&#8217;s the thing. We can flip the output binding logic \u2014if it&#8217;s a final output, add it to the table\u2014 to retroactively determine, given a table, what were the final outputs of a workflow. If it&#8217;s in the table, it must have been a final output. Accordingly, the deletion instruction that the notebook uses to clean up intermediates is engineered to look up each workflow submission in a given workspace, check which of its outputs are listed in one of the data tables, and delete any other output files within the execution directory\u00a0 (log files, which are spared).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Of course, that &#8220;given a table&#8221; clause is doing a lot of lifting here. If there is no table, because we ran the workflow directly on file paths, then we don&#8217;t have a straightforward way to determine which files we should care about.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span style=\"font-weight: 400;\">Try it out on some data you don&#8217;t care about first!<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">I hope this post helped you better understand these two deletion approaches. For your next step, I vigorously encourage you to try out both of them on data that you don&#8217;t care about, to check that the system is behaving the way you understood it, and to make sure that you have a good handle on the caveats. Once the data is gone, it is <\/span><i><span style=\"font-weight: 400;\">gone<\/span><\/i><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If it helps, the example workflow I referenced earlier is available in a <\/span><a href=\"https:\/\/app.terra.bio\/#workspaces\/help-gatk\/Genomics-in-the-Cloud-v1\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\"><strong>public workspace<\/strong><\/span><\/a><span style=\"font-weight: 400;\"> that you can clone as a testbed. The workspace includes two different configurations for the same workflow; <\/span><a href=\"https:\/\/app.terra.bio\/#workspaces\/help-gatk\/Genomics-in-the-Cloud-v1\/workflows\/help-gatk\/scatter-hc.data-table\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">one for using data tables<\/span><\/a><span style=\"font-weight: 400;\">, and <\/span><a href=\"https:\/\/app.terra.bio\/#workspaces\/help-gatk\/Genomics-in-the-Cloud-v1\/workflows\/help-gatk\/scatter-hc.filepaths\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">another for using direct file paths as inputs<\/span><\/a><span style=\"font-weight: 400;\">, so it&#8217;s convenient for comparing the two approaches across different use cases. Good luck and don&#8217;t hesitate to reach out to the <\/span><a href=\"https:\/\/terra.bio\/resources\/help\/\"><span style=\"font-weight: 400;\">Terra Helpdesk<\/span><\/a><span style=\"font-weight: 400;\"> if you run into any trouble.\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many workflows generate intermediate files that you won\u2019t ever use again once the pipeline has run to completion. You can reduce your data footprint \u2014 and your storage costs! \u2014 by getting Terra to delete them for you.<\/p>\n","protected":false},"author":4,"featured_media":322,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[55,24,25,43,32],"tags":[],"class_list":["post-318","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cost-control","category-data-management","category-data-model","category-features","category-workflows"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Reduce storage costs by deleting intermediate workflow outputs - Terra<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Reduce storage costs by deleting intermediate workflow outputs - Terra\" \/>\n<meta property=\"og:description\" content=\"Many workflows generate intermediate files that you won\u2019t ever use again once the pipeline has run to completion. You can reduce your data footprint \u2014 and your storage costs! \u2014 by getting Terra to delete them for you.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/\" \/>\n<meta property=\"og:site_name\" content=\"Terra\" \/>\n<meta property=\"article:published_time\" content=\"2021-04-14T12:08:39+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-27T04:54:39+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"627\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Geraldine Van der Auwera\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Geraldine Van der Auwera\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/\"},\"author\":{\"name\":\"Geraldine Van der Auwera\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/ad0522d0b331a5e08fa1733f65086ee2\"},\"headline\":\"Reduce storage costs by deleting intermediate workflow outputs\",\"datePublished\":\"2021-04-14T12:08:39+00:00\",\"dateModified\":\"2023-12-27T04:54:39+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/\"},\"wordCount\":1861,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/terra.bio\/#organization\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png\",\"articleSection\":[\"Cost Control\",\"Data Management\",\"Data Model\",\"Features\",\"Workflows\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/\",\"url\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/\",\"name\":\"Reduce storage costs by deleting intermediate workflow outputs - Terra\",\"isPartOf\":{\"@id\":\"https:\/\/terra.bio\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png\",\"datePublished\":\"2021-04-14T12:08:39+00:00\",\"dateModified\":\"2023-12-27T04:54:39+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#primaryimage\",\"url\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png\",\"contentUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png\",\"width\":1200,\"height\":627,\"caption\":\"Coral_reef 1\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/terra.bio\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Reduce storage costs by deleting intermediate workflow outputs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/terra.bio\/#website\",\"url\":\"https:\/\/terra.bio\/\",\"name\":\"Terra\",\"description\":\"Science at Scale\",\"publisher\":{\"@id\":\"https:\/\/terra.bio\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/terra.bio\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/terra.bio\/#organization\",\"name\":\"Terra\",\"url\":\"https:\/\/terra.bio\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp\",\"contentUrl\":\"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp\",\"width\":287,\"height\":318,\"caption\":\"Terra\"},\"image\":{\"@id\":\"https:\/\/terra.bio\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/ad0522d0b331a5e08fa1733f65086ee2\",\"name\":\"Geraldine Van der Auwera\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/terra.bio\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/d73bdaf6740465b385e0e3b290786d8cb9d9d548eadec23364254ba06c85204b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/d73bdaf6740465b385e0e3b290786d8cb9d9d548eadec23364254ba06c85204b?s=96&d=mm&r=g\",\"caption\":\"Geraldine Van der Auwera\"},\"sameAs\":[\"https:\/\/app.terra.bio\/\"],\"url\":\"https:\/\/terra.bio\/author\/geraldinevanterra\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Reduce storage costs by deleting intermediate workflow outputs - Terra","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/","og_locale":"en_US","og_type":"article","og_title":"Reduce storage costs by deleting intermediate workflow outputs - Terra","og_description":"Many workflows generate intermediate files that you won\u2019t ever use again once the pipeline has run to completion. You can reduce your data footprint \u2014 and your storage costs! \u2014 by getting Terra to delete them for you.","og_url":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/","og_site_name":"Terra","article_published_time":"2021-04-14T12:08:39+00:00","article_modified_time":"2023-12-27T04:54:39+00:00","og_image":[{"width":1200,"height":627,"url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png","type":"image\/png"}],"author":"Geraldine Van der Auwera","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Geraldine Van der Auwera","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#article","isPartOf":{"@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/"},"author":{"name":"Geraldine Van der Auwera","@id":"https:\/\/terra.bio\/#\/schema\/person\/ad0522d0b331a5e08fa1733f65086ee2"},"headline":"Reduce storage costs by deleting intermediate workflow outputs","datePublished":"2021-04-14T12:08:39+00:00","dateModified":"2023-12-27T04:54:39+00:00","mainEntityOfPage":{"@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/"},"wordCount":1861,"commentCount":0,"publisher":{"@id":"https:\/\/terra.bio\/#organization"},"image":{"@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#primaryimage"},"thumbnailUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png","articleSection":["Cost Control","Data Management","Data Model","Features","Workflows"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/","url":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/","name":"Reduce storage costs by deleting intermediate workflow outputs - Terra","isPartOf":{"@id":"https:\/\/terra.bio\/#website"},"primaryImageOfPage":{"@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#primaryimage"},"image":{"@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#primaryimage"},"thumbnailUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png","datePublished":"2021-04-14T12:08:39+00:00","dateModified":"2023-12-27T04:54:39+00:00","breadcrumb":{"@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#primaryimage","url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png","contentUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Coral_reef-1.png","width":1200,"height":627,"caption":"Coral_reef 1"},{"@type":"BreadcrumbList","@id":"https:\/\/terra.bio\/deleting-intermediate-workflow-outputs\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/terra.bio\/"},{"@type":"ListItem","position":2,"name":"Reduce storage costs by deleting intermediate workflow outputs"}]},{"@type":"WebSite","@id":"https:\/\/terra.bio\/#website","url":"https:\/\/terra.bio\/","name":"Terra","description":"Science at Scale","publisher":{"@id":"https:\/\/terra.bio\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/terra.bio\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/terra.bio\/#organization","name":"Terra","url":"https:\/\/terra.bio\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/#\/schema\/logo\/image\/","url":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp","contentUrl":"https:\/\/terra.bio\/wp-content\/uploads\/2023\/12\/Terra-Bio-App@2x.webp","width":287,"height":318,"caption":"Terra"},"image":{"@id":"https:\/\/terra.bio\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/terra.bio\/#\/schema\/person\/ad0522d0b331a5e08fa1733f65086ee2","name":"Geraldine Van der Auwera","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/terra.bio\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/d73bdaf6740465b385e0e3b290786d8cb9d9d548eadec23364254ba06c85204b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d73bdaf6740465b385e0e3b290786d8cb9d9d548eadec23364254ba06c85204b?s=96&d=mm&r=g","caption":"Geraldine Van der Auwera"},"sameAs":["https:\/\/app.terra.bio\/"],"url":"https:\/\/terra.bio\/author\/geraldinevanterra\/"}]}},"_links":{"self":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts\/318","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/comments?post=318"}],"version-history":[{"count":0,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/posts\/318\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/media\/322"}],"wp:attachment":[{"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/media?parent=318"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/categories?post=318"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/terra.bio\/wp-json\/wp\/v2\/tags?post=318"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}