Capturing the 'Whole Tale' of Computational Research: Reproducibility in Computing Environments

Reading time: 5 minute
...

📝 Original Info

  • Title: Capturing the ‘Whole Tale’ of Computational Research: Reproducibility in Computing Environments
  • ArXiv ID: 1610.09958
  • Date: 2016-11-01
  • Authors: Bertram Ludaescher, Kyle Chard, Niall Gaffney, Matthew B. Jones, Jaroslaw Nabrzyski, Victoria Stodden, Matthew Turk

📝 Abstract

We present an overview of the recently funded "Merging Science and Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research publications to their associated digital scholarly objects such as the data, code, and workflows. To enable this, Whole Tale will create an environment where researchers can collaborate on data, workspaces, and workflows and then publish them for future adoption or modification. Published data and applications will be consumed either directly by users using the Whole Tale environment or can be integrated into existing or future domain Science Gateways.

💡 Deep Analysis

Figure 1

📄 Full Content

Capturing the “Whole Tale” of Computational Research: Reproducibility in Computing Environments

Bertram Ludäscher, School of Information Sciences, University of Illinois at Urbana-Champaign; Kyle Chard, University of Chicago; Niall Gaffney, Texas Advanced Computing Center, University of Texas at Austin; Matthew B. Jones, University of California Santa Barbara; Jaroslaw Nabrzyski, University of Notre Dame; Victoria Stodden,* School of Information Sciences, University of Illinois at Urbana-Champaign; and Matthew Turk, School of Information Sciences, University of Illinois at Urbana-Champaign *Corresponding author address: School of Information Sciences, University of Illinois at Urbana- Champaign, Champaign, IL 61820, USA; email: vcs@illinois.edu

Abstract: We present an overview of the recently funded “Merging Science and Cyberinfrastructure Pathways: The Whole Tale” project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to- publication lifecycle, and 2) systematically and persistently link research publications to their associated digital scholarly objects such as the data, code, and workflows. To enable this, Whole Tale will create an environment where researchers can collaborate on data, workspaces, and workflows and then publish them for future adoption or modification. Published data and applications will be consumed either directly by users using the Whole Tale environment or can be integrated into existing or future domain Science Gateways.

  1. Introduction Computational resources and scientific services are now nearly ubiquitous in scientific investigations; however, the applications used to discover and analyze data are extremely fragmented and can be intractable, creating a large and meaningful gap between the research processes and the ability to verify the findings [1]. There is frequently no way to trace findings in publications back through the originating computations and data. The Whole Tale project aims to remedy this gap in two ways: 1) integrate existing cyberinfrastructure that supports the entire computational process underlying discovery, thus simplifying the ability for researchers to conduct computational research; and 2) and capture and deliver relevant workflow and processing provenance that will be discoverable and accessible from the associated publication. Whole Tale envisions a collaborative environment where data providers, application developers, and data consumers collaborate and create end-to-end workflows converting data to information using reproducible computational methods.
  2. The Whole Tale Research Environment Whole Tale will enable a research environment that seamlessly supports computational tools for tackling pressing research problems in a way that is scalable and reproducible but that still supports software familiar to current researchers. Our aim is to support scientific investigation at all computational scales, from HPC environments to single-user endeavors (the “long tail” of science). We will provide a research environment that captures and, at the time of publication, exposes salient details of the research via access to persistent versions of the data and code used, workflow provenance, data lineage, parameter settings, and output data. Our approach differs, and is complementary to, that provided by some science gateways in that we rely on utilization of commodity tools, rather than bespoke, domain- specific instruments. The Whole Tale environment will provide linkages to existing cyberinfrastructure to provide a research environment that will be instrumented with workflow and reproducibility tools to aid in capturing and storing key scripts, function calls, parameter settings and machine state information that are essential for reproducing the results.
    The cyberinfrastructure will be exposed to users through well-known applications such as Jupyter Notebooks that support commonly used data analysis languages including R and Python. Storage will be exposed to users through several interfaces including Globus, a web based filesystem interface, FUSE modules for filesystem- level access to local and remote data repositories, the DataONE federation of data repositories, and an open source Cloud storage environment Nextcloud. By building data repository access into modules that present file-like interfaces, we further lower the barrier to access for remote data stores. The system will also incorporate Globus Auth—a unified an identity management system that will allow users to leverage their own campus, ORCID identifier, or other existing identities. Whole Tale will also enable the deployment of Dockerfile- based environments to support extensible and customizable research workflows.

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut