To ensure the long-term preservation and usability of data it is important to ensure that data are
Adapted from: http://guides.library.cornell.edu/ecommons/formats
Cornell University offers a guide detailing common file formats and their probability for long-term preservation. Ensure your data formats are in one the high probability categories to ensure long-term preservation.
A significant challenge with any research project that includes code, scripts, or software is ensuring that the code can be run by others not intimately familiar with the project. Rigorous documentation will aid in this process, but unforeseen challenges will arise with changes in the software dependencies after publication, as well as differences in the operating system and configuration of other users. Continuous maintenance of a published codebase is time-consuming but much of this maintenance can be avoided by a little foresight prior to the publication of a codebase in a public repository. Listed below are a few tools to ensure software is sufficiently structured, documented, and reproducible prior to public release.
Cookiecutter is a command-line utility that creates new boilerplate projects from cookiecutters (project templates). A project template comprises a directory skeleton with boilerplate code, plaintext documentation, and supporting files populated by a user-created template. Cookiecutters are language and domain agnostic; they can contain templates for any plaintext files, including, but not limited to, markdown READMEs, code scripts in any language, Makefiles for building a project, and requirements files for managing project dependencies. Cookiecutters are best used at the beginning of a research project to encourage consistent documentation and meaningful project structure.
Three common problems with reproducing the results of publications generated using included scripts or software are:
Docker provides a way to escape these problems by providing an encapsulated software environment where all the software requirements are made explicit using a simple text file. In this way, it is possible to ensure your code will run exactly the same on any system, independent of operating system or configuration.
For more information on using Docker for reproducible research please refer to the excellent work by Carl Boettiger An introduction to Docker for reproducible research, with examples from the R environment as well as the hands-on tutorial Author Carpentry : Docker for reproducible research.