Data Curation is the documentation, management, and preservation of research data to produce datasets that are FAIR: Findable, Accessible, Interoperable, and Reusable. By curating research data we add value by enhancing data sets for current use, as well as future discovery and reuse.
To curate your data, follow the CURATE process:
Check that your data is well documented, with at minimum, a readme, and preferably with a data-dictionary, codebook, and/or rich metadata. If your research project contains code, check to ensure that the code runs, and is sufficiently documented that someone new to the project can run it without additional instruction.
Understand both the data and external requirements placed on the data by funders, institutions, publishers, and prospective repositories. Be aware of any embargoes on data release, or privacy and/or intellectual property requirements surrounding the data.
Request any missing or unclear data or metadata from the responsible party for the research project. Reach out to all project participants to identify any additional documentation, lab notebooks, or other documents that may have slipped through the cracks.
Augment the metadata to include any elements that were clearly missing during the Check phase. Ensure that a persistent identifier is assigned to your data, such as a digital object identifier (DOI), this way your data will be searchable and discoverable.
Transform your file formats from proprietary formats to open, platform-independent formats when possible. Transform your software so that dependencies are made explicit, preferably with a build automation tool such as a Makefile or a container environment.
Evaluate your data to ensure it is FAIR compliant, that is Findable, Accessible, Interoperable, and Reusable.
For more information on the data curation process follow the Data Curation Tutorial. If you have additional questions beyond the scope of this LibGuide please contact Courtney Kearney: Scholarly Engagement Librarian - Physical Sciences and Data Management.