Skip to main content Research Guides at Tulane University Tulane University | Howard-Tilton Memorial Library's Homepage Tulane University Homepage

Data Management

This guide aims to help Tulane faculty, staff, and students manage, store, and share their research data.

Data Storage

A well planned project will include both a file naming convention and directory structure that will ease the research process and increase efficiency.  

A brief yet descriptive file naming convention improves your ability to find files later and quickly determine what they contain.  The tips below improve cross compatibility of files between programming languages and software. 

File name tips:

  • Use names that are brief but descriptive.
  • Make sure all data producers use the same naming convention. 
  • Identify the version of the file.
  • Identify when the file was created.
  • Use three-letter file extension for files  (e.g. .rtf, .tif, .txt).
  • Avoid spaces and special characters (e.g. *, #, %).
  • Do not use letter case to identify different files (ex. datasetA.txt vs. dataseta.txt).
  • Include:
    • Project or experiment name or acronym, 
    • Location/spatial coordinates.
    • Researcher name/initials,
    • Date or date range of experiment, and/or
    • Type of data. 

Example of a good file name:

SABOR_CK_04072014_S1_bb_432.csv

SABOR is the project name
CK is the first and last initial of the data collector
04072014 is the DDMMYYYY the data was collected
S1 is the station number/location of the data
bb is the variable collected (backscatter)
432 is the wavelength that the data was collected
csv stands for the file type—ASCII comma separated variable

Instead of "bb 432" or "bb-432" use "bb_432".

Ideally, file formats selected for a project are chosen during the planning stages of a project with a specific data repository or archive in mind. Formats should be standard, nonproprietary, and open-access to ensure the long term accessibility of the data.  


Common Non-Proprietary Digital Preservation and Archiving File Types 

Text

  • XML (.xml)
  • HTML (.htm)
  • OpenDocument Format (e.g. OpenDocument Text, .odt)
  • Plain text (.txt)
  • Markdown and other human-readable markup languages deploying plain-text editing

Tabular

Media

  • Uncompressed TIFF (.tif)
  • JPEG 2000 (.mj2)
  • MPEG-4 (.mp4)
  • Free Lossless Audio Codec (.flac)

Geospatial

  • ESRI Shapefiles and supporting files (.shp, .shx, .dbf, .prj, .sbx, .sbn)
  • KML (.kml)
  • GML (.gml)
  • GeoTIFF (.tif, .tfw)

Common Proprietary File Types

Text

  • PDF/A

Statistical

  • SPSS portable format (.por)
  • R file formats, i.e. script files (.R) data (.Rda, .Rdata) or markdown files (.Rmd)
  • Stata file formats, i.e. do-files (.do) and data files (.dta)
  • SAS file formats (.sas, .xpt, etc.)

Media

  • Photoshop files (.psd)

Simple version control systems:

Google Drive:  

  • Any time you edit files created on Google Drive (Docs, Sheets, Slides), new versions are saved as you go.
  • Version information includes who was editing the file and the date and time the new version was created.
  • You can also see changes made and revert back to a previous version at any time.

Pros: The real-time editing feature means that Google Drive works well for collaborating on files with multiple people. 

Cons: You are restricted to Google software which means for files with a file format not readable by Google, this is not an option. 

Tulane Box:

Any type of document can be stored and versioned with Box.

  • Any document can be stored in Box.
  • Any time you edit or upload a new version of a document, Box overwrites the old version with the updated version. You do not need to rename new versions. 
  • Box keeps track of your old versions should you want to restore a previous version. 
  • You can add comments to help indicate changes between versions.
  • Box allows you to share files and track who uploaded or updated each file and when.

Pros: Allows you to automatically sync folders on your desktop to your Box account. 

Cons: Real-time editing is not available.

Advanced version control systems:

Github: a free and open source distributed version control system.

Mercurial: a free, cross-platform, revision control management tool for software developers. 

 

Other Resources:

UK Data Archive: Provide guidance for the acquisition, curation and archiving of data, including information on version control best practices. 

Documenting your data ensures it will be understood and therefor useable by you and others in the future.  Metadata is one way to document your data to ensure future use. 

What is metadata?

Metadata is data about your data.  

  • Answers who, what, why, when, where and how about your data set.
  • Often includes the purpose, time, geographic location, creator, access, variables, variable units, and terms of use of the data.
  • Must be organized using a standard accepted by the repository chosen which allows the data set to be easily indexed and retrieved from the repository.

What metadata standard should I use?

The metadata standard chosen depends on the discipline and repository.  Some examples include the following:

  • DataCite: schema used for a DOI assigned to a dataset.
  • DDI (Data Documentation Initiative): international standard for social, behavioral and economic sciences. 
  • Dublin Core: basic and widely used standard. 
  • EML (Ecological Metadata Language): ecological standard supported by the Ecological Society of America. 
  • ISO 19115 or FGDC's Content Standard for Digital Geospatial Metadata for geospatial data: used to describe geospatial data. 
  • MIBBI (Minimum Infomation for Biological and Biomedical Investigations)

A more comprehensive list of metadata standards is provided by the UK Digital Curation Center.  

Data backup is an essential component of data preservation. 

Backup tips: 

  • Maintain 3 copies of your data.
  • Store each copy in a different geographical location (e.g. office, university server, offsite server).

Tulane backup options:

Unlimited storage is available for faculty and staff through Box, a cloud-based storage system.  Students may sign up for Box to receive the standard 10 GB available free of charge to new users. 

Cloud storage resources:

  • Amazon S3 -Requires client software, no encryption support
  • S3-based Remote Hard Drive Services such as Elephant Drive and Jungle Disk
  • Mozy (from EMC) Free client software, 448-bit Blowfish encryption or AES key
  • Carbonite Free client software, 1024Free 1024-bit Blowfish encryption

Loading ...
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.