The goal: make your data machine-readable so you have the flexibility to import the data into a variety of analysis tools and data repositories in the future.
Provide column labels or a header line. |
Label each column with a short but informative name.
Follow the same conventions recommended for file names -- use only letters, numbers, or underscores. Avoid spaces and special characters. |
Document the definition of codes, abbreviations, and variables names. |
Abbreviations and variable names don't mean the same thing to everyone. Creating a list that defines each variable or code ensures that all project staff are collecting the same data -- this list will also help future users understand your data.
The simplest version of this documentation would be a "ReadMe" text file that resides in the same folder as your data. The social sciences often refer to this information as a codebook, while other disciplines use the term "data dictionary." |
Columns should contain one single type of data. |
Are your data text? Numeric? Categorical? etc.
*Format dates and times according to the ISO 8601 standard.
*If you're using text data, be sure to use a standard naming convention (see examples below). |
Record component variables, not compound variables. |
For example: If you're measuring each subject's BMI (Body Mass Index), don't just record the BMI itself. Also record the data you used to calculate the BMI (height and weight). This gives you more options later (you could re-calculate the BMI using a different formula if desired). |
Agree on a standard representation for missing data. |
Does your field or your preferred analysis software have a standard notation to represent missing data? |
Avoid visual cues and ambiguous/dependent information. |
Programs like Excel allow you to make very visual spreadsheets -- but remember:
- highlighting and font colors will be lost if you need to export data to different software,
- merging cells could also hinder data export,
- notes such as "see above cell" could become meaningless if someone re-sorts the spreadsheet.
|
A fictional example of a poorly-structured spreadsheet:

The same spreadsheet, with suggested corrections:

Recommendations are based on:
Andrea Horne Denton and Sherry Lake's "Workshop on the Best Practices in Data Collection and Management," presented at the National Network of Libraries of Medicine, Middle Atlantic Region's Symposium: "Doing It Your Way: Approaches to Research Data Management for Libraries" (April 2014)