Data Management @ NAU

What version of your data should you share, and what file format should you use?

Version Which version of the data would you personally find useful for a variety of future projects? Generally, the version of the data that you analyzed might be useful in other researchers' analyses, whereas raw instrument data might not be intelligible to anyone who hasn't used that instrument before.
File Formats

While you often need to work with proprietary file formats during a research project, consider saving a second copy of your final data in a non-proprietary format, to ensure that you and other researchers will be able to access the data in the future.

In general, file formats are more likely to be accessible in the future if they are:

  • non-proprietary
  • open, with documented standards
  • in common usage by the research community
  • using standard character encodings (i.e., ASCII, UTF-8)
  • uncompressed (space permitting)

See the DMPTool's data management guidance for more information about file formats.

Advantages to sharing data in a repository

The easiest way to share your data is to place it in a repository -- that way, you won't have to deal with:

Adding a license to your data. A repository will either prompt you to choose a license during data upload, or have a standard license for every dataset in the repository.
Establishing access permissions for your dataset. A repository will have a standard set of access permissions for all files (or subsets of files) in the repository.
Emails requesting data. Putting data in a repository means that data requests won't contribute to your email overload.
Preserving and migrating your files to ensure ongoing access. As Jeff Rothenburg said, "digital information lasts forever—or five years, whichever comes first" (quote is from page 2 of his Ensuring the Longevity of Digital Information paper).

Find the right repository for your dataset

Help other researchers discover your data by depositing it in a discipline-specific repository.

Browse the following directories of data repositories to find a repository that specializes in your research area:

Cline librarians can help you find the right repository for your data -- contact us today!

Can't find a discipline-specific repository for your data?

You may be able to put your data set into OpenKnowledge@NAU if it is:

  • the finalized version of your data set
  • a small data set (under 1g) or
  • a description of large or sensitive datasets,

Sharing data without a repository

If you want to share your data but don't want to share it through a repository, some funding agencies allow you to share data on your own website or inform people they can contact you for data.

Steps for sharing data without a repository Why? How
Licensing your dataset (establishing conditions for re-use). Add a license even if you want your data to be in the public domain -- this reassures potential users that your dataset really is open for re-use. Two organizations offer licenses for open data:

Creative Commons -- Science
  • Their license has been adopted by BioMed Central, Dryad, and many other repositories -- the CCO license is often recommended for datasets.
Open Data Commons
Establishing access permissions for your dataset. If you're sharing data from your own site or upon request, you might need to construct data access permissions (particularly if you're working with sensitive data).  
Planning for and managing the long-term preservation of your dataset. File formats tend to become obsolete within 5-10 years, due to software companies going out of business or to the release of new software versions that aren't backwards-compatible. See our advice about long-term preservation issues.