Getting informed consent to share sensitive data

Sharing potentially sensitive data will be simpler if you get written consent in advance from research subjects*. Add information about data sharing to your IRB application and your informed consent document:

*What if you don't have informed consent? If your research is regulated by HIPAA, the Privacy Rule specifies five other circumstances under which researchers can use or disclose protected health information, but keep in mind that it's still best practice to get informed consent even if you'll be sharing de-identified data.

De-identifying or anonymizing data

In order to share sensitive data publicly, you must remove or obscure* all identifiers that would allow the data to be linked to an individual or small group of individuals. You have two options:

De-identify data Remove or obscure all identifiers, but preserve a key code in a separate location that could be used to re-link the data and the identifiers.
Anonymize data Remove or obscure all identifiers and destroy any key codes that could be used to re-link data and identifiers.

For longer definitions of de-identified information and anonymized information, please see the Glossary (Appendix E) of NIST's Guide to Protecting the Confidentiality of Personally Identifiable Information

*In some datasets, identifiers can be obscured/masked instead of being removed. For information about how to obscure a variable to prevent identification, see the following references (arranged in order of increasing detail):

Methods for sharing sensitive data

The National Institutes of Health identify four potential methods for sharing sensitive data:

Method Description/resources When would you use this method?
Deposit in a data archive *For health data, look for an appropriate archive in this list of NIH-supported data-sharing repositories.

*For social science data, the IPCSR data archive has the added advantage that archive staff will review submitted datasets to assess disclosure risk.

The best sharing option if the data can be de-identified -- see the "De-identifying or anonymizing data" box above for more information.

Deposit in a data enclave A data enclave is "a controlled, secure environment in which eligible researchers can perform analyses using restricted data resources."

For social science data, the IPCSR operates both a physical and a virtual data enclave.
The best sharing option for data that can't be de-identified without losing essential parts of the dataset.
Under the auspices of the PI Interested parties can contact the PI for data. You could use this method for any dataset, but keep in mind you'll be responsible for setting up a data license / data use agreement (plus, this method generates extra email in your inbox...)
Mixed mode sharing Create multiple versions of the dataset that provide different levels of detail -- for each version, choose the most appropriate sharing method from the three listed above. When you want to publish a de-identified version of the dataset for public access but also want to give researchers the option to access a more detailed version of the data in a secure environment.


For more sharing methods, see NCBI's "Approaches to Sharing Biological and Social Data"