Skip to Main Content

Research Data Management: Sharing & Archiving

Advantages of a respository

Why would you choose to deposit your data into a repository? A repository is helpful because it:

  • Provides a metadata structure for you to fill in
  • Serves as a backup vehicle for your data
  • May preserve your data for the future
  • Makes sharing your data easy
  • Others may cite your research more
  • May provide some computational/online analysis tools for people to use your data
  • Publishes the data for you by giving your dataset a unique persistent identifier, e.g., DOI

Selecting a data repository

There are some things to keep in mind when selecting a repository. Data in a repository should be:

  • Persistent (not likely to be modified
  • Searchable and browsable
  • Retrieved or downloaded easily
  • Cited

A wide variety of institution-based and discipline-specific repositories exist for digital data. The repository itself should be: 

  • Appropriate for the type of data you generate
  • Appropriate for the audience of the repository (so they will make use of your data!)
  • Open access

If both a discipline-specific repository and an institution-based one exist for your data, then consider depositing in both locations to maximize discovery and safety of the data. 

Data repositories

Many more data repositories are available online than can be listed here. Consult re3data.org, an external resource, for an extensive list of discipline-specific repositories.

CUNY Academic Works accepts all data formats, and is dedicated to collecting and providing access to the research, scholarship, and creative and pedagogical work of the City University of New York.

FigShare allows you to share all of your data, negative results and unpublished figures.

Dataverse Network Project (DVN), is an application to publish, share, reference, extract and analyze research data. It facilitates making data available to others, and allows to replicate others work. Researchers and data authors get credit, publishers and distributors get credit, affiliated institutions get credit. 

Inter-university Consortium for Political and Social Research (ICPSR) – The world’s largest archive of digital social science data. ICPSR staff can guide you in preparing your data for archiving and distribution.

 

Confidentiality

It is vital to maintain the confidentiality of research subjects for reasons of ethics and to ensure the continuing participation in research. At the same time, data on research subjects can be shared if proper steps are taken to maintain participant confidentiality:

Informed consent should make a provision for data sharing: When obtaining informed consent from study participants, ensure confidentiality while also enabling the option of data sharing. Even if you are not certain that you will share your research data with others, you must obtain informed consent at the outset. For an example of how to write informed consent forms to allow for data sharing, see the ICPSR Confidentiality Language for Informed Consent Agreements.

Evaluate the sensitivity of your data: Researchers should consider whether or not their data contains either direct or indirect identifiers that could be utilized with other public information to identify research participants. If so, steps should be taken to remove or mask these in public-use data files.

Obtain a confidentiality review: A benefit to depositing your data with some archives, such as ICPSR, is that their staff will review your data for the presence of confidential information.

Comply with CUNY regulations: Researchers concerned about confidentiality issues with their data should consult the CUNY Human Research Protections Program (HRPP).

Comply with regulations for health research: HIPPA Privacy Rule, Information for Researchers.

Enable restricted use of your data: Do you want to make your data available in a more restricted, limited-access manner? The ICPSR DSDR program has resources for data producers.

Citing data

When writing a paper or doing a presentation, it is important to cite not only the literature consulted but also the data files used, even if they are data files that you have produced.

Citing data is important in order to:

  • Give the data producer appropriate credit
  • Enable readers of your work to access the data, for their own use or to replicate your results

Elements of a citation include:

  • Author(s)
  • Title
  • Year of publication: The date when the dataset was published or released (rather than the collection or coverage date)
  • Publisher: the data center/repository
  • Any applicable identifier (including edition or version)
  • Availability and access: URL or other location information for the data

Examples:

Bachman, Jerald G., Lloyd D. Johnston, and Patrick M. O'Malley. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 1998 [Computer file]. Conducted by University of Michigan, Survey Research Center. ICPSR02751-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [producer and distributor], 2006-05-15. http://dx.doi.org/10.3886/ICPSR02751.

ASTER Global Digital Elevation Model, version 1, ASTGTM_N11E122_num.tif, ASTGTM_N11E123_num.tif, Ministry of Economy, Trade, and Industry (METI) of Japan and NASA, downloaded from https://wist.echo.nasa.gov/api/, October 27, 2009

Related links:

ICPSR: Why and how should I cite data?

DataCite

This guide was developed by the CUNY Office of Library Services and is based on (and, in some cases, pulls from) guides created at the libraries at the CUNY Graduate Center, New York University, Massachusetts Institute of Technology, University of Massachusetts, University of Michigan, and Stanford University.