OpenScience

Lesson 4: Sharing Open Data

Overview

In this lesson, you learn about the practice of sharing your data. The discussion starts with a review of the sharing process and how to evaluate if your data are sharable. Next, you take a look at ensuring your data is accessible with a closer look at repositories and the lifecycle of data accessibility from the selecting a repository to maintaining and archiving your data. The lesson then discusses some steps to make the data as reusable as possible, and concludes with a section about considering who will help with the data sharing process.

Learning Objectives

After completing this lesson, you should be able to:

Data Sharing Process Overview

Sharing data is a critical part of increasing reproducibility of results. Whether it’s new data we collect ourselves or data that we process in order to do our analysis, we end up sharing some form of data. We need to think about what data we will share and how to best ensure that it will be open and usable by others.

Data sharing should typically be done through a long-term data center or repository which will be responsible for ingesting, curating, and distributing/publishing your open data. You are responsible for providing information/metadata to help make your data be readily discoverable, accessible, and citable. The cost of archiving and publishing data should also be considered.

So You Want to Share Your Data

Once you have decided to share your data, there are a number of questions you will have to answer to help you plan and that should be included in your data management plan (DMP):

   
What? Data formats and (where relevant) standards
When? When and if to share data
Where? The intended repositories for archived data
How? How the plan enables reuse of the data
Who? Roles and responsibilities of the team members in implementing the DMP

In this lesson, we will be covering some steps toward making data. Specifically, we will focus on the “when”, “where”, “how”, and “who” sections of a DMP.

Open Data Sharing Process

In general, sharing your open data requires the following steps:

  1. Make sure your data can be shared
  2. Select or identify a repository to host your data
  3. Work with your repository to follow their process and meet their requirements
  4. Make sure your data is findable and accessible through the repository and is maintained and archived
  5. Request a DOI for your data set so that it is easily citable
  6. Choose a data license

Sometimes, you may be able to work with a well-staffed repository that will handle many of these steps for you (for instance, if you are working with NASA mission data). Otherwise, it is your responsibility to follow the above steps to share your data openly.

When and If to Share Data

When to Share Data?

The decision of when to share should be discussed with everyone on the team and documented in the data management plan. Funding agencies and organizations may have specific requirements about when data must be shared, but here we encourage you to think about whether it is feasible or possible to share even earlier than required by your funder. There are different times when data could be shared:

As discussed previously in this curriculum, there are many benefits to sharing as early as possible. Early (advanced) sharing can lead to new and unexpected discoveries and expand your collaboration network. Remember, that even when you share data, you are still the world expert on that data! So often, when people want to work with the data, they will reach out to you to collaborate.

Should the Data be Shared?

Before datasets are shared, it’s important to consider any restrictions to your permission to share and ensure that your contributors – including sample and data donors – approve its release.

Data should be as open as possible and as closed as necessary.

Verify Your Data is Sharable

Before you decide where to share your data, you must make sure you can share your data.

Data needs to be as open as possible and as closed as necessary…

Specific considerations that might prevent the sharing of your data include:

In the first module of this curriculum, we listed several reasons why certain research products should not be shared. We will review some of these reasons, and go into more detail on a few that are particularly relevant to data.

Export and Security Considerations

Relevant laws and regulations that may prevent the release of data include but are not limited to:

Example: NASA Space System Protection Standard

NASA STD 1006.1 Space System Protection Standard, which establishes protection requirements to ensure NASA missions are resilient to purposeful threats.

Controlled Information Considerations

Some regulations and policies that may prevent the sharing of data include but are not limited to:

Intellectual Property Considerations

Data may be subject to intellectual property, copyright, and licensing concerns. A few of the relevant regulations and policies include patent or intellectual property laws including the Bayh-Dole Act, which enables universities, nonprofit research institutions, and small businesses to own, patent, and commercialize inventions developed under federally funded research programs.

Example: NASA FAR Supplement 1852.227

NASA FAR Supplement 1852.227, which outlines patent and data rights for government contracts.


Many research institutions have resident experts in intellectual property, copyright, and patent law. They can be a great resource if you have any questions or concerns.

Where to Share Data

Data can be shared in a variety of locations. While sharing data via email or websites is popular, they are not recommended as they do not meet the requirements for findability or long-term archival support. Sharing data as part of the supplemental material of a peer reviewed publication, especially for small data sets, is acceptable in some fields. A long term repository that provides a permanent identifier is the best option for sharing of data.

Selecting a Data Repository

If you do not already have a data repository in mind, consider the following to narrow down your options:

Find and compare the services, benefits and limitations of the repositories you are considering. Each repository will have its own processes and requirements for accepting and hosting your data depending on their level of funding, purpose, and user base.

Similarly, each repository will provide a different set of functionality and services depending on their level of funding, purpose, and user base.

Data with privacy concerns may have additional anonymization or approval processes or restrictions on who can access the data.

A good overview of desirable characteristics presented by the White House is given here.

Ensuring Accessibility

Good repositories will share (or offer) your open data through standard protocols, like HTTPS or SFTP. Common ways to do this are:

Additionally, repositories can require authorization and authentication (e.g., logins with usernames/passwords) to access data. While this is allowed under FAIR principles, it may violate Open Science principles if not everyone is able to obtain a login.

Working with a Repository

START WORKING WITH A REPOSITORY ☑ MAINTAINING DATA AT A REPOSITORY ARCHIVING DATA AT A REPOSITORY

Repository requirements can vary widely. Always review a repository’s requirements to see what actions you need to take once you’re ready to start working with them. Also note that some repositories have staff that will help with the process of sharing data, while others rely on the user to know how to share their own data.

If you use a repository that has staff to help you with the process, they may want to review and comment on your data management plan.

The repository may request that you produce some test of sample data in order to assess:

  • That the data format you intend to use is supported.
  • That data variables are named as expected.
  • That metadata vocabulary is correct.
  • That repository-specific requirements are met.

This conformity check can identify misunderstandings early and result in a smooth final submission of your data to the repository.

START WORKING WITH A REPOSITORY MAINTAINING DATA AT A REPOSITORY ☑ ARCHIVING DATA AT A REPOSITORY

As you progress through your project lifecycle, utilize your repository's update, revision and resubmission processes to keep the archived data products up to date. Any new versions of the data you want to share through the repository will need to go through a similar process as your initial data set.

Any new versions of the data you want to share through the repository should go through the same DMP review, compliance check, and upload procedure as your initial data set.

START WORKING WITH A REPOSITORY MAINTAINING DATA AT A REPOSITORY ARCHIVING DATA AT A REPOSITORY ☑

When your project ends, ensure you’ve updated and uploaded any companion documentation (discussed in the previous lesson "Making Open Data") with your final version (even if only a single version of the data was made).

Make sure the repository will keep your data (or at least your metadata) on- line for a reasonable period of time after your project ends.

If any data issues are found after the conclusion of your project, make sure the repository will still accept data revisions, if they are needed.

How to Enable Reuse of Data

Obtaining a DOI

Individuals cannot typically request a DOI (digital object identifier) themselves but rather have to go through an authorized organization that can submit the request, such as:

Data makers should provide summary information for DOI landing page(s) if required. Data sharers should accommodate data providers’ suggestions and comply with DOI guidelines and create landing page(s). If possible, reserve a DOI for you ahead of creating your data.

Ensuring Findability

Repositories handle the sharing, distribution, and curation of data. Additional services they may provide include:

Making it Easy to Cite Your Data

The goal is to make it easy to cite your data. Best practices include:

Now that your data are at a repository and have a citation statement and DOI, publicize it to your users and remind them to cite your data in their work!

Who is Responsible for Sharing Data

Sharing data openly is a team effort. An important part of planning for open data is planning and agreeing to roles and responsibilities of who will ensure implementation of the plan.

So what needs to be done? Documenting these roles and responsibilities in your Data Management Plan will help your team stay organized and do science faster! A well-written, detailed plan should include:

Who Will Move Data to a Repository

Once you are ready to send your data to your repository, find the repository’s recommendations for uploading data. Determine who will work with your repository to accomplish the following types of activities:

Who Will Develop the Data Documentation and Metadata

Determine who will work with your repository, inventory the transferred data, metadata, and documentation. This role might include the task of populating any required metadata in databases to make the data findable.

You may be able to accomplish some of these tasks through a repository’s interface. However, some types of repositories may require you to interact with their administration teams. For this role, determine who will:

Who Will Help With Data Reuse

Once the repository has made your data available, someone from your team must test access to the data (its accessibility) and distribution methods (its findability). If possible, identify who will work with your repository to optimize/modify tools for intuitive human access and standardize machine access. This role requires someone who to:

Who Will Develop Guidance on Privacy and Cultural Sensitivity of Data

Sharing data should be respectful of the communities that may be involved. This means thinking about privacy issues and cultural sensitivities. Who on your team will identify and develop guidance on:

Lesson 4: Summary

The following are the key takeaways from this lesson:

Lesson 4: Knowledge Check

Answer the following questions to test what you have learned so far.

Question

01/04

Data cannot be shared if it is:

Question

02/04

Select the option you think is correct to complete the sentence.

It is best practice to start working with a repository _____.

Question

03/04

Which one of the following might be able to help you get a DOI for your data:

Question

04/04

Which of the following are roles to consider when sharing data? Select all that apply.