Manage and Store Data Effectively

October 19, 2016

Data is everywhere. It can be on your computer, a USB, a notebook, even a box in your basement. But, what are the best ways to store your data? Why should you make your data accessible to your peers? What’s the difference between data and metadata? These questions were addressed by Richard Inouye, Liz Woolcott and Andrea Payant, at October’s GrTS.

For assistance creating a Data Management Plan, or to further review data and metadata, contact librarians Betty Rozum (betty.rozum@usu.edu) or Andrea Payant (andrea.payant@usu.edu).

download slides

Preparation

Many researchers collect data from external sources or online databanks. Common data collection sites include the U.S. Geological Survey, the National Centers for Environmental Information and GenBank.

When you are collecting data, three important questions to consider are:

  1. What equipment will be used?
  2. Is there required processing?
  3. Are there quality assurance and control guidelines?

Whether or not you’re using online databases to collect data, these three questions will aid you in the gathering process.

What format or file type will you use? How large are your files? How many files do you have?

These are important questions to ask as you begin to process and store your data. Large files may require you to purchase data storage software. Some file formats are not easily accessible to the public. You may want to consider what you’ll be using your data for, and what benefit it will be to the public, before you begin to format it.

When you go back to your dataset ten years from now, what will you remember? And, when the public goes to access your data, what will they search in order to find it?

These are important questions to consider when titling your data. A few tips to make it easily recognizable and searchable are to include the full date of the study in a year, month, day format. For example: 20161020.

You also want to provide as much context to the data as possible. Instead of simply listing the data as, “Rivers,” consider titling the data something like, “Greater Yellowstone Rivers: 1:1 26,700 U.S. Forest Service Visitor Maps (1961-1983).”

The second example shows what was studied, where it was studied, when it was studied, and what scale was used in the study. Consider using these elements whenever you title data.

Technology is constantly changing. Punch cards, floppy disks and plain old paper were once industry standard and cutting edge technologies. Now, most data is stored through online storage facilities like Cloud, or many individuals have a USB or external hard drive.

These tools are useful, but generally you should store your data in at least two distinct places. USU librarians use the LOCKSS system, Lots of Copies Keeps Stuff Safe.

When storing data on multiple devices, make sure each device has a separate location. This will help  prevent damage from flooding, fire, or other natural hazards.

Also, consider how your data storage will impact its use by others. Are you storing your data in a manner where it will still be relevant in 15-20 years, or will the technology be obsolete by then? Are you storing your data in an easily accessible manner? These are questions to consider with data storage.

Data and Metadata

Metadata is data about your data. It is structured. It is specific, and it is used to make your data more easily searchable by Google and other platforms. It makes data interoperable. It maintains an organizational structure and consistency. It assists in the sharing of data, and it allows for replication.
Metadata is used by colleagues, fellow researchers, university staff, library personnel and yourself, years down the road.

Colleagues can find your data. Researchers can use it. University personnel can publish the data sets and studies, and the library staff allow students and community members to access the information recorded.

It is important for you to collect metadata for all of these groups.

When constructing metadata, ask:

  • Why were the data created?
  • What processes were used to create the data?
  • When were the data last updated?
  • Who created the data?
  • What fields are present and what do the values of those fields mean?
  • Who do I contact about getting more information about the data?
  • How do I obtain a hard copy of the data?
  • Are there any limitations to the data?
  • Constantly review your records for accuracy and completeness.
  • Have a colleague review your records.
  • Don’t use jargon or acronyms within your data, these can easily be misinterpreted, misunderstood or misapplied.
  • Avoid special characters.

Check out discipline-specific metadata standards.

Data Access and Sharing

In February 2013, the Office of Science and Technology issued a memorandum stating all digitally formatted scientific data should be stored and publicly accessible to search, retrieve and analyze.

This means all digital data within the scientific realm needs to be recorded, preserved and shared with the general public.

All agencies, including university’s, which receive more than $100 million in Research and Development expenditures are required to make their data publicly accessible. Utah State falls under this category.

Open access to data is important because a majority of studies conducted are publicly funded, and as such, the public deserves to know the results of these studies, and they should be allowed access to the data and be able to analyze how and why it was collected.

Open access allows for additional analysis to be made. Research builds off each other. It is important to know the methods and results of peers when conducting a new study.

Open access increases the impact of your data and results.

Open access to data also creates a better informed public, and community members can use your data and research to decide on public policies.

Sharing your data is of personal benefit because you are given recognition for your study, and you can become a field matter expert.

When sharing your data, you should consider whether or not your data will be immediately accessible to the public. If you desire to conduct further research on the subject at hand, you can embargo your data for a period.

You also should consider how people will access your data, and if any special software will be required to access it.

When allowing for reuse of your data, consider whether or not you will require permission to be granted. Who will want to use the data? What is the intended future use of the data?

Also consider where you will store your data longterm. Will you store it in a repository? Will you deposit all of the data in the chosen repository? What metadata documents will you include?

These are key questions to ask when considering data access.