The Center for Expanded Data Annotation and Retrieval (CEDAR) will develop information technologies that make authoring complete metadata much more manageable, and that facilitate using the metadata in further research.
Imagine a library with no catalog—or a catalog with lots of missing or partial entries! Without that metadata, you couldn’t be sure what resources exist, where they might be located, or even what they are about.
Just as metadata can index a library’s books, metadata can also index experimental data sets, like the ones produced by biomedical studies. These data sets are big, and they are complicated, so it is important to describe them with metadata that is accurate and complete.
Big Data, as found in biomedical studies and other research products, offers tremendous opportunities for scientific researchers. However, we currently face several problems:
- Creating accurate—and adequate—metadata is tedious and hard.
- Data managers often don’t find the right metadata standards to use, or having found them, can’t easily use them in a consistent way.
- Even the best metadata standards rarely apply best practices (like controlled vocabularies) to get consistent terms and answers.
As a result:
- Experimental data that an investigator collected is often not maintained in a public repository.
- When it is, it can be challenging even to find that experimental data.
- It is even harder to find all the data from similar experiments.
- It is generally not possible to compare characteristics of experiments.
- Usually the metadata is too limited to offer clear insight into what the experimenter actually did.
- And finally, it is all but impossible for investigators to reproduce other experiments by using the public metadata about that experiment.
Our mission targets all of the problems listed above, in the entire metadata lifecycle, and supports improved discovery, comparison, reproduction, and analytics of existing science and its Big Data collections.