CEDAR aims to accelerate biomedical research by improving its metadata. CEDAR plans not only to make biomedical metadata better, but also to make creating it easier and faster. Better metadata will improve our ability to understand and replicate studies, improve discovery of relevant studies, and improve interoperability of study data across repositories and analytical systems.
CEDAR’s resulting collection of metadata, that CEDAR has aligned using its study models and specifications, will also create direct opportunities for biomedical research. The collection will make it faster and easier to explore simple questions and hypotheses, and will enable users to discover studies using a common model for metadata access. Here we describe how the metadata pipeline turns into better, and newly possible, research products.
Applicable Research Products
We provide below a general research scenario that CEDAR will be able to target. And in lieu of a description of resulting research—since CEDAR is not yet fully built—we offer examples of past research products that CEDAR could have accelerated.
A research lab wants to find data sets across a large number of different repositories that relate to particular condition. For example, if studying influenza infection, how can it find all data sets that relate to that concept in all the relevant repositories? In short, this scenario calls for finding enough quality datasets to support integrated... Read Complete Scenario
Although immune system suppression therapies have improved the acceptance period of transplanted organs to some degree, still many organs are rejected over longer periods. To minimize the rejection rate, one strategy in a recent paper in The Journal of Experimental Medicine studied genetic markers from different kinds of transplants, looking for... Read Complete Scenario
Sepsis is a syndrome of systemic inflammation in response to infection. It kills about 750,000 people in the United States every year (1), and is also the single most expensive condition treated in the United States, costing the healthcare system more than $20 billion annually. Prompt diagnosis and treatment is essential to save lives, but there... Read Complete Scenario
Today many pharmaceutical drugs have been developed, often to treat a particular disease. Because licensing a drug requires such expensive and lengthy testing, it is difficult to create a new drug and get it approved, so existing disease treatment options may be few and unsatisfactory. However, we know that many drugs can be effective for... Read Complete Scenario
The Challenges
Existing challenges in this effort include dealing with the number and range of different repositories, with all their different interfaces, metadata models, and terminologies; finding data sets from repositories whose metadata is too poorly structured to allow effective search; finding data sets that have been described with terms that are not well defined, either because they are not sufficiently unique to be confidently used, or because the terms are not commonly used with the intended meaning ; and weeding out datasets that have similarly expressed terms, but are not in fact about the same thing.
CEDAR’s Contribution
Each of the challenges above are addressed by one or more CEDAR features or strategies. We briefly outline those CEDAR responses here; some are straightforward, and others require long-term or challenging work and community engagement. We encourage you to discuss any questions with the CEDAR team, for example by contacting us through this site.
Challenge
CEDAR Response
Multiple repositories
Be able to publish metadata records to the most common and critical repositories
Providing repository-centric features
Differing repository interfaces
Effective interface development
Buy-in and support from repositories
Differing repository models and terminologies
Effective mappings from CEDAR entities
Templates defined to match repository needs
Finding data sets given poorly structured metadata
Improve rigor of metadata definition
Improve mapping of metadata content
Finding data sets given poorly defined terms
Thorough integration with well-defined terminologies, in defining and using templates
Validation of defined metadata against required vocabularies
Mapping of poorly defined terms to more rigorous terms
Avoiding data sets with ‘false match’ terms
Encourage use of precisely specified terms (IRIs)
Identify deceptive terms (through analytics) and recommend improvements to their holders