CEDAR Research Products | CEDAR - Center for Expanded Data Annotation and Retrieval

CEDAR Research Products

CEDAR aims to accelerate biomedical research by improving its metadata. CEDAR plans not only to make biomedical metadata better, but also to make creating it easier and faster. Better metadata will improve our ability to understand and replicate studies, improve discovery of relevant studies, and improve interoperability of study data across repositories and analytical systems.

CEDAR’s resulting collection of metadata, that CEDAR has aligned using its study models and specifications, will also create direct opportunities for biomedical research. The collection will make it faster and easier to explore simple questions and hypotheses, and will enable users to discover studies using a common model for metadata access. Here we describe how the metadata pipeline turns into better, and newly possible, research products.

Applicable Research Products

We provide below a general research scenario that CEDAR will be able to target. And in lieu of a description of resulting research—since CEDAR is not yet fully built—we offer examples of past research products that CEDAR could have accelerated.

Last Updated:

Sep 25 2020 - 11:14am

Weight:

General Research User Scenario: Find Datasets Related to a Disease

A research lab wants to find data sets across a large number of different repositories that relate to particular condition. For example, if studying influenza infection, how can it find all data sets that relate to that concept in all the relevant repositories? In short, this scenario calls for finding enough quality datasets to support integrated... Read Complete Scenario

Identifying Organ Transplant Rejection Mechanisms and Therapies

Although immune system suppression therapies have improved the acceptance period of transplanted organs to some degree, still many organs are rejected over longer periods. To minimize the rejection rate, one strategy in a recent paper in The Journal of Experimental Medicine studied genetic markers from different kinds of transplants, looking for... Read Complete Scenario

Identifying Possible Sepsis Markers

Sepsis is a syndrome of systemic inflammation in response to infection. It kills about 750,000 people in the United States every year (1), and is also the single most expensive condition treated in the United States, costing the healthcare system more than $20 billion annually. Prompt diagnosis and treatment is essential to save lives, but there... Read Complete Scenario

Finding “New” Drugs to Treat Old Diseases: Novel Drug Targets for Cancer

Today many pharmaceutical drugs have been developed, often to treat a particular disease. Because licensing a drug requires such expensive and lengthy testing, it is difficult to create a new drug and get it approved, so existing disease treatment options may be few and unsatisfactory. However, we know that many drugs can be effective for... Read Complete Scenario

The Challenges

Existing challenges in this effort include dealing with the number and range of different repositories, with all their different interfaces, metadata models, and terminologies; finding data sets from repositories whose metadata is too poorly structured to allow effective search; finding data sets that have been described with terms that are not well defined, either because they are not sufficiently unique to be confidently used, or because the terms are not commonly used with the intended meaning ; and weeding out datasets that have similarly expressed terms, but are not in fact about the same thing.

CEDAR’s Contribution

Each of the challenges above are addressed by one or more CEDAR features or strategies. We briefly outline those CEDAR responses here; some are straightforward, and others require long-term or challenging work and community engagement. We encourage you to discuss any questions with the CEDAR team, for example by contacting us through this site.

Challenge	CEDAR Response
Multiple repositories	Be able to publish metadata records to the most common and critical repositories Providing repository-centric features
Differing repository interfaces	Effective interface development Buy-in and support from repositories
Differing repository models and terminologies	Effective mappings from CEDAR entities Templates defined to match repository needs
Finding data sets given poorly structured metadata	Improve rigor of metadata definition Improve mapping of metadata content
Finding data sets given poorly defined terms	Thorough integration with well-defined terminologies, in defining and using templates Validation of defined metadata against required vocabularies Mapping of poorly defined terms to more rigorous terms
Avoiding data sets with ‘false match’ terms	Encourage use of precisely specified terms (IRIs) Identify deceptive terms (through analytics) and recommend improvements to their holders