The CEDAR researchers are likely to identify many strategies for determining likely metadata entries, or optimal value sets, under given conditions. It may not be obvious which combination of techniques is most effective, or most appealing for the end user.
From a data mining standpoint, we will have the ability to assess which techniques are most effective, by running tests against the data sets collected by CEDAR. Most of these tests can be performed without user intervention, merely by evaluating the accuracy of the CEDAR suggestions against original metadata records. The frequency with which CEDAR’s top few suggestions include the originally entered value provides a strong metric for determining the algorithm’s overall usefulness.
At a higher level, the CEDAR team will be able to run these tests multiple times with different combination, weight, and sequence of strategies on each run. By evaluating the results of each combination, the framework can identify the overall approach to provide the most accurate results.
Of course, users may view the results suggested by CEDAR through a different lens, and we want to understand and respond to the users’ perception of what works best for entering real scientific metadata. We discuss this challenge in the Evaluating CEDAR page.