ARC: Improving Metadata

IMPACT's Analysis and Review of CMR is talking about metadata again, data that helps you find the data you want to find.
author-share

Are you ready for an acronym within an acronym? Allow me to introduce you to IMPACT’s ARC team. ARC stands for Analysis and Review of CMR, which in its entirety is Analysis and Review of the Common Metadata Repository. But it’s quicker to say ARC. And we’re back to talking about metadata again (see an earlier post here), or data that helps you find the data you want to find.

The ARC project assists in making NASA’s Earth science data easier for both scientists and the general public to find, access, and use. ARC’ s contribution to this effort focuses on metadata and the role it plays in connecting users to data. You can search for NASA Earth science data using the Earthdata Search website which allows you to enter a search term (like you would in Google) as well as filter your search by a variety of parameters such as location and time. For example, you could search for “precipitation” and limit your search to the state of Alaska for the year 2014. The search returns a list of datasets meeting that criteria if the metadata are accurate enough.

All of the information you encounter in Earthdata Search is pulled directly from the Common Metadata Repository (CMR) database. Each of NASA’s Earth science datasets is described by a metadata record stored in the CMR database. The metadata record contains important information about the dataset including a descriptive title, when and where the data were collected, what instruments collected the data, and how the data can be accessed. This information, contained in the metadata record, is what is queried when a user searches.

The above scenario illustrates how metadata connects users to data and why maintaining these records is important. If the dataset metadata record is missing or outdated, then that dataset may not appear in your search result even though it actually matches your criteria.

Image
Powered by metadata, improved by ARC (image: search.earthdata.nasa.gov).


This is the value of the ARC project. ARC is responsible for conducting quality assessments of the NASA metadata records in the CMR. This involves evaluating the information provided in each metadata record for correctness, completeness, and consistency. The ARC team then provides recommendations to NASA data centers (DAACs) on how to improve the metadata and follows up with re-checks after changes have been made. Higher-quality metadata records help to provide a better search experience in any platform that leverages the CMR.

Jeanné le Roux leads the ARC project, which she describes as,

"an interdisciplinary project combining aspects of Earth science, computer science, and information science. As someone with an Earth science background, it’s been interesting to learn more about these other disciplines and see how everything connects."

The ARC team identifies metadata improvements needed for better describing the vast amount of Earth science data products made publicly available by NASA. Many of the ARC recommendations are geared toward adding or improving contextual metadata information in order to help the user connect to relevant data. A dataset well documented by metadata is easier to find, easier to understand, and easier to use. Since the metadata is often the initial ‘documentation’ that a user encounters, it’s important that it be informative and up to date. The manner in which users may search for data is also continuously evolving. ARC also reports new metadata use cases to NASA and helps communicate changes made to NASA’s metadata models to all responsible parties.

Ms. le Roux summarizes the contribution her team makes to the Earth sciences:

"Most scientists are familiar with crunching numbers and trying to draw unbiased conclusions based on their data. The ARC project does similar things except metadata is our data! We write scripts to automate our quality assessments as much as possible, but a lot of the process requires manual intervention. We are constantly thinking of ways to automate and make our processes less subjective."

Metadata management and stewardship practices are an important aspect of any discipline where data is collected and disseminated, not just Earth science. The type of information documented in metadata and corresponding metadata standards can vary widely based on the community served, but the basic principle of high quality metadata (i.e. correct, complete, consistent) is relevant to all communities. This is the purpose of ARC.

See Ms. le Roux's LinkedIn profile.

More information about the ARC project can be found on the IMPACT website.

Last Updated