When you look up a word in the dictionary you find not only the word’s definition, but also information about the word—its linguistic origins; how the word is used (as a noun, verb, adjective, etc.); and possibly information about when, where, or how the word was first used. This additional information that helps define, identify, or show how to use the word properly is a form of metadata. Metadata allow you to easily search for, find, and understand any form of data, from words in the dictionary to scientific measurements.
“As we generate more data, the need for metadata grows enormously,” says Barry Weiss, the systems engineer and data architect for NASA’s recently launched Soil Moisture Active Passive (SMAP) mission. “People not only need to find what they’re looking for, they also need to assess the quality and content to determine if they really want the available data.” Metadata are integral components of NASA’s Earth science data, and ensuring their content and quality are crucial.
The Earth Observing System Data and Information System (EOSDIS) provides end-to-end capabilities for managing NASA’s Earth science data from satellites, aircraft, and field observations. These data are processed, archived, and distributed by the Earth Science Data and Information System (ESDIS) Project through the discipline-specific Distributed Active Archive Centers (DAACs).
The DAACs currently hold more than 9 Petabytes (PB) of Earth science data. To put this into perspective, 1 PB is equivalent to roughly 20 million four-drawer file cabinets filled with text. Metadata enables users to find specific EOSDIS data products and understand their quality. This vast collection of metadata is the foundation of the EOSDIS Earthdata Search data discovery interface (Figure 1).
NASA is in the process of applying uniform international standards to Earth science metadata with the goal of making it even easier for data users to discover, understand, and use these vast data collections. This is not a simple process, and affects the design of metadata systems for new NASA Earth observing missions as well as existing data files and products that currently use a variety of metadata standards.
A Closer Look at Metadata and its Uses
Metadata are used to describe a wide range of data attributes, including:
- Data quality, such as how complete the data are, where gaps in the data exist, and characteristics that might affect the reliability of the data
- Data lineage, which includes provenance or processing input as well as information that tracks data through transformations, analyses, and interpretations
- Data acquisition parameters. These may include specific parameters that impact algorithm behavior as well as information about the location of the satellite when data were acquired
Metadata also includes documentation to help users better understand data, including tools that enable data interpretation and analysis. This documentation may include instrumentation details and algorithmic descriptions as well as recommended viewing software and color keys.
Until recently, different data providers developed their own standards for metadata content and representation. To take advantage of the metadata, users of these products needed to learn the standard in use. With the exponential growth of Earth science data sets, this approach has become problematic. “Currently, different organizations use very different metadata models and structures,” says Weiss. “The Earth science community has come to recognize the need for a common model and representation of metadata.”
Indeed, even NASA uses several metadata standards, many of which were developed specifically for NASA data. Two of the most common NASA standards for Earth science metadata are the Directory Interchange Format (DIF), which is used by the Global Change Master Directory (GCMD), and the ECHO Metadata Standard (which has now been replaced by the Common Metadata Repository).
While the NASA-developed metadata standards do a good job addressing data discovery, this is only one facet of metadata. “Discovering data is important,” says Ted Habermann, the Director of Earth Science at The HDF Group, an independent non-profit organization that develops and manages the Hierarchical Data Format (HDF) set of Earth science metadata conventions. “But we really need standards that go beyond discovery to cover access, use, and understanding of data.”
Re-Evaluating NASA Metadata - The MENDS Project
In 2010, Andrew Mitchell, the ESDIS Project Science Systems Development Manager, initiated the Metadata Evolution for NASA Data Systems (MENDS) Project. Members of the MENDS Project were asked to assess the metadata needs and current practices of EOSDIS datasets, with particular attention to discovery, archive management, citation, provenance and lineage, data quality, semantics, and data services. They also were asked to provide recommendations for determining the optimal path for integrating current NASA Earth science data systems using a common metadata standard. “It was my vision to have a working group that involved the DAACs, the missions, and the current systems to all get together to talk about our metadata issues,” Mitchell says. “The purpose was to look at all things metadata.”
The MENDS Project recommended that NASA Earth science metadata should be based on the International Standards Organization (ISO) 19100 series of standards, which describe geographic data.
The base metadata model of the ISO 19100 series is represented by ISO 19115, which integrates multiple metadata standards. In addition, other standards in the ISO 19100 series, such as ISO 19139, describe how the ISO 19115 models are represented. To avoid confusion, the term “ISO standards” will be used to mean ISO 19115 and associated standards.
Implementing ISO Standards for NASA Missions and Existing Data
The MENDS Project recommendations are codified in the ESDIS Project’s Metadata Requirements-Base Reference for NASA Earth Science Data Projects, which states that NASA Earth Science Division (ESD) base metadata requirements for science data products created using NASA satellite mission data systems will contain metadata conforming to ISO standards. In addition, the ESDIS Standards Coordination Office (ESCO) provides guidance and vision in the utilization of NASA standards for data format, metadata content, and required documentation for EOSDIS data.
NASA directed that the recently launched SMAP mission would use metadata based on ISO standards (Figure 2). The SMAP mission successfully developed a software architecture that incorporates ISO metadata into SMAP Earth science data products. The SMAP mission success demonstrated the feasibility of implementing ISO standards in other NASA Earth science missions.
One benefit to using an international metadata standard for mission data is that data users can easily search for data across international organizations. For example, if a data user searches for “atmospheric ozone” in collections held by NASA, the European Space Agency, or the Japan Aerospace Exploration Agency, their search should return similar datasets thanks to metadata based on a uniform international standard. “If [NASA] wants to share data with the world, ISO is clearly the way to do this,” says Habermann.
Adopting ISO standards for existing Earth science data held by the DAACs is more challenging since these data use metadata based on a variety of standards, including DIF and ECHO. The ESDIS Project created the Common Metadata Repository (CMR) to provide an authoritative management system for NASA’s Earth science metadata. “The CMR manages the evolution of NASA Earth science metadata in a unified and consistent way by providing a central storage and access capability that streamlines current workflows while increasing overall data quality and anticipating future needs,” Mitchell says.
Using the CMR, distinct metadata formats (such as DIF and ECHO) are validated against a common set of EOSDIS core metadata requirements called the Unified Metadata Model (UMM). The UMM defines overall metadata requirements for NASA Earth science data and drives search and retrieval of metadata cataloged in the CMR. In addition, the UMM serves as a bridge between the heritage metadata standards that the DAACs have been using and the newly adopted ISO standards.
While this may sound simple, the complexity of metadata means that the translation from legacy metadata into ISO standards is a time-consuming process, particularly when it comes to ensuring that all metadata components are captured correctly. “We can translate 100% of DIF and 100% of ECHO into ISO without loss, but that doesn’t mean that we have all of the data quality information that people need because that [information] is not currently in DIF or ECHO,” says Habermann.
To address the concern with data quality, EOSDIS has implemented a formal metadata quality assurance process. This process begins with automated validation to see if the existing metadata meet ISO quality standards. Metadata that do not pass the automated validation are sent to EOSDIS teams for a manual review. Once the metadata are consistent and complete, they can be integrated into ISO standards.
The implementation of ISO metadata standards for NASA missions and to existing Earth science data is an important step toward bringing more consistency to processing, understanding, and defining NASA Earth science data. This, in turn, will make it easier for users to discover, manage, and assess the quality of Earth science data—not just at NASA, but at organizations around the world using the same metadata standard.
“It’s not doing the same old thing in another language,” says Habermann. “It’s doing new things that actually address real requirements. That’s the goal.”
Mitchell agrees, and notes that this process will lead to easier use, discovery, and quality assessment of NASA Earth observing data. “The capabilities brought forth by the CMR, the UMM, and the adoption of ISO standards allow for not only improved metadata quality, but also sets the stage for interoperability of data records from disparate data systems that previously were much harder to analyze together,” he says. “It hasn’t been an easy road to go down, but I think it’s making strides toward making our data much more interoperable so we can work more easily with data from other national and international agencies.”
Helpful Websites and References
Bagwell, R., et al. 2015. “NASA ISO for EOSDIS.” Available online at https://wiki.earthdata.nasa.gov/display/NASAISO/NASA+ISO+for+EOSDIS.
NASA EOSDIS. 2015. “ESDIS Standards Coordination Office (ESCO) Home.” Available online at /esdis/esco.
Weiss, Barry. 2015. “NASA Earthdata Webinar: Implementing ISO 19115 Standards.” Earthdata Webinar Series, February 19, 2015. Available online at https://www.youtube.com/watch?v=rvBe_GWgRR4.