NASA’s Earth Observing System Data and Information System (EOSDIS) provides full and open access to more than 17.5 petabytes of Earth observation data, according to Fiscal Year 2016 EOSDIS metrics. These data are managed by NASA’s Earth Science Data and Information System (ESDIS) Project. To give an idea of how much data this represents, 1 petabyte has been described as being equal to roughly 20 million file cabinets filled with text. By 2020 the cumulative EOSDIS data archive is estimated to be around 65 petabytes in size; by 2025 this archive may be more than 330 petabytes in size.
Even with this vast amount of data, EOSDIS has developed systems that allow data users from around the world to easily search the entire EOSDIS data catalog and find relevant data products in less than a second. A key component that makes this possible is the Common Metadata Repository (CMR).
Using metadata as its foundation and designed to be scalable as EOSDIS data holdings grow, the CMR brings an evolutionary—and a revolutionary—architecture to EOSDIS data that benefits both data users and data providers.
Metadata: The Foundation of the CMR
Metadata are what make data searchable and allow for the efficient management of large data collections. At their most basic, metadata are simply data that describe data, such as when and where the data were collected, the instrument used to collect the data along with the instrument settings, and how the data were processed (i.e., the data lineage or provenance).
NASA Earth observing data come from a variety of sources, including NASA Distributed Active Archive Center (DAACs), international data providers, and non-NASA data providers in the U.S. (inner white circle). Metadata from these data holdings are integrated into the CMR catalog (inner green ring). The Universal Metadata Model (UMM) (gray/black ring) is an extensible metadata model that provides a ‘Rosetta stone’ or cross-walk for mapping between CMR-supported metadata standards. For detailed information about the UMM, please see the EOSDIS UMM page. These metadata records are registered, modified, discovered, and accessed through programmatic interfaces leveraging standard protocols and APIs (outer green ring).The CMR API is not only the basis for EOSDIS Earthdata Search, but also can be used by outside developers and organizations (such as the European Space Agency [ESA] or Group on Earth Observations) to create client systems to search NASA Earth science data using the CMR.
Metadata are used in all aspects of NASA’s Earth science data lifecycle, from initial measurements to the search and discovery of processed data. Individual missions use metadata in science data products when describing the instrument/sensor, operational plan, or spatial parameters. NASA EOSDIS DAACs that manage individual data collections use metadata for facilitating the preservation, access, and distribution of data and data products. Assembling metadata from disparate Earth observing data collections into a single, common repository based on interoperable metadata standards vastly improves the discovery, access, and use of these data.
The Common Metadata Repository, or CMR, is the definitive management system for EOSDIS Earth science metadata. As a single, shared, scalable metadata repository, CMR merges all current capabilities and metadata from the existing NASA Earth science metadata systems of the Global Change Master Directory (GCMD) and the EOS Clearing House (ECHO). In addition, CMR:
- Provides very fast (sub-second) searches of all EOSDIS Earth science data and similar NASA Earth science data collections (such as GCMD);
- Allows for expansion (scalability) to enable new capabilities as users’ metadata needs evolve;
- Ensures high-quality metadata through a process of continual curation and assessment, both automated and manual; and
- Provides a metadata model (the Unified Metadata Model,[UMM]) that presumes continual evolution and development of advanced metadata concepts.
Benefits of Having CMR as a Central Repository for NASA Earth Science Metadata
CMR provides numerous benefits for data users, including:
1. Evolving metadata models for evolving end user needs: Both CMR and UMM continually evolve to meet community needs. This approach helps create more versatile metadata and structures that enable more than mere search and discovery. Metadata processes also continually evolve to support community needs.
2. Increasing metadata quality: CMR improves metadata quality by implementing continual assessments to ensure that these metadata meet standards of form and content. These metadata assessments are conducted both electronically and through human oversight when flags are raised. The result of this constant curation is higher quality metadata and, by extension, more efficient searches through the EOSDIS data collection using the EOSDIS Earthdata Search and similar engines.
The CMR also facilitates a more consistent metadata representation. All metadata are evaluated against a common set of core EOSDIS metadata requirements (the UMM) based on the International Standards Organization (ISO) 19100 series of standards, which are the specific international standards designed for geophysical metadata (such as ISO 19115 and ISO 19139). This, in turn, increases the interoperability of EOSDIS data and data collections with collections held by other agencies that also are based on these international standards.
3. Designing for future growth and a big data future: CMR ingest services easily can be adjusted to handle data reprocessing demands and additional data loads without degradations in service. In fact, these services were designed from the outset to handle the more than one billion records that are expected by 2020. This allows more metadata to be discovered and used by more users and applications. This feature also allows the metadata archive to easily grow as data holdings increase while maintaining highly available, sub-second search performance.
4. Catering to developers seeking to leverage NASA’s Earth science data: CMR provides benefits to developers. As part of the historical requirement to make NASA data and the software used to create NASA data open to the public, the application program interface (API) on which CMR is based, along with UMM specifications, are available through the Earthdata Developer Resource. Additionally, a CMR Client Developer Forum allows users to submit questions directly to the CMR team on best practices for using the system and for requesting feature updates. While EOSDIS retains complete control over the metadata represented in the CMR, the CMR API allows client systems to be developed that use CMR services. The CMR API facilitates the development of custom client applications that meet the needs of a general user audience or a specific science application.
The CMR team is working towards open-sourcing all software. This will CMR technology to be more easily shared with others and contribute to the further standardization of Earth science data among organizations, agencies, and other entities with similar data.
Development of NASA Earth Science Data Collections and the Need for CMR
Prior to the development of CMR, ECHO and GCMD enabled search and discovery of data from NASA Earth observing missions. While both ECHO and GCMD rely on metadata to enable search and discovery of data and data products, they followed separate paths in their development.
ECHO was developed specifically for searching and ordering raw and processed data from NASA’s Earth Observing System (EOS) missions to enable broader use of NASA’s EOS data. GCMD was primarily developed to help users find Earth science data collections within NASA and eventually other agencies under the U.S. Global Change Research Program, and was the original source of metadata for EOSDIS. GCMD also developed systems to share data from multinational sources as part of the international Committee on Earth Observation Satellites (CEOS) International Directory Network (IDN). The IDN links Earth science data and datasets from various international organizations, including NASA, ESA, and the Japan Aerospace Exploration Agency.
Due to the similar data and data collections held by both services, the integration of ECHO and GCMD metadata into CMR puts these systems on the same coordinated and coherent path and helps merge these two vast data collections. The end result is a more streamlined search through a more unified collection of Earth science metadata.
Shifting ECHO and GCMD metadata into CMR would have been a relatively easy process had all metadata in NASA’s Earth science collection been in the same format or written to a single standard. When developing CMR, this was found not to be the case, especially for older data collections. To address this problem and bring these metadata into a more standardized format, ESDIS developed the UMM.
UMM is a common data model across metadata in CMR, and provides mappings between CMR-supported existing metadata standards (such as ECHO 10, DIF 9, and ISO 19115-1) directly to UMM without the need for an additional translation. In addition, ISO 19100-series standards are being applied to Earth science metadata represented by UMM.
Looking Towards the Future: What CMR Means for You
With the bulk of NASA Earth science metadata collections now unified into a single repository, data users are seeing an overall improvement in the speed of searches and the relevancy of returned results. The legacy ECHO system is now retired, and GCMD services its data searches via the CMR API. The merger of metadata from ECHO and GCMD also entails an overall quality assessment of NASA Earth science metadata and the standardization of existing metadata. The end result is higher quality metadata based on international standards that enables faster searches through the entire NASA Earth observation data catalog and the ability of this catalog to grow as needed without a loss in search efficiency.
CMR continues the evolution of NASA’s Earth science data collection metadata. Through the unification of a majority of NASA’s Earth science data collections into a single, standardized repository, data users reap the benefits of increased search speed and greater relevancy of results. Through the UMM, CMR is based on internationally recognized metadata standards, which, in turn, fosters greater interoperability with non-NASA agencies utilizing similar standards. In addition, the CMR Client Developer’s Forum and efforts to open-source the software enable the developer community to actively participate in the development of CMR. For data users, CMR means more efficient use of the vast NASA Earth science data collection and the ability for greater use of these data for a broader range of research.