Over the past quarter-century, NASA’s Global Change Master Directory (GCMD) has become an integral system facilitating Earth science and global change studies. The metadata and keyword structures of GCMD are pivotal components of NASA’s Earth science data collection. GCMD also is a cornerstone of NASA’s international collaboration, and one of NASA’s contributions to the international Committee on Earth Observation Satellites (CEOS), where it is known as the CEOS International Directory Network (IDN).
Since its inception, GCMD and NASA’s Earth Observing System Data and Information System (EOSDIS) have remained separate systems. Now, through the development of the EOSDIS Common Metadata Repository (CMR), these systems are in the process of becoming unified through the use of CMR as the metadata source for both systems. For GCMD’s broad base of international data users, this means a more robust system and the ability to drill-down even more deeply in their searches for Earth science and environmental collection-level data. “GCMD’s original purpose, and its continuing purpose, is to support the discovery of Earth science and environmental data collections,” says Dr. Stephen Wharton, the former GCMD Project Manager and Chief of NASA’s Global Change Data Center (GCDC).
To put this significant recent evolution of GCMD into perspective, it is worth reviewing the development of GCMD and the many innovations adopted and created by the GCMD team. A look at the future direction of GCMD shows how this directory will remain a premiere collaborative international resource linking scientists, researchers, policy makers, and the general public with Earth science and environmental data.
GCMD was established at a fortuitous time, and filled a need for discovering Earth science and environmental data. The 1980s saw not only the development of computers with the required power and cost-effectiveness to support such a directory, but also a literal turning point in Earth’s environmental systems. Earth observing data from numerous sources indicate that during this decade “abrupt, substantial, and persistent changes in the state of natural systems” occurred, according to recent research. This, in turn, led to a growing need for researchers, scientists, and managers to discover Earth science data related to these changes. However, this was easier said than done. “I think it’s fair to say that these [data] collections were not necessarily searchable online [in the late-1980s],” says Dr. Wharton.
In 1987, NASA released the NASA Master Directory (NMD) as a source for Earth and space data described at the collection level. While collections of data were being exposed as available to users, users would have obtained the file-level data by ordering media offline in the form of tapes or the then-new technology of compact disks; there was no easy way to find file-level data. By the early 1990s, NASA Earth science data were separated into their own directory—GCMD. In 1994, GCMD became part of NASA’s Global Change Data Center at NASA’s Goddard Space Flight Center in Greenbelt, MD. Also around this same time, EOSDIS was conceived as NASA’s premiere system for archiving and disseminating Earth science data at the file level. It was natural that EOSDIS and GCMD would be managed under the same program, yet remain separate entities.
It is important to note the distinction between collection-level and file-level (or what EOSDIS refers to as “granular”) data. A data collection is a description of data where people can understand what the data are about. A data granule, on the other hand, is an individual data value that is part of a larger collection. For example, you might have a data collection comprising 10 years of data, but you might want one day of data from one month in this 10-year collection; this is the data granule. As established, GCMD and EOSDIS served different needs (collection-level data searches vs. file-level/granular data searches). This, in turn, required that GCMD and EOSDIS have separate systems describing their data and enabling these searches. Data used to describe data are called “metadata,” and are what make data discoverable and searchable. As a result, GCMD and EOSDIS remained separate systems.
The recent development of the EOSDIS Common Metadata Repository (CMR) created the opportunity to finally unify the separate metadata systems used by GCMD and EOSDIS into a single system. CMR was developed by NASA’s Earth Science Data and Information System (ESDIS) Project to be the authoritative management system for all EOSDIS metadata and facilitate rapid searches through the EOSDIS archive. CMR serves as the metadata source for EOSDIS’ Earthdata Search and now also serves as the metadata source for GCMD.
Having CMR as the metadata source for GCMD is considered to be a win-win for data users by GCMD staff in that CMR not only speeds up GCMD searches, but also enables GCMD users to drill-down even more deeply into Earth science data collections. “Prior to this, GCMD had its own backend system for serving data and information on the GCMD website,” says Alicia Aleman, GCMD Senior Science Coordinator. “Once CMR was in place, we migrated all of our content from our own servers and databases to CMR. Now we’re part of this much stronger, more robust infrastructure.”
The use of CMR as the source for GCMD metadata is only the latest evolution of the directory, and builds on many innovations developed by the GCMD team. These include the establishment of Science Keywords and data portals for easy data collection discovery, the adoption of the Directory Interchange Format (DIF) standard for exchanging information about scientific datasets and the development of docBUILDER for ensuring complete dataset metadata, and the implementation of automated quality assurance (QA) rules to ensure the highest quality metadata.
GCMD Science Keywords remain the heart of the GCMD system and an international resource used throughout the Earth science community. The keywords describe Earth science data and services consistently and comprehensively in a hierarchical format, and follow a codified governance process. The power of the keywords is in their ability to enable scientists to tag their data using a taxonomy of controlled scientific categories. This, in turn, allows those searching for data to discover datasets easily through the use of an established hierarchy. “GCMD science keywords are an authoritative source that can be integrated into search interfaces, used for metadata authoring tools, and serve as the foundation for building ontologies,” says Aleman. “The main strengths of the keywords are the breadth of their content and how they allow for precision in searching for and retrieving data.”
Since their introduction more than 20 years ago, GCMD keywords continue to be refined and expanded in response to user needs. As part of a more formalized keyword governance process, a Keywords Community Forum recently was established. The forum gives GCMD users an even greater say in keyword development and evolution. “The forum is intended to be a place where users can come to us with questions about keywords, we can respond to them, and users can comment on our responses,” explains Dr. Wharton.
While the science keywords are a guide to finding large data collections, scientists studying specific areas looked to GCMD for a way to make their smaller, more specific data collections more accessible. “They said, ‘We have all these datasets, but we don’t want to build a system to make all these data accessible. Can you build something for us so that all these datasets are shown and we can have something out there that has our name and our institution and allows us to highlight the datasets we have?’” says Dr. Wharton.
This request evolved into the GCMD portals. Portals facilitate focused views for organizations to maintain and document their data within GCMD without having to create a separate online directory for these data. When new datasets are defined and submitted to GCMD, they are automatically recognized as being part of a specific portal without having to tag each individual dataset. GCMD portals are an easy way to put datasets on the map, and benefit from the high level of quality given them through GCMD science keywords describing the information.
An important element in GCMD evolution was the adoption of the Directory Interchange Format (DIF) standard as a means for exchanging information about scientific datasets. DIF was developed in the late-1980s to provide a specific set of attributes for describing Earth science data. “How do we structure the information in [GCMD] consistently? This led to development of the DIF,” says Dr. Wharton. “The DIF was a consistent format for representing all this information.”
The DIF standard is the basis for constructing directory entries that describe a group of data, that is, metadata. Having high-quality metadata provides two important benefits for GCMD users—it increases the likelihood that researchers will find their datasets of interest and it decreases the likelihood that datasets will become undiscoverable.
The DIF is the “container” for the metadata elements that are maintained in the GCMD database. To ensure that all required dataset metadata are entered in a DIF record and the record is complete, the GCMD team developed the innovative docBUILDER tool that allows metadata authors to easily create or modify dataset descriptions. The most recent iteration of docBUILDER, docBUILDER-10, ensures that DIFs comply with CMR requirements, and allows metadata authors to validate and submit DIFs directly to CMR.
The GCMD team further ensures the quality of GCMD metadata using quality assurance (QA) rules. QA recently was enhanced through automation that enables GCMD Science Coordinators to automatically make fixes and update metadata. “Quality review used to be done manually, which was a time-consuming process and things could be missed,” says Dr. Wharton. “Now we have an automated QA system with a formal set of QA rules. We can [conduct] QA [of] all of our records very quickly.”
QA is especially important when dealing with keyword releases. “When we change a science keyword, we have to update the associated metadata,” Aleman says. “We now have tools in place that can facilitate these changes very rapidly, assuring that the metadata remain in compliance with QA rules.”
All these enhancements, tools, and services come together in the redesigned GCMD website. “When you go to the website, you have an option to search by GCMD keywords, either by facets or traditional ‘drill-down,’ or you can search by free text,” says Aleman. “This offers different options for different users. Because our user community is so broad, we’ve found this to be a really successful implementation of our search interface.”
Through its continuous development over more than a quarter-century, GCMD has established itself as a key resource for its global user community. The overall objective of EOSDIS is the continuous improvement of all systems, with better access to Earth science resources for the entire science community. Unifying GCMD and EOSDIS metadata systems through CMR is the most recent evolution in this continuous improvement.
Future evolution includes the development of a single metadata management tool that will incorporate all of the features of both GCMD and EOSDIS, meaning users will only have to learn one tool. Also in the works is a combined approach to QA for the metadata supplied by CMR to both GCMD and EOSDIS systems, further improving metadata quality. “In terms of capabilities, NASA has a commitment to GCMD services,” says Dr. Wharton. “Putting GCMD tools into one system that has more resources and capabilities than the GCMD improves the long-term viability of [GCMD].”
While GCMD users might not notice what’s going on in the background, the services, tools, and enhancements developed by the GCMD team along with the evolution to using CMR as the source for GCMD metadata ensure that this directory remains a powerful resource. Through the ongoing support of NASA and EOSDIS, this evolution will continue.
CEOS International Directory Network (IDN): https://idn.ceos.org/index.html
GCMD Keywords Community Forum: https://forum.earthdata.nasa.gov/app.php/tag/GCMD+Keywords?
GCMD Keyword Directory: https://wiki.earthdata.nasa.gov/display/CMR/GCMD+Keyword+Access
GCMD Website: https://gcmd.earthdata.nasa.gov