According to the National Plan for Civil Earth Observations, the U.S. federal government is the largest provider of civil Earth observation data in the world. Data users need quick, unhindered access to these vast Earth science data collections across the many federal agencies and organizations responsible for these data. Through the Big Earth Data Initiative (BEDI), federal agencies are working to make this a reality.
The overall goal of BEDI is to improve the interoperability of civil Earth observing data across U.S. federal agencies, systems, and platforms by improving the usability, discoverability, and accessibility of these data and systems along with improving data management practices. BEDI originated out of the National Strategy for Civil Earth Observations, which was released by the White House Office of Science and Technology Policy (OSTP) in 2013. The U.S. Group on Earth Observations (USGEO) is tasked with interagency coordination and oversight of BEDI. NASA plays a key role in USGEO activities coordinating BEDI, and is one of three U.S. agencies (along with NOAA and the USGS) to receive federal funding to implement BEDI objectives.
NASA’s Earth Observing System Data and Information System (EOSDIS) has primary responsibility for NASA’s Earth observing data collection, which currently contains more than 15 petabytes of Earth observing data acquired from satellite, airborne, and ground-based missions as well as socio-economic data. These data are managed by NASA’s Earth Science Data and Information System (ESDIS) Project and processed, archived, and distributed through discipline-specific Distributed Active Archive Centers (DAACs). In Fiscal Year 2015, EOSDIS delivered more than 1.42 billion data products to more than 2.6 million data users around the world.
Since NASA is not alone in collecting, managing, and distributing federally-funded civil Earth observing data, these data exist in different locations and may be in different formats, use different metadata standards, and be searchable using different data discovery applications. Through BEDI, these data will become more standardized, easier to find, and usable across more processing systems. What this means for users of EOSDIS data and data products is continual improvements and refinements to EOSDIS systems and programs that will facilitate easier use of these data and the ability for these data to be used by a broader base of unique user communities.
NASA BEDI Basics
NASA’s strategy for achieving BEDI objectives has four main elements:
- A focus on the pieces necessary to enable BEDI rather than end-user applications or the creation of new data products;
- A drive toward community-driven, open standards for data formats, interfaces, and protocols as the key to achieving interoperability of data products within EOSDIS and with other U.S. federal agencies;
- An effort to design and execute BEDI-related work activities so that the resulting output and products are beneficial and useful to both NASA and other U.S. federal agencies; and
- A strategy to leverage current NASA Earth observation plans and priorities to accelerate what NASA already is doing, planning to do, or wants to do.
The implementation of these elements involves not only enhancements to data held by EOSDIS DAACs, but also to EOSDIS services for searching, accessing, and using these data, such as Earthdata Search and Global Imagery Browse Services (GIBS). These enhancements will be integrated into a larger federal framework called the Common Framework for Earth Observation Data, which is a set of recommended standards and practices that all federal agencies will follow. As of the end of August 2016, EOSDIS DAACs participating in the BEDI data processing part of this effort (which includes 11 of the 12 DAACs) have completed more than 88% of required BEDI tasks, including finishing a review of 1,118 datasets that will be included in EOSDIS BEDI efforts. These datasets represent more than 9,000 unique data products.
NASA BEDI Specifics: Enhancing EOSDIS Data Usability
A key BEDI-wide strategy to enhance data usability is the organization of data collections into twelve Societal Benefit Areas, or SBA. The BEDI SBA align with nine environmental fields adopted internationally by the Global Earth Observation System of Systems (GEOSS) project conducted by the Group on Earth Observations (GEO), of which the U.S. is one of more than 100 member nations. The BEDI SBA are overarching environmental fields of interest (such as disasters, energy, climate, and agriculture) that organize Earth science data into discrete fields where they can be more easily accessed and discovered by the global user community. EOSDIS DAACs are mapping individual datasets in their collections into these SBA, including ensuring that dataset imagery are available through GIBS, when applicable.
Just as EOSDIS will continue to increase the number of images available through GIBS, data users also can expect improvements to clients that use GIBS imagery, like Worldview. Recent Worldview enhancements based on SBA include the ability to search for imagery related to specific hazards and as well as by science disciplines.
Data usability also is improved through systems designed to deliver data rapidly and with minimal processing for use in managing time-critical events, such as wildfires and ice floe assessments. One such service is the EOSDIS Land, Atmosphere Near real-time Capability for EOS (LANCE), which provides imagery of Earth observations generally within three hours of an observation. While not intended for scientific research, LANCE products help further the BEDI objectives of making Earth science data more easily usable by a broader base of user communities, such as resource managers, policy analysts, and local governments.
NASA BEDI Specifics: Enhancing EOSDIS Data Discoverability
A critical element to accomplishing BEDI goals and objectives is ensuring that the metadata associated with federal civil Earth observation data are complete and consistent. Improving EOSDIS data discovery begins with a thorough review of the metadata associated with EOSDIS data and datasets to verify that they are complete and meet international metadata standards.
Metadata are data about data, and include (but are not limited to) data attributes such as quality, lineage, and acquisition parameters. Metadata are used in all aspects of NASA’s Earth science data lifecycle, from initial measurements to the search and discovery of processed data. Earth observing missions use metadata in science data products when describing information such as the instrument/sensor, operational plan, or geographic region sampled. DAACs use metadata for preservation, access, and distribution of data and data products.
As the EOSDIS data collection grew over the years, this led to EOSDIS metadata based on multiple and disparate systems, each requiring different formats and different mechanisms for submitting and updating data entries. This not only reduced the value of the metadata, but led to users having difficulty discovering relevant data and datasets. To correct this problem, EOSDIS created the Common Metadata Repository (CMR).
The CMR is a single, shared, scalable metadata repository for all NASA Earth science data that merges all existing capabilities and metadata from existing NASA Earth science metadata systems, such as the Global Change Master Directory (GCMD) and the EOSDIS’ Common Metadata Repository (CMR). In addition, the CMR serves as the definitive management system for EOSDIS metadata, and includes metadata from EOSDIS data collections as well as from Earth science data collections outside EOSDIS (such as the GCMD). CMR metadata also are formatted to meet the International Standards Organization (ISO) 19100 series of standards, which applies to geophysical metadata (such as ISO 19115 and 19139).
Assembling metadata from Earth observing data collections into a single repository based on international metadata standards requires a review of DAAC dataset metadata, including metadata that are part of ECHO and GCMD datasets. This review includes the use of tools and techniques to:
- Compare metadata recommendations and dialects
- Identify the structure of metadata collections
- Compare the structure of metadata collections
- Evaluate and measure metadata completeness with respect to recommendations
- Evaluate and measure metadata completeness with respect to specific organization goals
As of the end of August 2016, 95.8% of ECHO and GCMD BEDI metadata have been evaluated for completeness and accuracy; this process is still ongoing for CMR datasets and no metrics are currently available. Ensuring that EOSDIS metadata are complete, accurate, and adhere to international standards vastly improves the discovery, access, and use of Earth science data across organizations and, through this, significantly enhances data discovery.
The CMR also is the foundation for EOSDIS Earthdata Search. Earthdata Search provides access to EOSDIS services for data discovery, filtering, and visualization, and uses the CMR to conduct sub-second searches through the entire EOSDIS metadata catalog. Once BEDI is fully implemented, an EOSDIS data user will be able to use Earthdata Search to discover data across multiple agencies.
Along with ensuring that EOSDIS metadata are complete, easily searchable, and based on international standards, data discoverability is further enhanced by registering appropriate DAAC data with a digital object identifier (DOI). A DOI is a unique sequence of numbers and letters that identify an object, such as a dataset or journal article. DOIs are assigned and regulated by the International DOI Foundation (IDF) and based on international standards (ISO 26324, Digital Object Identifier System). According to the IDF, approximately 130 million DOIs have been assigned worldwide as of June 2016.
Registering a DOI with an object, such as an EOSDIS dataset, greatly enhances the discoverability of an object. Once registered, an object’s DOI remains fixed, whereas the object’s location and other metadata may change. Referring to an online dataset by its DOI provides a more stable linking than simply referring to it by its web address and makes it discoverable by anyone with an internet connection.
Researchers that acquire product data files should be able use the DOI to find the definitive documentation from NASA’s Scientific and Technical Information archives. Adding DOIs to product metadata also enables tools for provenance tracking and allows data users to find more information about the creation of the data product. Additionally, a DOI gives appropriate credit to dataset authors. Datasets from the EOSDIS DAACs participating in the BEDI data processing part of this effort are being evaluated and submitted for DOI registration. As of the end of August 2016 this effort was more than 97% complete.
NASA BEDI Specifics: Enhancing EOSDIS Data Accessibility
The BEDI objective of improving data accessibility ensures that once users discover the data they need they will be able to easily download and open these data files. While scientific data may be written in a wide range of data formats, a 2013 Executive Order requires that U.S. government data, including civilian Earth observing data, must be available in formats that are open and machine-readable. An open format is one that is platform independent and publically available without restrictions that could prevent the re-use of information; machine-readable means that the data are in a form that a computer can process.
NASA launched its code directory (https://code.nasa.gov) in January 2012, and publishes open source projects through this portal. In addition, NASA uses multiple public, open source development repositories at SourceForge and GitHub to host NASA open source software releases. The NASA Open Source Agreement (NOSA) provides for public release of NASA-funded software. Since 2003, more than 60 NASA software projects have been released under NOSA. More detailed information about how NASA is addressing the 2013 Executive Order is in NASA’s Open Government Plan 2016 and at the interactive open.NASA.gov website.
Converting existing Earth science data into formats that are open and machine readable is a time-consuming process. Fortunately, open standards designed for scientific data already are available that meet BEDI objectives by enabling users to access these data across multiple platforms and systems.
One open data standard utilized by EOSDIS is the Open Source Project for a Network Data Access Protocol (OPeNDAP). OPeNDAP enables users to easily access and transport scientific data and provides simple, remote access to large collections of datasets via the internet. OPeNDAP allows large, rich, complex collections of NASA Earth science data to be quickly filtered and viewed on a user’s desktop or mobile device. EOSDIS has applied BEDI resources to helping the OPeNDAP group make improvements to their software and enhance OPeNDAP capabilities. In addition, EOSDIS DAACs have been tasked with identifying and selecting BEDI datasets that are appropriate for integration with OPeNDAP. As of the end of August 2016, 879 datasets are supported or will be supported in OPeNDAP; 98.5% of selected DAAC BEDI datasets have been fully integrated into OPeNDAP.
Another EOSDIS effort to address BEDI accessibility requirements is making application program interfaces, or API, based on open standards. API are sets of requirements that govern how applications talk to other applications, and make it possible to move information between programs. EOSDIS has numerous APIs available that allow developers to create systems to use EOSDIS data. EOSDIS is working to have standard data formats utilized for DAAC datasets and for API.
What will BEDI success look like? Aside from being able to use a search engine from one government agency to search the entire collection of U.S. civil Earth observing data, BEDI will mean a new framework for collecting data that will rely on standardized metadata across agencies, standardized formats for these data, and the availability of these data in products designed for all levels of data user—from expert users to first-time explorers.
NASA’s efforts to improve EOSDIS data usability, discoverability, and accessibility and to meet BEDI objectives are gradually coming together. The CMR is now the overall management system for EOSDIS metadata, and new data imagery products are being added to GIBS. OPeNDAP-based servers and protocols are being implemented at EOSDIS DAACs, and more than 97% of EOSDIS datasets at DAACs participating in BEDI data processing efforts have registered DOIs. These EOSDIS efforts, along with the efforts of the other federal agencies involved in BEDI, are creating a more coherent, collaborative collection of Earth observation data that is available to users around the world.
Learn more about BEDI and NASA EOSDIS BEDI efforts:
Simplified Access to NASA Earth Science Data through OPeNDAP
Metadata Recommendations, Dialects, Evaluation and Improvement
Share Data with OPeNDAP Hyrax: New Features and Improvements
Improving Accessibility and Use of NASA Earth Science Data
Implementing ISO 19115 Standards: NASA Earth Science Data
White House Office of Science and Technology Policy