Principal Investigator (PI): Chris Lenhardt, Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)
(Formerly Jerry Pan, ORNL DAAC)
Digital content, including Earth Science observations and model output, is an essential part of contemporary scientific research activities. Not only is the rate of archiving for such content increasing rapidly, but there is also an increase in derived and on-demand data product creation and consumption. As a result of these trends, scientific digital content has become even more heterogeneous in format and more distributed across the Internet. In turn, this makes the content more difficult for providers to manage and preserve and for users to locate, understand, and consume.
Specifically, it is increasingly harder to deliver relevant metadata and data processing lineage information along with the actual content, particularly when there are multiple ways of delivering the content, including the increasing use of web services. Readme files, data quality information, production provenance, and other descriptive metadata are often separated in the storage level as well as in the data search and retrieval interfaces available to a user. Critical archival metadata, such as auditing trails and integrity checks, are often even more difficult for users to access, if they exist at all.
We propose to address these challenges by using and extending the capabilities of a contemporary digital object repository to work for science data and metadata delivery. Digital repository technology has been used for digital libraries at great success, and we believe it can also be applied to the more complex needs of Earth Science data management. We will demonstrate this capability in the context of an existing modeling and synthesis data center project for the North American Carbon Program (NACP) as the primary science context and one of the more complex data projects for ORNL DAAC for Biogeochemical Dynamics as a second context.
There are three high-level objectives in this project:
- Demonstrate the applicability of a digital object repository technology to science data. Based on our preliminary work, we expect to couple the Fedora Repository and a Drupal-based Graphic User Interface (GUI) as key elements of a next-generation NASA Earth system science data center infrastructure, using datasets collected as part of the NACP Modeling and Synthesis Thematic Data Center (MAST-DC) as the primary science context.
- Use this implementation to enable better and more consistent access to critical metadata, including processing lineage information and administrative metadata, using the capabilities inherent in a digital repository (multiple streams for a given object and remote data streams). The enhanced metadata access ensures that science digital content becomes more transparent to the end user, with provenance and quality control information readily available.
- Demonstrate how data providers can more easily and effectively manage science data sets, associated metadata, processing lineage, and quality control/data provenance information. A consistent process, with associated user interfaces, application programming interfaces (APIs) can be used by the data provider to ingest, update, and modify a dataset for metadata changes or additional content dissemination revenues. Particularly in the context of ORNL DAAC data, this work will demonstrate potential technology migration paths for existing data operations.
In addition, the successful completion of this project will provide a foundation for the improved long-term preservation of NASA Earth Science data using open standards, which should prove useful in the ongoing development of Earth Observing System Data and Information Systems (EOSDIS) and the delivery of data for current and future missions.