ESDS Program

HDF-EOS5 Data Model, File Format and Library

Summary

Hierarchical Data Format -Earth Observing Systems (HDF-EOS) is a software library designed to support NASA Earth Observing System (EOS) Earth science data. HDF is the Hierarchical Data Format developed by the National Center for Supercomputing Applications (NCSA). Specific data structures which are containers for science data are: Grid, Point, Zonal Average and Swath. These data structures are constructed from standard HDF data objects, using EOS conventions, through the use of a software library. A key feature of HDF-EOS is a standard prescription for associating geolocation data with science data through internal structural metadata. The relationship between geolocation and science data is transparent to the end-user. Instrument and data type independent services, such as subsetting by geolocation, can be applied to files across a wide variety of data products through the same library interface. The library is extensible and new data structures can be added. This document describes a proposed standard for HDF-EOS5 Grid and Swath structures, which is based on the HDF5 data model and file format, provided by the HDF Group. The HDF Group was part of the NCSA until July 2006, at which time it began full operations as a non-profit 501(c)(3) company.

Status

The HDF-EOS5 Data Model, File Format and Library is an approved standard recommended for use in NASA Earth Science Data Systems in January 2007. Minor updates to correct typographic errors and URLs that had become unreachable were made in May, 2016.

Specification Document

HDF-EOS5 Data Model, File Format and Library (v1.1)

User Resources

HDF-EOS Tools and Information Center

Putting some Spark into HDF-EOS—use cases and a tutorial about processing HDF5 files with Apache Spark.

Dataset Interoperability Recommendations for Earth Science—best practices to reduce and bridge gaps between geoscience dataset formats widely used at NASA and elsewhere, and to improve dataset compliance, discoverability, extensibility with relevant metadata conventions.

Examples of Implementation

NASA Earth Science Community Recommendations for Use

Both HDF-EOS and HDF-EOS5 data formats, software libraries and application programming interfaces (APIs), are already widely used in NASA Earth Science Data Systems. While the APIs for these two related data formats are nearly identical, HDF-EOS5 is built on the more feature-rich HDF5 format, which is also the basis for the new netCDF 4. Therefore, we recommend that any new HDF-EOS data sets be implemented in HDF-EOS5. The TWG bases its recommendation on an analysis of the following factors in a NASA context:

Strengths

HDF and HDF-EOS have been widely used for NASA earth observation mission data for many years. The latest version of HDF-EOS, HDF-EOS5 is the data format for four instruments on NASA's Aura satellite. Users cite many strengths, including:

Widespread use of HDF-EOS formats for NASA Earth science data. Reviewers cite 10s of Terabytes of data in HDF-EOS5, with thousands of users.

HDF-EOS5 inherits the benefits of HDF5, including open-source software support, internal compression, portability, support for structural data, self-describing file metadata enhanced performance over HDF4/HDF-EOS2, and xml support. To these, HDF-EOS5 adds full support of earth science data types. All these factors make it a flexible data format which can be easily mapped to complex earth science data.

Reviewers note that using the HDF-EOS API is much easier than using HDF5 directly.

The HDF-EOS library enforces adherence to a specific HDF profile. By using the HDF-EOS library, developers create files which have a specific format. Also, as the developers on Aura discovered, adherence to an even more stringent set of specifications can lead to even more conformity and allow for easier data sharing.

The HDF-EOS API allows users to easily migrate from HDF4-based files to HDF5-based files. This migration would be much more difficult without the HDF-EOS API hiding most of the HDF4 to HDF5 API changes.

HDF-EOS5 takes full advantage of the HDF5 library and file format. It can handle very efficiently huge volumes of data in the current and in the emerging computational environments without any changes to the HDF-EOS5 applications.

Source code for writing and reading data in the format is publicly available.

HDF-EOS5 Data files are also readable by theHDF5 library and tools which support HDF5. Several reviewers mentioned that they use IDL very effectively with data in HDF-EOS5.

Weaknesses

HDF-EOS5 is undeniably complex, and requires a significant learning curve. Users have also expressed concern about the availability of long-term support for HDF-EOS5 and related tools, but this concern is somewhat alleviated by the availability of the source code.

One challenge is that the HDF-EOS5 package actually consists of multiple libraries, including HDF-EOS5, HDF5 and the SDP Toolkit maintained by different organizations. When one encounters a problem or has a question, it is not always clear which organization needs to be contacted.

While HDF-EOS5 provides a valuable profile of HDF5, it still allows data to be stored in non-standard ways.

One reviewer cites problems that are encountered on files which have been SZIP compressed. But this isn't just related to HDF5/HDF-EOS5 it also applies to HDF/HDF-EOS with SZIP compression.

Another reviewer identified a problem with the earlier version of HDF-EOS, which occurs when a file contains 2 or more grids, and the grids each contain identically named fields. This file structure is supported by the HDF-EOS interface, but users of tools which in turn use the basic HDF4 interface are not able to distinguish between them. It is not clear whether this remains a problem with HDF-EOS5.

Applicability

HDF-EOS5 is used for archive and distribution of Earth Science data. The strengths cited above, together with the availability of analysis tools, make the format suitable for data analysis as well. As a notable example of the use of HDF-EOS5 in NASA Earth Science Data Systems, the instrument teams from the Aura satellite jointly developed an HDF-EOS5 profile for their datasets, thus facilitating data sharing from four coincident instruments. Coordinated development and use of specific HDF-EOS profiles should be strongly encouraged.

Limitations

Reviewers note that development of HDF-EOS5 necessarily lags behind its parent HDFF5 format. Users may be affected when a new feature is added to HDF5 which is not readily supported through the current HDF-EOS5 interface. As HDF5 continues to be actively developed, it is important that HDF-EOS5 be maintained just as actively. Further, the level of technical support available to new users of HDF-EOS5 has dropped, which may limit its adoption by new data providers. Other limitations noted by users include:

HDF-EOS5 is not supported by many third party applications such as Interactive Data Language (IDL) and MATLAB. However, HDF-EOS5 data can be read with the HDF5 interfaces that are more frequently supported.

HDF allows parallel I/O while HDF-EOS does not.

Suggestions for enhancements to address current limitations include: Both forward and backward compatibility are important. In particular, tools built with new releases must be capable of reading data files written with older versions. No improvement can compensate for orphaned or lost data. The HDF-EOS5 API is very similar to HDF-EOS4, but not identical. NASA should consider making the HDF-EOS5 library fully backward compatible.

A set of quality assurance (QA) tools should be developed which analyze a target dataset to verify that it is a lexically and syntactically correct HDF-EOS5 formated dataset. The QA tools should be both distributed as open source and made available as a Web based service.

Overall, HDF-EOS5 is a widely used data format that provides a standard way of storing and working with science data. The ESDS-RFC-008 TWG thus recommends its endorsement by the SPG as an Earth Science Data Systems Standard.

Last Updated