Skip to main content

This document defines Hierarchical Data Format 5 (HDF5)—a data model, file format, and I/O library designed for storing, exchanging, managing, and archiving complex data including scientific, engineering, and remote sensing data.

Status

The HDF5 data model, file format and library is an approved standard recommended for use in NASA Earth Science Data Systems. HDF5 1.6 was approved in January 2007.

Specification Document

ESDS-RFC-007: HDF5 Data Model, File Format and Library – HDF5 1.6

ESDS-RFC-007: Appendix A, HDF5 File Format Specification 

ESDS-RFC-007: Appendix B, HDF5 API Reference Manual

HDF5 1.6, Evidence of Implementation

User Resources

HDF Group: HDF5 Software Documentation - Primary entry point to the HDF Group's HDF5 tools and libraries.

Dataset Interoperability Recommendations for Earth Science - best practices to reduce and bridge gaps between the geoscience data formats widely used at NASA and elsewhere, and to improve dataset compliance, discoverability, and extensibility with relevant metadata conventions.

NASA Earth Science Community Recommendations for Use

Strengths

HDF and HDF-EOS data formats, software libraries, and application programming interfaces (APIs) have been widely used for NASA earth observation mission data for many years. The latest version of HDF—HDF5—is the current or planned data format for missions including Orbiting Carbon Observatory 2 (OCO-2) and the Joint Polar Satellite System (JPSS), totaling many tens of terabytes of data. Users cite many strengths, including:

  • Widespread planned use for NASA Earth science data;
  • Data users read only the data that they need, not the whole file;
  • Data producers can put images, tables, multidimensional arrays, etc., into the same file;
  • Users do not need to be concerned with the platform in which the data are produced;
  • Its limited primary structures (i.e., groups and datasets) makes the file design simple;
  • Ample metadata can be added to the file, groups, and dataset, making the file self-describing;
  • Data files can be internally compressed using different schemes, allowing for better data storage and usage;
  • The ability to store data compactly, yet allow it to be read on any platform;
  • Source code for writing and reading data in the format is widely and publicly available;
  • Supported by many third-party applications, such as Interactive Data Language (IDL) and MATLAB;
  • Support for a rich set of data types, including composite and user-defined data types;
  • Support for extensions and profiles, including HDF-EOS5.

Weaknesses

HDF5 is undeniably complex and requires a significant learning curve. However, users also applaud the quality of documentation and help-desk support. Third-party tools with HDF5 support, such as IDL and MATLAB, also help hide complexity from users. Users have expressed concern about the availability of long-term support for HDF5 and related tools, but this concern is somewhat alleviated by the availability of the source code.

Applicability

HDF5 is used for data archiving and distribution. The strengths cited above, together with the availability of analysis tools, make the format suitable for data analysis as well. The new netCDF 4.0 will include the capability to use HDF5 as the data storage layer for the netCDF API, with the addition of many new features in HDF5 such as user defined types, multiple unlimited dimensions, and per-variable data compression. This merger of the two formats will further extend the HDF5 user community.

Limitations

A major limitation for HDF5 is the loss of backward compatibility with HDF4 and earlier versions. Also, unlike less complex formats, users cannot read the HDF5 files directly without using the HDF5 software library. Of greater concern are recent postings on mailing lists discussing the use of netCDF and HDF5 in high-performance computing applications with thousands of processors using parallel I/O. Commenters warn of the danger of file corruption during parallel I/O if a client dies at a particular time. The HDF Group is aware of this problem and is addressing it.

Overall, HDF5 is a widely used data format with a well-defined specification that provides a standard way of storing and working with science data. The ESDS-RFC-007 TWG thus recommends its endorsement by the SPG as an Earth Science Data Systems Standard.