It has never been easier or quicker to explore Earth. With a computer and a fast internet connection, researchers can virtually travel thousands of miles in a day or traverse impossibly difficult-to-reach places to learn more about our planet's features, processes, and history thanks to the more than 100 petabytes (PB) of data openly available through NASA's Earth Science Data Systems (ESDS) Program. To make retrieving, analyzing, and using these data more efficient, ESDS is adopting a new Open Geospatial Consortium (OGC) data standard for using Geographic Tagged Image File Format (GeoTIFF) imagery files in cloud environments.
GeoTIFFs are Tag Image File Format (TIFF) raster imagery files that are combined with georeferenced metadata regarding the image's location and other important information. The new Cloud Optimized GeoTIFF (COG) standard being adopted by NASA allows for efficient streaming and partial downloading of georeferenced imagery and grid coverage data on the web, and enables fast data visualization and geospatial processing workflows. COG-aware applications can also efficiently stream or download only the parts of the information they need to visualize or process web-based data.
"Imagine that you want to look at just North America, but you have to download the entire globe and then crop to it; that’s a lot of extra data that you’re downloading," said Dr. Alexey Shiklomanov, an expert in cloud computing technologies for Earth science data at NASA's Goddard Space Flight Center in Greenbelt, Maryland. "With Cloud Optimized GeoTIFFs, these data are sliced up into individual tiles that can be retrieved piece by piece."
A good example is when GeoTIFFs are used with a visualization and analysis platform, such as Google Earth Engine. With a COG, users can retrieve tiles for the specific location they want to see and arrange them in layers to create the effect of zooming in and out of an image.
COG formats have been around for several years, as multiple groups experimented with different ways to tile data and make sure that existing tools would still work with the data. The recent OGC announcement standardizing the format and NASA adopting that standard recognizes the maturity and usefulness of this format.
Being Up Front About the Data
"The other big thing that the COG standard does is reorganize the internal content of the file to basically make it as easy as possible to get as much information as you can in as few requests as possible," said Shiklomanov.
The standard does this by addressing two key issues: One, when downloading from the internet, there is always latency—a time delay—for each trip back and forth to a server to retrieve data. Second, metadata are usually tagged and kept with its associated data. For example, if imagery has multiple bands, the file for each band must be explored to see its metadata. This means it can take a long time identify, get, sort, and select the exact data that you want from a large dataset.
"Cloud optimized GeoTIFFs do things differently by presenting all the metadata up front. So, the first time you go to grab the file, you request all the metadata, which then come in one single block. In a single request you can understand the entire file," said Shiklomanov. "Then, if you want to grab imagery for a location from a particular light band, you just make one additional request and ultimately only pay that time penalty twice."
In addition to allowing users to easily read metadata and download tiles of a COG, the standard has other significant benefits:
- The COG standard supports parallel access to different parts of an image, enabling even faster access to large datasets; this allows users to easily scale up their data access and analysis workflows by simply adding more computing resources
- COGs are backward compatible and tools that work with regular GeoTIFF files still work with COG files
- The COG ecosystem is growing rapidly with many tools, libraries, and services, including the Geospatial Data Abstraction Library (GDAL), QGIS, and ArcGIS, being compatible with the standard
Speaking of ArcGIS, one field embracing the COG standard is geographic information systems (GIS). GeoTIFF, in general, is the preferred format for GIS users because it is a simple, easy way to visualize imagery and raster data; almost any GIS tool can ingest GeoTIFF without error.
"In the most basic sense it is almost a guaranteed 'drag and drop' process versus some of the more complex scientific data file types, which may be supported but may not be translated correctly, projected to display on a map, or have the metadata readily available because of the file structure," said Leah Schwizer, team lead for NASA's Earth Science Data Systems GIS Team (EGIST). "COGs possess efficiencies for doing retrieval, visualization, analysis, embedding, and integration with apps that work in the cloud."
Implementing the COG Standard
Implementation of the COG standard is already in full swing for NASA Earth science data.
"The COG standard aligns very well with where Earthdata is heading: the Earthdata Cloud Evolution," said Dr. Yaxing Wei, lead scientist for NASA's Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC). ORNL DAAC is one of the key data archives implementing the COG standard at NASA. "The standard will help with achieving one of the initiative's main objectives, which is efficient and scalable in-cloud data access and analysis."
For example, users can already search for data products in COG by using the data format search filter in Earthdata Search. COG was recognized as an emerging standard by the NASA Earth Science Data and Information System (ESDIS) Project Standards Coordination Office (ESCO), and COG is one of the formats supported by NASA's Earthdata GIS (EGIS) and the Visualization, Exploration, and Data Analysis (VEDA) project. Many DAACs have started adopting COG as the standard option for all new data in GeoTIFF format, as well.
Wei points out that while COG addresses many important bottlenecks and makes working with GeoTIFF files more efficient, it also has its important considerations. For example, COG is not the best format for all data, such as for multiple-dimensional raster data. In these cases, Zarr and netCDF-4/HDF5 are more suitable formats.
With these considerations in mind, for many users and uses the new COG standard offers great enhancements to Earth science and GIS research and analysis. Through the standard, users will enjoy the benefits of georeferenced data in the cloud that offers them easier pathways to selecting precisely the data they need, efficiency and flexibility in access, in-place cloud analysis, and downloading in a modern format that is being widely adopted yet is still backward compatible with most GIS software.