Applying AI to MODIS Data Analysis

4 MIN READ

Charlie Plain

May 9, 2025

Feature Article

Facebook

Artificial intelligence systems can rapidly analyze massive amounts of NASA’s Earth data, discover embedded information, and make the valuable measurements even more useful. To make the most of this technology, researchers with Goddard Space Flight Center in Greenbelt, Maryland, recently trained supercomputers to learn information from one of NASA’s marquee datasets, Moderate Resolution Imaging Spectroradiometer (MODIS) imagery. The center’s Data Science Group used machine-learning techniques to train a model to learn visual patterns and discern atmospheric features such as clouds or dust from land, water, and other features in millions of MODIS images.

This powerful generative artificial intelligence (AI) model, called SatVision-TOA (Top-of-Atmosphere), can make very accurate predictions to complete the shape of objects in obscured images and quickly identify features for analysis. SatVision-TOA has broad applications including cloud property retrieval, land cover mapping, flood and disaster monitoring, urban planning, and environmental analysis. What’s more, the model can potentially be applied to imagery from other instruments with technology similar to MODIS.

This image has four frames. The first frame is labled Original MODIS image and show gray and white clouds over a purple background. The second image say Masked Image and is a random mix of pink computered-generated pixel squares, gray-to-white cloud squares, and purple background squares. A majority of the image is colored with pink squares. The third image says Huge Model Version and is similar to the first image but missing details. The fourth frame says Giant Model Version and includes more detail.

The SatVision-TOA model was trained by taking actual MODIS truth images and randomly masking them to hide details. Two versions of the model — Huge and Giant — were then challenged to try to complete the image based on their best predictions. The second, Giant version with three billion parameters completed the picture most accurately. Credit: Jie Gong and Mark Carroll.

“With the model trained using archived MODIS images, we can now stand on the shoulders of our predecessors not only for technology, but also for knowledge collectively learned from the past,” said Dr. Jie Gong, science lead for the project.

SatVision-TOA is based on the transformer neural network artificial intelligence (AI) architecture originally developed by Google that later became the backbone for large language models (LLMs). The SwinV2 architecture that SatVision-TOA model employed basically gives computers the ability to learn patterns and assign meaning to them in a similar fashion to how human brains function.

When the Transformer came out, it caught the attention of Mark Carroll and the Data Science Group he leads at NASA Goddard.

“A couple of people on my team came to me and said they wanted to learn more about transformers,” said Carroll. “I said let’s use MODIS because we’re familiar with it, we have a huge amount of the data, and it’s important to Goddard.”

At the same time, Gong became aware of the potential of the functional model because of her research interests in machine learning and cloud remote sensing. Gong and Carroll soon began collaborating and focused on training the foundation model on the popular MODIS data, knowing that if they could get the functional model to work well with MODIS, future missions could end up using the model as well.

The data the team used for training the model comprised 100 million randomly selected samples from Level 1B MODIS data (MOD021KM v6.1) images from the past 25 years recorded by the Terra satellite. They chose images from 14 spectral channels MODIS has in common with the similar Advance Baseline Imager (ABI), aboard the GOES-R weather satellites, to increase the model’s potential use.

The functional model was written and debugged by the Data Science Group at NASA Goddard over the course of six months. Then, the team headed to the Oak Ridge National Laboratory in Tennessee to do a full training run of the model with MODIS data on the facility’s Frontier supercomputer.

“When we first started training the model, we didn’t tell it what is the ground or what are clouds; we just let it learn the patterns in the 100 million all-sky images,” said Gong. “Then we started masking — hiding — random pixels in images and making the model predict what should be next to the visible portions to fill in the jigsaw puzzle.”

After SatVision-TOA made its best predictions, the team calculated the difference between the computer’s filled-in image and the actual MODIS image. If the difference was a lot, they then adjusted the model’s settings to increase its accuracy. Once the model was predicting well, Gong and Carroll started teaching it what clouds, ground, water, and other things look like.

The team developed two versions of SatVision-TOA — Huge and Giant — with different numbers of inputs, or “parameters.” The most accurate, high-fidelity version was Giant, which has three billion parameters. (For comparison, in the human brain one parameter would equal one neuron. The human brain has approximately 100 billion neurons.)

With SatVision-TOA now proven capable of recognizing features in MODIS data, new collaborators are now attempting to use the functional model to characterize aerosols beneath clouds, such as dust storms transported by tropical storms, and measure cloud properties including cloud top height and optical depth.

For those interested in learning more about SatVision-TOA, the model architecture and model weights are available on GitHub and Hugging Face, respectively. For more information and a detailed user guide, see the white paper “SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery.”

Applying AI to MODIS Data Analysis

Details

Last Updated

Published

Find Data

By Platform

By Topic

Data Catalog

Data Tools