“With the model trained using archived MODIS images, we can now stand on the shoulders of our predecessors not only for technology, but also for knowledge collectively learned from the past,” said Dr. Jie Gong, science lead for the project.
SatVision-TOA is based on the transformer neural network artificial intelligence (AI) architecture originally developed by Google that later became the backbone for large language models (LLMs). The SwinV2 architecture that SatVision-TOA model employed basically gives computers the ability to learn patterns and assign meaning to them in a similar fashion to how human brains function.
When the Transformer came out, it caught the attention of Mark Carroll and the Data Science Group he leads at NASA Goddard.
“A couple of people on my team came to me and said they wanted to learn more about transformers,” said Carroll. “I said let’s use MODIS because we’re familiar with it, we have a huge amount of the data, and it’s important to Goddard.”
At the same time, Gong became aware of the potential of the functional model because of her research interests in machine learning and cloud remote sensing. Gong and Carroll soon began collaborating and focused on training the foundation model on the popular MODIS data, knowing that if they could get the functional model to work well with MODIS, future missions could end up using the model as well.
The data the team used for training the model comprised 100 million randomly selected samples from Level 1B MODIS data (MOD021KM v6.1) images from the past 25 years recorded by the Terra satellite. They chose images from 14 spectral channels MODIS has in common with the similar Advance Baseline Imager (ABI), aboard the GOES-R weather satellites, to increase the model’s potential use.
The functional model was written and debugged by the Data Science Group at NASA Goddard over the course of six months. Then, the team headed to the Oak Ridge National Laboratory in Tennessee to do a full training run of the model with MODIS data on the facility’s Frontier supercomputer.
“When we first started training the model, we didn’t tell it what is the ground or what are clouds; we just let it learn the patterns in the 100 million all-sky images,” said Gong. “Then we started masking — hiding — random pixels in images and making the model predict what should be next to the visible portions to fill in the jigsaw puzzle.”
After SatVision-TOA made its best predictions, the team calculated the difference between the computer’s filled-in image and the actual MODIS image. If the difference was a lot, they then adjusted the model’s settings to increase its accuracy. Once the model was predicting well, Gong and Carroll started teaching it what clouds, ground, water, and other things look like.
The team developed two versions of SatVision-TOA — Huge and Giant — with different numbers of inputs, or “parameters.” The most accurate, high-fidelity version was Giant, which has three billion parameters. (For comparison, in the human brain one parameter would equal one neuron. The human brain has approximately 100 billion neurons.)
With SatVision-TOA now proven capable of recognizing features in MODIS data, new collaborators are now attempting to use the functional model to characterize aerosols beneath clouds, such as dust storms transported by tropical storms, and measure cloud properties including cloud top height and optical depth.
For those interested in learning more about SatVision-TOA, the model architecture and model weights are available on GitHub and Hugging Face, respectively. For more information and a detailed user guide, see the white paper “SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery.”