Finding Needles in a Satellite Imagery Haystack

A partnership between NASA's IMPACT and SpaceML led to the development of an open-source framework that simplifies the discovery of satellite imagery in NASA Worldview.

NASA embraces open science. NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT) works to enable open data for NASA tools such as Worldview, which gives users access to over 450 terabytes of satellite imagery. Open data is critical to research. Before embarking on a scientific study related to particular phenomena, such as wildfires, scientists need to collect numerous examples of these phenomena. Locating global examples requires searching through 197 million square miles of satellite imagery across more than 20 years of data. Such an effort can produce a valuable trove of data, but the act of manually searching the data is cumbersome and laborious. Making large amounts of data more discoverable and usable for specific parameter extraction is a hard problem. A question such as “Can we use new techniques, such as self-supervised learning, to tackle our data discovery problem?” has a number of hidden questions:

• Can we find a needle in a haystack?
• Can we teach a machine to search fine-grained data without labels?
• Can we get artificial intelligence (AI) to present examples to a human when it gets confused?
• Can we scale up the search from gigabytes to terabytes to petabytes?
• Can we create tools that make it simple to ingest the data?
• Can we learn to represent rare events?
• Can we teach AI to focus on the interesting parts?
• Can we search several years of data covering the entire planet in under a second?

Words SpaceML with colored circle to the left.


To tackle these questions, IMPACT embraced an open science approach and partnered with the SpaceML initiative, an international AI accelerator for citizen scientists and a branch of Frontier Development Lab in partnership with NASA, the SETI Institute, and Trillium Technologies Inc. SpaceML engages early career research engineers and connects them with mentors who are senior machine learning (ML) and software engineering experts. Current participants range from high school graduates and graduate students to industry professionals, along with contributors from non-traditional computer science academic backgrounds, including two high school teachers transitioning their careers to data science.

Anirudh Koul, the founder of SpaceML, explains the driving impetus behind this initiative:

Each contributor is motivated by the impact they can have on the planet. And when determination finds opportunity and guidance, hard problems start to crack open. Reducing the time of manual data curation from several months to hours or even minutes opens new avenues of scientific exploration previously considered impractical. By making it available in open source as another tool in scientists’ toolbox, we hope to accelerate the process of making scientific discoveries.

This collaborative partnership with SpaceML produced a generalizable package of machine learning operations (MLOps) components and workflows that can be utilized not only by Earth science tools and applications such as Worldview, but also by other teams working on datasets from NASA’s Hubble Space Telescope to the NASA Solar Dynamics Observatory. Users do not need to understand programming or even ML to benefit from MLOps. Further, the collaboration embraced the goal of developing the underlying ML components from technology readiness level (TRL) 3, the point of sound software engineering, to TRL 9, a flight-ready and deployed solution.


Flowchart diagram showing imagery flow from GIBS archive on left to final image search output on right.
The NASA Worldview imagery search pipeline. NASA's Global Imagery Browse Services (GIBS) provides rapid access to more than 900 satellite imagery products that can be interactively explored using NASA Worldview. Green boxes indicate components created through the SpaceML initiative that can be used individually or in combination, and that easily can be applied to other imagery collections. For information about individual tools such as GIBS Downloader and Swipe Data Labeler, please see the SpaceML Repo. NASA IMPACT image.


Image of cloud at top; returned imagery of clouds similar to the top image are below with a label indicating the returned imagery can be downloaded.
Example of an image search using the current MLOps prototype. After entering an image (top), the MLOps prototype finds similar imagery that can then be downloaded. NASA IMPACT image.


James Parr, the director of the Frontier Development Lab, explains the value of the Worldview image search pipeline this way:

We’re realizing that deploying mature machine learning outcomes for one need requires the same cost and effort as building solutions for multiple use-cases. SpaceML is the expression of that idea: a toolbox for space AI applications that makes it easy to get going on your specific problem and endlessly useful.

The result of this effort is a set of open science tools that simplify the use of NASA’s Earth science archive for machine learning. By partnering with SpaceML, IMPACT also inspires a new generation of ML engineers to apply their ingenuity to making a difference to life on Earth.

More information about IMPACT can be found on the IMPACT project website.

Article originally published 29 March 2021 on the IMPACT blog and reprinted with permission.

Additional Resources:

NASA GIBS/Worldview Similarity Search

Last Updated