ESDS Program

Advancing an Open-Access Repository for Earth Observation Training Data and Machine Learning Models

Principal Investigator: Seyed Hamed Alemohammad, Open Imagery Network, Inc.

Recent advancements in machine learning (ML) techniques have enabled analysis of massive Earth observation (EO) datasets and, in so doing, have the potential to improve decision making by governments, businesses, and non-governmental organizations. However, these techniques also pose new challenges, not the least of which is the lack of diverse and open training datasets and pre-trained models. To address these issues, Radiant Earth Foundation established Radiant MLHub, an open-access geospatial training data repository where anyone can discover and download ML-ready training datasets. This is an essential resource that will allow the research community to share their datasets with other, so they can not only reproduce results, but also enhance them by benchmarking their models against the same dataset.

Radiant Earth will expand Radiant MLHub to address further challenges faced by users in the EO/ML community. The objectives of this project are to enable sharing and retrieval of containerized ML models on Radiant MLHub, develop a Python client to enhance usability of the Radiant MLHub API, and expand the existing training data catalog by generating the first multi-mission (Landsat 8, Sentinel-2, and Sentinel-1) global land cover training dataset as a benchmark for developing new ML models.

Radiant Earth will achieve these goals by collaborating with the EO and ML community of practice across sectors, including academia, government, commercial, and nonprofit organizations. The proposed features, developed in consultation with these expert stakeholders, will ensure that resources are properly targeted. Here, each is described in detail:

  • Expand Radiant MLHub by providing an API endpoint for sharing and discovering ML models. This element will enable users to search Radiant MLHub for trained ML models that are registered on the platform and then pull an inference-ready containerized model object, along with the scripts needed to preprocess or reorganize the training data chipset so it can be input into the model. Moreover, a portal will be provided for users who want to register their model on Radiant MLHub to submit GitHub repositories, containing models and their code, along with other required documentation.
  • Develop a Python client for Radiant MLHub API: Radiant MLHub has a REST API that users can interact with using HTTP requests. This is currently possible within Python or any other programming language; however, users need to write their own API request codes within the program. Developing an open client will increase the API’s usability and enable Radiant Earth to provide new features for accessing training datasets and models on Radiant MLHub. This element of the proposed work will focus on designing and developing a Python client that users can easily install and use for interacting with the API.
  • Generate a global land cover classification training dataset from Landsat 8, Sentinel-2, and Sentinel-1. Radiant Earth Foundation is about to release a training dataset of global land cover classes based on Sentinel-2 data, as well as labels generated using a hybrid approach that marries model predictions with human verification. In this work, Radiant Earth proposes to augment that dataset with input data from Landsat 8 and Sentinel-1 to create the first global multi-mission and multi-modal (multispectral and radar data) land cover classification training data at 10-meter spatial resolution.
Last Updated
Sep 24, 2020