Dr. Dan Runfola, Associate Professor of Applied Science, William & Mary
Research Interests: Remote sensing, human migration, computer vision (training computers to analyze imagery), and climate change.
Research Highlights: Every day, scientists around the globe use Earth observation data from NASA satellites to investigate and monitor a wide array of atmospheric, marine, and terrestrial phenomena. From the top of the atmosphere to (beneath) Earth’s surface, the data they get from the satellites circling the globe or perched in geostationary orbit provide information on everything from the planet's energy budget and the temperature of ocean waters to the presence of soil moisture and the location of fissures below ground. But what can satellite data tell us about ourselves, the human beings who call Earth home?
The answer to that question is quite a lot, and the information satellite data provide goes far beyond the high-resolution corrected reflectance (i.e., true color) imagery from the Harmonized Landsat Sentinel-2 (HLS) project or the clever use of Nighttime Lights imagery from the Visible Infrared Imaging Radiometer Suite (VIIRS).
Among the researchers using satellite data to study humanity is Dr. Dan Runfola, associate professor of applied science at William & Mary in Williamsburg, Virginia.
“There is remarkably little information about the socioeconomic status of individuals in some of the most vulnerable regions in the world,” said Runfola. "This is due to a range of reasons, from conflict to political capacity, but is most commonly because local governments lack the capacity to implement regular surveys. While a few third-party groups have attempted to resolve this through funding in situ surveys, the coverage is exceptionally limited spatiotemporally. This challenges our ability to both conduct historic analyses of how social systems have evolved over time and accurately target contemporary efforts to improve human conditions around the world.”
For Runfola and his colleagues, the wealth of data offered by Earth-observing satellites offers a means of surmounting this challenge.
“My work exploits the idea that the physical spaces we create to live and work in reflect many underlying cultural, social, and economic factors. They do not manifest randomly, but are a product of human goals, desires, and economic capabilities,” he said. “Because these spaces can be observed with satellite imagery, we have found it possible to estimate a selection of socioeconomic factors using satellite imagery alone.”
Included among these socioeconomic factors is income level. While Runfola acknowledges it might be hard to believe that researchers can estimate how much people earn based on a satellite image of where they live, it’s the correlation between desire and economic potential that makes such estimates possible.
“There are some very strong linkages you can see [in satellite imagery] that I think are really self-evident, such as paved roads. If you have paved roads, that means you have the equipment to pave roads and the money to acquire that equipment. So, some of these correlations are very direct,” he said. “Others are a bit more vague, such as how big are the houses. That might be correlated with certain levels of income in certain parts of the world, but the correlations are much more context-dependent. So, there are blatant correlations and then more subtle correlations.”
Spurred by the potential of their estimations, Runfola and his colleagues are researching the development of new models specifically designed to investigate these types of socioeconomic issues. At the same time, they’re evaluating the limitations of such approaches and analyzing their methods for any biases. This entails everything from evaluating the strengths and weaknesses of different satellite sensors and tweaking models to considerations about data infrastructure and investigating the use of new technologies like machine learning.
“There’s a huge range of research topics we engage with, including developing large-scale data pipelines, contrasting the capability of [data] products with higher resolutions or additional bands, and innovating new deep learning architectures to train these algorithms,” he said. “We are big proponents of deep learning techniques because they enable us to capture relationships that may not be immediately evident in context-dependent ways. The tradeoff is that understanding the output of these models is harder than what we’re used to, so we develop data and tools to try to better expose how these models are working so we can have more confidence in any decisions made on the basis of the model [outputs].”
The data used by Runfola in his research are from a variety of sources, including NASA’s Socioeconomic Data and Applications Center (SEDAC), one of the Distributed Active Archive Centers (DAACs) in NASA's Earth Observing System Data and Information System (EOSDIS). SEDAC is operated by the Center for International Earth Science Information Network (CIESIN), a unit of the Earth Institute at Columbia University based at the Lamont-Doherty Earth Observatory in Palisades, New York, and synthesizes Earth science and socioeconomic data and information in ways useful to a wide range of decision-makers and other applied users. With an extensive population, sustainability, and geospatial data archive, SEDAC serves as an “information gateway” between the socioeconomic and Earth science data and information domains, and provides access to many multilateral environmental agreements.
“We use the Gridded Population of the World product [from SEDAC] to better understand the impact of projects implemented by the United Nations Food and Agriculture Organization and the Global Environmental Facility, and to understand the environmental effects of economic interventions by the World Bank and other institutions,” he said. “On the satellite side, it depends on the use case. We use Landsat [data] for our census estimation work because it has an historic record; we’ve even experimented with using sensors such as MODIS to explore relationships between spatial resolution and accuracy.”
As a case in point, Runfola and a team of his William & Mary colleagues used SEDAC and Landsat data in a 2022 paper presenting a deep learning-based data fusion technique that integrates satellite and census data for estimates of human migration from Mexico to the United States.
Although this type of research is typically conducted by augmenting quantitative census data with qualitative information from interviews and other techniques, Runfola and his colleagues adopted a different approach. “[We] built a new model that integrates census and satellite data. So, even though you have two types of data—one’s a number, one’s an image—this deep learning model can take both of these in, integrate them, and then use that information to try to predict how likely it is that people will come to the southern border of the United States,” he said.
Runfola and his research team found their approach outperformed more traditional methodologies by approximately 10%, which suggests such multimodal data fusion provides a valuable pathway forward for modeling migratory processes.
“If you just take census data, which is the traditional way to do it, you can get 60% to 70% accuracy. But if you add satellite data, you get a bump in accuracy of about 10%,” Runfola said. “What that means is you’re picking up trends in the satellite data that are not reflected in the census data. We believe this is because you can see things like droughts and crop failure very clearly. You can see the water levels in reservoirs changing, you can see how slums have grown over time. All of these signals help to augment the more traditional analysis.”
Landsat imagery also played a role in a paper Runfola, his William & Mary colleague Dr. Anthony Stefandis, and graduate student Heather Baier published in Remote Sensing Letters (2021). This paper presents a case study in which the researchers estimated school test scores in the Philippines (2010, 2014) and Brazil (2016) based solely on publicly available satellite imagery.
“The goal was, using just three sources of imagery for every school—Landsat imagery, Google Maps imagery, and Google Street View imagery—can we predict the test scores the schools are likely to achieve? It turns out that works pretty well,” said Runfola. “If you go down to the level of are you an above- or below-average school, you can get about 80% accuracy levels. Obviously, predicting exact [test] scores and how one school ranks relative to another school is much harder, but if you group these things, you can get pretty good answers just from the images.”
The results of this study are significant, Runfola said, because the approach he and his colleagues developed helps address the lack of educational achievement data, which UNICEF and other organizations need to help them better target their aid resources.
“In cases where we have limited data on school performance, which by the way is the vast majority of the world, this allows us to estimate what works based on these pictures,” he said. “And we know that not only does it work in the Philippines, but with some fine tuning, it transferred pretty well to Brazil. It’s not to the point where you can take it off the shelf and apply it somewhere else—it has to be tuned to individual regions—but that's what we're working on next. Can we build a generalized model?”
In addition to filling gaps in critical data, Runfola is also developing deep learning approaches that use satellite imagery to make labor-, resource-, and time-intensive processes, like mapping shoreline structures (i.e., ripraps, groins, breakwaters, etc.), more efficient.
“The traditional approach to collecting and mapping shoreline structures consists of a GPS field survey to collect coordinates and attribute information of coastal structures, and then manually delineating the structures and extracting basic feature information from remotely sensed data or other available digital images. These processes require time-intensive, in situ surveys and well-trained technicians to carry them out,” write Runfola and Zhonghui Lv, a Ph.D. student in William & Mary’s Virginia Institute of Marine Science, in a paper published in Computers and Geosciences (2023). “While the manual approach to assessing coastal shoreline inventories can generate spatially explicit and highly-resolved outputs, the decade-long process of generating such information can also inhibit contemporary accuracy . . . producing a large gap between the generated data product and the real-world shoreline structure topography.”
To remedy this, the researchers evaluated the effectiveness of deep-learning approaches that mapped shoreline structures using satellite imagery from Landsat, Digital Globe, and other sources. After testing a range of system architectures using a dataset of more than 10,000 observations of four classes of shoreline structures, they found that one approach—a ResNet18-based Pyramid Attention Network (PAN) architecture—achieved an overall accuracy of 72%.
“We’re currently around 70% accuracy, which helps humans [delineate these structures] much quicker, but we’re not all the way at the finish line of full-on automation,” Runfola said. “[Our approach], which is integrated into an ArcGIS tool (pyShore), is now being used in the Chesapeake Bay area. It doesn’t use any of [the ArcGIS] deep learning approaches, but it still works within their interface, which is helpful.”
Such results are meaningful, given the challenges of shoreline structure mapping as it’s currently performed and the shoreline change that typically occurs during its years-long production processes. Therefore, the adoption of an approach like the one identified in their paper would not only make shoreline structure mapping more efficient, it also would produce more timely assessments.
Bringing disparate sources of data together and using them in systems that yield much needed information or improve existing processes is the over-arching aim of Runfola’s research.
“The term we use for this is multimodal data integration. So, you have data from Google Street View, you have data from satellites, you have data from censuses, you have written words—all these different sources of data,” he said. “How do we integrate them all in a meaningful way to create better estimations than we’ve been able to in the past?”
The answer to that question depends on the problem to be solved and the data available to study it, but one thing is certain: if there’s a dataset in SEDAC’s extensive archive of population, sustainability, and geospatial data that’s pertinent to the issue, the DAAC will make sure it’s available for Runfola, his colleagues, and anyone else who needs it.
Representative Data Products Used or Created:
Available through SEDAC:
- Gridded Population of the World, Version 4, Revision 11
- Global Roads Open Access Data Set (gROADS), Version 1 (1980 -2010)
Other data products used:
- MODIS/Terra Surface Reflectance Daily L2G Global 1km and 500m SIN Grid V061
- MODIS/Terra Vegetation Indices 16-Day L3 Global 250 m SIN Grid
- NASA Shuttle Radar Topography Mission Global 1 arc second V003
- VIIRS Suomi National Polar-orbiting Partnership Gap-Filled Lunar BRDF-Adjusted Nighttime Lights Daily L3 Global 500m
- Armed Conflict Location & Event Data Project (ACLED)
- European Union Joint Research Center Global Accessibility Map and Data
- Geographically based Economic Data (G-Econ)
- HydroSHEDS Database
- Institute for Health Metrics and Evaluation Global Burden of Disease (GBD)
- Landsat Data Access (USGS)
- Global Land Analysis & Discovery Landsat Analysis Ready Data
- NOAA Precipitation Frequency Data Server (PFDS)
- Uppsala Conflict Data Program
- World Database on Protected Areas (WPDA)
Read about the Research:
Lv, Z., Nunez, K., Brewer, E., & Runfola, D. (2023). pyShore: A deep learning toolkit for shoreline structure mapping with high-resolution orthographic imagery and convolutional neural networks. Computers & Geosciences, 171: 105296. doi:10.1016/j.cageo.2022.105296
Runfola, D., Baier, H., Mills, L., Naughton-Rockwell, M., & Stefanidis, A. (2022). Deep Learning Fusion of Satellite and Social Information to Estimate Human Migratory Flows. Transactions in GIS, 26(6): 2495-2518. doi:10.1111/tgis.12953
Brewer, E., Kemper, P., Lin, J., Hennin, J., & Runfola, D. (2021). Predicting Road Quality using High Resolution Satellite Imagery: A Transfer Learning Approach. PLoS One, 16(7): e0253370. doi:10.1371/journal.pone.0253370
Runfola, D., Stefanidis, A., & Baier, H. (2021). Using Satellite Data and Deep Learning to Estimate Educational Outcomes in Data Sparse Environments. Remote Sensing Letters, 13(1): 87-97. doi:10.1080/2150704X.2021.1987575