Air pollution is a serious problem around the globe. According to data from the World Health Organization, almost all of Earth’s population (99%) breathes air that exceeds WHO guideline limits and contains high levels of pollutants, with low- and middle-income countries suffering from the highest exposures.
Although there are many toxins that can adversely affect human health, the pollutants thought to pose the biggest risk to public health include fine particulate matter (PM2.5), ozone (O3), and nitrogen dioxide (NO2). PM2.5 is especially concerning, as these small particles (designated as having a diameter of less than 2.5 micrometers) can penetrate deep into the lungs, enter the bloodstream, and travel to organs causing damage to tissues and cells. Further, the Global Burden of Disease study, a publication of the Institute for Health Metrics and Evaluation at the University of Washington School of Medicine, reports that exposure to high levels of air pollution is a significant cause of premature death worldwide.
To assist public health, environmental, and air quality researchers in their investigations of pollution’s effects on human health, NASA’s Socioeconomic Data and Applications Center (SEDAC) created an Air Quality Data for Health-Related Applications data collection that currently consists of three data products. The datasets were developed by a team of researchers from Harvard University’s T.H. Chan School of Public Health (SPH), led by Dr. Joel Schwartz, Professor of Environmental Epidemiology. The three datasets are:
- Daily and Annual PM2.5 Concentrations for the Contiguous United States (2000–2016), offering predictions of PM2.5 concentrations in grid cells at a 1-kilometer (km) spatial and daily temporal resolution for the years 2000 to 2016. It was created with a generalized additive model that accounts for geographic difference to ensemble daily predictions from the machine learning models incorporating multiple predictors, including satellite data, meteorological variables, land-use variables, elevation, chemical transport model predictions, several reanalysis datasets, and other predictors. The annual predictions were calculated by averaging the daily predictions for each year in each grid cell.
- Daily 8-Hour Maximum and Annual O3 Concentrations for the Contiguous United States (2000–2016), containing estimates of ozone concentrations at a 1-km spatial and daily temporal resolution for the years 2000 to 2016. These predictions incorporate various predictor variables, such as O3 ground measurements from the U.S. Environmental Protection Agency (EPA) Air Quality System monitoring data, land-use variables, meteorological variables, chemical transport models, and remote sensing data, along with other data sources. The annual predictions were computed by averaging the daily 8-hour maximum predictions in each year for each grid cell.
- Daily and Annual NO2 Concentrations for the Contiguous United States (2000–2016), offering daily predictions of NO2 concentrations at 1-km spatial and daily temporal resolution for the years 2000 to 2016. An ensemble modeling framework was used to assess NO2 levels with high accuracy, which combined estimates from three machine learning models with a generalized additive model. Predictor variables included NO2 column concentrations from satellites, land-use variables, meteorological variables, predictions from two chemical transport models (GEOS-Chem and the U.S. EPA Community Multiscale Air Quality Modeling System), along with other ancillary variables. The annual predictions were calculated by averaging the daily predictions for each year in each grid cell.
SEDAC, which is hosted at Columbia University’s Center for International Earth Science Information Network (CIESIN), is the NASA Earth Observing System Data and Information System (EOSDIS) Distributed Active Archive Center (DAAC) responsible for archiving and distributing socioeconomic data in the EOSDIS collection. SEDAC synthesizes Earth science and socioeconomic data and serves as an “information gateway” for a wide range of decision-makers and other applied users, including those working in the disciplines of public health and epidemiology. In addition, SEDAC supports the dissemination of third-party datasets and, with the guidance of its User Working Group, has developed comprehensive submission guidelines for authors to better enable the hosting of datasets such as this important air quality triad.
“There has long been a recognition that satellite data could inform the measurement air pollutants such as PM2.5, NO2, and O3,” said Dr. Alex de Sherbinin, SEDAC deputy manager. “PM2.5 in particular is one of the leading killers in the world. If you look at the global burden of disease, it’s among the top causes of premature death, particularly in regions like Asia where pollution levels are high owing to coal-fired power plants. Dr. Schwartz’s data are important for many users in the public health and environmental fields, and a complement to a number of global gridded products hosted by SEDAC that measure average PM2.5 and NO2 concentrations at annual time steps.”
NO2, one of a group of highly reactive gases known as nitrogen oxides (NOx), is known to cause significant respiratory conditions. NO2 primarily gets in the air from the burning of fossil fuels, including emissions from cars, trucks and buses, power plants, and off-road equipment. According to the EPA, brief exposure to high concentrations of NO2 can irritate respiratory airways and aggravate respiratory diseases, particularly asthma, resulting in coughing, wheezing, or difficulty breathing. NO2 and other NOx gasses can also react with other chemicals in the air to form both particulate matter and ozone, and with water, oxygen, and other chemicals in the atmosphere to form acid rain, which can have significant impacts on ecosystems such as lakes and forests.
Ground-level O3, an important component of smog, is another pollutant responsible for health impacts in children and people with existing respiratory conditions. It is not emitted directly into the air but created by chemical reactions between NOx and volatile organic compounds in the presence of sunlight. O3 is most likely to reach unhealthy levels on hot, sunny days in urban environments and unlike PM2.5, short-term maximum exposure is considered more injurious to human health than average daily exposures. The people at greatest risk of harm from high concentrations of ground-level O3 include those with asthma and other respiratory ailments. Sensitive vegetation, including agricultural crops, can also be affected.
Yet, while the promise of using satellite data to detect concentrations of ground-level air pollutants has been well known, finding the optimal way to achieve it was a challenge.
“In the case of particulate matter, the issue became one of figuring out how to translate data on aerosol optical depth (AOD) from MODIS and other instruments,” de Sherbinin said. “Measurements of AOD basically tell you what’s the concentration in an air column, but if aerosols are up at 10,000 feet or 20,000 feet, they’re not affecting human beings from a health perspective.”
Among the first to solve this translation problem were researchers led by Dr. Randall Martin (formerly of Dalhousie University in Nova Scotia, Canada, and now at Washington University in St. Louis, Missouri), who began using total column AOD measurements as inputs into atmospheric models to determine levels of exposure at ground level, where humans are actually breathing the air and whatever pollutants it might contain.
“The main point behind these datasets is that just taking the satellite data itself off the shelf and trying to estimate health effects are not going to get you very far,” de Sherbinin said. “You need to run the data through the atmospheric models to be able to identify what’s going on at ground level where people are being exposed.”
According to Schwartz, obtaining accurate exposure data is critical to understanding the relative health effects of different particle components so they can be addressed more effectively. So he and his SPH colleagues developed a model designed to predict exposure levels for major air pollutants across the U.S.
“Previously, the only available exposure data was from Randall Martin’s team, and they provide some PM components on a 1-km grid. However, their model is based on chemical transport simulations and those do not include many metal pollutants,” said Schwartz. “Our model offers 50-meter gridded resolution in urban areas, which contain greater than 80% of the population, and includes many metals, which are important because toxicological studies show they can have major effects on human health.” This 50-meter urban data, which is not now included in the Air Quality Data for Health-Related Applications collection, will be available from SEDAC soon.
The SPH model can also capture data on particles from traffic, which can vary substantially in a few hundred meters and identify exposure hot spots. Further, because they were created with satellite observations, these model-derived datasets offer NO2, O3, and PM2.5 exposure with wider spatial coverage than datasets from ground-based monitors.
“In the past, researchers have used air pollutant concentrations measured at monitoring stations, primarily located in urban areas, as population-averaged exposure measurements,” said Dr. Yaguang Wei, a Postdoctoral Research Fellow in the SPH’s Department of Environmental Health and an author of the datasets. “The model we developed predicts major air pollutant levels across the United States, in urban and non-urban areas, with much higher spatio-temporal resolution compared with monitored data.”
Given their improved resolution, the datasets of the Air Quality Data for Health-Related Applications collection are already being used by epidemiologists and those working in other health disciplines. They’re also of benefit to environmental scientists, urban planners, and those in other fields who are in need of more precise and ready-made products offering measurements of air pollutant concentrations at ground level.
To assess that usage, SEDAC manages a citations database, which is a collection of about 6,000 known publications citing SEDAC data. Although many of the citations pertain to SEDAC’s gridded population data, more than 10% cite policy-oriented environmental indicator datasets available from SEDAC that are based in part on satellite-derived data. Several of these indicators, such as the Environmental Performance Index (EPI) and the Natural Resource Management Index (NRMI), are regularly used by organizations like the Millennium Challenge Corporation to make better-informed decisions.
“Our goal is to get a lot of these satellite-derived metrics, which I would call Level 5 products because they’re processed beyond a traditional Level 4 product and convey real information about an environmental parameter,” de Sherbinin said. “These Level 5 products would be ready for policy applications. Someone could make a decision based on them because they pertain to actual parameters of direct interest to policymakers.” SEDAC’s collection of environmental indicators serves this purpose.
According to de Sherbinin, SEDAC’s PM2.5 dataset is already being used this way.
“We have also used the PM2.5 data in some of our own indicator work like the EPI, which is another dataset that we disseminate through SEDAC, that was generated with Yale University,” said de Sherbinin. “The EPI has global reach and has influenced a number of governments, such as the municipality of Seoul, to enact stricter air pollution policies.”
These datasets are also valuable to those working in the arena of environmental justice, as higher rates of exposure to air pollutants have been reported in some localities with high percentages of people of color and residents with low household incomes.
“Our recent hyperlocal data across the 3,535 U.S. urban areas at 50-meter spatial resolution would be an important resource to reveal exposure disparities, as the concentrations of some pollutants can have substantial variation over small areas,” said Wei. “Furthermore, our data provide opportunities to evaluate the health impacts of air pollution among individuals living in rural areas, who are understudied due to the lack of monitoring sites. Those people have poor access to quality healthcare and usually face greater health consequences.”
To help the user community make the most of these datasets, the data have been made available in both GeoTIFF and Reference Dataset (or RDS) formats, meaning that most common geographic information system (GIS) and statistical tools should be able to access and process them.
Users should know, however, that the datasets are large and SEDAC has not yet generated any subsetting or web-map tools. The data will be available via open services from SEDAC this fall.
“The U.S. grids are at a very high resolution and have a high number of time steps, so instead of one grid for the entire year, you’ve got daily time steps and for some urban areas up to 50-meter resolution,” de Sherbinin said. “So, these datasets are prime examples of why cloud-based processing could really be a benefit to the science community. In fact, as we migrate more SEDAC data to the cloud, we’ll make getting these into the cloud a high priority.”
The datasets of SEDAC’s Air Quality Data for Health-Related Applications collection also provide a good illustration of how the data obtained by NASA satellites can be used to aid decision-makers working in the disciplines of public health and environmental equity.
“NASA satellite data have tremendous potential for targeting public health interventions and environmental justice remediation, but they do need additional processing to be useful to the professionals working at the front lines of these issues,” de Sherbinin said. “These three datasets have had that additional processing, so that’s why we feel they are of use to the large community of users interested in these issues.”