Section 5. Geospatial Data#

Geospatial data includes data that is linked in some way to a specific geographic area or location. By connecting data to a geographic location, multiple data elements can be analyzed and assessed, including data sources such as demographic data, environmental data, and others.

Health Equity and Geospatial Data

  • Many aspects of geospatial data are important to analyzing public health data. Many geospatial data features have important links to Social Determinants of Health and other demographic features, essential to informing public health research and policy.
  • Several aspects around health equity are of importance when utilizing geospatial data, including biases in data (such as those related to historic disparities and disease prevalence), lower quality data availability in lower population/rural areas, and changes in population distributions, which can in some cases outpace data capture and availability.

Common Types of Geospatial Data#

The table below describes several common geospatial data types.

Type Common Usage Example         
Spatial Data: Vector Points, lines, or polygons which describe geospatial features. Latitude, longitude
Spatial Data: Raster Pixel data (or grid data) where each cell may represent different terrain or aspects of the terrain. Satellite imagery
Attribute/Tabular Data The most commonly used geospatial data in most data analytics. Essentially, non-geospatial characteristics are connected to geospatial elements.
  • Lung cancer rates by U.S. county
  • The demographic group counts by census block groups
  • High and low daily temperatures by zip code
Metadata Additional information provided about geospatial data
  • Data source and provenance
  • Scale and size
  • Number of elements

[Harbour, n.d., IBM, n.d.]

How Geospatial Data is Used#

Some examples of geospatial data within the context of public and population health are:

  • Mapping and Allocating Resources during Emergency Responses

  • Tracking Health Disparities and Social Determinants of Health

  • Environmental Conditions and Issues

  • Mapping Disease Spread or Prevalence

Mitigating Bias in Geospatial Data#

  • Determining the ideal resolution (i.e. comparing a small town with limited samples vs. a state where data granularity may be lost)

  • Performing cross-validation to ensure the data is valid when compared with other datasets

  • Using up-to-date maps and sources. Geospatial data may rapidly change as cultural, political, and environmental boundaries shift.

[Burney, n.d.]

Health Equity Considerations#

While geospatial data is a key resource for understanding demographic trends and patterns, researchers should also be conscious of potential health equity concerns.

Challenge

Challenge Description

Heath Equity Example

Recommended Best Practice

Cultural contexts which may differ from historic geographic/political boundaries

Current political boundaries may not represent traditional, linguistic, or tribal boundaries. Essentially, geospatial boundaries can change over time and those changes should be considered in working with geospatial data.

A researcher is studying smoking rates in the Navajo Nation tribal areas. The data for smoking rates is only available at the county level. For many historic reasons, tribal boundaries do not align with county geographic lines, making it challenging to render accurate results with these data.

Investigate current and historic contexts and events. Use multiple and diverse maps and sources, and be sure to provide a cultural context in your explanation.

Issues with rural/lower-density population areas

Some considerations with rural/lower density population data are that data may not be available or it may be of lower quality. There could be gaps in radar and satellite coverage, leading to inaccurate data and limited data availability.

A researcher is studying infant mortality among different racial groups. They find that the cell sizes are particularly small in the West North Central division, and must be weighted to be represented appropriately.

Depending on the study, smaller population areas may have to be removed, combined with areas, or weighted appropriately, to ensure they are fairly represented.

Population movement and interaction across geospatial boundaries

Populations that live in border regions, either between states or nations are somewhat fluid, and especially at a human level could potentially be accounted for in multiple geographic regions.

A researcher is studying influenza in the workplace in NY. The researcher does not account for several vectors occurring in neighboring NJ, so several key factors in workplace disease were missed in studying a region with a high density of workers from other states.

Consider finding the most updated sources when possible. Where feasible, consider using larger geographic regions such as divisions and regions.

Population changes outpacing census and other data sets, effects of a rapid immigration and migration

Another potential issue with geospatial includes populations changing over time, and data capturing methods lagging in time. This could be especially pertinent in areas where there was rapid change due to a natural disaster or other factors including health, economy, environment, etc.

A researcher is estimating the number of hospitals and hospital workers across different European countries. The researcher failed to take into account the influx of migrants into several countries, which increased not only the population of patients, but also the size of the worker pool.

Use updated and multiple data sources, and be sure to include contextual descriptions in your research results.

Case Study Example#

Case study is for illustrative purposes and does not represent a specific study from the literature.

Scenario: HK is a public health researcher studying food deserts, and the potential resulting health outcomes of the populations residing nearby.

Specific Model Objective: The goal is to develop an understanding of potential food deserts, based on several factors such as the location of grocery stores, farmer’s markets, convenience stores, etc., in conjunction with population density.

Data Source: HK uses data from the U.S. Census, BRFSS, Google APIs, and from other sources, such as web-scraping.

Analytic Method: HK creates a simple index based on the factors mentioned above.

Results: HK can identify food deserts with moderate success, and creates a data visualization showing them in different cities across the U.S.

Health Equity Considerations:

  • HK considers access to healthy food as a social determinant of health, therefore, in their study, they should consider factors like obesity, heart disease, and diabetes rates.

  • HK may want to also consider broader systemic issues that may be occurring such as issues related to race, ethnicity, income, etc.

  • HK may also want to consider other geographic features that impact access to health food such as, weather, public transportation, and clean water access.

Considerations for Project Planning

  • How will you deal with geographic aras that have little to no data points? Do you plan on collapsing into larger regions?
  • What other health equity concerns may be present in the geospatial data you are using?
  • What strategies do you plan to use to mitigate these risks?

Resources#