Section 4. Computer Vision

Section 4. Computer Vision#

In this lesson, users will be given insight into how Computer Vision (CV) methods can impact health equity. This lesson includes an overview of common methods and research objectives, health equity challenges, and a more detailed case study.

Health Equity and Computer Vision

More and more automated computer vision tasks have been integrated into public health such as medical image segmentation, health monitoring, and diagnosis, which is why mitigating bias inherent in these methods is so important.
Computer vision methods may introduce new bias when using training data with an imbalanced class distribution. This is a common problem and leads to learning discriminating features that are biased toward the minority class.
Computer vision has a history of poor performance when discriminating based on gender and race/ethnicity that can lead to health disparities when not properly managed.

What is Computer Vision#

Computer Vision (CV) is the scientific field that gains insight into using computers for processing visual information to determine the shape, appearance, or identity of objects. Computer vision is used for tasks such as monitoring safety compliance (i.e. tracking workers in an area), biomolecular research, and diagnostics. Computer vision tasks such as object recognition are inherently prone to bias because they are an inverse problem, where unknowns are estimated using partial information. Probabilistic models are used to help disambiguate between potential solutions, which introduces bias into the system.

The coded gaze, which is implanted into AI-driven computer vision platforms sees the world through the programmatic bias entered into it. Where computer vision platforms are built to detect different people, the implications of bias could result in incorrect identification of individuals. Historically, computer vision platforms have had issues detecting people of color and discriminating gender [Buolamwini, 2018],[Schwemmer et al., 2020],[Wang et al., 2022].

Computer vision problems often consist of one of the following research objectives:

Image Preprocessing: Involves formatting images prior to being used to train a model. This can include methods such as image warping (used for image scaling, translation, and rotation or may be used for correction of image distortion) and image de-noising (used for the retrieval of an unknown original image from a noisy one).
Image Clustering: Image clustering algorithms are used for classification of objects in an image when there is no classification training data, making is a type of unsupervised learning. These clustering methods find natural groups in feature space.
Image Classification: Image classification includes labeled training data and can produce stronger results than clustering. It is a type of supervised learning.
Boundary Detection: defines the boundary between two or more objects or locations in an image.
Object Detection: an object is identified and includes the use of feature analysis and neural networks.
Face Detection: detects the location of a face in an image and includes the use of feature analysis and neural networks.

How Computer Vision is Used#

Some examples of computer vision problems within a public health context are:

Medical Image Segmentation: Supervised medical image segmentation is used in order to perform edge detection and object detection to automate the identification of anatomical structure and regions of interest.
Health Monitoring: Computer vision classification algorithms may be applied on unlabeled facial scans for predicting early symptoms of infection and illness.
Early Diagnosis: A computer vision model may be built to detect cancerous cells in human-annotated tissue image samples.

Computer Vision Methods#

Bias resulting in misidentification of biometric attributes such as gender, age, and ethnicity carries potential for great consequences. Therefore, common methods in object detection are emphasized in more detail below:

Object Recognition is a type of object detection. In object detection, instances of an object are identified in a series of images. In object recognition, the object itself is identified. Control for bias in object recognition is of extreme pertinence because misidentifying objects carries potential for great consequences. [Waithe et al., 2020]
Haar Cascades is a method of object detection that uses a collection of positive and negative samples to identify the object. Positive samples include the object and negative samples do not.
Face Recognition Facial recognition is a type of object detection in which faces are recognized. Control for bias in facial recognition is of extreme pertinence because misidentifying faces carries potential for great consequences. [Libby and Ehrenfeld, 2021]
Feature Analysis is a method of facial recognition in which individual features are used to identify the individual. Feature examples include distance between the eyes, length of the chin, tone of the lips, etc.
Convolutional Neural Networks (CNNs) are artificial neural networks with convolution and pooling layers that identify features. They are multilayer perceptrons, meaning that each neuron in a layer is connected to all neurons in the adjacent layer and this makes them prone to overfitting. Regularization curbs this effect. We suggest taking special care when choosing a regularizer, testing multiple options before choosing the optimal regularizer.
Adaptive Neuro-Fuzzy Interference System (ANFIS) is a type of artificial neural network that takes advantage of fuzzy logic principles and is reported to show high accuracy in facial recognition.
Eigenfaces is a method of facial recognition that uses dimensionality reduction of images. The eigenfaces form the basis of images used to construct the covariance matrix for a vector space of images. This basis may be used to reconstruct any of the original images in the set.
Fischerfaces is an improvement upon the eigenfaces method that can interpolate and extrapolate over lighting and facial expressions.
Thermal Face Recognition models learn the unique temperature patterns of the face. The ability of these models to extract features does not depend on the presence of makeup or glasses.

The table below contains an overview of common methods in computer vision. More experienced readers may want to jump ahead to the next section on health equity considerations, which includes health equity challenges with computer vision and a more in depth case study.

If you are already familiar with computer vision methods, please continue to the next section. Otherwise click here.

Model	Common Usage	Suggested Usage	Suggested Scale	Interpretability	Common Concerns
Image De-noising	Image Preprocessing	Suggested preprocessing for all images	Small to large datasets	High	Outliers Standardization Parameter tuning
Warping Images	Image Preprocessing	For reversal of distortion Image scaling, translation, and rotation, and morphing 2D or 3D transitions between images for film	Small to large datasets	Low	Outliers Standardization Parameter tuning
Content-based Image Retrieval	Image Preprocessing	Image search in database using exemplar image or semantic search	Small to large datasets	High	Missing values Outliers Standardization Parameter tuning
K-Means Clustering	Image Clustering	Clustering algorithm that separates a field of vectors into k clusters by their proximity to k centroids. Image classification when there is no classification training data	Large datasets with multiple variables, but not too many variables	Moderate	Overfitting and underfitting, depending on the number of clusters chosen One may choose to remove centroids with few elements
Hierarchical Clustering	Image Clustering	Agglomerative method- start with N clusters, where N is the number of observations and merge with each iteration Divisive method- start with one cluster and subdivide with each iteration Image classification when there is no classification training data	Large datasets Some publications suggest a minimum dataset size defined by N=2m, where m is the number of attributes	Moderate	Overfitting and underfitting Missing values Outliers Standardization Parameter tuning
Spectral Clustering	Image Clustering	Clustering method based on graph theory Uses spectrum (eigenvalues) to cluster in a reduced number of dimensions Image classification when there is no classification training data	Large datasets with too many dimensions	Moderate	Overfitting and underfitting Missing values Outliers Standardization Parameter tuning
K-Nearest Neighbors	Image Classification	This algorithm classifies each datapoint by the proximity of the point to k closest neighbors in a training dataset Image classification when there is classification training data.	Medium to large datasets	Moderate	Missing values Outliers Standardization Parameter tuning
Bayes Classifier	Image Classification	This algorithm classifies each datapoint using Bayes theorem and so it assumes independence between the probability of events. Bayes Theorem: `P(y\|X) = P(X\|y)P(y)/P(X)`, where previous events are used to identify the probability of future events. P(y\|X) is the probability of event y, given event X. Image classification when there is classification training data.	Medium to large datasets	High	Missing values Outliers Standardization Parameter tuning
Support Vector Machines	Image Classification	This algorithm classifies datapoints by determining a hyperplane that maximizes distance between clusters Image classification when there is classification training data. Identifying objects	Medium to large datasets	Moderate	Overfitting and underfitting Missing values Outliers Standardization Parameter tuning
Edge-Based Segmentation	Image Segmentation / Boundary Detection	Selecting some pixels (2D) or voxels (3D) of image Edge detection Object Detection Identify image edges using the pixels of the image.	Small to large datasets	Low	Missing values Outliers Standardization Parameter tuning
Threshold-Based Segmentation	Image Segmentation / Boundary Detection	Selecting some pixels (2D) or voxels (3D) of image Regions are classified by threshold values for pixel properties, such as intensity or color. Edge detection Object Detection Compares pixel intensity.	Small to large datasets	Low	Missing values Outliers Standardization Parameter tuning
Region-Based Segmentation	Image Segmentation / Boundary Detection	Selecting some pixels (2D) or voxels (3D) of image Regions are be classified by rules connecting pixels exhibiting similar properties. Edge detection Object Detection Locates groups of pixels by similarity to seed points.	Small to large datasets	Low	Missing values Outliers Standardization Parameter tuning
Haar Cascades	Object Recognition	Classification algorithm that analyses an image based on Haar wavelets as opposed to pixel intensity. A Haar wavelet is a sequence of rescaled "square-shaped" functions that form a wavelet basis. Image classification when there is classification training data. Identifying objects Dimensionality reduction is often used as a preprocessing step in object recognition.	Large datasets	Low	Overfitting and underfitting Missing values Incorrect Labeling of training dataset Training dataset being truly representative of the population
Feature Analysis	Object Recognition / Facial Recognition	An approach to classification based on the notion that perception of features of objects and faces is variable and can be used to categorize objects. Image classification when there is classification training data. Identifying objects and people Dimensionality reduction is often used as a preprocessing step in face recognition.	Large datasets	Low	Overfitting and underfitting Missing values Outliers Standardization Parameter tuning Training dataset being truly representative of the population Regularization
Convolutional Neural Networks (CNNs)	Object Recognition / Facial Recognition	The convolutional neural network processes the pixels representing color in the original image and condenses parts of the image into 3D tensors, which are stacks of feature maps. Image classification when there is classification training data. Identifying objects and people Dimensionality reduction is often used as a preprocessing step in face recognition. Processing for skin tone should be comparable for all ethnicities {cite:p}`10.1001/jamadermatol.2021.3129`.	Large datasets	Low	Overfitting and underfitting Missing values Outliers Standardization Parameter tuning Training dataset being truly representative of the population Regularization
Adaptive Neuro-Fuzzy Interference System (ANFIS)	Object Recognition / Facial Recognition	ANFIS combines neural networks with fuzzy logic, allowing nonlinear estimation. Image classification when there is classification training data. Identifying objects and people Dimensionality reduction is often used as a preprocessing step in face recognition. Processing for skin tone should be comparable for all ethnicities.	Large datasets	Low	Overfitting and underfitting Missing values Outliers Standardization Parameter tuning Training dataset being truly representative of the population Regularization
Eigenfaces	Facial Recognition	An eigenface is the eigenvector resulting from dimensionality reduction of a collection of face images using principal component analysis. Image classification when there is classification training data. Identifying people Dimensionality reduction is often used as a preprocessing step in face recognition. Processing for skin tone should be comparable for all ethnicities.	Large datasets	Low	Overfitting and underfitting Missing values Outliers Standardization Parameter tuning Training dataset being truly representative of the population Regularization
Fischerfaces	Facial Recognition	Fischerfaces is an eigenvector resulting from dimensionality reduction of a collection of face images using linear disciminant analysis. Image classification when there is classification training data. Identifying people Dimensionality reduction is often used as a preprocessing step in face recognition. Processing for skin tone should be comparable for all ethnicities.	Large datasets	Low	Overfitting and underfitting Missing values Outliers Standardization Parameter tuning Training dataset being truly representative of the population Regularization
Thermal Facial Recognition	Facial Recognition	Face recognition using infrared images. Image classification when there is classification training data. Identifying people Dimensionality reduction is often used as a preprocessing step in face recognition. Processing for skin tone should be comparable for all ethnicities.	Large datasets	Low	Overfitting and underfitting Missing values Outliers Standardization Parameter tuning Training dataset being truly representative of the population Regularization

Health Equity Considerations#

Below are definitions of common sources of bias in computer vision and descriptions on how to mitigate these biases in the context of public health. For a broader review of bias in machine learning, please see the section on Machine Learning, which includes lessons on supervised, unsupervised, and reinforcement learning.

Challenge

Challenge Description

Health Equity Example

Recommended Best Practice

Measurement Bias

Measurement bias occurs when the training data is not a good representation of the population or when there is a bias in the measurements made to create the data.

You are writing a K-Nearest Neighbor algorithm that classifies efficacy of acne treatment based on demographic background using image data. With only 30 Latino persons included total, incorrect labeling of ten people of the Latino population results in measurement bias.

To mitigate measurement bias the training data must be of good quality and the sample size of training data must be large. Additionally, care must be taken to ensure that the images used in the training data are of similar quality to the test images.

Selection Bias

Selection bias is a type of measurement bias that occurs when the training dataset is inherently not representative of the population.

You are building a computer vision platform that identifies cancer cells in an in vitro image of a tissue sample. You train your model on skin cancer cells and are asked to identify cancer cells in a brain slice. Because you trained your model on a different type of cell, your model might be subject to selection bias.

To mitigate selection bias we must have define the information that is pertinent to our population and make sure that our training dataset encompasses all the traits of interest.

Recall Bias

Recall bias is a type of measurement bias that results from inconsistent labelling of your training dataset. It is also the bias that results from failure to recall events accurately.

You are building a model that is trained to identify early signs of Covid-19 in video feed of people entering a building. To build your model, you train it with images of people who are healthy or sick. If some of the healthy people in your training dataset were actually sick, this would result in recall bias.

To mitigate recall bias it is encouraged to validate the labels by having multiple people assign labels for each image. If there is data in the training dataset that does not obviously fall under one of the labels, it should be removed. Cross-validation is a common method to prevent recall bias.

Confirmation Bias

Confirmation bias is the tendency of people to see what they expected to see. In terms of computer vision, it can apply both to incorrect labeling of the training set and to analysis of the results. It is a type of measurement bias.

You are labeling images of people classifying individuals by disease status: healthy or sick. You happen to know that there is an outbreak in a certain region. When classifying people from that region you increase the number of sick labels for healthy individuals.

To mitigate confirmation bias it is encouraged to validate labels by having multiple people assign labels for each image. A blind approach is recommended, where the person labeling the training dataset does not know about the people nor the reasons why the labels were chosen.

Demographic Bias

Demographic bias is any bias in your model that skews the demographics of your results. This term has been created for the emphasis that it requires in the field of computer vision because of the great potential of any demographic bias in computer vision platforms to impact society.

You are classifying flu status by demographic background using face image data. You find that your algorithm is much worse at identifying black women compared to white men.

You must take remove the source of bias and reanalyze your results. You must take care to ensure that a large enough sample size exists for persons of each demographic background. You may choose to normalize your data or add a regularizer.

Racial Bias

Racial bias is the skewing of racial representation in your results. It is a type of demographic bias.

In a comparison of different computer vision programs, Joy Buolamwini found that every program had higher accuracy in lighter-skinned individuals, with the error between lighter and darker skin ranging from 11.8% to 19.2% and that of the faces incorrectly gendered, 93.6% of them were dark-skinned (1).

To mitigate racial bias the sample size of the training dataset must be demographically representative of the population. It is important to include images of people that are demographically representative in terms of the variables that are important for your model. If you want to create a computer vision model that identifies disease, you must include the same number of images of people of each race in each potential disease state.

Gender Bias

Gender bias is the skewing of gender representation in your results. It is a type of demographic bias.

One research group found that a popular computer vision platform labelled a congresswoman as a “smiling” “television presenter” with “black hair,” whereas a very similar photograph of a male senator was labeled as an “official,” “businessperson,” and “spokesperson.”

To mitigate gender bias the sample size of the training dataset must be representative of all genders in the population. It is important to include images of people that are representative of gender in terms of the variables that are important for your model. If you want to create a computer vision model that identifies types of workers in a medical facility, you must include the same number of images of people of each gender working each position.

Overfitting and Underfitting

Underfitting occurs when there is not enough data for the model to capture the underlying trend and can result in a lot of bias. In overfitting the dataset is too large and your algorithm starts learning rules based on noise or incorrect labels.

You obtain map data, showing the location of flu occurrence in North America for different ethnicities. You decide to build a K-Means Clustering algorithm to classify the proximity of flu outbreak to the nearest city based on ethnicity. You find that your algorithm performs really well on your training data, but when you try to predict flu locations for the following year your results are not great. You realize you may have run into overfitting.

We mitigate underfitting by including a large sample size. Beware, that as you increase your sample size you run into the chance of overfitting, where your algorithm starts learning rules based on noise or incorrect labels. It is important to scrutinize your results to choose the appropriate sample size. To avoid overfitting we can use cross-validation, where we split our training dataset into k small segments and use each these segments as training datasets. We can also add features to avoid overfitting and remove features to avoid underfitting. A regularizer, which is added to constrain the model can also be used to avoid overfitting, but may also introduce bias into your model.

Outliers and Exclusion

Outliers are cases that diverge from the average by some predetermined amount and may be removed. When we remove outliers we must take care to avoid exclusion bias, which is underrepresentation of removed features.

You obtain map data, showing the location of flu occurrence in North America for different ethnicities. You decide to build a classification algorithm to classify the proximity of flu outbreak to the nearest city based on ethnicity. You remove outliers from your dataset and then find out that 90% of your removed outliers were of Latino background. This is an example of exclusion and racial bias.

If you find that outlier removal systematically shifts your results you must redesign your model. In our case, where 90% of the outliers are Latino suggests that the Latino population lives at a distance that is cut off by our outlier definition. In this case, analyzing results for the Latino population separately might be the solution. Or, one may open the city epicenter to include the region where the Latino population lives. The outlier cut-off may be lowered to include this population.

Case Study Example#

Case study is for illustrative purposes and does not represent a specific study from the literature.

Scenario: A researcher wants to explore the spread of influenza in the workplace with the help of a computer vision platform.

Specific Model Objective: Build a computer vision platform based on a labeled image dataset composed of masked facial recognition data to support downstream modeling in order to analyze the effect of demographic background on the spread of influenza.

Data Source: Masked facial recognition data coupled with thermal screenings was obtained for 5000 people of known vaccine status from 50 corporate office sites, one in each state. Racial distribution of participants in the training dataset were as follows: black (36%), white (44%), latino (18%), and asian (2%).

Analytic Method: A Convolutional Neural Network (CNN) was trained to perform facial recognition on masked participants.

Results: Leveraging the CNN model, a predictive model achieved an overall accuracy of 96% for racial classification and overall accuracy of 92% for health monitoring (sick vs healthy status).

Health Equity Considerations: Computer vision platforms used for facial recognition should be able to correctly identify demographic data, including people of all races equally. The failure to recognize some races may skew results and therefore public health recommendations or policies and lead to disparities. Below are additional considerations for this study:

There is an imbalance in the training data that leads to a large false positive rate for asian people. The computer vision platform for this study was written to recognize people wearing masks using a training dataset that consisted of only 2% asian participants. This is indicative of demographic bias as the algorithm has learned to classify black and white people with much higher accuracy.
- Effects of this bias means that researchers may conclude that influenza spread is lower among the asian population than what it is in reality since fever measurement data is not properly tracked.
- Facial recognition technology can produce racially biased results if the underlying data used is not representative of the community being served. One way to mitigate this bias is to ensure the training dataset has a large enough sample size with an acceptable minimum number of samples representing each possible race in the dataset.
In the analysis, outliers were defined as samples lying outside of three standard deviations from the mean. Post-analysis, it was observed that all of the outliers removed were of a specific minority population. By removing them they have introduced demographic bias into their system. To mitigate this bias, consider redefining the outlier boundary or develop a separate models for different populations.
When tested on unseen individuals, it was found that recall bias was present in predicting healthy vs infected individuals. This indicates that fever readings may have been inaccurate and the training data and improperly skewed toward a sick population with mislabeled data. Additional information such as a survey of how participants are feeling or PCR test could be useful in applying correct labels.
Finally, researcher may also want to carefully consider the metric used during model training and evaluation. For example, one might choose a specific test to reduce misidentification of participant race (reduce false positives). On the other hand, one might prioritize a sensitive test to detect fever/infection at the expense of misclassifying an individual as ill when they are in fact healthy (more false positives).

Considerations for Project Planning

Is your data set diverse in terms of geographic and demographic attributes? How have you identified and characterized any outliers?
Does your data have training labels and how have you validated that the annotations are accurate?
How have you validated your model's performance in order to mitigate bias? Does it show signs of overfitting? Does it perform better for certain groups over others?