Section 4. Computer Vision#

In this lesson, users will be given insight into how Computer Vision (CV) methods can impact health equity. This lesson includes an overview of common methods and research objectives, health equity challenges, and a more detailed case study.

Health Equity and Computer Vision

  • More and more automated computer vision tasks have been integrated into public health such as medical image segmentation, health monitoring, and diagnosis, which is why mitigating bias inherent in these methods is so important.
  • Computer vision methods may introduce new bias when using training data with an imbalanced class distribution. This is a common problem and leads to learning discriminating features that are biased toward the minority class.
  • Computer vision has a history of poor performance when discriminating based on gender and race/ethnicity that can lead to health disparities when not properly managed.

What is Computer Vision#

Computer Vision (CV) is the scientific field that gains insight into using computers for processing visual information to determine the shape, appearance, or identity of objects. Computer vision is used for tasks such as monitoring safety compliance (i.e. tracking workers in an area), biomolecular research, and diagnostics. Computer vision tasks such as object recognition are inherently prone to bias because they are an inverse problem, where unknowns are estimated using partial information. Probabilistic models are used to help disambiguate between potential solutions, which introduces bias into the system.

The coded gaze, which is implanted into AI-driven computer vision platforms sees the world through the programmatic bias entered into it. Where computer vision platforms are built to detect different people, the implications of bias could result in incorrect identification of individuals. Historically, computer vision platforms have had issues detecting people of color and discriminating gender [Buolamwini, 2018],[Schwemmer et al., 2020],[Wang et al., 2022].

Computer vision problems often consist of one of the following research objectives:

  • Image Preprocessing: Involves formatting images prior to being used to train a model. This can include methods such as image warping (used for image scaling, translation, and rotation or may be used for correction of image distortion) and image de-noising (used for the retrieval of an unknown original image from a noisy one).

  • Image Clustering: Image clustering algorithms are used for classification of objects in an image when there is no classification training data, making is a type of unsupervised learning. These clustering methods find natural groups in feature space.

  • Image Classification: Image classification includes labeled training data and can produce stronger results than clustering. It is a type of supervised learning.

  • Boundary Detection: defines the boundary between two or more objects or locations in an image.

  • Object Detection: an object is identified and includes the use of feature analysis and neural networks.

  • Face Detection: detects the location of a face in an image and includes the use of feature analysis and neural networks.

How Computer Vision is Used#

Some examples of computer vision problems within a public health context are:

  • Medical Image Segmentation: Supervised medical image segmentation is used in order to perform edge detection and object detection to automate the identification of anatomical structure and regions of interest.

  • Health Monitoring: Computer vision classification algorithms may be applied on unlabeled facial scans for predicting early symptoms of infection and illness.

  • Early Diagnosis: A computer vision model may be built to detect cancerous cells in human-annotated tissue image samples.

Computer Vision Methods#

Bias resulting in misidentification of biometric attributes such as gender, age, and ethnicity carries potential for great consequences. Therefore, common methods in object detection are emphasized in more detail below:

  • Object Recognition is a type of object detection. In object detection, instances of an object are identified in a series of images. In object recognition, the object itself is identified. Control for bias in object recognition is of extreme pertinence because misidentifying objects carries potential for great consequences. [Waithe et al., 2020]

  • Haar Cascades is a method of object detection that uses a collection of positive and negative samples to identify the object. Positive samples include the object and negative samples do not.

  • Face Recognition Facial recognition is a type of object detection in which faces are recognized. Control for bias in facial recognition is of extreme pertinence because misidentifying faces carries potential for great consequences. [Libby and Ehrenfeld, 2021]

  • Feature Analysis is a method of facial recognition in which individual features are used to identify the individual. Feature examples include distance between the eyes, length of the chin, tone of the lips, etc.

  • Convolutional Neural Networks (CNNs) are artificial neural networks with convolution and pooling layers that identify features. They are multilayer perceptrons, meaning that each neuron in a layer is connected to all neurons in the adjacent layer and this makes them prone to overfitting. Regularization curbs this effect. We suggest taking special care when choosing a regularizer, testing multiple options before choosing the optimal regularizer.

  • Adaptive Neuro-Fuzzy Interference System (ANFIS) is a type of artificial neural network that takes advantage of fuzzy logic principles and is reported to show high accuracy in facial recognition.

  • Eigenfaces is a method of facial recognition that uses dimensionality reduction of images. The eigenfaces form the basis of images used to construct the covariance matrix for a vector space of images. This basis may be used to reconstruct any of the original images in the set.

  • Fischerfaces is an improvement upon the eigenfaces method that can interpolate and extrapolate over lighting and facial expressions.

  • Thermal Face Recognition models learn the unique temperature patterns of the face. The ability of these models to extract features does not depend on the presence of makeup or glasses.

The table below contains an overview of common methods in computer vision. More experienced readers may want to jump ahead to the next section on health equity considerations, which includes health equity challenges with computer vision and a more in depth case study.

Health Equity Considerations#

Below are definitions of common sources of bias in computer vision and descriptions on how to mitigate these biases in the context of public health. For a broader review of bias in machine learning, please see the section on Machine Learning, which includes lessons on supervised, unsupervised, and reinforcement learning.

Challenge

Challenge Description

Health Equity Example

Recommended Best Practice         

Measurement Bias

    Measurement bias occurs when the training data is not a good representation of the population or when there is a bias in the measurements made to create the data.

    You are writing a K-Nearest Neighbor algorithm that classifies efficacy of acne treatment based on demographic background using image data. With only 30 Latino persons included total, incorrect labeling of ten people of the Latino population results in measurement bias.

    To mitigate measurement bias the training data must be of good quality and the sample size of training data must be large. Additionally, care must be taken to ensure that the images used in the training data are of similar quality to the test images.

Selection Bias

    Selection bias is a type of measurement bias that occurs when the training dataset is inherently not representative of the population.

    You are building a computer vision platform that identifies cancer cells in an in vitro image of a tissue sample. You train your model on skin cancer cells and are asked to identify cancer cells in a brain slice. Because you trained your model on a different type of cell, your model might be subject to selection bias.

    To mitigate selection bias we must have define the information that is pertinent to our population and make sure that our training dataset encompasses all the traits of interest.

Recall Bias

    Recall bias is a type of measurement bias that results from inconsistent labelling of your training dataset. It is also the bias that results from failure to recall events accurately.

    You are building a model that is trained to identify early signs of Covid-19 in video feed of people entering a building. To build your model, you train it with images of people who are healthy or sick. If some of the healthy people in your training dataset were actually sick, this would result in recall bias.

    To mitigate recall bias it is encouraged to validate the labels by having multiple people assign labels for each image. If there is data in the training dataset that does not obviously fall under one of the labels, it should be removed. Cross-validation is a common method to prevent recall bias.

Confirmation Bias

    Confirmation bias is the tendency of people to see what they expected to see. In terms of computer vision, it can apply both to incorrect labeling of the training set and to analysis of the results. It is a type of measurement bias.

    You are labeling images of people classifying individuals by disease status: healthy or sick. You happen to know that there is an outbreak in a certain region. When classifying people from that region you increase the number of sick labels for healthy individuals.

    To mitigate confirmation bias it is encouraged to validate labels by having multiple people assign labels for each image. A blind approach is recommended, where the person labeling the training dataset does not know about the people nor the reasons why the labels were chosen.

Demographic Bias

    Demographic bias is any bias in your model that skews the demographics of your results. This term has been created for the emphasis that it requires in the field of computer vision because of the great potential of any demographic bias in computer vision platforms to impact society.

    You are classifying flu status by demographic background using face image data. You find that your algorithm is much worse at identifying black women compared to white men.

    You must take remove the source of bias and reanalyze your results. You must take care to ensure that a large enough sample size exists for persons of each demographic background. You may choose to normalize your data or add a regularizer.

Racial Bias

    Racial bias is the skewing of racial representation in your results. It is a type of demographic bias.

    In a comparison of different computer vision programs, Joy Buolamwini found that every program had higher accuracy in lighter-skinned individuals, with the error between lighter and darker skin ranging from 11.8% to 19.2% and that of the faces incorrectly gendered, 93.6% of them were dark-skinned (1).

    To mitigate racial bias the sample size of the training dataset must be demographically representative of the population. It is important to include images of people that are demographically representative in terms of the variables that are important for your model. If you want to create a computer vision model that identifies disease, you must include the same number of images of people of each race in each potential disease state.

Gender Bias

    Gender bias is the skewing of gender representation in your results. It is a type of demographic bias.

    One research group found that a popular computer vision platform labelled a congresswoman as a “smiling” “television presenter” with “black hair,” whereas a very similar photograph of a male senator was labeled as an “official,” “businessperson,” and “spokesperson.”

    To mitigate gender bias the sample size of the training dataset must be representative of all genders in the population. It is important to include images of people that are representative of gender in terms of the variables that are important for your model. If you want to create a computer vision model that identifies types of workers in a medical facility, you must include the same number of images of people of each gender working each position.

Overfitting and Underfitting

    Underfitting occurs when there is not enough data for the model to capture the underlying trend and can result in a lot of bias. In overfitting the dataset is too large and your algorithm starts learning rules based on noise or incorrect labels.

    You obtain map data, showing the location of flu occurrence in North America for different ethnicities. You decide to build a K-Means Clustering algorithm to classify the proximity of flu outbreak to the nearest city based on ethnicity. You find that your algorithm performs really well on your training data, but when you try to predict flu locations for the following year your results are not great. You realize you may have run into overfitting.

    We mitigate underfitting by including a large sample size. Beware, that as you increase your sample size you run into the chance of overfitting, where your algorithm starts learning rules based on noise or incorrect labels. It is important to scrutinize your results to choose the appropriate sample size. To avoid overfitting we can use cross-validation, where we split our training dataset into k small segments and use each these segments as training datasets. We can also add features to avoid overfitting and remove features to avoid underfitting. A regularizer, which is added to constrain the model can also be used to avoid overfitting, but may also introduce bias into your model.

Outliers and Exclusion

    Outliers are cases that diverge from the average by some predetermined amount and may be removed. When we remove outliers we must take care to avoid exclusion bias, which is underrepresentation of removed features.

    You obtain map data, showing the location of flu occurrence in North America for different ethnicities. You decide to build a classification algorithm to classify the proximity of flu outbreak to the nearest city based on ethnicity. You remove outliers from your dataset and then find out that 90% of your removed outliers were of Latino background. This is an example of exclusion and racial bias.

    If you find that outlier removal systematically shifts your results you must redesign your model. In our case, where 90% of the outliers are Latino suggests that the Latino population lives at a distance that is cut off by our outlier definition. In this case, analyzing results for the Latino population separately might be the solution. Or, one may open the city epicenter to include the region where the Latino population lives. The outlier cut-off may be lowered to include this population.

Case Study Example#

Case study is for illustrative purposes and does not represent a specific study from the literature.

Scenario: A researcher wants to explore the spread of influenza in the workplace with the help of a computer vision platform.

Specific Model Objective: Build a computer vision platform based on a labeled image dataset composed of masked facial recognition data to support downstream modeling in order to analyze the effect of demographic background on the spread of influenza.

Data Source: Masked facial recognition data coupled with thermal screenings was obtained for 5000 people of known vaccine status from 50 corporate office sites, one in each state. Racial distribution of participants in the training dataset were as follows: black (36%), white (44%), latino (18%), and asian (2%).

Analytic Method: A Convolutional Neural Network (CNN) was trained to perform facial recognition on masked participants.

Results: Leveraging the CNN model, a predictive model achieved an overall accuracy of 96% for racial classification and overall accuracy of 92% for health monitoring (sick vs healthy status).

Health Equity Considerations: Computer vision platforms used for facial recognition should be able to correctly identify demographic data, including people of all races equally. The failure to recognize some races may skew results and therefore public health recommendations or policies and lead to disparities. Below are additional considerations for this study:

  • There is an imbalance in the training data that leads to a large false positive rate for asian people. The computer vision platform for this study was written to recognize people wearing masks using a training dataset that consisted of only 2% asian participants. This is indicative of demographic bias as the algorithm has learned to classify black and white people with much higher accuracy.

    • Effects of this bias means that researchers may conclude that influenza spread is lower among the asian population than what it is in reality since fever measurement data is not properly tracked.

    • Facial recognition technology can produce racially biased results if the underlying data used is not representative of the community being served. One way to mitigate this bias is to ensure the training dataset has a large enough sample size with an acceptable minimum number of samples representing each possible race in the dataset.

  • In the analysis, outliers were defined as samples lying outside of three standard deviations from the mean. Post-analysis, it was observed that all of the outliers removed were of a specific minority population. By removing them they have introduced demographic bias into their system. To mitigate this bias, consider redefining the outlier boundary or develop a separate models for different populations.

  • When tested on unseen individuals, it was found that recall bias was present in predicting healthy vs infected individuals. This indicates that fever readings may have been inaccurate and the training data and improperly skewed toward a sick population with mislabeled data. Additional information such as a survey of how participants are feeling or PCR test could be useful in applying correct labels.

  • Finally, researcher may also want to carefully consider the metric used during model training and evaluation. For example, one might choose a specific test to reduce misidentification of participant race (reduce false positives). On the other hand, one might prioritize a sensitive test to detect fever/infection at the expense of misclassifying an individual as ill when they are in fact healthy (more false positives).

Considerations for Project Planning

  • Is your data set diverse in terms of geographic and demographic attributes? How have you identified and characterized any outliers?
  • Does your data have training labels and how have you validated that the annotations are accurate?
  • How have you validated your model's performance in order to mitigate bias? Does it show signs of overfitting? Does it perform better for certain groups over others?