{ "cells": [ { "cell_type": "markdown", "id": "69520859", "metadata": {}, "source": [ "# Section 4. Computer Vision" ] }, { "cell_type": "markdown", "id": "fbf00d13", "metadata": {}, "source": [ "In this lesson, users will be given insight into how Computer Vision (CV) methods can impact health equity. This lesson includes an overview of common methods and research objectives, health equity challenges, and a more detailed case study." ] }, { "cell_type": "markdown", "id": "90e594a8", "metadata": {}, "source": [ "
\n", "

Health Equity and Computer Vision

\n", " \n", " \n", " \n", " \n", "
\n", "
    \n", "
  • More and more automated computer vision tasks have been integrated into public health such as medical image segmentation, health monitoring, and diagnosis, which is why mitigating bias inherent in these methods is so important.
  • \n", "
  • Computer vision methods may introduce new bias when using training data with an imbalanced class distribution. This is a common problem and leads to learning discriminating features that are biased toward the minority class.
  • \n", "
  • Computer vision has a history of poor performance when discriminating based on gender and race/ethnicity that can lead to health disparities when not properly managed.
  • \n", "
\n", "
\n", "
" ] }, { "cell_type": "markdown", "id": "ef89768f", "metadata": {}, "source": [ "## What is Computer Vision\n", "\n", "Computer Vision (CV) is the scientific field that gains insight into using computers for processing visual information to determine the shape, appearance, or identity of objects. Computer vision is used for tasks such as monitoring safety compliance (i.e. tracking workers in an area), biomolecular research, and diagnostics. Computer vision tasks such as object recognition are inherently prone to bias because they are an inverse problem, where unknowns are estimated using partial information. Probabilistic models are used to help disambiguate between potential solutions, which introduces bias into the system. \n", "\n", "The coded gaze, which is implanted into AI-driven computer vision platforms sees the world through the programmatic bias entered into it. Where computer vision platforms are built to detect different people, the implications of bias could result in incorrect identification of individuals. Historically, computer vision platforms have had issues detecting people of color and discriminating gender {cite:p}`Buolamwini2018GenderShades`,{cite:p}`Schwemmer2020GenderBias`,{cite:p}`Wang2022REVISE`.\n", "\n", "Computer vision problems often consist of one of the following research objectives:\n", "- **Image Preprocessing**: Involves formatting images prior to being used to train a model. This can include methods such as image warping (used for image scaling, translation, and rotation or may be used for correction of image distortion) and image de-noising (used for the retrieval of an unknown original image from a noisy one).\n", "- **Image Clustering**: Image clustering algorithms are used for classification of objects in an image when there is no classification training data, making is a type of [unsupervised learning](5-2-2.-unsupervised-learning.ipynb). These clustering methods find natural groups in feature space. \n", "- **Image Classification**: Image classification includes labeled training data and can produce stronger results than clustering. It is a type of [supervised learning](5-2-1.-supervised-learning.ipynb). \n", "- **Boundary Detection**: defines the boundary between two or more objects or locations in an image.\n", "- **Object Detection**: an object is identified and includes the use of feature analysis and neural networks.\n", "- **Face Detection**: detects the location of a face in an image and includes the use of feature analysis and neural networks.\n", "\n", "\n", "### How Computer Vision is Used\n", "\n", "Some examples of computer vision problems within a public health context are:\n", "- **Medical Image Segmentation**: Supervised medical image segmentation is used in order to perform edge detection and object detection to automate the identification of anatomical structure and regions of interest.\n", "- **Health Monitoring**: Computer vision classification algorithms may be applied on unlabeled facial scans for predicting early symptoms of infection and illness.\n", "- **Early Diagnosis**: A computer vision model may be built to detect cancerous cells in human-annotated tissue image samples.\n", "\n", "### Computer Vision Methods\n", "\n", "Bias resulting in misidentification of biometric attributes such as gender, age, and ethnicity carries potential for great consequences. Therefore, common methods in object detection are emphasized in more detail below:\n", "\n", "- **Object Recognition** is a type of object detection. In object detection, instances of an object are identified in a series of images. In object recognition, the object itself is identified. Control for bias in object recognition is of extreme pertinence because misidentifying objects carries potential for great consequences. {cite:p}`Waithe2020ObjectDetection`\n", "- **Haar Cascades** is a method of object detection that uses a collection of positive and negative samples to identify the object. Positive samples include the object and negative samples do not.\n", "- **Face Recognition** Facial recognition is a type of object detection in which faces are recognized. Control for bias in facial recognition is of extreme pertinence because misidentifying faces carries potential for great consequences. {cite:p}`Libby2021FaceRecognition`\n", "- **Feature Analysis** is a method of facial recognition in which individual features are used to identify the individual. Feature examples include distance between the eyes, length of the chin, tone of the lips, etc.\n", "- **Convolutional Neural Networks (CNNs)** are artificial neural networks with convolution and pooling layers that identify features. They are multilayer perceptrons, meaning that each neuron in a layer is connected to all neurons in the adjacent layer and this makes them prone to overfitting. Regularization curbs this effect. We suggest taking special care when choosing a regularizer, testing multiple options before choosing the optimal regularizer. \n", "- **Adaptive Neuro-Fuzzy Interference System (ANFIS)** is a type of artificial neural network that takes advantage of fuzzy logic principles and is reported to show high accuracy in facial recognition.\n", "- **Eigenfaces** is a method of facial recognition that uses dimensionality reduction of images. The eigenfaces form the basis of images used to construct the covariance matrix for a vector space of images. This basis may be used to reconstruct any of the original images in the set. \n", "- **Fischerfaces** is an improvement upon the eigenfaces method that can interpolate and extrapolate over lighting and facial expressions.\n", "- **Thermal Face Recognition** models learn the unique temperature patterns of the face. The ability of these models to extract features does not depend on the presence of makeup or glasses.\n", "\n", "The table below contains an overview of common methods in computer vision. More experienced readers may want to jump ahead to the next section on health equity considerations, which includes health equity challenges with computer vision and a more in depth case study." ] }, { "cell_type": "markdown", "id": "d9566d7d", "metadata": {}, "source": [ "```{admonition} If you are already familiar with computer vision methods, please continue to the next section. Otherwise click here.\n", ":class: dropdown\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ModelCommon UsageSuggested UsageSuggested ScaleInterpretabilityCommon Concerns         
Image De-noising
    Image Preprocessing
  • Suggested preprocessing for all images
  • Small to large datasetsHigh
  • Outliers
  • Standardization
  • Parameter tuning
  • Warping Images
      Image Preprocessing
  • For reversal of distortion
  • Image scaling, translation, and rotation, and morphing 2D or 3D transitions between images for film
  • Small to large datasetsLow
  • Outliers
  • Standardization
  • Parameter tuning
  • Content-based Image Retrieval
      Image Preprocessing
  • Image search in database using exemplar image or semantic search
  • Small to large datasetsHigh
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • K-Means Clustering
      Image Clustering
  • Clustering algorithm that separates a field of vectors into k clusters by their proximity to k centroids.
  • Image classification when there is no classification training data
  • Large datasets with multiple variables, but not too many variablesModerate
  • Overfitting and underfitting, depending on the number of clusters chosen
  • One may choose to remove centroids with few elements
  • Hierarchical Clustering
      Image Clustering
  • Agglomerative method- start with N clusters, where N is the number of observations and merge with each iteration
  • Divisive method- start with one cluster and subdivide with each iteration
  • Image classification when there is no classification training data
    • Large datasets
      Some publications suggest a minimum dataset size defined by N=2m, where m is the number of attributes
    Moderate
  • Overfitting and underfitting
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Spectral Clustering
      Image Clustering
  • Clustering method based on graph theory
  • Uses spectrum (eigenvalues) to cluster in a reduced number of dimensions
  • Image classification when there is no classification training data
  • Large datasets with too many dimensionsModerate
  • Overfitting and underfitting
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • K-Nearest Neighbors
      Image Classification
  • This algorithm classifies each datapoint by the proximity of the point to k closest neighbors in a training dataset
  • Image classification when there is classification training data.
  • Medium to large datasetsModerate
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Bayes Classifier
      Image Classification
  • This algorithm classifies each datapoint using Bayes theorem and so it assumes independence between the probability of events.
  • Bayes Theorem:
      P(y|X) = P(X|y)P(y)/P(X),
    where previous events are used to identify the probability of future events. P(y|X) is the probability of event y, given event X.
  • Image classification when there is classification training data.
  • Medium to large datasetsHigh
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Support Vector Machines
      Image Classification
  • This algorithm classifies datapoints by determining a hyperplane that maximizes distance between clusters
  • Image classification when there is classification training data.
  • Identifying objects
  • Medium to large datasetsModerate
  • Overfitting and underfitting
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Edge-Based Segmentation
      Image Segmentation / Boundary Detection
  • Selecting some pixels (2D) or voxels (3D) of image
  • Edge detection
  • Object Detection
  • Identify image edges using the pixels of the image.
  • Small to large datasetsLow
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Threshold-Based Segmentation
      Image Segmentation / Boundary Detection
  • Selecting some pixels (2D) or voxels (3D) of image
  • Regions are classified by threshold values for pixel properties, such as intensity or color.
  • Edge detection
  • Object Detection
  • Compares pixel intensity.
  • Small to large datasetsLow
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Region-Based Segmentation
      Image Segmentation / Boundary Detection
  • Selecting some pixels (2D) or voxels (3D) of image
  • Regions are be classified by rules connecting pixels exhibiting similar properties.
  • Edge detection
  • Object Detection
  • Locates groups of pixels by similarity to seed points.
  • Small to large datasetsLow
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Haar Cascades
      Object Recognition
  • Classification algorithm that analyses an image based on Haar wavelets as opposed to pixel intensity. A Haar wavelet is a sequence of rescaled "square-shaped" functions that form a wavelet basis.
  • Image classification when there is classification training data.
  • Identifying objects
  • Dimensionality reduction is often used as a preprocessing step in object recognition.
  • Large datasetsLow
  • Overfitting and underfitting
  • Missing values
  • Incorrect Labeling of training dataset
  • Training dataset being truly representative of the population
  • Feature Analysis
      Object Recognition / Facial Recognition
  • An approach to classification based on the notion that perception of features of objects and faces is variable and can be used to categorize objects.
  • Image classification when there is classification training data.
  • Identifying objects and people
  • Dimensionality reduction is often used as a preprocessing step in face recognition.
  • Large datasetsLow
  • Overfitting and underfitting
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Training dataset being truly representative of the population
  • Regularization
  • Convolutional Neural Networks (CNNs)
      Object Recognition / Facial Recognition
  • The convolutional neural network processes the pixels representing color in the original image and condenses parts of the image into 3D tensors, which are stacks of feature maps.
  • Image classification when there is classification training data.
  • Identifying objects and people
  • Dimensionality reduction is often used as a preprocessing step in face recognition.
  • Processing for skin tone should be comparable for all ethnicities {cite:p}`10.1001/jamadermatol.2021.3129`.
  • Large datasetsLow
  • Overfitting and underfitting
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Training dataset being truly representative of the population
  • Regularization
  • Adaptive Neuro-Fuzzy Interference System (ANFIS)
      Object Recognition / Facial Recognition
  • ANFIS combines neural networks with fuzzy logic, allowing nonlinear estimation.
  • Image classification when there is classification training data.
  • Identifying objects and people
  • Dimensionality reduction is often used as a preprocessing step in face recognition.
  • Processing for skin tone should be comparable for all ethnicities.
  • Large datasetsLow
  • Overfitting and underfitting
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Training dataset being truly representative of the population
  • Regularization
  • Eigenfaces
      Facial Recognition
  • An eigenface is the eigenvector resulting from dimensionality reduction of a collection of face images using principal component analysis.
  • Image classification when there is classification training data.
  • Identifying people
  • Dimensionality reduction is often used as a preprocessing step in face recognition.
  • Processing for skin tone should be comparable for all ethnicities.
  • Large datasetsLow
  • Overfitting and underfitting
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Training dataset being truly representative of the population
  • Regularization
  • Fischerfaces
      Facial Recognition
  • Fischerfaces is an eigenvector resulting from dimensionality reduction of a collection of face images using linear disciminant analysis.
  • Image classification when there is classification training data.
  • Identifying people
  • Dimensionality reduction is often used as a preprocessing step in face recognition.
  • Processing for skin tone should be comparable for all ethnicities.
  • Large datasetsLow
  • Overfitting and underfitting
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Training dataset being truly representative of the population
  • Regularization
  • Thermal Facial Recognition
      Facial Recognition
  • Face recognition using infrared images.
  • Image classification when there is classification training data.
  • Identifying people
  • Dimensionality reduction is often used as a preprocessing step in face recognition.
  • Processing for skin tone should be comparable for all ethnicities.
  • Large datasetsLow
  • Overfitting and underfitting
  • Missing values
  • Outliers
  • Standardization
  • Parameter tuning
  • Training dataset being truly representative of the population
  • Regularization
  • \n", "
    \n", "\n", "```" ] }, { "cell_type": "markdown", "id": "be0f2212", "metadata": {}, "source": [ "\n", "\n", "## Health Equity Considerations\n", "\n", "Below are definitions of common sources of bias in computer vision and descriptions on how to mitigate these biases in the context of public health. For a broader review of bias in machine learning, please see the section on [Machine Learning](5-2-0.-machine-learning.ipynb), which includes lessons on supervised, unsupervised, and reinforcement learning." ] }, { "cell_type": "markdown", "id": "23fba90d", "metadata": {}, "source": [ "| Challenge |Challenge Description | Health Equity Example | Recommended Best Practice          |\n", "|:-------- | :------------------------------------- |:------------------------------------------------ |:------------------------------------------------ |\n", "| **Measurement Bias** | |||\n", "| **Selection Bias** | |||\n", "| **Recall Bias** | |||\n", "| **Confirmation Bias** | |||\n", "| **Demographic Bias** | |||\n", "| **Racial Bias** | |||\n", "| **Gender Bias** | |||\n", "| **Overfitting and Underfitting** | |||\n", "| **Outliers and Exclusion** | |||" ] }, { "cell_type": "markdown", "id": "844eeef9", "metadata": {}, "source": [ "## Case Study Example\n", "\n", "Case study is for illustrative purposes and does not represent a specific study from the literature.\n", "\n", "**Scenario:**\n", "A researcher wants to explore the spread of influenza in the workplace with the help of a computer vision platform. \n", "\n", "**Specific Model Objective:**\n", "Build a computer vision platform based on a labeled image dataset composed of masked facial recognition data to support downstream modeling in order to analyze the effect of demographic background on the spread of influenza.\n", "\n", "**Data Source:** \n", "Masked facial recognition data coupled with thermal screenings was obtained for 5000 people of known vaccine status from 50 corporate office sites, one in each state. Racial distribution of participants in the training dataset were as follows: black (36%), white (44%), latino (18%), and asian (2%).\n", "\n", "**Analytic Method:**\n", "A Convolutional Neural Network (CNN) was trained to perform facial recognition on masked participants.\n", "\n", "**Results:**\n", "Leveraging the CNN model, a predictive model achieved an overall accuracy of 96% for racial classification and overall accuracy of 92% for health monitoring (sick vs healthy status).\n", "\n", "**Health Equity Considerations:**\n", "Computer vision platforms used for facial recognition should be able to correctly identify demographic data, including people of all races equally. The failure to recognize some races may skew results and therefore public health recommendations or policies and lead to disparities. Below are additional considerations for this study:\n", "* There is an imbalance in the training data that leads to a large false positive rate for asian people. The computer vision platform for this study was written to recognize people wearing masks using a training dataset that consisted of only 2% asian participants. This is indicative of [**demographic bias**](https://ieeexplore.ieee.org/document/8987334) as the algorithm has learned to classify black and white people with much higher accuracy.\n", " * Effects of this bias means that researchers may conclude that influenza spread is lower among the asian population than what it is in reality since fever measurement data is not properly tracked.\n", " * Facial recognition technology can produce racially biased results if the underlying data used is not representative of the community being served. One way to mitigate this bias is to ensure the training dataset has a large enough sample size with an acceptable minimum number of samples representing each possible race in the dataset.\n", "* In the analysis, [**outliers**](https://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm) were defined as samples lying outside of three standard deviations from the mean. Post-analysis, it was observed that all of the outliers removed were of a specific minority population. By removing them they have introduced demographic bias into their system. To mitigate this bias, consider redefining the outlier boundary or develop a separate models for different populations.\n", "* When tested on unseen individuals, it was found that [**recall bias**](https://pubmed.ncbi.nlm.nih.gov/2319285/) was present in predicting healthy vs infected individuals. This indicates that fever readings may have been inaccurate and the training data and improperly skewed toward a sick population with mislabeled data. Additional information such as a survey of how participants are feeling or PCR test could be useful in applying correct labels.\n", "* Finally, researcher may also want to carefully consider the metric used during model training and evaluation. For example, one might choose a specific test to reduce misidentification of participant race (reduce false positives). On the other hand, one might prioritize a sensitive test to detect fever/infection at the expense of misclassifying an individual as ill when they are in fact healthy (more false positives).\n" ] }, { "cell_type": "markdown", "id": "3aa411b6", "metadata": {}, "source": [ "
    \n", "

    Considerations for Project Planning

    \n", "\n", "
    \n", "
      \n", "
    • Is your data set diverse in terms of geographic and demographic attributes? How have you identified and characterized any outliers?
    • \n", "
    • Does your data have training labels and how have you validated that the annotations are accurate?
    • \n", "
    • How have you validated your model's performance in order to mitigate bias? Does it show signs of overfitting? Does it perform better for certain groups over others?
    • \n", "
    \n", "
    " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.17" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }