Section 2. Machine Learning#

Machine Learning (ML) is an approach to Artificial Intelligence (AI) in which a computer algorithm is developed to analyze data and make predictions or classifications based on what the algorithm has learned from the data [Artificial Intelligence (AI), n.d.].

Health Equity and Machine Learning

  • Machine learning algorithms can be used to identify populations who are at risk of health disparities, and to help identify and address the underlying social and environmental determinants of health that often lead to disparities. For example, machine learning can be used to identify patterns in health data that can help identify populations that may not be receiving adequate access to health care or resources.
  • Machine learning can be used to improve the accuracy and efficiency of identifying and addressing health disparities by making better predictions about how health services should be allocated and which interventions are likely to be most effective.
  • Machine learning can introduce powerful methods to identify useful patterns within public health data, however it is important to address the types of bias specific methods can introduce and the appropriate ways to test the methods

Types of Machine Learning#

Artificial Intelligence (AI) and Machine Learning (ML) have often been used interchangeably however, they are not exactly the same. We can think of artificial intelligence as enabling a computer program to simulate human intelligence through functions such as learning and problem-solving [Chandra, 2019]. A machine’s ability to learn and improve based on gained knowledge and experience is the core principle of AI. Machine learning is considered to be a subset of AI, as ML uses specific types of algorithms to perform learning and problem-solving tasks. Machine learning can be broken down into the following types:

Supervised learning models are defined by their use of labeled datasets to train algorithms to either predict which group an input belongs to (Classification) or predict/forecast a numeric value (Regression)[National Institute of Biomedical Imaging and Bioengineering, n.d.]. A labeled dataset refers to data that has been annotated with a label or class that the algorithm will be trained to predict.

Unsupervised learning models are defined by an ability to analyze and group unlabeled datasets. Unlabeled data refers to data that does not come with predefined group labels and can include photos, news articles, tweets, medical scans, and audio or video files. Unsupervised learning models can cluster observed data into partitions of distinct subgroups (Clustering), help quantify relationships between different associated variables (Association Rules), or help reduce the number of variables to the most significant for analysis (Dimensionality Reduction) [Carbonell et al., 1983].

Reinforcement learning models are defined by their trial and error structure where an agent learns how to achieve a desired goal through desired behaviors that are “rewarded” (Positive) and undesired behaviors that are “punished” (Negative). The model learns through repeated play and environmental feedback in order to maximize the cumulative reward of its actions [Ladosz et al., 2022].

Considerations for Project Planning

  • Are you using currently using machine learning? If so what methods have you been using?
  • How will you identify and handle bias in your data and model?
  • What techniques or strategies will you employ to enhance fairness and mitigate biases during your projects?

Resources#

SAS provides several practical resources and best practices for machine learning tasks in a concise format:

  1. SAS github link Best practices for machine learning

  2. Best Practices Quick Reference

  3. Algorithms Quick Reference