Browse Articles

Predicting smoking status based on RNA sequencing data

Yang et al. | Aug 30, 2024

Predicting smoking status based on RNA sequencing data
Image credit: Yang and Stanley 2024

Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.

Read More...

Predicting baseball pitcher efficacy using physical pitch characteristics

Oberoi et al. | Jan 11, 2024

Predicting baseball pitcher efficacy using physical pitch characteristics
Image credit: Antoine Schibler

Here, the authors sought to develop a new metric to evaluate the efficacy of baseball pitchers using machine learning models. They found that the frequency of balls, was the most predictive feature for their walks/hits allowed per inning (WHIP) metric. While their machine learning models did not identify a defining trait, such as high velocity, spin rate, or types of pitches, they found that consistently pitching within the strike zone resulted in significantly lower WHIPs.

Read More...

Determining viability of image processing models for forensic analysis of hair for related individuals

Wang et al. | Feb 04, 2025

Determining viability of image processing models for forensic analysis of hair for related individuals
Image credit: Taylor Smith

Here, the authors used machine learning to analyze microscopic images of hair, quantifying various features to distinguish individuals, even within families where traditional DNA analysis is limited. The Discriminant Analysis (DA) model achieved the highest accuracy (88.89%) in identifying individuals, demonstrating its potential to improve the reliability of hair evidence in forensic investigations.

Read More...

Groundwater prediction using artificial intelligence: Case study for Texas aquifers

Sharma et al. | Apr 19, 2024

Groundwater prediction using artificial intelligence: Case study for Texas aquifers

Here, in an effort to develop a model to predict future groundwater levels, the authors tested a tree-based automated artificial intelligence (AI) model against other methods. Through their analysis they found that groundwater levels in Texas aquifers are down significantly, and found that tree-based AI models most accurately predicted future levels.

Read More...