Browse Articles

Exploring the Factors that Drive Coffee Ratings

Agarwal et al. | May 19, 2025

Exploring the Factors that Drive Coffee Ratings

This study explores the factors that influence coffee quality ratings using data from the Coffee Quality Institute. Through a regression model based on gradient descent, the authors aimed to predict coffee ratings (total cup points) and hypothesized that sweetness and the coffee producer would be the most influential factors.

Read More...

Creating a drought prediction model using convolutional neural networks

Bora et al. | Oct 08, 2024

Creating a drought prediction model using convolutional neural networks
Image credit: The authors

Droughts kill over 45,000 people yearly and affect the livelihoods of 55 million others worldwide, with climate change likely to worsen these effects. However, unlike other natural disasters (hurricanes, etc.), there is no early detection system that can predict droughts far enough in advance to be useful. Bora, Caulkins, and Joycutty tackle this issue by creating a drought prediction model.

Read More...

Predicting smoking status based on RNA sequencing data

Yang et al. | Aug 30, 2024

Predicting smoking status based on RNA sequencing data
Image credit: Yang and Stanley 2024

Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.

Read More...