Articles | Journal of Emerging Investigators

Evaluating the predicted eruption times of geysers in Yellowstone National Park

Rhee et al. | Jun 25, 2024

The authors compare the predicted versus actual geyser eruption times for the Old Faithful and Beehive Geysers at Yellowstone National Park.

An explainable model for content moderation

Cao et al. | Aug 16, 2023

The authors looked at the ability of machine learning algorithms to interpret language given their increasing use in moderating content on social media. Using an explainable model they were able to achieve 81% accuracy in detecting fake vs. real news based on language of posts alone.

An analysis of junior rower performance and how it is affected by rower's features

Biller et al. | Jan 07, 2022

In this study, with consideration for the increasing participation of high school students in indoor rowing, the authors analyzed World Indoor Rowing Championship data. Statistical analysis revealed two key features that can determine the performance of a rower as well as increasing competitiveness in nearly all categories considered. They conclude by offering a 2000-meter ergometer time distribution that can help junior rowers assess their current performance relative to the world competition.

Using economic indicators to create an empirical model of inflation

Kasera et al. | Dec 01, 2022

Here, seeking to understand the correlation of 50 of the most important economic indicators with inflation, the authors used a rolling linear regression to identify indicators with the most significant correlation with the Month over Month Consumer Price Index Seasonally Adjusted (CPI). Ultimately the concluded that the average gasoline price, U.S. import price index, and 5-year market expected inflation had the most significant correlation with the CPI.

Predicting smoking status based on RNA sequencing data

Yang et al. | Aug 30, 2024

Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.

Predicting college retention rates from Google Street View images of campuses

Dileep et al. | Jan 02, 2024

Every year, around 40% of undergraduate students in the United States discontinue their studies, resulting in a loss of valuable education for students and a loss of money for colleges. Even so, colleges across the nation struggle to discover the underlying causes of these high dropout rates. In this paper, the authors discuss the use of machine learning to find correlations between the built environment factors and the retention rates of colleges. They hypothesized that one way for colleges to improve their retention rates could be to improve the physical characteristics of their campus to be more pleasing. The authors used image classification techniques to look at images of colleges and correlate certain features like colors, cars, and people to higher or lower retention rates. With three possible options of high, medium, and low retention rates, the probability that their models reached the right conclusion if they simply chose randomly was 33%. After finding that this 33%, or 0.33 mark, always fell outside of the 99% confidence intervals built around their models’ accuracies, the authors concluded that their machine learning techniques can be used to find correlations between certain environmental factors and retention rates.

Predicting the factors involved in orthopedic patient hospital stay

D’Souza et al. | Dec 13, 2023

Long hospital stays can be stressful for the patient for many reasons. We hypothesized that age would be the greatest predictor of hospital stay among patients who underwent orthopedic surgery. Through our models, we found that severity of illness was indeed the highest factor that contributed to determining patient length of stay. The other two factors that followed were the facility that the patient was staying in and the type of procedure that they underwent.

Predicting the Instance of Breast Cancer within Patients using a Convolutional Neural Network

Adhikesaven et al. | Oct 05, 2022

Using a convolution neural network, these authors show machine learning can clinically diagnose breast cancer with high accuracy.

Predicting asthma-related emergency department visits and hospitalizations with machine learning techniques

Chatterjee et al. | Oct 25, 2021

Seeking to investigate the effects of ambient pollutants on human respiratory health, here the authors used machine learning to examine asthma in Lost Angeles County, an area with substantial pollution. By using machine learning models and classification techniques, the authors identified that nitrogen dioxide and ozone levels were significantly correlated with asthma hospitalizations. Based on an identified seasonal surge in asthma hospitalizations, the authors suggest future directions to improve machine learning modeling to investigate these relationships.

Predicting Orbital Resonance of 2867 Šteins Using the Yarkovsky Effect

Rosenberg et al. | Jan 26, 2021

In this study, the impact of thermal effects on the orbit of an asteroid is investigated. This included determining if the asteroid's orbit would push into a region devoid of asteroids due to the gravitational pull of Jupiter.

Browse Articles

Evaluating the predicted eruption times of geysers in Yellowstone National Park

An explainable model for content moderation

An analysis of junior rower performance and how it is affected by rower's features

Using economic indicators to create an empirical model of inflation

Predicting smoking status based on RNA sequencing data

Predicting college retention rates from Google Street View images of campuses

Predicting the factors involved in orthopedic patient hospital stay

Predicting the Instance of Breast Cancer within Patients using a Convolutional Neural Network

Predicting asthma-related emergency department visits and hospitalizations with machine learning techniques

Predicting Orbital Resonance of 2867 Šteins Using the Yarkovsky Effect

Search Articles

Popular Tags

Browse Articles

Search Articles

Category

School Level

Popular Tags