Articles | Journal of Emerging Investigators

Comparison of the ease of use and accuracy of two machine learning algorithms – forestry case study

Bhatia et al. | Mar 21, 2021

Machine learning algorithms are becoming increasingly popular for data crunching across a vast area of scientific disciplines. Here, the authors compare two machine learning algorithms with respect to accuracy and user-friendliness and find that random forest algorithms outperform logistic regression when applied to the same dataset.

Predicting smoking status based on RNA sequencing data

Yang et al. | Aug 30, 2024

Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.

A comparative analysis of machine learning approaches for prediction of breast cancer

Nag et al. | May 11, 2021

Machine learning and deep learning techniques can be used to predict the early onset of breast cancer. The main objective of this analysis was to determine whether machine learning algorithms can be used to predict the onset of breast cancer with more than 90% accuracy. Based on research with supervised machine learning algorithms, Gaussian Naïve Bayes, K Nearest Algorithm, Random Forest, and Logistic Regression were considered because they offer a wide variety of classification methods and also provide high accuracy and performance. We hypothesized that all these algorithms would provide accurate results, and Random Forest and Logistic Regression would provide better accuracy and performance than Naïve Bayes and K Nearest Neighbor.

The Effect of Various Preparation Methods on the Spoilage Rate of Roma Tomatoes (Solanum lycopersicum)

Cataltepe et al. | Feb 22, 2018

As levels of food waste continue to rise, it is essential to find improved techniques of prolonging the shelf life of produce. The authors aimed to find a simple, yet effective, method of slowing down spoilage in tomatoes. Linear regression analysis revealed that the tomatoes soaked salt water and not dried displayed the lowest correlation between time and spoilage, confirming that this preparation was the most effective.

Understanding the battleground of identity fraud

Basu et al. | Oct 09, 2024

The authors looked at variables associated with identity fraud in the US. They found that national unemployment rate and online banking usage are among significant variables that explain identity fraud.

Using two-step machine learning to predict harmful algal bloom risk

Shukla et al. | Jul 04, 2025

Using machine learning to predict the risk of algae bloom

The effect of COVID-19 on the USA house market

Xiao et al. | Nov 19, 2022

COVID-19 has impacted the way many people go about their daily lives, but what are the main factors driving the changes in the housing market, particular house prices?

Determining the relationship between unemployment and minimum wage in Turkey

Yalçın et al. | Sep 19, 2024

The authors looked at the relationship between unemployment and minimum wage in Turkey (Türkiye). They found that there is a positive correlation between minimum wage and unemployment.

Uncovering the hidden trafficking trade with geographic data and natural language processing

Aqid et al. | Oct 14, 2024

The authors use machine learning to develop an evidence-based detection tool for identifying human trafficking.

The influence of economic factors on United States household energy consumption in 2020

Ramanathan et al. | Jun 08, 2026

This study used machine learning models to examine which factors most influenced U.S. household energy consumption in 2020 using data from 18,496 households.

Browse Articles

Comparison of the ease of use and accuracy of two machine learning algorithms – forestry case study

Predicting smoking status based on RNA sequencing data

A comparative analysis of machine learning approaches for prediction of breast cancer

The Effect of Various Preparation Methods on the Spoilage Rate of Roma Tomatoes (Solanum lycopersicum)

Understanding the battleground of identity fraud

Using two-step machine learning to predict harmful algal bloom risk

The effect of COVID-19 on the USA house market

Determining the relationship between unemployment and minimum wage in Turkey

Uncovering the hidden trafficking trade with geographic data and natural language processing

The influence of economic factors on United States household energy consumption in 2020

Search Articles

Popular Tags

Browse Articles

Search Articles

Category

School Level

Popular Tags