Articles | Journal of Emerging Investigators

Machine Learning Algorithm Using Logistic Regression and an Artificial Neural Network (ANN) for Early Stage Detection of Parkinson’s Disease

Kar et al. | Oct 10, 2020

Despite the prevalence of PD, diagnosing PD is expensive, requires specialized testing, and is often inaccurate. Moreover, diagnosis is often made late in the disease course when treatments are less effective. Using existing voice data from patients with PD and healthy controls, the authors created and trained two different algorithms: one using logistic regression and another employing an artificial neural network (ANN).

Comparison of the ease of use and accuracy of two machine learning algorithms – forestry case study

Bhatia et al. | Mar 21, 2021

Machine learning algorithms are becoming increasingly popular for data crunching across a vast area of scientific disciplines. Here, the authors compare two machine learning algorithms with respect to accuracy and user-friendliness and find that random forest algorithms outperform logistic regression when applied to the same dataset.

Predicting smoking status based on RNA sequencing data

Yang et al. | Aug 30, 2024

Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.

A comparative analysis of machine learning approaches for prediction of breast cancer

Nag et al. | May 11, 2021

Machine learning and deep learning techniques can be used to predict the early onset of breast cancer. The main objective of this analysis was to determine whether machine learning algorithms can be used to predict the onset of breast cancer with more than 90% accuracy. Based on research with supervised machine learning algorithms, Gaussian Naïve Bayes, K Nearest Algorithm, Random Forest, and Logistic Regression were considered because they offer a wide variety of classification methods and also provide high accuracy and performance. We hypothesized that all these algorithms would provide accurate results, and Random Forest and Logistic Regression would provide better accuracy and performance than Naïve Bayes and K Nearest Neighbor.

Comparative study of machine learning models for water potability prediction

Lee et al. | Mar 31, 2025

The global issue of water quality has led to the use of machine learning models, like ANN and SVM, to predict water potability. However, these models can be complex and resource-intensive. This research aimed to find a simpler, more efficient model for water quality prediction.

Who controls U.S. politics? An analysis of major political endorsements in U.S. midterm elections

Huang et al. | Jul 07, 2023

The authors analyze political endorsement patterns and impacts from the 2018 and 2020 midterm elections and find that such endorsements may be predictable based on the ideological and demographic factors of the endorser.

Uncovering the hidden trafficking trade with geographic data and natural language processing

Aqid et al. | Oct 14, 2024

The authors use machine learning to develop an evidence-based detection tool for identifying human trafficking.

The utilization of Artificial Intelligence in enabling the early detection of brain tumors

Haider et al. | Feb 05, 2025

AI analysis of brain scans offers promise for helping doctors diagnose brain tumors. Haider and Drosis explore this field by developing machine learning models that classify brain scans as "cancer" or "non-cancer" diagnoses.

Epileptic seizure detection using machine learning on electroencephalogram data

Gokturk et al. | Sep 24, 2024

The authors use machine learning and electroencephalogram data to propose a method for improving epilepsy diagnosis.

Exploring differences in men’s marijuana consumption and cigarette smoking by race and citizenship status

Miriyala et al. | Sep 04, 2024

This study examined the relationship between citizenship status, racial background, and the use of marijuana and cigarettes among males in California using data from the 2017–2018 California Health Interview Survey. Findings indicated that non-citizens and naturalized citizens were less likely to use marijuana compared to US-born citizens, while Asian and Latino males were less likely to consume marijuana than White males. Additionally, various racial groups were more likely to smoke cigarettes compared to White males, suggesting that targeted health interventions based on citizenship status and race could be beneficial.

Browse Articles

Machine Learning Algorithm Using Logistic Regression and an Artificial Neural Network (ANN) for Early Stage Detection of Parkinson’s Disease

Comparison of the ease of use and accuracy of two machine learning algorithms – forestry case study

Predicting smoking status based on RNA sequencing data

A comparative analysis of machine learning approaches for prediction of breast cancer

Comparative study of machine learning models for water potability prediction

Who controls U.S. politics? An analysis of major political endorsements in U.S. midterm elections

Uncovering the hidden trafficking trade with geographic data and natural language processing

The utilization of Artificial Intelligence in enabling the early detection of brain tumors

Epileptic seizure detection using machine learning on electroencephalogram data

Exploring differences in men’s marijuana consumption and cigarette smoking by race and citizenship status

Search Articles

Popular Tags

Browse Articles

Search Articles

Category

School Level

Popular Tags