Articles | Journal of Emerging Investigators

Model selection and optimization for poverty prediction on household data from Cambodia

Wong et al. | Sep 29, 2023

Here the authors sought to use three machine learning models to predict poverty levels in Cambodia based on available household data. They found teat multilayer perceptron outperformed the other models, with an accuracy of 87 %. They suggest that data-driven approaches such as these could be used more effectively target and alleviate poverty.

Assessing and Improving Machine Learning Model Predictions of Polymer Glass Transition Temperatures

Ramprasad et al. | Mar 18, 2020

In this study, the authors test whether providing a larger dataset of glass transition temperatures (T_g) to train the machine-learning platform Polymer Genome would improve its accuracy. Polymer Genome is a machine learning based data-driven informatics platform for polymer property prediction and T_g is one property needed to design new polymers in silico. They found that training the model with their larger, curated dataset improved the algorithm's T_g, providing valuable improvements to this useful platform.

Comparative study of machine learning models for water potability prediction

Lee et al. | Mar 31, 2025

The global issue of water quality has led to the use of machine learning models, like ANN and SVM, to predict water potability. However, these models can be complex and resource-intensive. This research aimed to find a simpler, more efficient model for water quality prediction.

A comparative analysis of machine learning approaches for prediction of breast cancer

Nag et al. | May 11, 2021

Machine learning and deep learning techniques can be used to predict the early onset of breast cancer. The main objective of this analysis was to determine whether machine learning algorithms can be used to predict the onset of breast cancer with more than 90% accuracy. Based on research with supervised machine learning algorithms, Gaussian Naïve Bayes, K Nearest Algorithm, Random Forest, and Logistic Regression were considered because they offer a wide variety of classification methods and also provide high accuracy and performance. We hypothesized that all these algorithms would provide accurate results, and Random Forest and Logistic Regression would provide better accuracy and performance than Naïve Bayes and K Nearest Neighbor.

Exploring the effects of diverse historical stock price data on the accuracy of stock price prediction models

Girma et al. | Sep 24, 2024

Algorithmic trading has been increasingly used by Americans. In this work, we tested whether including the opening, closing, and highest prices in three supervised learning models affected their performance. Indeed, we found that including all three prices decreased the error of the prediction significantly.

String analysis of exon 10 of the CFTR gene and the use of Bioinformatics in determination of the most accurate DNA indicator for CF prediction

Carroll et al. | Jul 12, 2020

Cystic fibrosis is a genetic disease caused by mutations in the CFTR gene. In this paper, the authors attempt to identify variations in stretches of up to 8 nucleotides in the protein-coding portions of the CFTR gene that are associated with disease development. This would allow screening of newborns or even fetuses in utero to determine the likelihood they develop cystic fibrosis.

Validating DTAPs with large language models: A novel approach to drug repurposing

Curtis et al. | Mar 02, 2025

Here, the authors investigated the integration of large language models (LLMs) with drug target affinity predictors (DTAPs) to improve drug repurposing, demonstrating a significant increase in prediction accuracy, particularly with GPT-4, for psychotropic drugs and the sigma-1 receptor. This novel approach offers to potentially accelerate and reduce the cost of drug discovery by efficiently identifying new therapeutic uses for existing drugs.

Monitoring drought using explainable statistical machine learning models

Cheung et al. | Oct 28, 2024

Droughts have a wide range of effects, from ecosystems failing and crops dying, to increased illness and decreased water quality. Drought prediction is important because it can help communities, businesses, and governments plan and prepare for these detrimental effects. This study predicts drought conditions by using predictable weather patterns in machine learning models.

Using broad health-related survey questions to predict the presence of coronary heart disease

Chavda et al. | Aug 23, 2024

Coronary heart disease (CHD) is the leading cause of death in the U.S., responsible for nearly 700,000 deaths in 2021, and is marked by artery clogging that can lead to heart attacks. Traditional prediction methods require expensive clinical tests, but a new study explores using machine learning on demographic, clinical, and behavioral survey data to predict CHD.

Contrasting role of ASCC3 and ALKBH3 in determining genomic alterations in Glioblastoma Multiforme

Sriram et al. | Sep 27, 2022

Contrasting role of <i>ASCC3</i> and <i>ALKBH3</i> in determining genomic alterations in Glioblastoma Multiforme

Glioblastoma Multiforme (GBM) is the most malignant brain tumor with the highest fraction of genome alterations (FGA), manifesting poor disease-free status (DFS) and overall survival (OS). We explored The Cancer Genome Atlas (TCGA) and cBioportal public dataset- Firehose legacy GBM to study DNA repair genes Activating Signal Cointegrator 1 Complex Subunit 3 (ASCC3) and Alpha-Ketoglutarate-Dependent Dioxygenase AlkB Homolog 3 (ALKBH3). To test our hypothesis that these genes have correlations with FGA and can better determine prognosis and survival, we sorted the dataset to arrive at 254 patients. Analyzing using RStudio, both ASCC3 and ALKBH3 demonstrated hypomethylation in 82.3% and 61.8% of patients, respectively. Interestingly, low mRNA expression was observed in both these genes. We further conducted correlation tests between both methylation and mRNA expression of these genes with FGA. ASCC3 was found to be negatively correlated, while ALKBH3 was found to be positively correlated, potentially indicating contrasting dysregulation of these two genes. Prognostic analysis showed the following: ASCC3 hypomethylation is significant with DFS and high ASCC3 mRNA expression to be significant with OS, demonstrating ASCC3’s potential as disease prediction marker.

Browse Articles

Model selection and optimization for poverty prediction on household data from Cambodia

Assessing and Improving Machine Learning Model Predictions of Polymer Glass Transition Temperatures

Comparative study of machine learning models for water potability prediction

A comparative analysis of machine learning approaches for prediction of breast cancer

Exploring the effects of diverse historical stock price data on the accuracy of stock price prediction models

String analysis of exon 10 of the CFTR gene and the use of Bioinformatics in determination of the most accurate DNA indicator for CF prediction

Validating DTAPs with large language models: A novel approach to drug repurposing

Monitoring drought using explainable statistical machine learning models

Using broad health-related survey questions to predict the presence of coronary heart disease

Contrasting role of ASCC3 and ALKBH3 in determining genomic alterations in Glioblastoma Multiforme

Search Articles

Popular Tags

Browse Articles

Search Articles

Category

School Level

Popular Tags