Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.
Read More...Browse Articles
Predicting college retention rates from Google Street View images of campuses
Every year, around 40% of undergraduate students in the United States discontinue their studies, resulting in a loss of valuable education for students and a loss of money for colleges. Even so, colleges across the nation struggle to discover the underlying causes of these high dropout rates. In this paper, the authors discuss the use of machine learning to find correlations between the built environment factors and the retention rates of colleges. They hypothesized that one way for colleges to improve their retention rates could be to improve the physical characteristics of their campus to be more pleasing. The authors used image classification techniques to look at images of colleges and correlate certain features like colors, cars, and people to higher or lower retention rates. With three possible options of high, medium, and low retention rates, the probability that their models reached the right conclusion if they simply chose randomly was 33%. After finding that this 33%, or 0.33 mark, always fell outside of the 99% confidence intervals built around their models’ accuracies, the authors concluded that their machine learning techniques can be used to find correlations between certain environmental factors and retention rates.
Read More...A novel approach for predicting Alzheimer’s disease using machine learning on DNA methylation in blood
Here, recognizing the difficulty associated with tracking the progression of dementia, the authors used machine learning models to predict between the presence of cognitive normalcy, mild cognitive impairment, and Alzheimer's Disease, based on blood DNA methylation levels, sex, and age. With four machine learning models and two dataset dimensionality reduction methods they achieved an accuracy of 53.33%.
Read More...A novel encoding technique to improve non-weather-based models for solar photovoltaic forecasting
Several studies have applied different machine learning (ML) techniques to the area of forecasting solar photovoltaic power production. Most of these studies use weather data as inputs to predict power production; however, there are numerous practical issues with the procurement of this data. This study proposes models that do not use weather data as inputs, but rather use past power production data as a more practical substitute to weather-based models. Our proposed models demonstrate a better, cheaper, and more reliable alternatives to current weather models.
Read More...Temperature and Precipitation Responses to a Stratospheric Aerosol Geoengineering Experiment Using the Community Climate System Model 4
We are changing our environment with steadily increasing carbon dioxide emissions, but we might be able to help. The authors here use a computer program called Community Climate System Model 4 to predict the effects of spraying small particles into the atmosphere to reflect away some of the sun's rays. The software predicts that this could reduce the amount of energy the Earth's atmosphere absorbs and may limit but will not completely counteract our carbon dioxide production.
Read More...Predicting Orbital Resonance of 2867 Šteins Using the Yarkovsky Effect
In this study, the impact of thermal effects on the orbit of an asteroid is investigated. This included determining if the asteroid's orbit would push into a region devoid of asteroids due to the gravitational pull of Jupiter.
Read More...Predicting the Instance of Breast Cancer within Patients using a Convolutional Neural Network
Using a convolution neural network, these authors show machine learning can clinically diagnose breast cancer with high accuracy.
Read More...Population Forecasting by Population Growth Models based on MATLAB Simulation
In this work, the authors investigate the accuracy with which two different population growth models can predict population growth over time. They apply the Malthusian law or Logistic law to US population from 1951 until 2019. To assess how closely the growth model fits actual population data, a least-squared curve fit was applied and revealed that the Logistic law of population growth resulted in smaller sum of squared residuals. These findings are important for ensuring optimal population growth models are implemented to data as population forecasting affects a country's economic and social structure.
Read More...A Novel Model to Predict a Book's Success in the New York Times Best Sellers List
In this article, the authors identify the characteristics that make a book a best-seller. Knowing what, besides content, predicts the success of a book can help publishers maximize the success of their print products.
Read More...Genetic algorithm based features selection for predicting the unemployment rate of India
The authors looked at using genetic algorithms to look at the Indian labor market and what features might best explain any variation seen. They found that features such as economic growth and household consumption, among others, best explained variation.
Read More...