Browse Articles

Similarity Graph-Based Semi-supervised Methods for Multiclass Data Classification

Balaji et al. | Sep 11, 2021

Similarity Graph-Based Semi-supervised Methods for Multiclass Data Classification

The purpose of the study was to determine whether graph-based machine learning techniques, which have increased prevalence in the last few years, can accurately classify data into one of many clusters, while requiring less labeled training data and parameter tuning as opposed to traditional machine learning algorithms. The results determined that the accuracy of graph-based and traditional classification algorithms depends directly upon the number of features of each dataset, the number of classes in each dataset, and the amount of labeled training data used.

Read More...

Predicting smoking status based on RNA sequencing data

Yang et al. | Aug 30, 2024

Predicting smoking status based on RNA sequencing data
Image credit: Yang and Stanley 2024

Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.

Read More...

Model selection and optimization for poverty prediction on household data from Cambodia

Wong et al. | Sep 29, 2023

Model selection and optimization for poverty prediction on household data from Cambodia
Image credit: Paul Szewczyk

Here the authors sought to use three machine learning models to predict poverty levels in Cambodia based on available household data. They found teat multilayer perceptron outperformed the other models, with an accuracy of 87 %. They suggest that data-driven approaches such as these could be used more effectively target and alleviate poverty.

Read More...

The characterization of quorum sensing trajectories of Vibrio fischeri using longitudinal data analytics

Abdel-Azim et al. | Dec 16, 2023

The characterization of quorum sensing trajectories of <i>Vibrio fischeri</i> using longitudinal data analytics

Quorum sensing (QS) is the process in which bacteria recognize and respond to the surrounding cell density, and it can be inhibited by certain antimicrobial substances. This study showed that illumination intensity data is insufficient for evaluating QS activity without proper statistical modeling. It concluded that modeling illumination intensity through time provides a more accurate evaluation of QS activity than conventional cross-sectional analysis.

Read More...

Color photometry and light curve modeling of apparent transient 2023jri

Favretto et al. | Aug 13, 2024

Color photometry and light curve modeling of apparent transient 2023jri

Observing transients like supernovae, which have short-lived brightness variations, helps astronomers understand cosmic phenomena. This study analyzed transient 2023jri, hypothesizing it was a Type IIb supernova. By collecting and analyzing data over four weeks, including light and color curves, they confirmed its classification and provided additional insights into this less-studied supernova type.

Read More...