Every year, around 40% of undergraduate students in the United States discontinue their studies, resulting in a loss of valuable education for students and a loss of money for colleges. Even so, colleges across the nation struggle to discover the underlying causes of these high dropout rates. In this paper, the authors discuss the use of machine learning to find correlations between the built environment factors and the retention rates of colleges. They hypothesized that one way for colleges to improve their retention rates could be to improve the physical characteristics of their campus to be more pleasing. The authors used image classification techniques to look at images of colleges and correlate certain features like colors, cars, and people to higher or lower retention rates. With three possible options of high, medium, and low retention rates, the probability that their models reached the right conclusion if they simply chose randomly was 33%. After finding that this 33%, or 0.33 mark, always fell outside of the 99% confidence intervals built around their models’ accuracies, the authors concluded that their machine learning techniques can be used to find correlations between certain environmental factors and retention rates.
Read More...Browse Articles
Machine learning-based enzyme engineering of PETase for improved efficiency in plastic degradation
Here, recognizing the recognizing the growing threat of non-biodegradable plastic waste, the authors investigated the ability to use a modified enzyme identified in bacteria to decompose polyethylene terephthalate (PET). They used simulations to screen and identify an optimized enzyme based on machine learning models. Ultimately, they identified a potential mutant PETases capable of decomposing PET with improved thermal stability.
Read More...Hybrid Quantum-Classical Generative Adversarial Network for synthesizing chemically feasible molecules
Current drug discovery processes can cost billions of dollars and usually take five to ten years. People have been researching and implementing various computational approaches to search for molecules and compounds from the chemical space, which can be on the order of 1060 molecules. One solution involves deep generative models, which are artificial intelligence models that learn from nonlinear data by modeling the probability distribution of chemical structures and creating similar data points from the trends it identifies. Aiming for faster runtime and greater robustness when analyzing high-dimensional data, we designed and implemented a Hybrid Quantum-Classical Generative Adversarial Network (QGAN) to synthesize molecules.
Read More...Risk assessment modeling for childhood stunting using automated machine learning and demographic analysis
Over the last few decades, childhood stunting has persisted as a major global challenge. This study hypothesized that TPTO (Tree-based Pipeline Optimization Tool), an AutoML (automated machine learning) tool, would outperform all pre-existing machine learning models and reveal the positive impact of economic prosperity, strong familial traits, and resource attainability on reducing stunting risk. Feature correlation plots revealed that maternal height, wealth indicators, and parental education were universally important features for determining stunting outcomes approximately two years after birth. These results help inform future research by highlighting how demographic, familial, and socio-economic conditions influence stunting and providing medical professionals with a deployable risk assessment tool for predicting childhood stunting.
Read More...The Role of a Mask - Understanding the Performance of Deep Neural Networks to Detect, Segment, and Extract Cellular Nuclei from Microscopy Images
Cell segmentation is the task of identifying cell nuclei instances in fluorescence microscopy images. The goal of this paper is to benchmark the performance of representative deep learning techniques for cell nuclei segmentation using standard datasets and common evaluation criteria. This research establishes an important baseline for cell nuclei segmentation, enabling researchers to continually refine and deploy neural models for real-world clinical applications.
Read More...Can the attributes of an app predict its rating?
In this article the authors looked at different attributes of apps within the Google Play store to determine how those may impact the overall app rating out of five stars. They found that review count, amount of storage needed and when the app was last updated to be the most influential factors on an app's rating.
Read More...Part of speech distributions for Grimm versus artificially generated fairy tales
Here, the authors wanted to explore mathematical paradoxes in which there are multiple contradictory interpretations or analyses for a problem. They used ChatGPT to generate a novel dataset of fairy tales. They found statistical differences between the artificially generated text and human produced text based on the distribution of parts of speech elements.
Read More...Machine Learning Algorithm Using Logistic Regression and an Artificial Neural Network (ANN) for Early Stage Detection of Parkinson’s Disease
Despite the prevalence of PD, diagnosing PD is expensive, requires specialized testing, and is often inaccurate. Moreover, diagnosis is often made late in the disease course when treatments are less effective. Using existing voice data from patients with PD and healthy controls, the authors created and trained two different algorithms: one using logistic regression and another employing an artificial neural network (ANN).
Read More...Applying centrality analysis on a protein interaction network to predict colorectal cancer driver genes
In this article the authors created an interaction map of proteins involved in colorectal cancer to look for driver vs. non-driver genes. That is they wanted to see if they could determine what genes are more likely to drive the development and progression in colorectal cancer and which are present in altered states but not necessarily driving disease progression.
Read More...Comparison of the ease of use and accuracy of two machine learning algorithms – forestry case study
Machine learning algorithms are becoming increasingly popular for data crunching across a vast area of scientific disciplines. Here, the authors compare two machine learning algorithms with respect to accuracy and user-friendliness and find that random forest algorithms outperform logistic regression when applied to the same dataset.
Read More...