Browse Articles

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Anand et al. | Mar 09, 2025

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls
Image credit: Anand and Sampath 2025

The mountain chain of the Western Ghats on the Indian peninsula, a UNESCO World Heritage site, is home to about 200 frog species, 89 of which are endemic. Distinctive to each frog species, their vocalizations can be used for species recognition. Manually surveying frogs at night during the rain in elephant and big cat forests is difficult, so being able to autonomously record ambient soundscapes and identify species is essential. An effective machine learning (ML) species classifier requires substantial training data from this area. The goal of this study was to assess data augmentation techniques on a dataset of frog vocalizations from this region, which has a minimal number of audio recordings per species. Consequently, enhancing an ML model’s performance with limited data is necessary. We analyzed the effects of four data augmentation techniques (Time Shifting, Noise Injection, Spectral Augmentation, and Test-Time Augmentation) individually and their combined effect on the frog vocalization data and the public environmental sounds dataset (ESC-50). The effect of combined data augmentation techniques improved the model's relative accuracy as the size of the dataset decreased. The combination of all four techniques improved the ML model’s classification accuracy on the frog calls dataset by 94%. This study established a data augmentation approach to maximize the classification accuracy with sparse data of frog call recordings, thereby creating a possibility to build a real-world automated field frog species identifier system. Such a system can significantly help in the conservation of frog species in this vital biodiversity hotspot.

Read More...

Validating DTAPs with large language models: A novel approach to drug repurposing

Curtis et al. | Mar 02, 2025

Validating DTAPs with large language models: A novel approach to drug repurposing
Image credit: Growtika

Here, the authors investigated the integration of large language models (LLMs) with drug target affinity predictors (DTAPs) to improve drug repurposing, demonstrating a significant increase in prediction accuracy, particularly with GPT-4, for psychotropic drugs and the sigma-1 receptor. This novel approach offers to potentially accelerate and reduce the cost of drug discovery by efficiently identifying new therapeutic uses for existing drugs.

Read More...

Determining viability of image processing models for forensic analysis of hair for related individuals

Wang et al. | Feb 04, 2025

Determining viability of image processing models for forensic analysis of hair for related individuals
Image credit: Taylor Smith

Here, the authors used machine learning to analyze microscopic images of hair, quantifying various features to distinguish individuals, even within families where traditional DNA analysis is limited. The Discriminant Analysis (DA) model achieved the highest accuracy (88.89%) in identifying individuals, demonstrating its potential to improve the reliability of hair evidence in forensic investigations.

Read More...

Advancing pediatric cancer predictions through generative artificial intelligence and machine learning

Yadav et al. | Dec 21, 2024

Advancing pediatric cancer predictions through generative artificial intelligence and machine learning

Pediatric cancers pose unique challenges due to their rarity and distinct biological factors, emphasizing the need for accurate survival prediction to guide treatment. This study integrated generative AI and machine learning, including synthetic data, to analyze 9,184 pediatric cancer patients, identifying age at diagnosis, cancer types, and anatomical sites as significant survival predictors. The findings highlight the potential of AI-driven approaches to improve survival prediction and inform personalized treatment strategies, with broader implications for innovative healthcare applications.

Read More...

Survival analysis in cardiovascular epidemiology: nexus between heart disease and mortality

Lachwani et al. | Oct 23, 2024

Survival analysis in cardiovascular epidemiology: nexus between heart disease and mortality

In 2021, over 20 million people died from cardiovascular diseases, highlighting the need for a deeper understanding of factors influencing heart failure outcomes. This study examined multiple variables affecting mortality after heart failure, using random forest models to identify time, serum creatinine, and ejection fraction as key predictors. These findings could contribute to personalized medicine, improving survival rates by tailoring treatment strategies for heart failure patients.

Read More...

Investigating ecosystem resiliency in different flood zones of south Brooklyn, New York

Ng et al. | Mar 23, 2024

Investigating ecosystem resiliency in different flood zones of south Brooklyn, New York
Image credit: Ng and Zheng et al 2024

With climate change and rising sea levels, south Brooklyn is exposed to massive flooding and intense precipitation. Previous research discovered that flooding shifts plant species distribution, decreases soil pH, and increases salt concentration, nitrogen, phosphorus, and potassium levels. The authors predicted a decreasing trend from Zone 1 to 6: high-pH, high-salt, and high-nutrients in more flood-prone areas to low-pH, low-salt, and low-nutrient in less flood-prone regions. They performed DNA barcoding to identify plant species inhabiting flood zones with expectations of decreasing salt tolerance and moisture uptake by plants' soil from Zones 1-6. Furthermore, they predicted an increase in invasive species, ultimately resulting in a decrease in biodiversity. After barcoding, they researched existing information regarding invasiveness, ideal soil, pH tolerance, and salt tolerance. They performed soil analyses to identify pH, nitrogen (N), phosphorus (P), and potassium (K) levels. For N and P levels, we discovered a general decreasing trend from Zone 1 to 6 with low and moderate statistical significance respectively. Previous studies found that soil moisture can increase N and P uptake, helping plants adopt efficient resource-use strategies and reduce water stress from flooding. Although characteristics of plants were distributed throughout all zones, demonstrating overall diversity, the soil analyses hinted at the possibility of a rising trend of plants adapting to the increase in flooding. Future expansive research is needed to comprehensively map these trends. Ultimately, investigating trends between flood zones and the prevalence of different species will assist in guiding solutions to weathering climate change and protecting biodiversity in Brooklyn.

Read More...

Predicting baseball pitcher efficacy using physical pitch characteristics

Oberoi et al. | Jan 11, 2024

Predicting baseball pitcher efficacy using physical pitch characteristics
Image credit: Antoine Schibler

Here, the authors sought to develop a new metric to evaluate the efficacy of baseball pitchers using machine learning models. They found that the frequency of balls, was the most predictive feature for their walks/hits allowed per inning (WHIP) metric. While their machine learning models did not identify a defining trait, such as high velocity, spin rate, or types of pitches, they found that consistently pitching within the strike zone resulted in significantly lower WHIPs.

Read More...

Developing anticholinergic drugs for the treatment of asthma with improved efficacy

Wong et al. | Jul 05, 2023

Developing anticholinergic drugs for the treatment of asthma with improved efficacy
Image credit: Wong et al.

Anticholinergics are used in treating asthma, a chronic inflammation of the airways. These drugs block human M1 and M2 muscarinic acetylcholine receptors, inhibiting bronchoconstriction. However, studies have reported complications of anticholinergic usage, such as exacerbated eosinophil production and worsened urinary retention. Modification of known anticholinergics using bioisosteric replacements to increase efficacy could potentially minimize these complications. The present study focuses on identifying viable analogs of anticholinergics to improve binding energy to the receptors compared to current treatment options. Glycopyrrolate (G), ipratropium (IB), and tiotropium bromide (TB) were chosen as parent drugs of interest, due to the presence of common functional groups within the molecules, specifically esters and alcohols. Docking score analysis via AutoDock Vina was used to evaluate the binding energy between drug analogs and the muscarinic acetylcholine receptors. The final results suggest that G-A3, IB-A3, and TB-A1 are the most viable analogs, as binding energy was improved when compared to the parent drug. G-A4, IB-A4, IB-A5, TB-A3, and TB-A4 are also potential candidates, although there were slight regressions in binding energy to both muscarinic receptors for these analogs. By researching the effects of bioisosteric replacements of current anticholinergics, it is evident that there is a potential to provide asthmatics with more effective treatment options.

Read More...

Time-Efficient and Low-Cost Neural Network to detect plant disease on leaves and reduce food loss and waste

Singh et al. | Apr 24, 2023

Time-Efficient and Low-Cost Neural Network to detect plant disease on leaves and reduce food loss and waste

About 25% of the food grown never reaches consumers due to spoilage, and 11.5 billion pounds of produce from gardens are wasted every year. Current solutions involve farmers manually looking for and treating diseased crops. These methods of tending crops are neither time-efficient nor feasible. I used a convolutional neural network to identify signs of plant disease on leaves for garden owners and farmers.

Read More...

Differential privacy in machine learning for traffic forecasting

Vinay et al. | Dec 21, 2022

Differential privacy in machine learning for traffic forecasting

In this paper, we measured the privacy budgets and utilities of different differentially private mechanisms combined with different machine learning models that forecast traffic congestion at future timestamps. We expected the ANNs combined with the Staircase mechanism to perform the best with every value in the privacy budget range, especially with the medium high values of the privacy budget. In this study, we used the Autoregressive Integrated Moving Average (ARIMA) and neural network models to forecast and then added differentially private Laplacian, Gaussian, and Staircase noise to our datasets. We tested two real traffic congestion datasets, experimented with the different models, and examined their utility for different privacy budgets. We found that a favorable combination for this application was neural networks with the Staircase mechanism. Our findings identify the optimal models when dealing with tricky time series forecasting and can be used in non-traffic applications like disease tracking and population growth.

Read More...