Browse Articles

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Anand et al. | Mar 09, 2025

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls
Image credit: Anand and Sampath 2025

The mountain chain of the Western Ghats on the Indian peninsula, a UNESCO World Heritage site, is home to about 200 frog species, 89 of which are endemic. Distinctive to each frog species, their vocalizations can be used for species recognition. Manually surveying frogs at night during the rain in elephant and big cat forests is difficult, so being able to autonomously record ambient soundscapes and identify species is essential. An effective machine learning (ML) species classifier requires substantial training data from this area. The goal of this study was to assess data augmentation techniques on a dataset of frog vocalizations from this region, which has a minimal number of audio recordings per species. Consequently, enhancing an ML model’s performance with limited data is necessary. We analyzed the effects of four data augmentation techniques (Time Shifting, Noise Injection, Spectral Augmentation, and Test-Time Augmentation) individually and their combined effect on the frog vocalization data and the public environmental sounds dataset (ESC-50). The effect of combined data augmentation techniques improved the model's relative accuracy as the size of the dataset decreased. The combination of all four techniques improved the ML model’s classification accuracy on the frog calls dataset by 94%. This study established a data augmentation approach to maximize the classification accuracy with sparse data of frog call recordings, thereby creating a possibility to build a real-world automated field frog species identifier system. Such a system can significantly help in the conservation of frog species in this vital biodiversity hotspot.

Read More...

Can the nucleotide content of a DNA sequence predict the sequence accessibility?

Balachandran et al. | Mar 10, 2023

Can the nucleotide content of a DNA sequence predict the sequence accessibility?
Image credit: Warren Umoh

Sequence accessibility is an important factor affecting gene expression. Sequence accessibility or openness impacts the likelihood that a gene is transcribed and translated into a protein and performs functions and manifests traits. There are many potential factors that affect the accessibility of a gene. In this study, our hypothesis was that the content of nucleotides in a genetic sequence predicts its accessibility. Using a machine learning linear regression model, we studied the relationship between nucleotide content and accessibility.

Read More...

Predicting asthma-related emergency department visits and hospitalizations with machine learning techniques

Chatterjee et al. | Oct 25, 2021

Predicting asthma-related emergency department visits and hospitalizations with machine learning techniques

Seeking to investigate the effects of ambient pollutants on human respiratory health, here the authors used machine learning to examine asthma in Lost Angeles County, an area with substantial pollution. By using machine learning models and classification techniques, the authors identified that nitrogen dioxide and ozone levels were significantly correlated with asthma hospitalizations. Based on an identified seasonal surge in asthma hospitalizations, the authors suggest future directions to improve machine learning modeling to investigate these relationships.

Read More...

Groundwater prediction using artificial intelligence: Case study for Texas aquifers

Sharma et al. | Apr 19, 2024

Groundwater prediction using artificial intelligence: Case study for Texas aquifers

Here, in an effort to develop a model to predict future groundwater levels, the authors tested a tree-based automated artificial intelligence (AI) model against other methods. Through their analysis they found that groundwater levels in Texas aquifers are down significantly, and found that tree-based AI models most accurately predicted future levels.

Read More...

The impact of genetic analysis on the early detection of colorectal cancer

Agrawal et al. | Aug 24, 2023

The impact of genetic analysis on the early detection of colorectal cancer

Although the 5-year survival rate for colorectal cancer is below 10%, it increases to greater than 90% if it is diagnosed early. We hypothesized from our research that analyzing non-synonymous single nucleotide variants (SNVs) in a patient's exome sequence would be an indicator for high genetic risk of developing colorectal cancer.

Read More...