Browse Articles

Propagation of representation bias in machine learning

Dass-Vattam et al. | Jun 10, 2021

Propagation of representation bias in machine learning

Using facial recognition as a use-case scenario, we attempt to identify sources of bias in a model developed using transfer learning. To achieve this task, we developed a model based on a pre-trained facial recognition model, and scrutinized the accuracy of the model’s image classification against factors such as age, gender, and race to observe whether or not the model performed better on some demographic groups than others. By identifying the bias and finding potential sources of bias, his work contributes a unique technical perspective from the view of a small scale developer to emerging discussions of accountability and transparency in AI.

Read More...

Monitoring drought using explainable statistical machine learning models

Cheung et al. | Oct 28, 2024

Monitoring drought using explainable statistical machine learning models

Droughts have a wide range of effects, from ecosystems failing and crops dying, to increased illness and decreased water quality. Drought prediction is important because it can help communities, businesses, and governments plan and prepare for these detrimental effects. This study predicts drought conditions by using predictable weather patterns in machine learning models.

Read More...

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Anand et al. | Mar 09, 2025

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls
Image credit: Anand and Sampath 2025

The mountain chain of the Western Ghats on the Indian peninsula, a UNESCO World Heritage site, is home to about 200 frog species, 89 of which are endemic. Distinctive to each frog species, their vocalizations can be used for species recognition. Manually surveying frogs at night during the rain in elephant and big cat forests is difficult, so being able to autonomously record ambient soundscapes and identify species is essential. An effective machine learning (ML) species classifier requires substantial training data from this area. The goal of this study was to assess data augmentation techniques on a dataset of frog vocalizations from this region, which has a minimal number of audio recordings per species. Consequently, enhancing an ML model’s performance with limited data is necessary. We analyzed the effects of four data augmentation techniques (Time Shifting, Noise Injection, Spectral Augmentation, and Test-Time Augmentation) individually and their combined effect on the frog vocalization data and the public environmental sounds dataset (ESC-50). The effect of combined data augmentation techniques improved the model's relative accuracy as the size of the dataset decreased. The combination of all four techniques improved the ML model’s classification accuracy on the frog calls dataset by 94%. This study established a data augmentation approach to maximize the classification accuracy with sparse data of frog call recordings, thereby creating a possibility to build a real-world automated field frog species identifier system. Such a system can significantly help in the conservation of frog species in this vital biodiversity hotspot.

Read More...

Model selection and optimization for poverty prediction on household data from Cambodia

Wong et al. | Sep 29, 2023

Model selection and optimization for poverty prediction on household data from Cambodia
Image credit: Paul Szewczyk

Here the authors sought to use three machine learning models to predict poverty levels in Cambodia based on available household data. They found teat multilayer perceptron outperformed the other models, with an accuracy of 87 %. They suggest that data-driven approaches such as these could be used more effectively target and alleviate poverty.

Read More...

Assessing machine learning model efficacy for brain tumor MRI classification: a multi-model approach

Dhingra et al. | Mar 14, 2026

Assessing machine learning model efficacy for brain tumor MRI classification: a multi-model approach
Image credit: Dhingra and Dhingra

This manuscript explores the performance of five different machine learning models in classifying brain tumors from a dataset of MRI scans. The authors find that several of the models showed >90% accuracy. Thus, the authors suggest that machine learning models demonstrate potential for effective implementation in clinical settings, including as a diagnostic tool that can be used to complement the expertise of neuroradiologists.

Read More...

Deep learning for pulsar detection: Investigating hyperparameter effects on TensorFlow classification accuracy

Upadhyay et al. | Jan 31, 2026

Deep learning for pulsar detection: Investigating hyperparameter effects on TensorFlow classification accuracy

This study investigates how the hyperparameters epochs and batch size affect the classification accuracy of a convolutional neural network (CNN) trained on pulsar candidate data. Our results reveal that accuracy improves with increasing number of epochs and smaller batch sizes, suggesting that with optimized hyperparameters, high accuracy may be achievable with minimal training. These findings offer insights that could help create more efficient machine learning classification models for pulsar signal detection, with the potential of accelerating pulsar discovery and advancing astrophysical research.

Read More...