Browse Articles

Machine learning for retinopathy prediction: Unveiling the importance of age and HbA1c with XGBoost

Ramachandran et al. | Sep 05, 2024

Machine learning for retinopathy prediction: Unveiling the importance of age and HbA1c with XGBoost

The purpose of our study was to examine the correlation of glycosylated hemoglobin (HbA1c), blood pressure (BP) readings, and lipid levels with retinopathy. Our main hypothesis was that poor glycemic control, as evident by high HbA1c levels, high blood pressure, and abnormal lipid levels, causes an increased risk of retinopathy. We identified the top two features that were most important to the model as age and HbA1c. This indicates that older patients with poor glycemic control are more likely to show presence of retinopathy.

Read More...

Propagation of representation bias in machine learning

Dass-Vattam et al. | Jun 10, 2021

Propagation of representation bias in machine learning

Using facial recognition as a use-case scenario, we attempt to identify sources of bias in a model developed using transfer learning. To achieve this task, we developed a model based on a pre-trained facial recognition model, and scrutinized the accuracy of the model’s image classification against factors such as age, gender, and race to observe whether or not the model performed better on some demographic groups than others. By identifying the bias and finding potential sources of bias, his work contributes a unique technical perspective from the view of a small scale developer to emerging discussions of accountability and transparency in AI.

Read More...

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Anand et al. | Mar 09, 2025

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls
Image credit: Anand and Sampath 2025

The mountain chain of the Western Ghats on the Indian peninsula, a UNESCO World Heritage site, is home to about 200 frog species, 89 of which are endemic. Distinctive to each frog species, their vocalizations can be used for species recognition. Manually surveying frogs at night during the rain in elephant and big cat forests is difficult, so being able to autonomously record ambient soundscapes and identify species is essential. An effective machine learning (ML) species classifier requires substantial training data from this area. The goal of this study was to assess data augmentation techniques on a dataset of frog vocalizations from this region, which has a minimal number of audio recordings per species. Consequently, enhancing an ML model’s performance with limited data is necessary. We analyzed the effects of four data augmentation techniques (Time Shifting, Noise Injection, Spectral Augmentation, and Test-Time Augmentation) individually and their combined effect on the frog vocalization data and the public environmental sounds dataset (ESC-50). The effect of combined data augmentation techniques improved the model's relative accuracy as the size of the dataset decreased. The combination of all four techniques improved the ML model’s classification accuracy on the frog calls dataset by 94%. This study established a data augmentation approach to maximize the classification accuracy with sparse data of frog call recordings, thereby creating a possibility to build a real-world automated field frog species identifier system. Such a system can significantly help in the conservation of frog species in this vital biodiversity hotspot.

Read More...

Machine Learning Algorithm Using Logistic Regression and an Artificial Neural Network (ANN) for Early Stage Detection of Parkinson’s Disease

Kar et al. | Oct 10, 2020

Machine Learning Algorithm Using Logistic Regression and an Artificial Neural Network (ANN) for Early Stage Detection of Parkinson’s Disease

Despite the prevalence of PD, diagnosing PD is expensive, requires specialized testing, and is often inaccurate. Moreover, diagnosis is often made late in the disease course when treatments are less effective. Using existing voice data from patients with PD and healthy controls, the authors created and trained two different algorithms: one using logistic regression and another employing an artificial neural network (ANN).

Read More...

A novel encoding technique to improve non-weather-based models for solar photovoltaic forecasting

Ahmed et al. | Jun 09, 2023

A novel encoding technique to improve non-weather-based models for solar photovoltaic forecasting

Several studies have applied different machine learning (ML) techniques to the area of forecasting solar photovoltaic power production. Most of these studies use weather data as inputs to predict power production; however, there are numerous practical issues with the procurement of this data. This study proposes models that do not use weather data as inputs, but rather use past power production data as a more practical substitute to weather-based models. Our proposed models demonstrate a better, cheaper, and more reliable alternatives to current weather models.

Read More...

Determining viability of image processing models for forensic analysis of hair for related individuals

Wang et al. | Feb 04, 2025

Determining viability of image processing models for forensic analysis of hair for related individuals
Image credit: Taylor Smith

Here, the authors used machine learning to analyze microscopic images of hair, quantifying various features to distinguish individuals, even within families where traditional DNA analysis is limited. The Discriminant Analysis (DA) model achieved the highest accuracy (88.89%) in identifying individuals, demonstrating its potential to improve the reliability of hair evidence in forensic investigations.

Read More...