Articles | Journal of Emerging Investigators

A land use regression model to predict emissions from oil and gas production using machine learning

Cao et al. | Mar 24, 2023

Emissions from oil and natural gas (O&G) wells such as nitrogen dioxide (NO₂), volatile organic compounds (VOCs), and ozone (O₃) can severely impact the health of communities located near wells. In this study, we used O&G activity and wind-carried emissions to quantify the extent to which O&G wells affect the air quality of nearby communities, revealing that NO₂, NO_x, and NO are correlated to O&G activity. We then developed a novel land use regression (LUR) model using machine learning based on O&G prevalence to predict emissions.

Transfer Learning for Small and Different Datasets: Fine-Tuning A Pre-Trained Model Affects Performance

Gupta et al. | Oct 18, 2020

In this study, the authors seek to improve a machine learning algorithm used for image classification: identifying male and female images. In addition to fine-tuning the classification model, they investigate how accuracy is affected by their changes (an important task when developing and updating algorithms). To determine accuracy, a set of images is used to train the model and then a separate set of images is used for validation. They found that the validation accuracy was close to the training accuracy. This study contributes to the expanding areas of machine learning and its applications to image identification.

Machine learning predictions of additively manufactured alloy crack susceptibilities

Gowda et al. | Nov 12, 2024

Additive manufacturing (AM) is transforming the production of complex metal parts, but challenges like internal cracking can arise, particularly in critical sectors such as aerospace and automotive. Traditional methods to assess cracking susceptibility are costly and time-consuming, prompting the use of machine learning (ML) for more efficient predictions. This study developed a multi-model ML pipeline that predicts solidification cracking susceptibility (SCS) more accurately by considering secondary alloy properties alongside composition, with Random Forest models showing the best performance, highlighting a promising direction for future research into SCS quantification.

Machine learning for retinopathy prediction: Unveiling the importance of age and HbA1c with XGBoost

Ramachandran et al. | Sep 05, 2024

The purpose of our study was to examine the correlation of glycosylated hemoglobin (HbA1c), blood pressure (BP) readings, and lipid levels with retinopathy. Our main hypothesis was that poor glycemic control, as evident by high HbA1c levels, high blood pressure, and abnormal lipid levels, causes an increased risk of retinopathy. We identified the top two features that were most important to the model as age and HbA1c. This indicates that older patients with poor glycemic control are more likely to show presence of retinopathy.

Propagation of representation bias in machine learning

Dass-Vattam et al. | Jun 10, 2021

Using facial recognition as a use-case scenario, we attempt to identify sources of bias in a model developed using transfer learning. To achieve this task, we developed a model based on a pre-trained facial recognition model, and scrutinized the accuracy of the model’s image classification against factors such as age, gender, and race to observe whether or not the model performed better on some demographic groups than others. By identifying the bias and finding potential sources of bias, his work contributes a unique technical perspective from the view of a small scale developer to emerging discussions of accountability and transparency in AI.

Minimizing distortion with additive manufacturing parts using Machine Learning

Yarlagadda et al. | Jan 15, 2025

This study explores how to predict and minimize distortion in 3D printed parts, particularly when using affordable PLA filament. The researchers developed a model using a gradient boosting regressor trained on 3D printing data, aiming to predict the necessary CAD dimensions to counteract print distortion.

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Anand et al. | Mar 09, 2025

The mountain chain of the Western Ghats on the Indian peninsula, a UNESCO World Heritage site, is home to about 200 frog species, 89 of which are endemic. Distinctive to each frog species, their vocalizations can be used for species recognition. Manually surveying frogs at night during the rain in elephant and big cat forests is difficult, so being able to autonomously record ambient soundscapes and identify species is essential. An effective machine learning (ML) species classifier requires substantial training data from this area. The goal of this study was to assess data augmentation techniques on a dataset of frog vocalizations from this region, which has a minimal number of audio recordings per species. Consequently, enhancing an ML model’s performance with limited data is necessary. We analyzed the effects of four data augmentation techniques (Time Shifting, Noise Injection, Spectral Augmentation, and Test-Time Augmentation) individually and their combined effect on the frog vocalization data and the public environmental sounds dataset (ESC-50). The effect of combined data augmentation techniques improved the model's relative accuracy as the size of the dataset decreased. The combination of all four techniques improved the ML model’s classification accuracy on the frog calls dataset by 94%. This study established a data augmentation approach to maximize the classification accuracy with sparse data of frog call recordings, thereby creating a possibility to build a real-world automated field frog species identifier system. Such a system can significantly help in the conservation of frog species in this vital biodiversity hotspot.

Machine Learning Algorithm Using Logistic Regression and an Artificial Neural Network (ANN) for Early Stage Detection of Parkinson’s Disease

Kar et al. | Oct 10, 2020

Despite the prevalence of PD, diagnosing PD is expensive, requires specialized testing, and is often inaccurate. Moreover, diagnosis is often made late in the disease course when treatments are less effective. Using existing voice data from patients with PD and healthy controls, the authors created and trained two different algorithms: one using logistic regression and another employing an artificial neural network (ANN).

Comparison of the ease of use and accuracy of two machine learning algorithms – forestry case study

Bhatia et al. | Mar 21, 2021

Machine learning algorithms are becoming increasingly popular for data crunching across a vast area of scientific disciplines. Here, the authors compare two machine learning algorithms with respect to accuracy and user-friendliness and find that random forest algorithms outperform logistic regression when applied to the same dataset.

A novel encoding technique to improve non-weather-based models for solar photovoltaic forecasting

Ahmed et al. | Jun 09, 2023

Several studies have applied different machine learning (ML) techniques to the area of forecasting solar photovoltaic power production. Most of these studies use weather data as inputs to predict power production; however, there are numerous practical issues with the procurement of this data. This study proposes models that do not use weather data as inputs, but rather use past power production data as a more practical substitute to weather-based models. Our proposed models demonstrate a better, cheaper, and more reliable alternatives to current weather models.

Browse Articles

A land use regression model to predict emissions from oil and gas production using machine learning

Transfer Learning for Small and Different Datasets: Fine-Tuning A Pre-Trained Model Affects Performance

Machine learning predictions of additively manufactured alloy crack susceptibilities

Machine learning for retinopathy prediction: Unveiling the importance of age and HbA1c with XGBoost

Propagation of representation bias in machine learning

Minimizing distortion with additive manufacturing parts using Machine Learning

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Machine Learning Algorithm Using Logistic Regression and an Artificial Neural Network (ANN) for Early Stage Detection of Parkinson’s Disease

Comparison of the ease of use and accuracy of two machine learning algorithms – forestry case study

A novel encoding technique to improve non-weather-based models for solar photovoltaic forecasting

Search Articles

Popular Tags

Browse Articles

Search Articles

Category

School Level

Popular Tags