Articles | Journal of Emerging Investigators

Using data science along with machine learning to determine the ARIMA model’s ability to adjust to irregularities in the dataset

Choudhary et al. | Jul 26, 2021

Auto-Regressive Integrated Moving Average (ARIMA) models are known for their influence and application on time series data. This statistical analysis model uses time series data to depict future trends or values: a key contributor to crime mapping algorithms. However, the models may not function to their true potential when analyzing data with many different patterns. In order to determine the potential of ARIMA models, our research will test the model on irregularities in the data. Our team hypothesizes that the ARIMA model will be able to adapt to the different irregularities in the data that do not correspond to a certain trend or pattern. Using crime theft data and an ARIMA model, we determined the results of the ARIMA model’s forecast and how the accuracy differed on different days with irregularities in crime.

Using broad health-related survey questions to predict the presence of coronary heart disease

Chavda et al. | Aug 23, 2024

Coronary heart disease (CHD) is the leading cause of death in the U.S., responsible for nearly 700,000 deaths in 2021, and is marked by artery clogging that can lead to heart attacks. Traditional prediction methods require expensive clinical tests, but a new study explores using machine learning on demographic, clinical, and behavioral survey data to predict CHD.

SmartZoo: A Deep Learning Framework for an IoT Platform in Animal Care

Ji et al. | Aug 07, 2024

Zoos offer educational and scientific advantages but face high maintenance costs and challenges in animal care due to diverse species' habits. Challenges include tracking animals, detecting illnesses, and creating suitable habitats. We developed a deep learning framework called SmartZoo to address these issues and enable efficient animal monitoring, condition alerts, and data aggregation. We discovered that the data generated by our model is closer to real data than random data, and we were able to demonstrate that the model excels at generating data that resembles real-world data.

The utilization of Artificial Intelligence in enabling the early detection of brain tumors

Haider et al. | Feb 05, 2025

AI analysis of brain scans offers promise for helping doctors diagnose brain tumors. Haider and Drosis explore this field by developing machine learning models that classify brain scans as "cancer" or "non-cancer" diagnoses.

Part of speech distributions for Grimm versus artificially generated fairy tales

Arvind et al. | Nov 16, 2024

Here, the authors wanted to explore mathematical paradoxes in which there are multiple contradictory interpretations or analyses for a problem. They used ChatGPT to generate a novel dataset of fairy tales. They found statistical differences between the artificially generated text and human produced text based on the distribution of parts of speech elements.

Correlation between shutdowns and CO levels across the United States.

Gupta et al. | Dec 05, 2021

Concerns regarding the rapid spread of Sars-CoV2 in early 2020 led company and local governmental officials in many states to ask people to work from home and avoid leaving their homes; measures commonly referred to as shutdowns. Here, the authors investigate how shutdowns affected carbon monoxide (CO) levels in 15 US states using publicly available data. Their results suggest that CO levels decreased as a result of these measures over the course of 2020, a trend which started to reverse after shutdowns ended.

Machine learning on crowd-sourced data to highlight coral disease

Narayan et al. | Jul 26, 2021

Triggered largely by the warming and pollution of oceans, corals are experiencing bleaching and a variety of diseases caused by the spread of bacteria, fungi, and viruses. Identification of bleached/diseased corals enables implementation of measures to halt or retard disease. Benthic cover analysis, a standard metric used in large databases to assess live coral cover, as a standalone measure of reef health is insufficient for identification of coral bleaching/disease. Proposed herein is a solution that couples machine learning with crowd-sourced data – images from government archives, citizen science projects, and personal images collected by tourists – to build a model capable of identifying healthy, bleached, and/or diseased coral.

Training neural networks on text data to model human emotional understanding

Sathish et al. | Mar 14, 2025

The authors train a neural network to detect text-based emotions including joy, sadness, anger, fear, love, and surprise.

The gender gap in STEM at top U.S. Universities: change over time and relationship with ranking

Kruus et al. | Jun 25, 2024

Authors address the gender disparity in STEM fields, examining changes in gender diversity across male-dominated undergraduate programs over 19 years at 24 top universities. Analyzing data from NCES IPEDS, it identifies STEM as persistently male-dominated but notes increasing gender diversity in many disciplines, particularly in recent years. Results indicate that higher-ranked universities in disciplines like computer science and mechanical engineering show a weak correlation with improved gender diversity, suggesting effective initiatives can mitigate the gender gap in STEM, despite ongoing challenges.

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Anand et al. | Mar 09, 2025

The mountain chain of the Western Ghats on the Indian peninsula, a UNESCO World Heritage site, is home to about 200 frog species, 89 of which are endemic. Distinctive to each frog species, their vocalizations can be used for species recognition. Manually surveying frogs at night during the rain in elephant and big cat forests is difficult, so being able to autonomously record ambient soundscapes and identify species is essential. An effective machine learning (ML) species classifier requires substantial training data from this area. The goal of this study was to assess data augmentation techniques on a dataset of frog vocalizations from this region, which has a minimal number of audio recordings per species. Consequently, enhancing an ML model’s performance with limited data is necessary. We analyzed the effects of four data augmentation techniques (Time Shifting, Noise Injection, Spectral Augmentation, and Test-Time Augmentation) individually and their combined effect on the frog vocalization data and the public environmental sounds dataset (ESC-50). The effect of combined data augmentation techniques improved the model's relative accuracy as the size of the dataset decreased. The combination of all four techniques improved the ML model’s classification accuracy on the frog calls dataset by 94%. This study established a data augmentation approach to maximize the classification accuracy with sparse data of frog call recordings, thereby creating a possibility to build a real-world automated field frog species identifier system. Such a system can significantly help in the conservation of frog species in this vital biodiversity hotspot.

Browse Articles

Using data science along with machine learning to determine the ARIMA model’s ability to adjust to irregularities in the dataset

Using broad health-related survey questions to predict the presence of coronary heart disease

SmartZoo: A Deep Learning Framework for an IoT Platform in Animal Care

The utilization of Artificial Intelligence in enabling the early detection of brain tumors

Part of speech distributions for Grimm versus artificially generated fairy tales

Correlation between shutdowns and CO levels across the United States.

Machine learning on crowd-sourced data to highlight coral disease

Training neural networks on text data to model human emotional understanding

The gender gap in STEM at top U.S. Universities: change over time and relationship with ranking

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Search Articles

Popular Tags

Browse Articles

Search Articles

Category

School Level

Popular Tags