The purpose of the study was to determine whether graph-based machine learning techniques, which have increased prevalence in the last few years, can accurately classify data into one of many clusters, while requiring less labeled training data and parameter tuning as opposed to traditional machine learning algorithms. The results determined that the accuracy of graph-based and traditional classification algorithms depends directly upon the number of features of each dataset, the number of classes in each dataset, and the amount of labeled training data used.
Read More...Browse Articles
Using text embedding models as text classifiers with medical data
This article describes the classification of medical text data using vector databases and text embedding. Various large language models were used to generate this medical data for the classification task.
Read More...Collaboration beats heterogeneity: Improving federated learning-based waste classification
Based on the success of deep learning, recent works have attempted to develop a waste classification model using deep neural networks. This work presents federated learning (FL) for a solution, as it allows participants to aid in training the model using their own data. Results showed that with less clients, having a higher participation ratio resulted in less accuracy degradation by the data heterogeneity.
Read More...Transfer learning and data augmentation in osteosarcoma cancer detection
Osteosarcoma is a type of bone cancer that affects young adults and children. Early diagnosis of osteosarcoma is crucial to successful treatment. The current methods of diagnosis, which include imaging tests and biopsy, are time consuming and prone to human error. Hence, we used deep learning to extract patterns and detect osteosarcoma from histological images. We hypothesized that the combination of two different technologies (transfer learning and data augmentation) would improve the efficacy of osteosarcoma detection in histological images. The dataset used for the study consisted of histological images for osteosarcoma and was quite imbalanced as it contained very few images with tumors. Since transfer learning uses existing knowledge for the purpose of classification and detection, we hypothesized it would be proficient on such an imbalanced dataset. To further improve our learning, we used data augmentation to include variations in the dataset. We further evaluated the efficacy of different convolutional neural network models on this task. We obtained an accuracy of 91.18% using the transfer learning model MobileNetV2 as the base model with various geometric transformations, outperforming the state-of-the-art convolutional neural network based approach.
Read More...Predicting smoking status based on RNA sequencing data
Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.
Read More...Model selection and optimization for poverty prediction on household data from Cambodia
Here the authors sought to use three machine learning models to predict poverty levels in Cambodia based on available household data. They found teat multilayer perceptron outperformed the other models, with an accuracy of 87 %. They suggest that data-driven approaches such as these could be used more effectively target and alleviate poverty.
Read More...Color photometry and light curve modeling of apparent transient 2023jri
Observing transients like supernovae, which have short-lived brightness variations, helps astronomers understand cosmic phenomena. This study analyzed transient 2023jri, hypothesizing it was a Type IIb supernova. By collecting and analyzing data over four weeks, including light and color curves, they confirmed its classification and provided additional insights into this less-studied supernova type.
Read More...Utilizing meteorological data and machine learning to predict and reduce the spread of California wildfires
This study hypothesized that a machine learning model could accurately predict the severity of California wildfires and determine the most influential meteorological factors. It utilized a custom dataset with information from the World Weather Online API and a Kaggle dataset of wildfires in California from 2013-2020. The developed algorithms classified fires into seven categories with promising accuracy (around 55 percent). They found that higher temperatures, lower humidity, lower dew point, higher wind gusts, and higher wind speeds are the most significant contributors to the spread of a wildfire. This tool could vastly improve the efficiency and preparedness of firefighters as they deal with wildfires.
Read More...Quantitative analysis and development of alopecia areata classification frameworks
This article discusses Alopecia areata, an autoimmune disorder causing sudden hair loss due to the immune system mistakenly attacking hair follicles. The article introduces the use of deep learning (DL) techniques, particularly convolutional neural networks (CNN), for classifying images of healthy and alopecia-affected hair. The study presents a comparative analysis of newly optimized CNN models with existing ones, trained on datasets containing images of healthy and alopecia-affected hair. The Inception-Resnet-v2 model emerged as the most effective for classifying Alopecia Areata.
Read More...Evaluating TensorFlow image classification in classifying proton collision images for particle colliders
In this study the authors looked at developing a more efficient particle collision classification method with the goal of being able to more efficiently analyze particle trajectories from large-scale particle collisions without loss of accuracy.
Read More...