Browse Articles

Can the nucleotide content of a DNA sequence predict the sequence accessibility?

Balachandran et al. | Mar 10, 2023

Can the nucleotide content of a DNA sequence predict the sequence accessibility?
Image credit: Warren Umoh

Sequence accessibility is an important factor affecting gene expression. Sequence accessibility or openness impacts the likelihood that a gene is transcribed and translated into a protein and performs functions and manifests traits. There are many potential factors that affect the accessibility of a gene. In this study, our hypothesis was that the content of nucleotides in a genetic sequence predicts its accessibility. Using a machine learning linear regression model, we studied the relationship between nucleotide content and accessibility.

Read More...

Gradient boosting with temporal feature extraction for modeling keystroke log data

Barretto et al. | Oct 04, 2024

Gradient boosting with temporal feature extraction for modeling keystroke log data
Image credit: Barretto and Barretto 2024.

Although there has been great progress in the field of Natural language processing (NLP) over the last few years, particularly with the development of attention-based models, less research has contributed towards modeling keystroke log data. State of the art methods handle textual data directly and while this has produced excellent results, the time complexity and resource usage are quite high for such methods. Additionally, these methods fail to incorporate the actual writing process when assessing text and instead solely focus on the content. Therefore, we proposed a framework for modeling textual data using keystroke-based features. Such methods pay attention to how a document or response was written, rather than the final text that was produced. These features are vastly different from the kind of features extracted from raw text but reveal information that is otherwise hidden. We hypothesized that pairing efficient machine learning techniques with keystroke log information should produce results comparable to transformer techniques, models which pay more or less attention to the different components of a text sequence in a far quicker time. Transformer-based methods dominate the field of NLP currently due to the strong understanding they display of natural language. We showed that models trained on keystroke log data are capable of effectively evaluating the quality of writing and do it in a significantly shorter amount of time compared to traditional methods. This is significant as it provides a necessary fast and cheap alternative to increasingly larger and slower LLMs.

Read More...

Predicting smoking status based on RNA sequencing data

Yang et al. | Aug 30, 2024

Predicting smoking status based on RNA sequencing data
Image credit: Yang and Stanley 2024

Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.

Read More...

SmartZoo: A Deep Learning Framework for an IoT Platform in Animal Care

Ji et al. | Aug 07, 2024

SmartZoo: A Deep Learning Framework for an IoT Platform in Animal Care

Zoos offer educational and scientific advantages but face high maintenance costs and challenges in animal care due to diverse species' habits. Challenges include tracking animals, detecting illnesses, and creating suitable habitats. We developed a deep learning framework called SmartZoo to address these issues and enable efficient animal monitoring, condition alerts, and data aggregation. We discovered that the data generated by our model is closer to real data than random data, and we were able to demonstrate that the model excels at generating data that resembles real-world data.

Read More...

Using two-stage deep learning to assist the visually impaired with currency differentiation

Nachnani et al. | Jun 02, 2024

Using two-stage deep learning to assist the visually impaired with currency differentiation
Image credit: Omer Shahzad

Here, recognizing the difficulty that visually impaired people may have differentiating United States currency, the authors sought to use artificial intelligence (AI) models to identify US currencies. With a one-stage AI they reported a test accuracy of 89%, finding that multi-level deep learning models did not provide any significant advantage over a single-level AI.

Read More...

A novel deep learning model for visibility correction of environmental factors in autonomous vehicles

Dey et al. | Oct 31, 2022

A novel deep learning model for visibility correction of environmental factors in autonomous vehicles

Intelligent vehicles utilize a combination of video-enabled object detection and radar data to traverse safely through surrounding environments. However, since the most momentary missteps in these systems can cause devastating collisions, the margin of error in the software for these systems is small. In this paper, we hypothesized that a novel object detection system that improves detection accuracy and speed of detection during adverse weather conditions would outperform industry alternatives in an average comparison.

Read More...

Predicting the factors involved in orthopedic patient hospital stay

D’Souza et al. | Dec 13, 2023

Predicting the factors involved in orthopedic patient hospital stay
Image credit: Pixabay

Long hospital stays can be stressful for the patient for many reasons. We hypothesized that age would be the greatest predictor of hospital stay among patients who underwent orthopedic surgery. Through our models, we found that severity of illness was indeed the highest factor that contributed to determining patient length of stay. The other two factors that followed were the facility that the patient was staying in and the type of procedure that they underwent.

Read More...