Although there has been great progress in the field of Natural language processing (NLP) over the last few years, particularly with the development of attention-based models, less research has contributed towards modeling keystroke log data. State of the art methods handle textual data directly and while this has produced excellent results, the time complexity and resource usage are quite high for such methods. Additionally, these methods fail to incorporate the actual writing process when assessing text and instead solely focus on the content. Therefore, we proposed a framework for modeling textual data using keystroke-based features. Such methods pay attention to how a document or response was written, rather than the final text that was produced. These features are vastly different from the kind of features extracted from raw text but reveal information that is otherwise hidden. We hypothesized that pairing efficient machine learning techniques with keystroke log information should produce results comparable to transformer techniques, models which pay more or less attention to the different components of a text sequence in a far quicker time. Transformer-based methods dominate the field of NLP currently due to the strong understanding they display of natural language. We showed that models trained on keystroke log data are capable of effectively evaluating the quality of writing and do it in a significantly shorter amount of time compared to traditional methods. This is significant as it provides a necessary fast and cheap alternative to increasingly larger and slower LLMs.
Read More...Browse Articles
Using data science along with machine learning to determine the ARIMA model’s ability to adjust to irregularities in the dataset
Auto-Regressive Integrated Moving Average (ARIMA) models are known for their influence and application on time series data. This statistical analysis model uses time series data to depict future trends or values: a key contributor to crime mapping algorithms. However, the models may not function to their true potential when analyzing data with many different patterns. In order to determine the potential of ARIMA models, our research will test the model on irregularities in the data. Our team hypothesizes that the ARIMA model will be able to adapt to the different irregularities in the data that do not correspond to a certain trend or pattern. Using crime theft data and an ARIMA model, we determined the results of the ARIMA model’s forecast and how the accuracy differed on different days with irregularities in crime.
Read More...Recognition of animal body parts via supervised learning
The application of machine learning techniques has facilitated the automatic annotation of behavior in video sequences, offering a promising approach for ethological studies by reducing the manual effort required for annotating each video frame. Nevertheless, before solely relying on machine-generated annotations, it is essential to evaluate the accuracy of these annotations to ensure their reliability and applicability. While it is conventionally accepted that there cannot be a perfect annotation, the degree of error associated with machine-generated annotations should be commensurate with the error between different human annotators. We hypothesized that machine learning supervised with adequate human annotations would be able to accurately predict body parts from video sequences. Here, we conducted a comparative analysis of the quality of annotations generated by humans and machines for the body parts of sheep during treadmill walking. For human annotation, two annotators manually labeled six body parts of sheep in 300 frames. To generate machine annotations, we employed the state-of-the-art pose-estimating library, DeepLabCut, which was trained using the frames annotated by human annotators. As expected, the human annotations demonstrated high consistency between annotators. Notably, the machine learning algorithm also generated accurate predictions, with errors comparable to those between humans. We also observed that abnormal annotations with a high error could be revised by introducing Kalman Filtering, which interpolates the trajectory of body parts over the time series, enhancing robustness. Our results suggest that conventional transfer learning methods can generate behavior annotations as accurate as those made by humans, presenting great potential for further research.
Read More...Effects of Coolant Temperature on the Characteristics of Soil Cooling Curve
In this article, the authors investigate whether coolant temperature affects soil cooling curves of soil with otherwise identical properties. The coolant temperature is representative of environmental temperature, and the authors hypothesized that differences in this temperature would not affect the freezing temperature of soil. Their findings validated their hypothesis providing helpful information relevant to understanding how frost heaves happen and how to predict their occurrence more accurately.
Read More...Analysis of professional and amateur tennis serves using computer pose detection
The authors looked at the dynamics of tennis serves from professional and amateur athletes.
Read More...Differential privacy in machine learning for traffic forecasting
In this paper, we measured the privacy budgets and utilities of different differentially private mechanisms combined with different machine learning models that forecast traffic congestion at future timestamps. We expected the ANNs combined with the Staircase mechanism to perform the best with every value in the privacy budget range, especially with the medium high values of the privacy budget. In this study, we used the Autoregressive Integrated Moving Average (ARIMA) and neural network models to forecast and then added differentially private Laplacian, Gaussian, and Staircase noise to our datasets. We tested two real traffic congestion datasets, experimented with the different models, and examined their utility for different privacy budgets. We found that a favorable combination for this application was neural networks with the Staircase mechanism. Our findings identify the optimal models when dealing with tricky time series forecasting and can be used in non-traffic applications like disease tracking and population growth.
Read More...Genetic algorithm based features selection for predicting the unemployment rate of India
The authors looked at using genetic algorithms to look at the Indian labor market and what features might best explain any variation seen. They found that features such as economic growth and household consumption, among others, best explained variation.
Read More...Using Artificial Intelligence to Forecast Continuous Glucose Monitor(CGM) readings for Type One Diabetes
People with Type One diabetes often rely on Continuous Blood Glucose Monitors (CGMs) to track their blood glucose and manage their condition. Researchers are now working to help people with Type One diabetes more easily monitor their health by developing models that will future blood glucose levels based on CGM readings. Jalla and Ghanta tackle this issue by exploring the use of AI models to forecast blood glucose levels with CGM data.
Read More...Role of Environmental Conditions on Drying of Paint
Reducing paint drying time is an important step in improving production efficiency and reducing costs. The authors hypothesized that decreased humidity would lead to faster drying, ultraviolet (UV) light exposure would not affect the paint colors differently, white light exposure would allow for longer wavelength colors to dry at a faster rate than shorter wavelength colors, and substrates with higher roughness would dry slower. Experiments showed that trials under high humidity dried slightly faster than trials under low humidity, contrary to the hypothesis. Overall, the paint drying process is very much dependent on its surrounding environment, and optimizing the drying process requires a thorough understanding of the environmental factors and their interactive effects with the paint constituents.
Read More...Estimation of cytokines in PHA-activated mononuclear cells isolated from human peripheral and cord blood
In this study, the authors investigated the time-dependent cytokine secretion ability of phyto-hemagglutinin (PHA)-activated T cells derived from human peripheral (PB) and cord blood (CB). They hypothesized that the anti-inflammatory cytokine, IL-10, and pro-inflammatory cytokine, TNFα, levels would be higher in PHA-activated T cells obtained from PB as compared to the levels obtained from CB and would decrease over time. Upon PHA-activation, the IL-10 levels were relatively high while the TNFα levels decreased, making these findings applicable in therapeutic treatments e.g., rheumatoid arthritis, psoriasis, and organ transplantation.
Read More...