In this paper, we measured the privacy budgets and utilities of different differentially private mechanisms combined with different machine learning models that forecast traffic congestion at future timestamps. We expected the ANNs combined with the Staircase mechanism to perform the best with every value in the privacy budget range, especially with the medium high values of the privacy budget. In this study, we used the Autoregressive Integrated Moving Average (ARIMA) and neural network models to forecast and then added differentially private Laplacian, Gaussian, and Staircase noise to our datasets. We tested two real traffic congestion datasets, experimented with the different models, and examined their utility for different privacy budgets. We found that a favorable combination for this application was neural networks with the Staircase mechanism. Our findings identify the optimal models when dealing with tricky time series forecasting and can be used in non-traffic applications like disease tracking and population growth.
Read More...Browse Articles
Predicting the factors involved in orthopedic patient hospital stay
Long hospital stays can be stressful for the patient for many reasons. We hypothesized that age would be the greatest predictor of hospital stay among patients who underwent orthopedic surgery. Through our models, we found that severity of illness was indeed the highest factor that contributed to determining patient length of stay. The other two factors that followed were the facility that the patient was staying in and the type of procedure that they underwent.
Read More...Gradient boosting with temporal feature extraction for modeling keystroke log data
Although there has been great progress in the field of Natural language processing (NLP) over the last few years, particularly with the development of attention-based models, less research has contributed towards modeling keystroke log data. State of the art methods handle textual data directly and while this has produced excellent results, the time complexity and resource usage are quite high for such methods. Additionally, these methods fail to incorporate the actual writing process when assessing text and instead solely focus on the content. Therefore, we proposed a framework for modeling textual data using keystroke-based features. Such methods pay attention to how a document or response was written, rather than the final text that was produced. These features are vastly different from the kind of features extracted from raw text but reveal information that is otherwise hidden. We hypothesized that pairing efficient machine learning techniques with keystroke log information should produce results comparable to transformer techniques, models which pay more or less attention to the different components of a text sequence in a far quicker time. Transformer-based methods dominate the field of NLP currently due to the strong understanding they display of natural language. We showed that models trained on keystroke log data are capable of effectively evaluating the quality of writing and do it in a significantly shorter amount of time compared to traditional methods. This is significant as it provides a necessary fast and cheap alternative to increasingly larger and slower LLMs.
Read More...The influence of purpose-of-use on information overload in online social networking
Here, seeking to understand the effects of social media in relation to social media fatigue and/or overload in recent years, the authors used various linear models to assess the results of a survey of 27 respondents. Their results showed that increased duration of use of social media did not necessarily lead to fatigue, suggesting that quality may be more important than quantity. They also considered the purpose of an individual's social media usage as well as their engagement behavior during the COVID-19 pandemic.
Read More...Differences in Reliability and Predictability of Harvested Energy from Battery-less Intermittently Powered Systems
Solar and radio frequency harvesters serve as a viable alternative energy source to batteries in many cases where the battery cannot be easily replaced. Using specifically designed circuit models, the authors quantify the reliability of different harvested energy sources to identify the most practical and efficient forms of renewable energy.
Read More...Cytokine Treatment for Myocarditis May Directly Impact Cardiomyocytes Negatively
The purpose of our study was to determine if direct administration of CXCL1/KC to cardiomyocytes causes negative changes to cell density or proliferation. This molecule has been shown to reduce inflammation in certain instances. Homocysteine models the direct effect of an inflammatory agent on cardiomyocytes. Our question was whether these molecules directly impact cell density through an interaction with the cell proliferation process. We hypothesized that cells treated with CXCL1/KC would maintain the same cell density as untreated cells. In contrast, cells treated with Homocysteine or both Homocysteine and CXCL1/KC, were expected to have a higher cell density that than that of untreated cells.
Read More...Predicting college retention rates from Google Street View images of campuses
Every year, around 40% of undergraduate students in the United States discontinue their studies, resulting in a loss of valuable education for students and a loss of money for colleges. Even so, colleges across the nation struggle to discover the underlying causes of these high dropout rates. In this paper, the authors discuss the use of machine learning to find correlations between the built environment factors and the retention rates of colleges. They hypothesized that one way for colleges to improve their retention rates could be to improve the physical characteristics of their campus to be more pleasing. The authors used image classification techniques to look at images of colleges and correlate certain features like colors, cars, and people to higher or lower retention rates. With three possible options of high, medium, and low retention rates, the probability that their models reached the right conclusion if they simply chose randomly was 33%. After finding that this 33%, or 0.33 mark, always fell outside of the 99% confidence intervals built around their models’ accuracies, the authors concluded that their machine learning techniques can be used to find correlations between certain environmental factors and retention rates.
Read More... Machine learning-based enzyme engineering of PETase for improved efficiency in plastic degradation
Here, recognizing the recognizing the growing threat of non-biodegradable plastic waste, the authors investigated the ability to use a modified enzyme identified in bacteria to decompose polyethylene terephthalate (PET). They used simulations to screen and identify an optimized enzyme based on machine learning models. Ultimately, they identified a potential mutant PETases capable of decomposing PET with improved thermal stability.
Read More...Hybrid Quantum-Classical Generative Adversarial Network for synthesizing chemically feasible molecules
Current drug discovery processes can cost billions of dollars and usually take five to ten years. People have been researching and implementing various computational approaches to search for molecules and compounds from the chemical space, which can be on the order of 1060 molecules. One solution involves deep generative models, which are artificial intelligence models that learn from nonlinear data by modeling the probability distribution of chemical structures and creating similar data points from the trends it identifies. Aiming for faster runtime and greater robustness when analyzing high-dimensional data, we designed and implemented a Hybrid Quantum-Classical Generative Adversarial Network (QGAN) to synthesize molecules.
Read More...Risk assessment modeling for childhood stunting using automated machine learning and demographic analysis
Over the last few decades, childhood stunting has persisted as a major global challenge. This study hypothesized that TPTO (Tree-based Pipeline Optimization Tool), an AutoML (automated machine learning) tool, would outperform all pre-existing machine learning models and reveal the positive impact of economic prosperity, strong familial traits, and resource attainability on reducing stunting risk. Feature correlation plots revealed that maternal height, wealth indicators, and parental education were universally important features for determining stunting outcomes approximately two years after birth. These results help inform future research by highlighting how demographic, familial, and socio-economic conditions influence stunting and providing medical professionals with a deployable risk assessment tool for predicting childhood stunting.
Read More...