Articles | Journal of Emerging Investigators

The Dependence of CO2 Removal Efficiency on its Injection Speed into Water

Chen et al. | Nov 08, 2024

Recent research confirms that climate change, driven by CO2 emissions from burning fossil fuels, poses a significant threat to humanity. In response, authors explore methods to remove CO2 from the atmosphere, including breaking its molecular bonds through high-speed collisions.

Gradient boosting with temporal feature extraction for modeling keystroke log data

Barretto et al. | Oct 04, 2024

Although there has been great progress in the field of Natural language processing (NLP) over the last few years, particularly with the development of attention-based models, less research has contributed towards modeling keystroke log data. State of the art methods handle textual data directly and while this has produced excellent results, the time complexity and resource usage are quite high for such methods. Additionally, these methods fail to incorporate the actual writing process when assessing text and instead solely focus on the content. Therefore, we proposed a framework for modeling textual data using keystroke-based features. Such methods pay attention to how a document or response was written, rather than the final text that was produced. These features are vastly different from the kind of features extracted from raw text but reveal information that is otherwise hidden. We hypothesized that pairing efficient machine learning techniques with keystroke log information should produce results comparable to transformer techniques, models which pay more or less attention to the different components of a text sequence in a far quicker time. Transformer-based methods dominate the field of NLP currently due to the strong understanding they display of natural language. We showed that models trained on keystroke log data are capable of effectively evaluating the quality of writing and do it in a significantly shorter amount of time compared to traditional methods. This is significant as it provides a necessary fast and cheap alternative to increasingly larger and slower LLMs.

Identification of potential therapeutic targets for multiple myeloma by gene expression analysis

Kochenderfer et al. | Apr 26, 2024

A central challenge of cancer therapy is identifying treatments that will effectively target cancer cells while minimizing effects on healthy cells. To identify potential targets for treating a multiple myeloma, a frequently incurable cancer, Kochenderfer and Kochenderfer analyze RNA sequencing data from the Cancer Cell Line Encyclopedia to find genes with high expression in multiple myeloma cells and low expression in normal tissues

Predicting baseball pitcher efficacy using physical pitch characteristics

Oberoi et al. | Jan 11, 2024

Here, the authors sought to develop a new metric to evaluate the efficacy of baseball pitchers using machine learning models. They found that the frequency of balls, was the most predictive feature for their walks/hits allowed per inning (WHIP) metric. While their machine learning models did not identify a defining trait, such as high velocity, spin rate, or types of pitches, they found that consistently pitching within the strike zone resulted in significantly lower WHIPs.

Predicting college retention rates from Google Street View images of campuses

Dileep et al. | Jan 02, 2024

Every year, around 40% of undergraduate students in the United States discontinue their studies, resulting in a loss of valuable education for students and a loss of money for colleges. Even so, colleges across the nation struggle to discover the underlying causes of these high dropout rates. In this paper, the authors discuss the use of machine learning to find correlations between the built environment factors and the retention rates of colleges. They hypothesized that one way for colleges to improve their retention rates could be to improve the physical characteristics of their campus to be more pleasing. The authors used image classification techniques to look at images of colleges and correlate certain features like colors, cars, and people to higher or lower retention rates. With three possible options of high, medium, and low retention rates, the probability that their models reached the right conclusion if they simply chose randomly was 33%. After finding that this 33%, or 0.33 mark, always fell outside of the 99% confidence intervals built around their models’ accuracies, the authors concluded that their machine learning techniques can be used to find correlations between certain environmental factors and retention rates.

The extent to which storefront alcohol advertising differs by community profile in Michigan

Voyt et al. | May 17, 2023

Image credit: Steve Harvey

Here, recognizing that alcohol manufacturers may target ethnic minorities and youths with specific forms of advertisements based on previous studies, the authors considered how alcohol storefronts differ depending on the community they are located in. Specifically, they looked at differences between Metro-Dtroit suburban communities of high- and low-incomes. They found that alcohol stores in the low-income areas had more and larger alcohol and malt liquor advertisements per store along with being within 1,000 feet of a school.

Tomato disease identification with shallow convolutional neural networks

Trinh et al. | Mar 03, 2023

Plant diseases can cause up to 50% crop yield loss for the popular tomato plant. A mobile device-based method to identify diseases from photos of symptomatic leaves via computer vision can be more effective due to its convenience and accessibility. To enable a practical mobile solution, a “shallow” convolutional neural networks (CNNs) with few layers, and thus low computational requirement but with high accuracy similar to the deep CNNs is needed. In this work, we explored if such a model was possible.

Hybrid Quantum-Classical Generative Adversarial Network for synthesizing chemically feasible molecules

Sikdar et al. | Jan 10, 2023

Current drug discovery processes can cost billions of dollars and usually take five to ten years. People have been researching and implementing various computational approaches to search for molecules and compounds from the chemical space, which can be on the order of 1060 molecules. One solution involves deep generative models, which are artificial intelligence models that learn from nonlinear data by modeling the probability distribution of chemical structures and creating similar data points from the trends it identifies. Aiming for faster runtime and greater robustness when analyzing high-dimensional data, we designed and implemented a Hybrid Quantum-Classical Generative Adversarial Network (QGAN) to synthesize molecules.

Differential privacy in machine learning for traffic forecasting

Vinay et al. | Dec 21, 2022

In this paper, we measured the privacy budgets and utilities of different differentially private mechanisms combined with different machine learning models that forecast traffic congestion at future timestamps. We expected the ANNs combined with the Staircase mechanism to perform the best with every value in the privacy budget range, especially with the medium high values of the privacy budget. In this study, we used the Autoregressive Integrated Moving Average (ARIMA) and neural network models to forecast and then added differentially private Laplacian, Gaussian, and Staircase noise to our datasets. We tested two real traffic congestion datasets, experimented with the different models, and examined their utility for different privacy budgets. We found that a favorable combination for this application was neural networks with the Staircase mechanism. Our findings identify the optimal models when dealing with tricky time series forecasting and can be used in non-traffic applications like disease tracking and population growth.

Effect of hypervitaminosis A in regenerating planaria: A potential model for teratogenicity testing

Bennet et al. | Dec 12, 2022

This unique research study evaluated the potential use of the flatworm, brown planaria (Dugesia tigrine), as an alternative model for teratogenicity testing. In this study, we exposed amputated planaria to varying concentrations of a known teratogen, vitamin A (retinol), for approximately 2 weeks, and evaluated multiple parameters including the formation of blastema and eyes. The results from this study demonstrated that high concentrations of retinol caused defects in head and eye formation in regenerating planaria, with similarities to vitamin A related teratogenicity findings in mammals. Based on these results, regenerating brown planaria are a promising alternative model for teratogenicity testing, which can potentially be paradigm shifting as it can reduce cost, time, and pregnant animal use in research.

Browse Articles

The Dependence of CO2 Removal Efficiency on its Injection Speed into Water

Gradient boosting with temporal feature extraction for modeling keystroke log data

Identification of potential therapeutic targets for multiple myeloma by gene expression analysis

Predicting baseball pitcher efficacy using physical pitch characteristics

Predicting college retention rates from Google Street View images of campuses

The extent to which storefront alcohol advertising differs by community profile in Michigan

Tomato disease identification with shallow convolutional neural networks

Hybrid Quantum-Classical Generative Adversarial Network for synthesizing chemically feasible molecules

Differential privacy in machine learning for traffic forecasting

Effect of hypervitaminosis A in regenerating planaria: A potential model for teratogenicity testing

Search Articles

Popular Tags

Browse Articles

Search Articles

Category

School Level

Popular Tags