Guidelines for Engineering- and Machine Learning-Based Projects

All manuscripts published in JEI must be hypothesis-driven research. Hypotheses are a crucial part of the scientific thinking process, and professional scientific endeavors revolve around posing and testing hypotheses. We believe that it is important for students who publish with JEI to practice rigorous scientific thinking. This means that manuscripts that merely introduce an invention, methods optimization, or machine learning algorithm/model, no matter how impressive it is, are not appropriate for JEI. Here are some common examples of unacceptable “hypotheses” relating to engineering projects:

I hypothesize that my invention will work
I hypothesize that I can build this invention
I hypothesize that my model/algorithm will be accurate
I hypothesize that this model/algorithm will be more accurate than other models

In this guide, we will describe a few of the best strategies to convert your engineering- or machine learning-based manuscript into a hypothesis-driven one publishable with JEI. We will use some examples of submissions that we received in the past that either implement one of the included strategies or no longer would be acceptable and provide guidance on how to revise them.

Converting your engineering- or machine learning-based project to a hypothesis-driven manuscript

It is often possible to convert an engineering- or machine learning-based manuscript to a hypothesis-driven one by adding a few experiments, and sometimes just by changing the way it is presented. Here are two strategies to convert manuscripts that involve engineering, machine learning, and optimization projects to also include a clear, experimentally tested hypothesis, with examples drawn from previous JEI submissions.

1. Use your device, algorithm, or model to address a scientific question

This is the best way to use your invention to write a hypothesis-driven manuscript acceptable for JEI publication. Rather than centering your hypothesis on your invention or model, your hypothesis should predict a scientific finding using your invention or model within the methodology of testing your hypothesis. Below are some examples of JEI manuscripts that use their invention/model/optimization to pose a question and perform a series of experiments to test the hypothesis using their invention/model/optimization.

ACCEPTABLE: Identifying factors, such as low sleep quality, that predict suicidal thoughts using machine learning

Summary

Every year, around 800,000 people die by suicide worldwide, and it is the second leading cause of death for individuals aged 15-24 years. The National Survey on Drug Use and Health (NSDUH) is conducted annually to ask the general United States population aged 12 years and older questions about mental health, substance use, and suicidal thoughts. Some of the factors the survey considers are substance use, including need for treatment and disorders, and mental health topics, including major depressive episodes, suicidal ideation and attempts, and mental illness. In our research, we sought to identify associations between suicidal ideation and relevant variables, such as sleep quality, hopelessness, and anxious behavior, by analyzing survey responses from the 2020 NSDUH. Due to the density of the survey and no obvious direct relationship between any variables with suicidal ideation, through our research we aimed to clearly find and display any association. We hypothesized that frequent anxious thoughts and behavior (such as fidgeting and restlessness), feelings of sadness, and/or low sleep quality would be the factors most predictive of having suicidal thoughts. Using a random forest classifier, we found that sleep problems were highly predictive of suicidal ideation. Professionals and clinicians should keep these findings in mind when developing suicide prevention efforts, such as identifying and supporting people potentially at risk, intervening to improve sleep quality, and teaching coping skills.

In this work, the authors used a computational algorithm to specifically address a scientific question. Their hypothesis was that several factors stemming from a large survey would be most predictive of suicidal thoughts. To test this hypothesis, the authors resorted to using a computational random forest classifier and found that only one of the hypothesized factors was highly predictive of suicidal thoughts.

ACCEPTABLE: The Effect of Turbine Size on Voltage Production in a Small-Scale Hydropower Generator

NOT ACCEPTABLE: Creating a System for Cheap Distributed Landmine Sensing Using GPS and GSM Technologies Coupled With Low Cost Electrical Design, Engineering, and Manufacturing Techniques

Summary

Millions of undetonated landmines still exist in underdeveloped and developing countries as remnants from past military conflicts that ravage developing and underdeveloped countries due to their political instability. There is an inability to detect and remove land mines, and to remove them safely, using many modern methods employed in developed countries around the world because the resources to do so are too expensive, risky, or unattainable due to a lack of training, practicality, or safety. The solution created in this project is to deploy many "modules" with each module contains a large inductor coil/metal detector, GSM chip, GPS chip, and a standalone microcontroller and have all the modules send data to one location and parse all the data using a client side application to process and display the data. The client side applications receive location data along with inductor sensor readings to create a heat map overlaid onto satellite imagery that can be updated with new data. The design was successful at fulfilling its goal; the early prototypes for the module worked and performed well with the other hardware. The data from the coil test proved the efficacy of the coil design and metal detector circuit to provide accurate and reliable readings. The application of the project is groundbreaking for humanitarian groups, undeveloped countries, and developing countries and extends beyond the scope of these users.

Reasons for Pre-Review failure

While the author claimed to have created a portable and cheap method to detect landmines, again, no clear hypothesis was presented or tested. The author hypothesized that they could create a solution to a problem without demonstrating that their system serves its intended purpose. Importantly, this manuscript falls under the category of unacceptable submissions where the author simply hypothesizes that they can build something. Please see below for ways to revise and convert this type of manuscript for publication in JEI.

How to Revise

While this manuscript discusses a very interesting and relevant problem, it is lacking a clear hypothesis and a good portion of the technology system was hypothetical, not created by the student. However, there is a section within the manuscript where the author creates a custom metal detector coil circuit and tests its ability to detect metal in accordance with UN standards for metal detection. Here, the author demonstrates their ability to build a custom metal-detecting device and validate its effectiveness. Instead of hypothesizing that they have a solution to a large-scale problem, the author could revise their hypothesis to something testable like:

Custom coil circuits with [insert specific] characteristics will be able to detect metal within a 0.5 meter radius
Increasing length of coil in coil circuit will increase range of metal detection

In this case, the author would 1) validate that their coil design is acceptable by UN standards and 2) test how specific changes to its design would affect its ability to detect metal.

2. Predict a specific factor that will influence the effectiveness of your learning model/invention

If an invention or computational algorithm/model explores a different way to accomplish a goal more efficiently or more accurately, you can design an experiment where you compare the factors that may influence the effectiveness/accuracy of your invention or model. However, you must include the following information for this approach to be acceptable:

Predict which factor you hypothesize will specifically lead to more accurate results compared to other tested factors within your device/model.
Give concrete reasons about why you think so and back your claim up with previous research or theoretical background.
The Results section should focus on the hypothesis of which factor is responsible for highest accuracy, not the accuracy of the model(s) as a whole.
Results section should contain statistical analysis to directly compare the results gathered from the independent variables (i.e., the factors you want to compare).

These requirements are necessary to ensure that you have thought critically about your invention/model.

ACCEPTABLE: Jet optimization using a hybrid multivariate regression model and statistical methods in dimuon collisions

ACCEPTABLE: Comparison of three large language models as middle school math tutoring assistants

Summary

Middle school math forms the basis for advanced mathematical courses leading up to the university level. Large language models (LLMs) have the potential to power next-generation educational technologies, acting as digital tutors to students. The main objective of this study was to determine whether LLMs like ChatGPT, Bard, and Llama 2 can serve as reliable middle school math tutoring assistants on three tutoring tasks: hint generation, comprehensive solution, and exercise creation. Our first hypothesis was that ChatGPT would perform better in completing all three tutoring tasks than Bard and Llama 2 due to its largest model size (175 billion parameters). Our second hypothesis was that Bard would perform better than Llama 2 in generating comprehensive correct solutions due to its relatively higher model size (137 billion parameters) than Llama 2 (70 billion parameters). We curated medium-difficulty, word-based middle school math problems on algebra, number theory, and counting/probability from The Art of Problem Solving and Khan Academy. A human tutor evaluated the LLMs' performance on each tutoring task. Contrary to our first hypothesis, results showed that ChatGPT didn't perform uniformly better than Bard and Llama 2 on all the tasks. ChatGPT outperformed both Bard and Llama 2 only in the comprehensive solution task. Bard didn't perform better than Llama 2 in the comprehensive solution task which does not support our second hypothesis. We conclude that middle school math teachers can use a combination of ChatGPT, Bard, and Llama 2 as assistants based on the specific tutoring task.

Here, the authors had a specific hypothesis in mind – that large language models (LLMs) could serve as digital tutors – and they tested three different LLMs for their reliability on three tutoring tasks. The authors made predictions based on the underlying architecture of each LLM. Finally, the authors tested their hypotheses using a specific set of questions they curated, essentially creating a neutral testing ground for the three LLMs.

NO LONGER ACCEPTABLE: A comparative analysis of machine learning approaches for prediction of breast cancer

Summary

One of the most dreadful diseases for women and their health is breast cancer. Breast cancer death rates are higher than those for any other cancer, aside from lung cancer. Machine learning and deep learning techniques can be used to predict the early onset of breast cancer. The main objective of this analysis was to determine whether machine learning algorithms can be used to predict the onset of breast cancer with more than 90% accuracy. Based on research with supervised machine learning algorithms, Gaussian Naïve Bayes, K Nearest Algorithm, Random Forest, and Logistic Regression were considered because they offer a wide variety of classification methods and also provide high accuracy and performance. We hypothesized that all these algorithms would provide accurate results, and Random Forest and Logistic Regression would provide better accuracy and performance than Naïve Bayes and K Nearest Neighbor. The Wisconsin Breast Cancer dataset from the UC Irvine repository was used to perform a comparison between the supervised machine learning algorithms of Gaussian Naïve Bayes, K Nearest Neighbor, Random Forest, and Logistic Regression. Based on the results, the Random Forest algorithm performed best among the four algorithms in malignant prediction (accuracy = 98%), and Logistic Regression algorithm performed best among the four algorithms in benign prediction (accuracy = 99%). All the algorithms performed well in the prediction of benign versus malignant cancer, with more than 90% accuracy based on their F1-score. The study results can be used for further research in prediction of cancer using machine learning algorithms.

In this work, the authors tested several machine learning (ML) algorithms for their ability to predict the onset of breast cancer. They hypothesized that all algorithms would be accurate at predicting early onset breast cancer but that some would perform better than others. Importantly, this type of hypothesis is no longer accepted at JEI. To make it acceptable for JEI, the authors would have had to focus their hypothesis on what would make a certain model accurate based on informed knowledge. For instance, they could reference particular parameters in each ML algorithm that would make the algorithm more likely to predict breast cancer in better or worse manner. Then, they would tune these parameters to test the hypothesis. A simple “These algorithms will work” is not a sufficiently rigorous hypothesis to be considered by JEI.

Links to Manuscripts

Feel free to contact the JEI Editorial Staff if you have any more questions about how to write a hypothesis-driven manuscript for JEI. Find the links to the full manuscripts mentioned above, as well as some other acceptable machine learning-based manuscripts below:

Identifying factors, such as low sleep quality, that predict suicidal thoughts using machine learning

Jet optimization using a hybrid multivariate regression model and statistical methods in dimuon collisions

Comparison of three large language models as middle school math tutoring assistants

A comparative analysis of machine learning approaches for prediction of breast cancer

Predicting asthma-related emergency department visits and hospitalizations with machine learning techniques

Evaluating the effectiveness of machine learning models for detecting AI-generated art

Evaluating machine learning algorithms to classify forest tree species through satellite imagery

These guidelines were last updated on June 24, 2024. The primary reason for our new guidelines on computational algorithm manuscripts is that JEI currently does not have the necessary expertise to evaluate the increasing volume of computational manuscripts. We always strive to provide extensive feedback to our student authors so that they can have a positive and educational experience while publishing their (likely) first scientific work. At the moment, we do not feel that we can do this appropriately for all computational manuscripts.