Simple solving heuristics improve the accuracy of sudoku difficulty classifiers

Sudokus are logical puzzles that vary in difficulty. They are typically solved with some commonly accepted strategies such as “obvious singles,” “hidden singles,” and “naked pairs.” Since strategies are an important part of how solvers approach sudoku puzzles, solving strategy analysis may be useful in building sudoku difficulty classifiers. Our study aimed to improve the accuracy of sudoku difficulty classification by analyzing the predictive power of 17 variables, including metrics based on sudoku solving strategies. Other classification attempts have been made using a convolutional neural network, a model trained on real-time human solving patterns, and a rating system based on the number of solving rounds needed to solve a puzzle. We collected 6,000 sudoku puzzles from puzzle-sudoku.com for our study. We based our classifier on the website’s difficulty ratings, which were basic, easy, intermediate, advanced, extreme, and evil. We paired the levels together so that our classifier only distinguished between three levels of difficulty. We trained two models; the Simple Model was trained on the number of clues, the average possibilities per empty cell, clue variation, and clue placement. The Simple Model had a 44% testing accuracy. The Solving Strategies Model was trained on all variables from the Simple Model and eight additional solving strategies features. These features measured the accuracy of the “obvious singles,” “hidden singles,” and “naked pairs” strategies on the puzzles. We hypothesized that including these solving strategies variables would improve accuracy in classifying sudoku difficulty because they reflect human solving behavior. The Solving Strategies Model classified between the three difficulty levels with 78% testing accuracy, significantly higher than the Simple Model’s accuracy. This result indicates that the sudoku strategy metrics improved the model’s ability to classify sudoku difficulty.