Model selection and optimization for poverty prediction on household data from Cambodia

(1) Wycombe Abbey, High Wycombe, England, (2) Department of Quantitative and Computational Biology, Princeton University, Princeton, New Jersey

https://doi.org/10.59720/22-290
Cover photo for Model selection and optimization for poverty prediction on household data from Cambodia
Image credit: Paul Szewczyk

Addressing global poverty requires understanding of the most poverty-stricken regions. One approach towards achieving this is through poverty prediction, a task that entails classifying poverty levels of households using available data. While machine learning (ML) has been applied in numerous fields with considerable success, its application in poverty prediction using exclusively household survey data is yet to be thoroughly explored. Household survey data offers a detailed view into the living conditions, lifestyle, and socio-economic factors affecting households. Hence, we aim explore the use of this data type in predicting poverty levels. Our study primarily focuses on three ML models: softmax classification, random forest classification, and multilayer perceptron (MLP). We chose Cambodia for this study due to its unique socio-economic landscape and as a representative of developing nations struggling with poverty. This analysis will serve as the foundation for applying this approach to other nations. The analysis was based on a dataset consisting of 15,825 household samples and 1,873 features obtained through the Demographic and Health Surveys (DHS) program in Cambodia. The study's aim was to validate the effectiveness of ML in poverty prediction using household data and identify the best performing model among the selected three. We hypothesized that the MLP, due to its advanced neural network structure, would provide superior results compared to the softmax classification and random forest models. As anticipated, the multilayer perceptron outperformed the other models, achieving an accuracy of 87% against the 81% and 80% accuracy of the random forest and softmax classification models respectively.

Download Full Article as PDF