Effects of data amount and variation in deep learning-based tuberculosis diagnosis in chest X-ray scans

(1) Dominican Academy, (2) Radcliffe Department of Medicine, University of Oxford

https://doi.org/10.59720/24-278
Cover photo for Effects of data amount and variation in deep learning-based tuberculosis diagnosis in chest X-ray scans

Pulmonary tuberculosis ranks among the world’s deadliest diseases, causing devastating global health harm. Despite diagnosing millions annually, COVID-19’s aftermath and inadequate screening within developing countries interfered with diagnosis of tuberculosis, hindering proper treatment and increasing tuberculosis mortality. Our study aims to enhance diagnostic opportunities by developing a deep-learning model to categorize pulmonary X-ray scans into “normal” and “tuberculosis” classes. Exercising feature extraction, we incorporated VGG-16 (Visual Geometry Group network with 16 layers) architecture to enhance model classification accuracy via transfer learning. We hypothesized that models trained on a greater amount and variety of data would perform better than those trained on less invariable data. Testing our hypothesis, we developed four models with replicate architectures trained on pulmonary X-ray scans from the Montgomery County Chest X-ray Dataset (Dataset A), the Shenzhen Chest X-ray Dataset (Dataset B), the Tuberculosis Chest X-ray Database (Dataset C), and the combined data (Dataset A, B, and C). Data amount and variability increased from Dataset A to C and was the largest in the combined datasets. Testing these models on each dataset used in training, we found the mean accuracy values were 45.6% for Dataset A, 62.0% for Dataset B, 82.0% for Dataset C, and 95.9% for the combined dataset. This indicates that models trained on more data perform more accurately across datasets due to greater data variation and amount. Additionally, our study highlights the efficacy of deep learning models in tuberculosis diagnosis, emphasizing the importance of data variability and a wider pulmonary X-ray database for potential implementation in global health.

Download Full Article as PDF