A novel CNN-based machine learning approach to identify skin cancers

Skin cancer is the most common type of cancer and includes diagnosis procedures that are costly, time-consuming, inaccurate, and inaccessible to many. The goal of this project was to determine the feasibility of a machine learning algorithm to identify skin cancers and compare the results to the conventional procedures of external visual inspection and biopsies. We used the HAM10000 dataset, a diverse collection of multisource clinical images of cutaneous skin pathologies (based on external appearance) which contained 11,034 unique image files of skin cancers and lesions at the time of this project. We tested and trained a machine learning algorithm with the dataset, and analyzed the accuracy, sensitivity, and runtime of the algorithm for seven skin pathologies, which were either skin cancers or pathologies that could potentially develop into skin cancer. The model was created with AutoKeras to automatically search for and apply the best algorithm. The average accuracy of the model (for each skin pathology type) was 84.05%, which exceeds the accuracy of histopathological diagnoses done by experienced dermatologists by 4.05%. For melanoma, the most fatal form of skin cancer, the model had a 70.63% diagnosis accuracy. Furthermore, the average runtime of the model, 4.9775 seconds, provides a significant advantage when compared to the typical minimum time needed to wait for biopsy results. The increased performance of a machine learning model when compared to conventional methods for identifying skin cancer results makes it a feasible alternative.