Spire
December 1, 2021
Dermatology
December 1, 2021

Article of the Month – December 2021

Convolutional neural networks (CNNs) achieve expert-level accuracy in the diagnosis of pigmented melanocytic lesions

Expert – Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks

Philipp Tschandl, MD, PhD; Cliff Rosendahl, PhD; Bengu Nisa Akay,MD; Giuseppe Argenziano, MD, PhD; Andreas Blum,MD; Ralph P.Braun, MD, PhD; Horacio Cabo, MD, PhD; Jean Yves Gourhant, MD; Jürgen Kreusch, MD, PhD; Aimilios Lallas, MD; Jan Lapins, MD, PhD; Ashfaq Marghoob, MD; Scott Menzies, MBBS, PhD; Nina Maria Neuber, MD; Jon Paoli, MD, PhD; Harold S.Rabinovitz, MD; Christoph Rinner, PhD; Alon Scope, MD; H.Peter Soyer, MD; Christoph Sinz, MD; Luc Thomas, MD, PhD; Iris Zalaudek, MD; Harald Kittler, MD

Importance

Convolutional neural networks (CNNs) achieve expert-level accuracy in the diagnosis of pigmented melanocytic lesions. However, the most common types of skin cancer are nonpigmented and nonmelanocytic and are more difficult to diagnose

Objectives

To compare the accuracy of a CNN-based classifier with that of physicians with different levels of experience.

Design, Setting, and Participants

A CNN-based classification model was trained on 7895 dermoscopic and 5829 close-up images of lesions excised at a primary skin cancer clinic between January 1, 2008, and July 13, 2017, for a combined evaluation of both imaging methods. The combined CNN (cCNN) was tested on a set of 2072 unknown cases and compared with results from 95 human raters who were medical personnel, including 62 board-certified dermatologists, with different experience in dermoscopy.

Main Outcomes and Measures

The proportions of correct specific diagnoses and the accuracy to differentiate between benign and malignant lesions measured as an area under the receiver operating characteristic curve served as main outcome measures.

Results

Among 95 human raters (51.6% female; mean age, 43.4 years; 95% CI, 41.0-45.7 years), the participants were divided into 3 groups (according to years of experience with dermoscopy): beginner raters (<3 years), intermediate raters (3-10 years), or expert raters (>10 years). The area under the receiver operating characteristic curve of the trained cCNN was higher than human ratings (0.742; 95% CI, 0.729-0.755 vs 0.695; 95% CI, 0.676-0.713; P < .001). The specificity was fixed at the mean level of human raters (51.3%), and therefore the sensitivity of the cCNN (80.5%; 95% CI, 79.0%-82.1%) was higher than that of human raters (77.6%; 95% CI, 74.7%-80.5%). The cCNN achieved a higher percentage of correct specific diagnoses compared with human raters (37.6%; 95% CI, 36.6%-38.4% vs 33.5%; 95% CI, 31.5% 35.6%; P = .001) but not compared with experts (37.3%; 95% CI, 35.7%-38.8% vs 40.0%; 95% CI, 37.0% 43.0%; P = .18).

A) Receiver operating characteristic (ROC) curves of pooled human ratings (orange) and the combined convolutional neural network (cCNN) rating (blue) show significantly higher performance by the automated classifier. B) Area under the curve (AUC) of corresponding reading sets of the cCNN and dermatologists, grouped by experience. The horizontal line in each box indicates the median (middle band), while the top and bottom borders of the box indicate the 75th and 25th percentiles, respectively.

Conclusions

Neural networks are able to classify dermoscopic and close-up images of nonpigmented lesions as accurately as human experts in an experimental setting.

Relevance to Healthcare Field

According to estimates of the American Academy of Dermatology, skin cancer is the most common cancer in the United States. There are two types: pigmented and the most frequent one, nonpigmented. Dermoscopy has proven to be accurate in the diagnosis of pigmented and nonpigmented lesions. However, the diagnosis accuracy of nonpigmented skin cancer is not as high as that of pigmented skin lesions. Current convolutional neural networks (CNNs) can detect colors, contrasts, and edges, allowing the CNNs to be trained through images to determine the type of skin cancer. In this study, CNNs classified malignant nonpigmented skin lesions as accurately as expert dermatologists. However, CNNs were less accurate in diagnosing benign nonpigmented skin lesions or rare malignant cases such as amelanotic melanoma, mainly due to a smaller set of reference images to train the CNN models. Resultingly, CNN is not widely used today. With the large variety of unique skin cancer cases, the CNN method will be helpful in the healthcare field once we have enough images of each skin lesion. It is an excellent diagnostic tool complementary to the clinician’s assessment. It is an easy procedure for patients who have not had a biopsy or lesion excision and will be time and cost-effective.

Contact Us