AUTHOR=Tao Yi , Yu Yanyan , Wu Tong , Xu Xiangli , Dai Quan , Kong Hanqing , Zhang Lei , Yu Weidong , Leng Xiaoping , Qiu Weibao , Tian Jiawei TITLE=Deep learning for the diagnosis of suspicious thyroid nodules based on multimodal ultrasound images JOURNAL=Frontiers in Oncology VOLUME=Volume 12 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.1012724 DOI=10.3389/fonc.2022.1012724 ISSN=2234-943X ABSTRACT=Objectives: This study aimed to differentially diagnose thyroid nodules (TNs) of TI-RADS category 3–5 using a deep learning (DL) model based on multimodal ultrasound (US) images and explore its auxiliary role for radiologists with varying degrees of experience. Methods: Preoperative multimodal US images of 1,138 TNs of TI-RADS category 3–5 were randomly divided into a training set, validation set, and test set in a 4:1:1.25 ratio. Gray-scale US (GSU), color Doppler flow imaging (CDFI), strain elastography (SE), and region of interest mask (Mask) images were acquired in both transverse and longitudinal sections, all of which were confirmed by pathology. The diagnostic performance of the mature DL model and radiologists in the test set was compared and whether DL could assist radiologists in improving diagnostic performance was verified. Results: The AUCs of DL in the differentiation of TNs were 0.858 based on (GSU+SE), 0.909 based on (GSU+CDFI), 0.906 based on (GSU+CDFI+SE), and 0.881 based (GSU+Mask), which were superior to that of 0.825-based single GSU (P = 0.014; P < 0.001; P < 0.001; P = 0.002, respectively). The highest AUC of 0.928 was achieved by DL based on (G+C+E+M)US, the highest specificity of 89.5% was achieved by (G+C+E)US, and the highest accuracy of 86.2% and sensitivity of 86.9% were achieved by DL based on (G+C+M)US. With DL assistance, the AUC of junior radiologists increased from 0.720 to 0.796 (P < 0.001), which was slightly higher than that of senior radiologists without DL assistance (0.796 vs. 0.794, P > 0.05). Senior radiologists with DL assistance exhibited higher accuracy and comparable AUC than that of DL based on GSU (83.4% vs. 78.9%, P = 0.041; 0.822 vs. 0.825, P = 0.512). However, the AUC of DL based on multimodal US images was significantly higher than that based on visual diagnosis by radiologists (P < 0.05). Conclusion: The DL models based on multimodal US images showed exceptional performance in the differential diagnosis of suspicious TNs, effectively increased the diagnostic efficacy of TN evaluations by junior radiologists, and provided an objective assessment for the clinical and surgical management phases that follow.