AUTHOR=Zhu Jingjin , Geng Jiahui , Shan Wei , Zhang Boya , Shen Huaqing , Dong Xiaohan , Liu Mei , Li Xiru , Cheng Liuquan 

TITLE=Development and validation of a deep learning model for breast lesion segmentation and characterization in multiparametric MRI

JOURNAL=Frontiers in Oncology

VOLUME=Volume 12 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.946580

DOI=10.3389/fonc.2022.946580

ISSN=2234-943X

ABSTRACT=Importance: The utilization of artificial intelligence for differentiation of benign and malignant breast lesions in multiparametric MRI (mpMRI), assists radiologists to improve diagnostic performance.
Objectives: To develop an automated deep learning model for breast lesion segmentation and characterization and evaluate the performance of models and radiologists. 
Materials and Methods: For lesions segmentation, 2,823 patients were used for the training, validation, and testing of the VNet-based segmentation models, the average dice similarity coefficient (DSC) between the manual segmentation by radiologists and the mask generated by VNet was calculated. For lesion characterization, 3,303 female patients with 3,607 pathologically confirmed lesions (2,213 malignant and 1,394 benign lesions) were used for the three ResNet-based characterization models (2 single-input and 1 multi-input models). Histopathology was used as the diagnostic criterion standard to assess the characterization performance of the models and the radiologists, in terms of sensitivity, specificity, accuracy, and AUC. Additional 123 patients with 136 lesions (81 malignant and 55 benign lesions) from another institution were available for external testing.
Results: Of 5,811 patients included in the study, the mean age was 46.14 (range 11-89) years. In the segmentation task, a DSC of 0.860 was obtained by the VNet-generated mask. In the characterization task, the AUCs of the multi-input and the other two single-input models were 0.927, 0.821, and 0.795 respectively. Compared to the single-input DWI or DCE model, the multi-input DCE & DWI model obtained a significant increase in sensitivity, specificity, and accuracy (0.831 vs 0.772/0.776, 0.874 vs 0.630/0.709, 0.846 vs 0.721/0.752). Furthermore, the specificity of the multi-input model was higher than that of radiologists, whether using BIRADS category 3 or 4 as a cut-off point (0.874 vs 0.404/0.841), and the accuracy was intermediate between the two assessment methods (0.846 vs 0.773/0.882). For the external testing, the performance of the three models remained robust with AUCs of 0.812, 0.831, and 0.885 respectively. 
Conclusions: Combining DCE with DWI was superior to applying a single sequence for breast lesion characterization. The deep learning model we developed significantly improved specificity and achieved comparable accuracy to the radiologists with promise for clinical application to provide preliminary diagnoses.