AUTHOR=An Ulzee , Bhardwaj Ankit , Shameer Khader , Subramanian Lakshminarayanan TITLE=High Precision Mammography Lesion Identification From Imprecise Medical Annotations JOURNAL=Frontiers in Big Data VOLUME=Volume 4 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2021.742779 DOI=10.3389/fdata.2021.742779 ISSN=2624-909X ABSTRACT=Breast cancer screening using Mammography serves as the earliest de- fense against breast cancer, revealing anomalous tissue years before it can be detected through physical screening. Despite the use of high resolu- tion radiography, the presence of densely overlapping patterns challenges the consistency of human-driven diagnosis and drives interest in leverag- ing state-of-art localization ability of deep convolutional neural networks (DCNN). The growing availability of digitized clinical archives enables the training of deep segmentation models, but training using the most widely available form of coarse hand-drawn annotations works against learning the precise boundary of cancerous tissue in evaluation, while producing results that are more aligned with the annotations rather than the un- derlying lesions. The expense of collecting high quality pixel-level data in the field of medical science makes this even more difficult. To surmount this fundamental challenge, we propose LatentCADx, a deep learning seg- mentation model capable of precisely annotating cancer lesions underlying hand-drawn annotations, which we procedurally obtain using joint classi- fication training and a strict segmentation penalty. We demonstrate the capability of LatentCADx on a publicly available dataset of 2620 Mam- mogram case files, where LatentCADx obtains classification ROC of 0.97, AP of 0.87, and segmentation AP of 0.75 (IOU=0.5), giving compara- ble or better performance than other models. Qualitative and precision evaluation of LatentCADx annotations on validation samples reveals that LatentCADx increases the specificity of segmentations beyond that of ex- isting models trained on hand-drawn annotations, with pixel level speci- ficity reaching a staggering value of 0.90. It also obtains sharp boundary around lesions unlike other methods, reducing the confused pixels in the output by more than 60%.