<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3-mathml3.dtd">
<article article-type="research-article" dtd-version="1.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Bioinform.</journal-id>
<journal-title-group>
<journal-title>Frontiers in Bioinformatics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Bioinform.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2673-7647</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1609004</article-id>
<article-id pub-id-type="doi">10.3389/fbinf.2025.1609004</article-id>
<article-version article-version-type="Version of Record" vocab="NISO-RP-8-2008"/>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Original Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Analysis of breast region segmentation in thermal images using U-Net deep neural network variants</article-title>
<alt-title alt-title-type="left-running-head">Rosli et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fbinf.2025.1609004">10.3389/fbinf.2025.1609004</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Rosli</surname>
<given-names>Rafhanah Shazwani</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn1">
<sup>&#x2020;</sup>
</xref>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; original draft" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-original-draft/">Writing &#x2013; original draft</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Project administration" vocab-term-identifier="https://credit.niso.org/contributor-roles/project-administration/">Project administration</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Formal analysis" vocab-term-identifier="https://credit.niso.org/contributor-roles/formal-analysis/">Formal Analysis</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Visualization" vocab-term-identifier="https://credit.niso.org/contributor-roles/visualization/">Visualization</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Data curation" vocab-term-identifier="https://credit.niso.org/contributor-roles/data-curation/">Data curation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Validation" vocab-term-identifier="https://credit.niso.org/contributor-roles/validation/">Validation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Methodology" vocab-term-identifier="https://credit.niso.org/contributor-roles/methodology/">Methodology</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Investigation" vocab-term-identifier="https://credit.niso.org/contributor-roles/investigation/">Investigation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Software" vocab-term-identifier="https://credit.niso.org/contributor-roles/software/">Software</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; review &#x26; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/">Writing &#x2013; review and editing</role>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Habaebi</surname>
<given-names>Mohamed Hadi</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/3031308"/>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Conceptualization" vocab-term-identifier="https://credit.niso.org/contributor-roles/conceptualization/">Conceptualization</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Resources" vocab-term-identifier="https://credit.niso.org/contributor-roles/resources/">Resources</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Investigation" vocab-term-identifier="https://credit.niso.org/contributor-roles/investigation/">Investigation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Validation" vocab-term-identifier="https://credit.niso.org/contributor-roles/validation/">Validation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Supervision" vocab-term-identifier="https://credit.niso.org/contributor-roles/supervision/">Supervision</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; review &#x26; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/">Writing &#x2013; review and editing</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Methodology" vocab-term-identifier="https://credit.niso.org/contributor-roles/methodology/">Methodology</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; original draft" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-original-draft/">Writing &#x2013; original draft</role>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Islam</surname>
<given-names>Md Rafiqul</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1003614"/>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; review &#x26; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/">Writing &#x2013; review and editing</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; original draft" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-original-draft/">Writing &#x2013; original draft</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Validation" vocab-term-identifier="https://credit.niso.org/contributor-roles/validation/">Validation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Supervision" vocab-term-identifier="https://credit.niso.org/contributor-roles/supervision/">Supervision</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Resources" vocab-term-identifier="https://credit.niso.org/contributor-roles/resources/">Resources</role>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Al Hussaini</surname>
<given-names>Mohammed Abdulla Salim</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/3197797"/>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; review &#x26; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/">Writing &#x2013; review and editing</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Software" vocab-term-identifier="https://credit.niso.org/contributor-roles/software/">Software</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Resources" vocab-term-identifier="https://credit.niso.org/contributor-roles/resources/">Resources</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Funding acquisition" vocab-term-identifier="https://credit.niso.org/contributor-roles/funding-acquisition/">Funding acquisition</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; original draft" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-original-draft/">Writing &#x2013; original draft</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Supervision" vocab-term-identifier="https://credit.niso.org/contributor-roles/supervision/">Supervision</role>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
<institution>IoT and Wireless Communication Protocols Laboratory, Department of Electrical and Computer Engineering, International Islamic University Malaysia (IIUM)</institution>, <city>Kuala Lumpur</city>, <country country="MY">Malaysia</country>
</aff>
<aff id="aff2">
<label>2</label>
<institution>Faculty of Computer Studies, Arab Open University (AOU)</institution>, <city>Muscat</city>, <country country="OM">Oman</country>
</aff>
<author-notes>
<corresp id="c001">
<label>&#x2a;</label>Correspondence: Mohamed Hadi Habaebi, <email xlink:href="mailto:habaebi@iium.edu.my">habaebi@iium.edu.my</email>
</corresp>
<fn fn-type="other" id="fn1">
<label>
<sup>&#x2020;</sup>
</label>
<p>ORCID: Rafhanah Shazwani Rosli, <ext-link ext-link-type="uri" xlink:href="http://orcid.org/0000-0002-1508-5022">orcid.org/0000-0003-2378-3776</ext-link>
</p>
</fn>
</author-notes>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2025-10-10">
<day>10</day>
<month>10</month>
<year>2025</year>
</pub-date>
<pub-date publication-format="electronic" date-type="collection">
<year>2025</year>
</pub-date>
<volume>5</volume>
<elocation-id>1609004</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>04</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>09</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2025 Rosli, Habaebi, Islam and Al Hussaini.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Rosli, Habaebi, Islam and Al Hussaini</copyright-holder>
<license>
<ali:license_ref start_date="2025-10-10">https://creativecommons.org/licenses/by/4.0/</ali:license_ref>
<license-p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License (CC BY)</ext-link>. The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</license-p>
</license>
</permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>Breast cancer detection using thermal imaging relies on accurate segmentation of the breast region from adjacent body areas. Reliable segmentation is essential to improve the effectiveness of computer-aided diagnosis systems.</p>
</sec>
<sec>
<title>Methods</title>
<p>This study evaluated three segmentation models&#x2014;U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;&#x2014;using five optimization algorithms (ADAM, NADAM, RMSPROP, SGDM, and ADADELTA). Performance was assessed through k-fold cross-validation with metrics including Intersection over Union (IoU), Dice coefficient, precision, recall, sensitivity, specificity, pixel accuracy, ROC-AUC, PR-AUC, and Grad-CAM heatmaps for qualitative analysis.</p>
</sec>
<sec>
<title>Results</title>
<p>The ADAM optimizer consistently outperformed the others, yielding superior accuracy and reduced loss. Among the models, the baseline U-Net, despite being less complex, demonstrated the most effective performance, with precision of 0.9721, recall of 0.9559, specificity of 0.9801, ROC-AUC of 0.9680, and PR-AUC of 0.9472. U-Net also achieved higher robustness in breast region overlap and noise handling compared to its more complex variants. The findings indicate that greater architectural complexity does not necessarily lead to improved outcomes.</p>
</sec>
<sec>
<title>Discussion</title>
<p>This research highlights that the original U-Net, when trained with the ADAM optimizer, remains highly effective for breast region segmentation in thermal images. The insights contribute to guiding the selection of suitable deep learning models and optimizers for medical image analysis, with the potential to enhance the efficiency and accuracy of breast cancer diagnosis using thermal imaging.</p>
</sec>
</abstract>
<kwd-group>
<kwd>breast region segmentation</kwd>
<kwd>thermal images</kwd>
<kwd>thermography</kwd>
<kwd>deep learning</kwd>
<kwd>deep neural network</kwd>
<kwd>artificial intelligence</kwd>
<kwd>U-Net</kwd>
<kwd>U-Net with spatial attention</kwd>
</kwd-group>
<funding-group>
<funding-statement>The author(s) declare that no financial support was received for the research and/or publication of this article.</funding-statement>
</funding-group>
<counts>
<fig-count count="18"/>
<table-count count="11"/>
<equation-count count="27"/>
<ref-count count="41"/>
<page-count count="26"/>
</counts>
<custom-meta-group>
<custom-meta>
<meta-name>section-in-acceptance</meta-name>
<meta-value>Computational BioImaging</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<label>1</label>
<title>Introduction</title>
<p>Breast cancer remains a global health concern, underscoring the critical importance of early detection for improved patient prognosis (<xref ref-type="bibr" rid="B33">Sung et al., 2021</xref>). Recent advancements in medical imaging, particularly thermal imaging, offer potential for enhancing early detection capabilities (<xref ref-type="bibr" rid="B3">Allugunti, 2022</xref>). However, the effectiveness of these technologies relies heavily on the precision of image segmentation, particularly in isolating the breast region from surrounding anatomical structures (<xref ref-type="bibr" rid="B8">Dafni Rose et al., 2022</xref>). This study addresses the pressing need for accurate and efficient breast region segmentation in thermal images, with the overarching goal of advancing early breast cancer detection.</p>
<p>The motivation for this research stems from the recognition that thermal imaging holds promise in detecting breast cancer early, and its success hinges on the precision of the segmentation process. To optimize thermal imaging pre-processing, we focus on leveraging advanced deep learning techniques, specifically U-Net variants. U-Net&#x2019;s symmetrical expansive pathway proves advantageous, enabling precise delineation of intricate boundaries, a crucial requirement in medical imaging (<xref ref-type="bibr" rid="B41">Zhou et al., 2018</xref>). The decision to employ U-Net variants is informed by their efficiency, precision, and adaptability, especially in the challenging task of segmenting the breast region in thermal images.</p>
<p>In contrast to alternative models like SegNet, DeepLabv3&#x2b;, Mask R-CNN, and EfficientNet, U-Net variants demonstrate superior efficiency and adaptability for sparse data, making them a preferred choice for this study (<xref ref-type="bibr" rid="B5">Badrinarayanan et al., 2017</xref>). DeepLabv3&#x2b; and Mask R-CNN, while powerful, pose challenges such as larger training datasets and substantial computational loads, limiting their suitability for our specific application (<xref ref-type="bibr" rid="B7">Chen et al., 2018</xref>; <xref ref-type="bibr" rid="B15">He et al., 2017</xref>). The adoption of U-Net variants is poised to significantly enhance the accuracy and efficiency of breast region segmentation, aligning with the objectives of this research (<xref ref-type="bibr" rid="B35">Tan and Le, 2019</xref>).</p>
<p>Breast region segmentation in thermal images involves distinguishing the breast area from surrounding body parts, a complex task given variations in size, shape, and orientation across individuals (<xref ref-type="bibr" rid="B32">Soomro et al., 2022</xref>). Several deep learning models, including U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; (Nested U-Net), have shown promise in image segmentation but have not been thoroughly explored for breast region segmentation in thermal images (<xref ref-type="bibr" rid="B4">Azad et al., 2022</xref>; <xref ref-type="bibr" rid="B26">Radhi and Kamil, 2023</xref>; <xref ref-type="bibr" rid="B24">Punn and Agarwal, 2022</xref>; <xref ref-type="bibr" rid="B18">Liu et al., 2022</xref>; <xref ref-type="bibr" rid="B12">Gu et al., 2022</xref>; <xref ref-type="bibr" rid="B16">Islam Sumon et al., 2023</xref>; <xref ref-type="bibr" rid="B38">Yin et al., 2022</xref>; <xref ref-type="bibr" rid="B21">Micallef et al., 2021</xref>; <xref ref-type="bibr" rid="B23">Mokhtar et al., 2023</xref>; <xref ref-type="bibr" rid="B11">Gargari et al., 2022</xref>; <xref ref-type="bibr" rid="B40">Zhao et al., 2022</xref>). This study not only evaluates the performance of these models but also conducts a comprehensive comparison of different optimization algorithms, recognizing the optimizer&#x2019;s pivotal role in training deep learning models.</p>
<p>By systematically evaluating various optimizers and identifying the most effective one for training segmentation models, this study aims to provide a holistic assessment of the segmentation task. The research presents a comprehensive evaluation of U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; for breast region segmentation in thermal images, coupled with a thorough comparison of different optimizers. The insights generated from this study are poised to contribute significantly to the advancement of early breast cancer detection technologies, benefiting researchers and practitioners in the fields of medical diagnostics and artificial intelligence.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related work</title>
<p>Breast region segmentation in thermal images has emerged as a pivotal area of research, given its potential in breast cancer detection. Diverse methodologies, ranging from conventional image processing techniques to cutting-edge deep learning models, have been proposed to improve the precision and efficiency of segmentation. The significance of deep learning methodologies, particularly their potential to bring beneficial effects in enhancing computer-aided medical diagnosis, is emphasized in (<xref ref-type="bibr" rid="B2">Al Husaini et al., 2023</xref>).</p>
<p>In a study employing Distance-based Metrics and High-Temperature Region-based Adaptive Thresholding (DM-HTRAT) (<xref ref-type="bibr" rid="B37">Venkatachalam et al., 2023</xref>), an accuracy of 96.5% in breast boundary segmentation was achieved, contributing to more reliable and effective detection of breast abnormalities. However, limitations include susceptibility to unclear boundaries, a low signal-to-noise ratio, and poor contrast in thermal images.</p>
<p>Another study proposed an automatic segmentation algorithm (<xref ref-type="bibr" rid="B1">Adel et al., 2018</xref>) that successfully segmented all types of breasts with an accuracy of 98.73%. While demonstrating faster runtimes than the Hough transform, challenges may arise in real-time applications requiring instantaneous results.</p>
<p>A comprehensive review of various image processing techniques for automatic segmentation of clinically significant Regions of Interest (ROIs) emphasized the importance of automated segmentation for fast and reproducible analysis (<xref ref-type="bibr" rid="B31">Singh and Arora, 2020</xref>). The review also highlighted the potential of deep learning for effective computer-aided medical diagnosis, acknowledging the limitations of human-based diagnoses influenced by factors such as narcissus effect, negligence, visual exhaustion, and mental workload.</p>
<p>A proposed methodology relying on local analysis to mitigate the impact of global noise achieved a new alternative for automatic segmentation of thermal breast images with 77.3% accuracy (<xref ref-type="bibr" rid="B28">S&#xe1;nchez-Ruiz et al., 2018</xref>). However, errors were observed in images with low contrast in the breast region and those depicting amorphous breast structures.</p>
<p>Autoencoder-like convolutional and deconvolutional neural networks (C-DCNN) demonstrated the capability to learn essential features of breast regions and delineate them in thermal images (<xref ref-type="bibr" rid="B13">Guan et al., 2018</xref>). The study suggested a need for an improved evaluation metric to effectively assess the quality of the breast segmentation model.</p>
<p>The MultiResUnet deep-learning segmentation model exhibited an average accuracy of 91.47%, surpassing the autoencoder by about 2% (<xref ref-type="bibr" rid="B19">Lou et al., 2019</xref>). However, limitations in small breast segmentation, IoU errors, data augmentation, and manual challenges were identified, suggesting areas for improvement.</p>
<p>Utilizing Genetic Algorithms (GA) with a fitness function based on cardioids, a method successfully separated the breast region in 52 out of 58 images without manual seed point selection (<xref ref-type="bibr" rid="B20">Mendes et al., 2020</xref>). However, challenges were faced with ellipse techniques and metallic markers, and the algorithm required 60 s for optimal results.</p>
<p>U-Net Convolutional Neural Networks demonstrated efficiency for Region of Interest (ROI) segmentation, achieving an accuracy of 98.24% over frontal views and 93.6% over lateral views (<xref ref-type="bibr" rid="B6">Carlos de Carvalho et al., 2023</xref>). Notably, the efficacy of the method decreased when applied to lateral views.</p>
<p>A study incorporating Vector Pooling Block (VPB) and AVG-MAX VPB in Convolutional Neural Networks (U-Net, AlexNet, ResNet18, GoogleNet) achieved impressive results, including a global accuracy of 99.2% (<xref ref-type="bibr" rid="B22">Mohamed et al., 2022</xref>). However, the study noted the need for more efficient exploration of the pooling layer&#x2019;s effect in Convolutional Neural Networks (CNNs) within the existing literature.</p>
</sec>
<sec sec-type="materials|methods" id="s3">
<label>3</label>
<title>Materials and methods</title>
<p>This section details the methodological framework adopted in this study, encompassing the dataset acquisition and preparation, model architectures, experimental setup, and the subsequent training and evaluation processes.</p>
<sec id="s3-1">
<label>3.1</label>
<title>Dataset acquisition and preparation</title>
<p>The DMR-IR database is a publicly accessible repository containing multimodal breast examination data, including infrared thermography, digital mammography, and clinical records. For this study, only the frontal thermal images of 130 patients were used, acquired under the First Static Protocol to ensure standardized conditions. Patients included both healthy controls and individuals with benign and malignant breast lesions, thereby introducing variability essential for robust model evaluation. The infrared images were captured using a FLIR SC620 camera, with a sensitivity of &#x3c;0.04 &#xb0;C and a temperature range of 40 &#xb0;C&#x2013;500 &#xb0;C, at a resolution of 640 &#xd7; 480 pixels. To minimize external variability, all acquisitions followed a controlled clinical protocol, where patients were acclimatized in a room maintained at 20 &#xb0;C&#x2013;23 &#xb0;C for 15 min before imaging. Manual annotations were performed by the study authors to generate ground-truth masks, with cross-verification among annotators to reduce bias. While not performed by certified radiologists, this procedure was explicitly designed for experimental, non-clinical purposes. The use of the DMR-IR database ensures transparency and reproducibility, as the dataset is publicly available online (<ext-link ext-link-type="uri" xlink:href="http://visual.ic.uff.br/dmi">http://visual.ic.uff.br/dmi</ext-link>), allowing independent research.</p>
<p>In this study, the dataset utilized for experimentation was obtained from an accessible online database. The main objective of this research revolves around the comparative analysis of three U-Net deep neural network variants. Therefore, leveraging a pre-existing dataset rather than dedicating resources to the creation of a new one is a strategic decision which allowed this study to focus on the main objective.</p>
<sec id="s3-1-1">
<label>3.1.1</label>
<title>Data acquisition</title>
<p>A collection of thermal breast images from 130 patients was acquired from the Database for Breast Research with Infrared Image (DMR-IR) as presented by (<xref ref-type="bibr" rid="B30">Silv et al., 2014</xref>). Specifically, frontal breast thermal images captured under the First Static Protocol. The DMI-IR incorporates infrared images, digitalized mammograms, and clinical data acquired from Ant&#xf4;nio Pedro University Hospital patients. These patients come from the screening department and the gynecology department. The DMI-IR contains data on healthy patients as well as patients with breast diseases, including cancer.</p>
<p>The infrared images, henceforth referred to as &#x2018;thermal images&#x2019; were obtained by a FLIR thermal camera, model SC620, with a sensitivity of less than 0.04 &#xb0;C and a capture standard of 40 &#xb0;C&#x2013;500 &#xb0;C. The pixel dimensions of the infrared images are 640 &#xd7; 480. The procurement of the images and their use in research have been approved by the hospital&#x2019;s Ethical Committee and registered with the Brazilian Ministry of Health under the following CAAE number: 01042812.0.0000.5243. The DMR-IR is accessible via an online user-friendly interface (<ext-link ext-link-type="uri" xlink:href="http://visual.ic.uff.br/dmi">http://visual.ic.uff.br/dmi</ext-link>) for managing and retrieving data from breast examinations and clinical data from voluntary patients.</p>
</sec>
<sec id="s3-1-2">
<label>3.1.2</label>
<title>Annotation and mask generation</title>
<p>Manual annotations were performed by the authors to prepare masks for thermal images. The breast region in each image was delineated, creating a mask that highlights the breast area and excludes other regions like the armpit, neck, and lower chest. The masks served as the ground truth for training the segmentation models. It is important to highlight that the annotation was performed only for the purpose of experimentation, and not for the use in clinical setup, as it was not performed by a certified technician. <xref ref-type="fig" rid="F1">Figure 1</xref> shows the screenshot of annotation of breast area from breast thermal image.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Annotation of breast area on breast thermal images.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g001.tif">
<alt-text content-type="machine-generated">Thermal image analysis interface displaying a breast with a highlighted polygon marking the breast area in green. Side panel includes tools for annotations such as rect, point, line, and polygon. Image thumbnails are shown on the left side.</alt-text>
</graphic>
</fig>
<p>Once the annotations are completed, they are exported in a VGG JSON format, which is a structured format to represent these annotations. After obtaining the annotations in the VGG JSON format, it is then used to generate binary masks for each annotated image. A binary mask is a black and white representation where the regions of interest are shown in white, and everything else is black. A sample of an unsegmented breast thermal image and its corresponding binary mask is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>
<bold>(a)</bold> Breast thermal image; <bold>(b)</bold> Corresponding binary mask.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g002.tif">
<alt-text content-type="machine-generated">Panel (a) shows an infrared thermographic image of a human torso, with thermal variations visible. Panel (b) presents a corresponding binary mask highlighting the breast areas in white against a black background.</alt-text>
</graphic>
</fig>
</sec>
<sec id="s3-1-3">
<label>3.1.3</label>
<title>Data preprocessing</title>
<p>Data preprocessing was conducted on each thermal image and its corresponding binary mask prior to the segmentation process. The process involved adjusting the size of the images to a consistent dimension of 256x256 pixels, standardizing the pixel values to fall within the range of 0&#x2013;1, and producing diverse versions of data augmentation from both the images and masks. The inclusion of this data augmentation step enhances the model&#x2019;s ability to generalize by introducing greater diversity into the training dataset.</p>
<p>The following transformations were conducted on both the images and their associated binary masks.<list list-type="simple">
<list-item>
<p>a. Rotation: To accommodate diverse breast orientations, images undergo random rotations of up to 20&#xb0;.</p>
</list-item>
<list-item>
<p>b. Width and Height Shifts: Images are shifted both horizontally and vertically by a maximum of 10% of their respective dimensions, aiding the model in identifying off-center region.</p>
</list-item>
<list-item>
<p>c. Shear Transformation: The images are slanted with an intensity of up to 0.2, introducing a skewing effect.</p>
</list-item>
<list-item>
<p>d. Zooming: Random zooming in or out of images by a factor of up to 20%, helping the model adapt to breast region of different scales.</p>
</list-item>
<list-item>
<p>e. Flipping: Images are flipped both horizontally and vertically, useful for datasets where breast orientation is not consistent.</p>
</list-item>
<list-item>
<p>f. Pixel Fill Strategy: After transformations like rotation or shifts, new pixels were created. The &#x2018;reflect&#x2019; strategy is used to mirror the edge pixels of the image.</p>
</list-item>
</list>
</p>
</sec>
</sec>
<sec id="s3-2">
<label>3.2</label>
<title>Model architectures</title>
<p>To improve breast region segmentation in thermal images, this study evaluated three distinct deep learning architectures. Each of these models is recognized for their image segmentation ability.</p>
<sec id="s3-2-1">
<label>3.2.1</label>
<title>U-net</title>
<p>U-Net, as introduced in paper (<xref ref-type="bibr" rid="B27">Ronneberger et al., 2015</xref>), is a deep learning architecture specifically designed for biomedical image segmentation. To address the challenge of effectively training deep neural networks with a limited number of annotated samples, the authors proposed a data augmentation-based approach. U-Net&#x2019;s architecture includes a contracting path for context assimilation and an expanding path for granular localization. Despite training on a limited image dataset, U-Net outperformed previous methods. The network&#x2019;s structure consists of 23 convolutional layers. The U-Net model for the base resolution of 32x32 pixels is depicted in <xref ref-type="fig" rid="F3">Figure 3</xref>. Each blue rectangle in this diagram represents a multichannel feature map, with the channel count indicated atop each rectangle and the x-y dimension indicated at its lower left. The duplicated feature maps depicted in white are denoted by arrows, which represent various operations.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>U-net model (<xref ref-type="bibr" rid="B27">Ronneberger et al., 2015</xref>).</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g003.tif">
<alt-text content-type="machine-generated">Diagram of a U-Net architecture for image segmentation, showing the input image tile leading to the output segmentation map. It illustrates the path of operations including convolutional layers with 3x3 convolutions, ReLU activation, max pooling with 2x2 strides, up-convolutions, and 1x1 convolutions. The layers are organized in a U-shape, showing contracting and expansive paths with arrows indicating layer transitions. Layer dimensions are provided at each step, with color-coded arrows signifying different operations.</alt-text>
</graphic>
</fig>
</sec>
<sec id="s3-2-2">
<label>3.2.2</label>
<title>U-net with spatial attention</title>
<p>In the paper (<xref ref-type="bibr" rid="B14">Guo et al., 2021</xref>), a network with reduced computational complexity known as Spatial Attention U-Net (SA-UNet) has been introduced. This network does not require a large number of annotated training samples. Alternatively, it can be utilized in a data augmentation methodology to optimize the utilization of the existing annotated samples. One notable characteristic of SA-UNet is its integration of a spatial attention module. The attention map along the spatial dimension is inferred by this module, then multiplied with the input feature map to enable adaptive feature refinement. Furthermore, to mitigate the issue of overfitting, the neural network utilizes structured dropout convolutional blocks as a substitute for the original convolutional blocks found in the U-Net architecture. <xref ref-type="fig" rid="F4">Figure 4</xref> illustrates the SA-UNet model, which consists of a U-shaped encoder on the left side and a decoder on the right side.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>U-Net with Spatial Attention model (<xref ref-type="bibr" rid="B14">Guo et al., 2021</xref>).</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g004.tif">
<alt-text content-type="machine-generated">Diagram of a neural network architecture featuring an encoder-decoder structure with convolutional layers. The input image passes through layers with operations such as 3x3 convolution, DropBlock, Batch Normalization, ReLU, and 1x1 convolution with sigmoid activation. Max pooling and transposed convolution are applied, with arrows indicating data flow, including spatial attention modules.</alt-text>
</graphic>
</fig>
<p>Each stage of the encoder consists of a structured dropout convolutional block and a 2 &#xd7; 2 max pooling operation. In each convolutional block, the convolutional layer is succeeded by a DropBlock, a batch normalization (BN) layer, and a rectified linear unit (ReLU). Subsequently, the max pooling operation is employed to down-sample the data with a stride size of 2. In each down-sampling step, the number of feature channels is doubled. Each step in the decoder involves a 2 &#xd7; 2 transposed convolution operation for up-sampling and reduces the number of feature channels by half. This is followed by concatenation with the corresponding feature map from the encoder, and then a structured dropout convolutional block is applied. The inclusion of a spatial attention module is implemented in the intermediate stage between the encoder and the decoder. In the ultimate layer, the utilization of a 1x1 convolution and the application of the Sigmoid activation function are employed to obtain the resulting segmentation map.</p>
</sec>
<sec id="s3-2-3">
<label>3.2.3</label>
<title>U-Net&#x2b;&#x2b;</title>
<p>The paper (<xref ref-type="bibr" rid="B41">Zhou et al., 2018</xref>) introduces UNet&#x2b;&#x2b;, a powerful medical image segmentation architecture with a deeply-supervised encoder-decoder network. The architecture connects encoder and decoder sub-networks through nested, dense skip pathways, aiming to reduce the semantic gap between feature maps. The optimizer handles easier learning tasks when feature maps from decoder and encoder networks are semantically similar. <xref ref-type="fig" rid="F5">Figure 5a</xref> depicts an overview of the proposed architecture. As can be seen, UNet&#x2b;&#x2b; begins with an encoder sub-network or backbone, which is followed by a decoder sub-network. What differentiates UNet&#x2b;&#x2b; from U-Net (the black components in <xref ref-type="fig" rid="F5">Figure 5a</xref>) are the redesigned skip pathways that connect the two sub-networks (shown in green and blue in <xref ref-type="fig" rid="F5">Figure 5b</xref>) and the use of deep supervision (shown in red in <xref ref-type="fig" rid="F5">Figure 5c</xref>).</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>
<bold>(a)</bold> UNet&#x2b;&#x2b; is a neural network that bridges the semantic gap between encoder and decoder feature maps before fusion. It uses nested dense convolutional blocks to bridge the gap between (X0,0, X1,3). The graphical abstract shows black for original U-Net, green and blue for skip pathways, and red for deep supervision. The components distinguish UNet&#x2b;&#x2b; from U-Net. <bold>(b)</bold> Detailed analysis of the first UNet&#x2b;&#x2b; skip pathway. <bold>(c)</bold> UNet&#x2b;&#x2b; can be pruned at the time of inference if it is trained under intensive supervision (<xref ref-type="bibr" rid="B41">Zhou et al., 2018</xref>).</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g005.tif">
<alt-text content-type="machine-generated">Diagram illustrating U-Net&#x2b;&#x2b; architecture in three parts. (a) Shows the overall structure with layers marked for down-sampling, up-sampling, skip connections, and convolutions. (b) Details the computation for nodes with equations. (c) Displays variations of U-Net&#x2b;&#x2b; at different levels (L&#x2074;, L&#xB3;, L&#xB2;, L&#xB9;) with connections and loss function represented.</alt-text>
</graphic>
</fig>
</sec>
</sec>
<sec id="s3-3">
<label>3.3</label>
<title>Experimental setup</title>
<sec id="s3-3-1">
<label>3.3.1</label>
<title>Software configuration</title>
<p>The experimental setup utilized advanced software tools and frameworks to conduct the research. The primary software components included:<list list-type="simple">
<list-item>
<p>a. Operating System: The experiments were conducted on a system running the latest version of Windows 11, providing a stable and user-friendly environment for the research tasks.</p>
</list-item>
<list-item>
<p>b. Deep Learning Frameworks: State-of-the-art deep learning frameworks such as TensorFlow and Keras were employed for model development, training, and evaluation. These frameworks offered a rich set of functionalities, making it possible to implement complex neural network architectures and algorithms efficiently.</p>
</list-item>
<list-item>
<p>c. Image Processing Libraries: OpenCV, a powerful open-source computer vision library, was employed for various image processing tasks. It provided essential tools for image manipulation, feature extraction, and visualization, crucial for preprocessing thermal images and analyzing the results.</p>
</list-item>
<list-item>
<p>d. Data Management: Python libraries like NumPy and Pandas were utilized for efficient data manipulation and analysis. NumPy facilitated numerical operations, while Pandas allowed structured data handling, enabling seamless organization and processing of experimental data.</p>
</list-item>
<list-item>
<p>e. Visualization: Matplotlib, a versatile plotting library in Python, was used for generating visualizations such as graphs, charts, and figures. It played a vital role in presenting experimental results and analyzing trends in the data.</p>
</list-item>
</list>
</p>
</sec>
<sec id="s3-3-2">
<label>3.3.2</label>
<title>Hardware configuration</title>
<p>The experimental setup was supported by robust hardware configurations, ensuring efficient computation and data processing. The key components of the hardware setup included:<list list-type="simple">
<list-item>
<p>a. Processor: An Intel 13th Gen Core i9-13900HX processor with a base clock speed of 2.20 GHz provides substantial computing power. Its high processing capabilities enabled swift execution of complex algorithms and simulations.</p>
</list-item>
<list-item>
<p>b. Memory: The system was equipped with 32 GB of RAM, allowing for the seamless handling of large datasets and resource-intensive deep learning tasks. The ample memory capacity facilitated smooth multitasking and efficient training of neural networks.</p>
</list-item>
<list-item>
<p>c. Graphics Processing Unit (GPU): The experimental setup featured an NVIDIA GeForce RTX 4080 Laptop GPU. This high-performance GPU accelerated deep learning computations, enabling the training of complex neural networks and the execution of computationally intensive tasks.</p>
</list-item>
</list>
</p>
</sec>
</sec>
<sec id="s3-4">
<label>3.4</label>
<title>Model training and evaluation</title>
<p>The training and evaluation of the U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; models were conducted based on the flowchart of the algorithm as illustrated in <xref ref-type="fig" rid="F6">Figure 6</xref>.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Flowchart for the model&#x2019;s training and evaluation algorithm.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g006.tif">
<alt-text content-type="machine-generated">Flowchart illustrating the process for developing a pre-trained model: start, load thermal images, data augmentation, split data into training and validation sets, compile model, train model with callbacks, evaluate model, save as pre-trained model, end.</alt-text>
</graphic>
</fig>
<p>The algorithm begins by loading grayscale images and its corresponding true masks from specified directories, resizes, and normalizes their pixel values. To augment the dataset, the images and masks are subjected to various transformations as explained in Section 4.1.3, thereby introducing variability into the training data. A stratified split approach was adopted to split the data into training and validation sets, with 20% of the augmented dataset reserved for validation. This is specifically for experimental comparison between models. This allocation, while not subjected to active experimentation, was designed to ensure a balanced representation of diverse classes in both training and validation sets. The model is then initialized based on the specific model type (U-Net, U-Net with Spatial Attention, or U-Net&#x2b;&#x2b;). The model is compiled using the binary cross-entropy loss function and accuracy metric, crucial for measuring segmentation precision.</p>
<p>Five different optimizers were evaluated comparatively: ADAM, NADAM, RMSPROP, SGDM, and ADADELTA. The evaluation of the specific optimizers was based on their widespread usage and documented effectiveness in various deep learning applications, particularly in image segmentation tasks. The mathematical equations that describe how the optimizers update the model weights during training are as follows.</p>
<sec id="s3-4-1">
<label>3.4.1</label>
<title>ADAM (adaptive moment estimation)</title>
<p>ADAM combines the advantages of both momentum-based optimization and RMSProp. It maintains adaptive learning rates for each parameter and keeps an exponentially decaying average of past gradients. The <xref ref-type="disp-formula" rid="e1">Equations 1</xref>&#x2013;<xref ref-type="disp-formula" rid="e5">5</xref> for ADAM, as described by (<xref ref-type="bibr" rid="B17">Kingma and Ba, 2014</xref>), are as follows:<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>
<disp-formula id="e2">
<mml:math id="m2">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xb7;</mml:mo>
<mml:msubsup>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
<disp-formula id="e3">
<mml:math id="m3">
<mml:mrow>
<mml:msubsup>
<mml:mi>m</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi>t</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
<disp-formula id="e4">
<mml:math id="m4">
<mml:mrow>
<mml:msubsup>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>2</mml:mn>
<mml:mi>t</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
<disp-formula id="e5">
<mml:math id="m5">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msubsup>
<mml:mi>m</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:msubsup>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:msqrt>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>Where:</p>
<p>
<inline-formula id="inf1">
<mml:math id="m6">
<mml:mrow>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf2">
<mml:math id="m7">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the first and second moments estimates,</p>
<p> <inline-formula id="inf3">
<mml:math id="m8">
<mml:mrow>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the gradient at time step t,</p>
<p> <inline-formula id="inf4">
<mml:math id="m9">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf5">
<mml:math id="m10">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are exponential decay rates for the moment estimates,</p>
<p> <inline-formula id="inf6">
<mml:math id="m11">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the learning rate,</p>
<p> <inline-formula id="inf7">
<mml:math id="m12">
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is a small constant added to prevent division by zero.</p>
</sec>
<sec id="s3-4-2">
<label>3.4.2</label>
<title>NADAM (nesterov-adam)</title>
<p>NADAM optimizer combines Nesterov&#x2019;s accelerated gradient with the benefits of ADAM. It uses the same equations as ADAM but with Nesterov&#x2019;s momentum applied to the gradients before calculating <inline-formula id="inf8">
<mml:math id="m13">
<mml:mrow>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. In standard momentum, the update rule for a parameter <inline-formula id="inf9">
<mml:math id="m14">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is given by <xref ref-type="disp-formula" rid="e6">Equations 6</xref>, <xref ref-type="disp-formula" rid="e7">7</xref>:<disp-formula id="e6">
<mml:math id="m15">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">&#x3b2;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="normal">&#x3b1;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:mo>&#x2207;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>
<disp-formula id="e7">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>Where:</p>
<p>
<inline-formula id="inf10">
<mml:math id="m17">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the learning rate,</p>
<p> <inline-formula id="inf11">
<mml:math id="m18">
<mml:mrow>
<mml:mo>&#x2207;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the gradient of the objective function at the predicted future position, </p>
<p>
<inline-formula id="inf12">
<mml:math id="m19">
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the momentum parameter.</p>
<p>Nesterov momentum modifies this approach by calculating the gradient at a &#x201c;lookahead&#x201d; position (<xref ref-type="bibr" rid="B10">Dozat, 2016</xref>) as given by <xref ref-type="disp-formula" rid="e8">Equations 8</xref>&#x2013;<xref ref-type="disp-formula" rid="e10">10</xref>.<disp-formula id="e8">
<mml:math id="m20">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mtext>lookahead</mml:mtext>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>
<disp-formula id="e9">
<mml:math id="m21">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:mo>&#x2207;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mtext>lookahead</mml:mtext>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>
<disp-formula id="e10">
<mml:math id="m22">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>
</p>
</sec>
<sec id="s3-4-3">
<label>3.4.3</label>
<title>RMSPROP (root mean square propagation)</title>
<p>RMSPROP adapts the learning rates for each parameter based on the average of recent magnitudes of the gradients. It prevents vanishing or exploding gradients by scaling the gradients with a moving average of their squared values, as captured by <xref ref-type="disp-formula" rid="e11">Equations 11</xref>, <xref ref-type="disp-formula" rid="e12">12</xref> (<xref ref-type="bibr" rid="B36">Tieleman and Hinton, 2012</xref>).<disp-formula id="e11">
<mml:math id="m23">
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:mi>E</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xb7;</mml:mo>
<mml:msubsup>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
<label>(11)</label>
</disp-formula>
<disp-formula id="e12">
<mml:math id="m24">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
</mml:msqrt>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(12)</label>
</disp-formula>
</p>
<p>Where:</p>
<p>
<inline-formula id="inf13">
<mml:math id="m25">
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the moving average of squared gradients, </p>
<p>
<inline-formula id="inf14">
<mml:math id="m26">
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the decay rate for the moving average, </p>
<p>
<inline-formula id="inf15">
<mml:math id="m27">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the learning rate, </p>
<p>
<inline-formula id="inf16">
<mml:math id="m28">
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is a small constant added to prevent division by zero.</p>
</sec>
<sec id="s3-4-4">
<label>3.4.4</label>
<title>SGDM (stochastic gradient descent with momentum)</title>
<p>SGDM incorporates momentum, allowing the optimizer to accumulate velocity and dampens oscillations. The momentum term helps the optimizer traverse through local minima more effectively. The SGDM <xref ref-type="disp-formula" rid="e13">Equations 13</xref>, <xref ref-type="disp-formula" rid="e14">14</xref> are derived based on the concept of accumulated gradients (<xref ref-type="bibr" rid="B25">Qian, 1999</xref>).<disp-formula id="e13">
<mml:math id="m29">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(13)</label>
</disp-formula>
<disp-formula id="e14">
<mml:math id="m30">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(14)</label>
</disp-formula>
</p>
<p>Where:</p>
<p>
<inline-formula id="inf17">
<mml:math id="m31">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the velocity or momentum term, </p>
<p>
<inline-formula id="inf18">
<mml:math id="m32">
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the momentum coefficient, </p>
<p>
<inline-formula id="inf19">
<mml:math id="m33">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the learning rate, </p>
<p>
<inline-formula id="inf20">
<mml:math id="m34">
<mml:mrow>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the gradient at time step t.</p>
</sec>
<sec id="s3-4-5">
<label>3.4.5</label>
<title>ADADELTA</title>
<p>ADADELTA dynamically adapts the learning rates based on past gradients without the need for manual tuning. It utilizes moving averages of both squared gradients and parameter updates to scale the gradients effectively, as shown in <xref ref-type="disp-formula" rid="e15">Equations 15</xref>&#x2013;<xref ref-type="disp-formula" rid="e17">17</xref> (<xref ref-type="bibr" rid="B39">Zeiler, 2012</xref>).<disp-formula id="e15">
<mml:math id="m35">
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c1;</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:mi>E</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xb7;</mml:mo>
<mml:msubsup>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
<label>(15)</label>
</disp-formula>
<disp-formula id="e16">
<mml:math id="m36">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(16)</label>
</disp-formula>
<disp-formula id="e17">
<mml:math id="m37">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(17)</label>
</disp-formula>
</p>
<p>Where:</p>
<p>
<inline-formula id="inf21">
<mml:math id="m38">
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the exponentially decaying average of squared gradients,</p>
<p>
<inline-formula id="inf22">
<mml:math id="m39">
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the decay rate,</p>
<p>
<inline-formula id="inf23">
<mml:math id="m40">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the parameter update.</p>
<p>All three models were trained using each of the five optimizers. The training was carried out in a controlled environment, ensuring the same number of epochs, batch size, and data augmentation techniques. The model is trained using the training data for a total of 30 epochs and a batch size of 20. The choice of 30 epochs was based on preliminary experiments, where we observed that all three models consistently converged within this range without signs of overfitting. Using a fixed number of epochs ensured fairness and comparability across models and optimizers. Moreover, callbacks were implemented to dynamically adjust the learning rate during training. While the dataset size (130 patients) is relatively small, it was chosen due to its availability in the DMR-IR database and the variability it provides across healthy, benign, and malignant cases. This limitation is acknowledged, but the use of data augmentation and k-fold cross-validation helped to mitigate its impact. The number of epochs and batch size, while not subjected to active experimentation, were chosen specifically to facilitate a fair and systematic experimental comparison between the models. Callbacks function, are used to dynamically adjust the learning rate during training based on the validation loss, allowing the model to adapt as it learns. The start and end times of the training are recorded, and the total training time is computed to assess the computational efficiency of the training process. Upon completion of the training, the performance of the model is evaluated on the entire dataset, and the final loss and accuracy are observed. The results are presented in <xref ref-type="sec" rid="s5-1">Section 5.1</xref>.</p>
</sec>
</sec>
<sec id="s3-5">
<label>3.5</label>
<title>Quantitative analysis</title>
<p>Quantitative analysis was conducted among the three deep learning models: U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;. The flowchart in <xref ref-type="fig" rid="F7">Figure 7</xref> outlines the systematic process of conducting k-fold cross-validation analysis for evaluating the three pre-trained segmentation models.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Flowchart of k-fold cross-validation analysis for model&#x2019;s quantitative evaluation.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g007.tif">
<alt-text content-type="machine-generated">Flowchart depicting a machine learning evaluation process starts with loading a pre-trained model and test images with masks. It sets K-Fold cross-validation, splits data, and iterates through each fold. For each validation image, it loads images and masks, predicts masks using the model, and calculates evaluation metrics. Results are visualized, followed by aggregating metrics, computing the mean and standard deviation of metrics, and concludes with an &#x22;End&#x22; step.</alt-text>
</graphic>
</fig>
<p>The evaluation begins by loading the pre-trained model. Test images and their corresponding masks are then loaded, and the dataset is divided into subsets for k-fold cross-validation. Within each fold, the data is further split into training and validation sets. For each validation image, it is resized to match the model&#x2019;s input shape and then preprocessed. The model predicts masks for these images, which are converted into binary format. Various evaluation metrics are calculated, and the results, including the original image, true mask, and predicted mask, are visualized for inspection. After evaluating all validation images in a fold, the metrics are aggregated. Mean and standard deviation of the metrics are computed across all folds. The results are presented in <xref ref-type="sec" rid="s5-2">Section 5.2</xref>. These provide a comprehensive overview of the model&#x2019;s overall performance and its consistency across different subsets of the dataset.</p>
<sec id="s3-5-1">
<label>3.5.1</label>
<title>Evaluation metrics</title>
<p>The following metrics are considered to evaluate the segmentation accuracy of the models, where:</p>
<p>TP (True Positives) are the pixels that are correctly classified as positive, </p>
<p>FP (False Positives) are the pixels that are incorrectly classified as positive, </p>
<p>TN (True Negatives) are the pixels that are correctly classified as negative, </p>
<p>FN (False Negatives) are the pixels that are incorrectly classified as negative.</p>
</sec>
<sec id="s3-5-2">
<label>3.5.2</label>
<title>Intersection over union (IoU)</title>
<p>This metric given in <xref ref-type="disp-formula" rid="e18">Equation 18</xref> evaluates the overlap between the predicted and true masks. A higher IoU indicates better segmentation accuracy.<disp-formula id="e18">
<mml:math id="m41">
<mml:mrow>
<mml:mtext>IoU</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mtext>TP</mml:mtext>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(18)</label>
</disp-formula>
</p>
<sec id="s3-5-2-1">
<label>3.5.2.1</label>
<title>Dice coefficient</title>
<p>The Dice coefficient given in <xref ref-type="disp-formula" rid="e19">Equation 19</xref> is another measure of the overlap between two binary images, which provides insights into the model&#x2019;s precision and sensitivity.<disp-formula id="e19">
<mml:math id="m42">
<mml:mrow>
<mml:mtext>Dice</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mtext>TP</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mtext>TP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(19)</label>
</disp-formula>
</p>
</sec>
<sec id="s3-5-2-2">
<label>3.5.2.2</label>
<title>Precision and recall</title>
<p>Precision quantifies the number of correct positive predictions made by the model, as given by <xref ref-type="disp-formula" rid="e20">Equation 20</xref>, while recall, in <xref ref-type="disp-formula" rid="e20">Equation 20</xref>, measures the model&#x2019;s ability to identify all positive instances.<disp-formula id="e20">
<mml:math id="m43">
<mml:mrow>
<mml:mtext>Precision</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mtext>TP</mml:mtext>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FP</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(20)</label>
</disp-formula>
<disp-formula id="e21">
<mml:math id="m44">
<mml:mrow>
<mml:mtext>Recall</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mtext>TP</mml:mtext>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(21)</label>
</disp-formula>
</p>
</sec>
<sec id="s3-5-2-3">
<label>3.5.2.3</label>
<title>Sensitivity and specificity</title>
<p>Sensitivity in <xref ref-type="disp-formula" rid="e22">Equation 22</xref> gauges the model&#x2019;s ability to correctly identify positive instances, whereas specificity in <xref ref-type="disp-formula" rid="e23">Equation 23</xref> evaluates the model&#x2019;s performance in correctly identifying negative instances.<disp-formula id="e22">
<mml:math id="m45">
<mml:mrow>
<mml:mtext>Sensitivity</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mtext>TP</mml:mtext>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(22)</label>
</disp-formula>
<disp-formula id="e23">
<mml:math id="m46">
<mml:mrow>
<mml:mtext>Specificity</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mtext>TN</mml:mtext>
<mml:mrow>
<mml:mtext>TN</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FP</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(23)</label>
</disp-formula>
</p>
</sec>
<sec id="s3-5-2-4">
<label>3.5.2.4</label>
<title>Pixel accuracy</title>
<p>This metric in <xref ref-type="disp-formula" rid="e24">Equation 24</xref> determines the percentage of pixels that are correctly classified, offering a straightforward measure of the model&#x2019;s accuracy at the pixel level.<disp-formula id="e24">
<mml:math id="m47">
<mml:mrow>
<mml:mtext>Pixel&#x2009;Accuracy</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>TN</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>TN</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FP</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(24)</label>
</disp-formula>
</p>
</sec>
<sec id="s3-5-2-5">
<label>3.5.2.5</label>
<title>ROC-AUC</title>
<p>The Receiver Operating Characteristic Area Under the Curve provides a measure of the model&#x2019;s ability to distinguish between the classes, with a value closer to 1 indicating superior performance.</p>
</sec>
<sec id="s3-5-2-6">
<label>3.5.2.6</label>
<title>PR-AUC</title>
<p>The Precision-Recall Area Under the Curve evaluates the model&#x2019;s precision-recall trade-off, especially useful when classes are imbalanced.</p>
</sec>
</sec>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Qualitative analysis</title>
<p>A qualitative analysis of the segmentation results generated by the three pre-trained segmentation models is performed through the utilization of Grad-CAM (Gradient-weighted Class Activation Mapping) heatmaps. The process of Grad-CAM heatmap visualization is outlined in the flowchart of <xref ref-type="fig" rid="F8">Figure 8</xref>.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Flowchart of Grad-CAM heatmap visualization for model&#x2019;s qualitative evaluation.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g008.tif">
<alt-text content-type="machine-generated">Flowchart illustrating a process starting with loading a pre-trained model, followed by loading test images and masks. It then sets K-fold cross-validation, splitting data for each fold. For each validation image, the process involves loading images and masks, predicting masks, calculating evaluation metrics, and visualizing results. Metrics are aggregated, with mean and standard deviation computed, concluding the process.</alt-text>
</graphic>
</fig>
<p>Grad-CAM heatmaps serve as a crucial tool for understanding the decision-making process of deep learning models, particularly in the context of image segmentation. The Grad-CAM heatmap visualization begins with the loading of the pre-trained segmentation model. Grayscale test images, representative of thermal data, are then loaded into the system. Each test image undergoes resizing to align with the pre-trained segmentation model&#x2019;s input dimensions, accompanied by pre-processing steps to ensure compatibility with the model&#x2019;s expectations. Subsequently, the pre-trained model processes the loaded test images, generating predictions while employing Grad-CAM to visualize regions of interest significantly contributing to the model&#x2019;s decision.</p>
<p>The Grad-CAM heatmap computation involves leveraging gradients of the target class, specifically features indicative of breast tissue, with respect to the model&#x2019;s final convolutional layer. These gradients are globally average-pooled to derive importance weights for each feature map. The identification of regions of interest is then accomplished by using these weights to highlight areas crucial for the model&#x2019;s decision-making. The ensuing step involves overlaying the generated Grad-CAM heatmaps onto the original grayscale images, visually elucidating the correspondence between highlighted regions and actual features in the thermal images. This overlay process is systematically repeated for all test images, facilitating a comprehensive qualitative analysis of the model&#x2019;s predictions and the corresponding regions of interest.</p>
<p>Grad-CAM heatmaps serve as a crucial tool for understanding the decision-making process of deep learning models, particularly in the context of image segmentation. The Grad-CAM heatmap visualization begins with the loading of the pre-trained segmentation model. Grayscale test images, representative of thermal data, are then loaded into the system. Each test image undergoes resizing to align with the pre-trained segmentation model&#x2019;s input dimensions, accompanied by pre-processing steps to ensure compatibility with the model&#x2019;s expectations. Subsequently, the pre-trained model processes the loaded test images, generating predictions while employing Grad-CAM to visualize regions of interest significantly contributing to the model&#x2019;s decision. The Grad-CAM heatmap computation involves leveraging gradients of the target class, specifically features indicative of breast tissue, with respect to the model&#x2019;s final convolutional layer. These gradients are globally average-pooled to derive importance weights for each feature map. The identification of regions of interest is then accomplished by using these weights to highlight areas crucial for the model&#x2019;s decision-making. The ensuing step involves overlaying the generated Grad-CAM heatmaps onto the original grayscale images, visually elucidating the correspondence between highlighted regions and actual features in the thermal images. This overlay process is systematically repeated for all test images, facilitating a comprehensive qualitative analysis of the model&#x2019;s predictions and the corresponding regions of interest.</p>
<p>Grad-CAM involves computations that are succinctly expressed through the following formulas (<xref ref-type="bibr" rid="B29">Selvaraju et al., 2016</xref>).</p>
<sec id="s4-1">
<label>4.1</label>
<title>Gradient-weighted global average pooling</title>
<p>Grad-CAM calculates the importance weights by performing global average pooling on the gradients of the target class with respect to the feature maps. This is mathematically represented by <xref ref-type="disp-formula" rid="e25">Equation 25</xref> as:<disp-formula id="e25">
<mml:math id="m48">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>Z</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:msup>
<mml:mi>Y</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:msubsup>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(25)</label>
</disp-formula>
</p>
<p>Where:</p>
<p>
<inline-formula id="inf24">
<mml:math id="m49">
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="normal">&#x3b1;</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the importance weight for the <inline-formula id="inf25">
<mml:math id="m50">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> th feature map in the <inline-formula id="inf26">
<mml:math id="m51">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> th class, </p>
<p>
<inline-formula id="inf27">
<mml:math id="m52">
<mml:mrow>
<mml:mi>Z</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the normalization factor, </p>
<p>
<inline-formula id="inf28">
<mml:math id="m53">
<mml:mrow>
<mml:msup>
<mml:mi>Y</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is the final prediction score for class <inline-formula id="inf29">
<mml:math id="m54">
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, </p>
<p>
<inline-formula id="inf30">
<mml:math id="m55">
<mml:mrow>
<mml:msubsup>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the activation in the <inline-formula id="inf31">
<mml:math id="m56">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> th feature map at position (<inline-formula id="inf32">
<mml:math id="m57">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>).</p>
</sec>
<sec id="s4-2">
<label>4.2</label>
<title>Weighted sum of feature maps</title>
<p>The weighted sum of feature maps is computed using <xref ref-type="disp-formula" rid="e26">Equation 26</xref> to obtain the heatmap, denoted as <inline-formula id="inf33">
<mml:math id="m58">
<mml:mrow>
<mml:msubsup>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mtext>Grad</mml:mtext>
<mml:mo>&#x2010;</mml:mo>
<mml:mtext>CAM</mml:mtext>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>:<disp-formula id="e26">
<mml:math id="m59">
<mml:mrow>
<mml:msubsup>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mtext>Grad</mml:mtext>
<mml:mo>&#x2010;</mml:mo>
<mml:mtext>CAM</mml:mtext>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>ReLU</mml:mtext>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(26)</label>
</disp-formula>
</p>
<p>Where:</p>
<p>
<inline-formula id="inf34">
<mml:math id="m60">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> represents the <inline-formula id="inf35">
<mml:math id="m61">
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>-th feature map.</p>
</sec>
<sec id="s4-3">
<label>4.3</label>
<title>Overlaying heatmap onto original image</title>
<p>The overlay operation in <xref ref-type="disp-formula" rid="e27">Equation 27</xref> involves combining the Grad-CAM heatmap <inline-formula id="inf36">
<mml:math id="m62">
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:msubsup>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mtext>Grad</mml:mtext>
<mml:mo>&#x2010;</mml:mo>
<mml:mtext>CAM</mml:mtext>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
</inline-formula> with the original image <inline-formula id="inf37">
<mml:math id="m63">
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
</inline-formula>:<disp-formula id="e27">
<mml:math display="block" id="m64">
<mml:mrow>
<mml:mrow>
<mml:mtext>Resultant</mml:mtext>
<mml:mspace width=".2em"/>
<mml:mtext>Image</mml:mtext>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mtext>Heatmap</mml:mtext>
<mml:mspace width=".2em"/>
<mml:mtext>Weight</mml:mtext>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:msubsup>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mtext>Grad</mml:mtext>
<mml:mo>&#x2010;</mml:mo>
<mml:mtext>CAM</mml:mtext>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mtext>Heatmap</mml:mtext>
<mml:mspace width=".2em"/>
<mml:mtext>Weight</mml:mtext>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:math>
<label>(27)</label>
</disp-formula>
</p>
<p>Where:</p>
<p>The <inline-formula id="inf38">
<mml:math id="m65">
<mml:mrow>
<mml:mtext>Heatmap&#x2009;Weight</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> determines the intensity of the heatmap overlay.</p>
<p>The Grad-CAM heatmap visualization offers valuable insights into the interpretability of the model. It aids in understanding which regions of the input images are pivotal for the model&#x2019;s predictions, thereby contributing to the overall assessment of the model&#x2019;s performance in thermography-based breast region segmentation. The results of the qualitative analysis are presented in <xref ref-type="sec" rid="s5-3">Section 5.3</xref>.</p>
<sec id="s4-3-1">
<label>4.3.1</label>
<title>Evaluation criteria</title>
<p>The segmentation outputs generated by each model are visually inspected from the Grad-CAM heatmaps to qualitatively assess the performance. The following criteria are analyzed:</p>
<sec id="s4-3-1-1">
<label>4.3.1.1</label>
<title>Breast region overlap</title>
<p>The extent to which the Grad-CAM heatmap aligns with the actual breast region in the thermal images is examined. The following scoring system is employed:<list list-type="simple">
<list-item>
<p>5 (Excellent): The Grad-CAM heatmap effectively highlights the component of breast region, aligning precisely with the breast boundaries.</p>
</list-item>
<list-item>
<p>4 (Good): The heatmap predominantly covers the breast area with minor inconsistencies in the highlighting.</p>
</list-item>
<list-item>
<p>3 (Moderate): The heatmap shows activations over parts of the breast, but with gaps or inaccuracies.</p>
</list-item>
<list-item>
<p>2 (Poor): Activations in the heatmap are sparse over the breast region, lacking coverage, and accuracy.</p>
</list-item>
<list-item>
<p>1 (Very Poor): The heatmap does not effectively highlight the breast region, lacking clear correlation with the actual boundaries.</p>
</list-item>
</list>
</p>
</sec>
<sec id="s4-3-1-2">
<label>4.3.1.2</label>
<title>Noise Handling</title>
<p>The presence of noise or random activations in non-relevant areas of the Grad-CAM heatmap is observed:<list list-type="simple">
<list-item>
<p>5 (Excellent): There is minimal to no noise, with activations concentrated on the breast area.</p>
</list-item>
<list-item>
<p>4 (Good): There are a few minor instances of noise, limited and not significantly affecting the heatmap quality.</p>
</list-item>
<list-item>
<p>3 (Moderate): Some noise is present in non-relevant areas but does not obscure the breast region entirely.</p>
</list-item>
<list-item>
<p>2 (Poor): Noticeable noise patterns interfere with the clear depiction of the breast region.</p>
</list-item>
<list-item>
<p>1 (Very Poor): The heatmap is predominantly noisy with little meaningful activation in the breast region, making accurate identification impossible.</p>
</list-item>
</list>
</p>
<p>The color scheme used in the generated heatmaps utilized &#x2018;jet&#x2019; colormap, where cool colors show low activations, and warm colors represent high activations. The interpretation of these colors is aligned with the model&#x2019;s confidence levels, with warmer colors indicating higher confidence in the presence of breast tissue. The following aspects are considered:<list list-type="simple">
<list-item>
<p>1. Cool Colors (Blue/Green): Regions in the heatmap represented by cooler colors indicate low activations. These areas might correspond to regions where the model is less certain about the presence of breast tissue. The alignment of these low activation areas with non-breast regions or ambiguous features is examined.</p>
</list-item>
<list-item>
<p>2. Warm Colors (Yellow/Red): Areas in the heatmap represented by warmer colors indicate high activations. These regions correspond to the areas where the model is most confident about the presence of breast tissue. The accuracy of these high activation areas in capturing the actual breast tissue is assessed.</p>
</list-item>
<list-item>
<p>3. Transition Zones (Green to Yellow to Red): Transitional areas between cool and warm colors are analyzed. Smooth transitions from cool to warm colors along the boundaries of the breast tissue indicate gradual changes in activation levels, demonstrating accurate localization and segmentation.</p>
</list-item>
</list>
</p>
</sec>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s5">
<label>5</label>
<title>Results</title>
<p>As stated, the annotations were carried out solely for experimental purposes and not for clinical application, since they were not performed by certified technicians. To mitigate potential bias, we followed standardized guidelines and performed cross-verification among the authors to ensure consistency and accuracy of the masks. We believe that this limitation does not compromise the reliability of the reported findings. Nevertheless, we fully agree that the inclusion of annotations from certified medical experts would add another layer of validation, and we consider this an important direction for future work.</p>
<sec id="s5-1">
<label>5.1</label>
<title>Model training and evaluation results</title>
<p>To determine the optimal optimizer for training the deep learning models, a thorough comparative evaluation was conducted using five different optimizers: ADAM, NADAM, RMSPROP, SGDM, and ADADELTA. The evaluation focused on three key metrics: final loss, final accuracy, and training time. These metrics provide insight into the efficacy and efficiency of each optimizer in training the segmentation models. The final loss measures how well the model fits the training data, the final accuracy indicates the proportion of training data correctly classified by the model, and the training time reflects the optimizer&#x2019;s computational efficiency. The results of this comparative evaluation are presented in <xref ref-type="table" rid="T1">Table 1</xref>&#x2013;<xref ref-type="table" rid="T3">3</xref>. Which are also graphically represented in <xref ref-type="fig" rid="F9">Figures 9</xref>&#x2013;<xref ref-type="fig" rid="F11">11</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Final loss of different optimizers across U-net, U-net with spatial attention, and U-Net&#x2b;&#x2b; models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Optimizer</th>
<th align="center">U-Net</th>
<th align="center">U-Net with spatial attention</th>
<th align="center">U-Net&#x2b;&#x2b;</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">ADAM</td>
<td align="center">0.0357</td>
<td align="center">0.0437</td>
<td align="center">0.0381</td>
</tr>
<tr>
<td align="center">NADAM</td>
<td align="center">0.0514</td>
<td align="center">0.0502</td>
<td align="center">0.0584</td>
</tr>
<tr>
<td align="center">RMSPROP</td>
<td align="center">0.0416</td>
<td align="center">0.0442</td>
<td align="center">0.0424</td>
</tr>
<tr>
<td align="center">SGDM</td>
<td align="center">0.2041</td>
<td align="center">0.2860</td>
<td align="center">0.2800</td>
</tr>
<tr>
<td align="center">ADADELTA</td>
<td align="center">0.6732</td>
<td align="center">0.6806</td>
<td align="center">0.6777</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Final accuracy of different optimizers across U-net, U-net with spatial attention, and U-Net&#x2b;&#x2b; models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Optimizer</th>
<th align="center">U-Net</th>
<th align="center">U-Net with spatial attention</th>
<th align="center">U-Net&#x2b;&#x2b;</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">ADAM</td>
<td align="center">0.9637</td>
<td align="center">0.9613</td>
<td align="center">0.9631</td>
</tr>
<tr>
<td align="center">NADAM</td>
<td align="center">0.9584</td>
<td align="center">0.9590</td>
<td align="center">0.9561</td>
</tr>
<tr>
<td align="center">RMSPROP</td>
<td align="center">0.9617</td>
<td align="center">0.9614</td>
<td align="center">0.9622</td>
</tr>
<tr>
<td align="center">SGDM</td>
<td align="center">0.9030</td>
<td align="center">0.8679</td>
<td align="center">0.8702</td>
</tr>
<tr>
<td align="center">ADADELTA</td>
<td align="center">0.6598</td>
<td align="center">0.6675</td>
<td align="center">0.6605</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Training time of different optimizers across U-net, U-net with spatial attention, and U-Net&#x2b;&#x2b; models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Optimizer</th>
<th align="center">U-Net</th>
<th align="center">U-Net with spatial attention</th>
<th align="center">U-Net&#x2b;&#x2b;</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">ADAM</td>
<td align="center">663.81 s</td>
<td align="center">687.92 s</td>
<td align="center">1,036.33 s</td>
</tr>
<tr>
<td align="center">NADAM</td>
<td align="center">709.93 s</td>
<td align="center">711.29 s</td>
<td align="center">1,059.80 s</td>
</tr>
<tr>
<td align="center">RMSPROP</td>
<td align="center">677.83 s</td>
<td align="center">743.26 s</td>
<td align="center">1,054.91 s</td>
</tr>
<tr>
<td align="center">SGDM</td>
<td align="center">691.45 s</td>
<td align="center">702.98 s</td>
<td align="center">1,068.53 s</td>
</tr>
<tr>
<td align="center">ADADELTA</td>
<td align="center">679.32 s</td>
<td align="center">704.77 s</td>
<td align="center">1,082.66 s</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Graph of Final Loss of different optimizers across U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; Models.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g009.tif">
<alt-text content-type="machine-generated">Bar chart depicting final loss values for different optimizers. ADAM, NADAM, RMSPROP, SGDM, and ADADELTA are compared using U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;. ADAM shows the lowest losses, while ADADELTA has the highest across all models.</alt-text>
</graphic>
</fig>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Graph of Final Accuracy of different optimizers across U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; Models.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g010.tif">
<alt-text content-type="machine-generated">Bar chart comparing final accuracy of three models: U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; using different optimizers. ADAM: U-Net 0.9637, U-Net with Spatial Attention 0.9613, U-Net&#x2b;&#x2b; 0.9631. NADAM: 0.9584, 0.959, 0.9561. RMSPROP: 0.9617, 0.9614, 0.9622. SGDM: 0.903, 0.8679, 0.8702. ADADELTA: 0.6598, 0.6675, 0.6605.</alt-text>
</graphic>
</fig>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption>
<p>Graph of Training time of different optimizers across U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; Models.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g011.tif">
<alt-text content-type="machine-generated">Bar chart comparing training times of U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; across different optimizers: ADAM, NADAM, RMSPROP, SGDM, and ADADELTA. U-Net consistently shows the lowest training time, followed by U-Net with Spatial Attention, while U-Net&#x2b;&#x2b; has the highest training time across all optimizers.</alt-text>
</graphic>
</fig>
<p>The findings presented demonstrate significant variations in the efficacy of different optimization algorithms when applied to the three distinct deep learning models, U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;. ADAM emerges as the preeminent optimizer for these models, consistently yielding the most favorable outcomes in terms of reduced loss values and heightened accuracy scores. The U-Net model, when trained using the ADAM optimizer, demonstrated notable performance with a final loss of 0.0357 and an accuracy of 0.9637. These results are highly competitive when compared to the others. In relation to the duration of the training, both the U-Net and U-Net with Spatial Attention architectures exhibit a notable level of efficiency, demonstrating comparable or reduced training times when compared to the U-Net&#x2b;&#x2b; model, regardless of the optimizer employed. The U-Net&#x2b;&#x2b; model exhibits a consistently longer training duration, which can be attributed to its intricate architectural design. The NADAM and RMSPROP optimizers exhibit comparable performance, albeit with marginally elevated loss values and diminished accuracy scores in comparison to the ADAM optimizer. In contrast, SGDM demonstrates notably elevated loss values and diminished accuracy scores across all three models, suggesting that it may not be the optimal selection for these specific models. Among the five optimizers, ADADELTA exhibits the poorest performance, characterized by significantly elevated loss values and notably decreased accuracy scores, along with relatively long training durations.</p>
<p>Using 20%&#x2013;30% of the data for validation is a common practice in deep learning and medical image classification studies, as it provides a balance between training and validation sizes (<xref ref-type="bibr" rid="B34">Szegedy et al., 2016</xref>). While this proportion allows for an initial assessment of model performance, we acknowledge that the dataset size remains limited for testing the model&#x2019;s generalizability to new or unseen data. Regarding the fixed 30 training epochs, this number was chosen to maintain consistency across all models and optimizers, with careful monitoring of training and validation loss to ensure convergence and prevent overfitting. Through our experiments, these settings proved suitable for all models to achieve optimal performance within the current dataset scope (<xref ref-type="bibr" rid="B9">Ding et al., 2022</xref>).</p>
<p>In conclusion, the ADAM optimizer is implemented for the training of the three segmentation models for its superior performance in this study. The training process of the models, using the ADAM optimizer, is visualized using their loss and accuracy graphs over the number of epochs. <xref ref-type="fig" rid="F12">Figure 12</xref>, depict the convergence of each model throughout the training process. The training loss metric serves as a measure of the model&#x2019;s ability to fit the data, whereas the accuracy metric reflects the frequency with which the model&#x2019;s predictions align with the actual outcomes. Over the course of 30 epochs, the models demonstrated a progressive decline in the loss values and a steady improvement in accuracy, while guided by the ADAM optimizer.</p>
<fig id="F12" position="float">
<label>FIGURE 12</label>
<caption>
<p>
<bold>(a)</bold> Loss and <bold>(b)</bold> Accuracy of U-Net, <bold>(c)</bold> Loss and <bold>(d)</bold> Accuracy of U-Net with Spatial Attention, and <bold>(e)</bold> Loss and <bold>(f)</bold> Accuracy of U-Net&#x2b;&#x2b; over number of epochs using ADAM optimizer.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g012.tif">
<alt-text content-type="machine-generated">Six graphs displaying loss and accuracy metrics over 30 epochs for a machine learning model. Graphs (a), (c), and (e) show loss and validation loss decreasing, while (b), (d), and (f) show accuracy and validation accuracy increasing, with separate lines for training and validation.</alt-text>
</graphic>
</fig>
</sec>
<sec id="s5-2">
<label>5.2</label>
<title>Quantitative analysis results</title>
<p>K-fold cross-validation analysis was conducted to evaluate the three pre-trained segmentation models using 30% of the entire dataset. Detailed metrics, as outlined in Section 4.5, were meticulously examined. The evaluation results for U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; are presented in <xref ref-type="table" rid="T4">Tables 4</xref>&#x2013;<xref ref-type="table" rid="T6">6</xref>, respectively. Each table provides a detailed breakdown of metrics such as Intersection over Union (IoU), Dice coefficient, precision, recall, sensitivity, specificity, pixel accuracy, ROC-AUC, and PR-AUC for every fold (k &#x3d; 1&#x2013;10).</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Evaluation metrics for the breast region segmentation folds using U-net model.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Metric</th>
<th align="center">k &#x3d; 1</th>
<th align="center">k &#x3d; 2</th>
<th align="center">k &#x3d; 3</th>
<th align="center">k &#x3d; 4</th>
<th align="center">k &#x3d; 5</th>
<th align="center">k &#x3d; 6</th>
<th align="center">k &#x3d; 7</th>
<th align="center">k &#x3d; 8</th>
<th align="center">k &#x3d; 9</th>
<th align="center">k &#x3d; 10</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">IoU</td>
<td align="center">0.9283</td>
<td align="center">0.9393</td>
<td align="center">0.9421</td>
<td align="center">0.9084</td>
<td align="center">0.9135</td>
<td align="center">0.9374</td>
<td align="center">0.9396</td>
<td align="center">0.9282</td>
<td align="center">0.9464</td>
<td align="center">0.9087</td>
</tr>
<tr>
<td align="center">Dice Coefficient</td>
<td align="center">0.9627</td>
<td align="center">0.9687</td>
<td align="center">0.9702</td>
<td align="center">0.9513</td>
<td align="center">0.9544</td>
<td align="center">0.9676</td>
<td align="center">0.9688</td>
<td align="center">0.9627</td>
<td align="center">0.9725</td>
<td align="center">0.9514</td>
</tr>
<tr>
<td align="center">Precision</td>
<td align="center">0.9545</td>
<td align="center">0.9862</td>
<td align="center">0.9875</td>
<td align="center">0.8681</td>
<td align="center">0.9889</td>
<td align="center">0.9867</td>
<td align="center">0.9914</td>
<td align="center">0.9903</td>
<td align="center">0.9753</td>
<td align="center">0.9922</td>
</tr>
<tr>
<td align="center">Recall</td>
<td align="center">0.9380</td>
<td align="center">0.9619</td>
<td align="center">0.9583</td>
<td align="center">0.9652</td>
<td align="center">0.9388</td>
<td align="center">0.9549</td>
<td align="center">0.9552</td>
<td align="center">0.9568</td>
<td align="center">0.9749</td>
<td align="center">0.9549</td>
</tr>
<tr>
<td align="center">Sensitivity</td>
<td align="center">0.9380</td>
<td align="center">0.9619</td>
<td align="center">0.9583</td>
<td align="center">0.9652</td>
<td align="center">0.9388</td>
<td align="center">0.9549</td>
<td align="center">0.9552</td>
<td align="center">0.9568</td>
<td align="center">0.9749</td>
<td align="center">0.9549</td>
</tr>
<tr>
<td align="center">Specificity</td>
<td align="center">0.9737</td>
<td align="center">0.9876</td>
<td align="center">0.9924</td>
<td align="center">0.9087</td>
<td align="center">0.9950</td>
<td align="center">0.9908</td>
<td align="center">0.9949</td>
<td align="center">0.9926</td>
<td align="center">0.9729</td>
<td align="center">0.9927</td>
</tr>
<tr>
<td align="center">Pixel Accuracy</td>
<td align="center">0.9605</td>
<td align="center">0.9752</td>
<td align="center">0.9792</td>
<td align="center">0.9304</td>
<td align="center">0.9768</td>
<td align="center">0.9758</td>
<td align="center">0.9799</td>
<td align="center">0.9768</td>
<td align="center">0.9740</td>
<td align="center">0.9741</td>
</tr>
<tr>
<td align="center">ROC-AUC</td>
<td align="center">0.9559</td>
<td align="center">0.9747</td>
<td align="center">0.9753</td>
<td align="center">0.9369</td>
<td align="center">0.9669</td>
<td align="center">0.9728</td>
<td align="center">0.9751</td>
<td align="center">0.9747</td>
<td align="center">0.9739</td>
<td align="center">0.9738</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Evaluation metrics for the breast region segmentation folds using U-net with spatial attention model.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Metric</th>
<th align="center">k &#x3d; 1</th>
<th align="center">k &#x3d; 2</th>
<th align="center">k &#x3d; 3</th>
<th align="center">k &#x3d; 4</th>
<th align="center">k &#x3d; 5</th>
<th align="center">k &#x3d; 6</th>
<th align="center">k &#x3d; 7</th>
<th align="center">k &#x3d; 8</th>
<th align="center">k &#x3d; 9</th>
<th align="center">k &#x3d; 10</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">IoU</td>
<td align="center">0.9267</td>
<td align="center">0.9235</td>
<td align="center">0.9422</td>
<td align="center">0.9239</td>
<td align="center">0.9239</td>
<td align="center">0.9451</td>
<td align="center">0.9373</td>
<td align="center">0.9230</td>
<td align="center">0.9316</td>
<td align="center">0.9125</td>
</tr>
<tr>
<td align="center">Dice Coefficient</td>
<td align="center">0.9617</td>
<td align="center">0.9601</td>
<td align="center">0.9702</td>
<td align="center">0.9601</td>
<td align="center">0.9603</td>
<td align="center">0.9717</td>
<td align="center">0.9676</td>
<td align="center">0.9598</td>
<td align="center">0.9646</td>
<td align="center">0.9535</td>
</tr>
<tr>
<td align="center">Precision</td>
<td align="center">0.9216</td>
<td align="center">0.9782</td>
<td align="center">0.9893</td>
<td align="center">0.8966</td>
<td align="center">0.9633</td>
<td align="center">0.9849</td>
<td align="center">0.9907</td>
<td align="center">0.9764</td>
<td align="center">0.9634</td>
<td align="center">0.9924</td>
</tr>
<tr>
<td align="center">Recall</td>
<td align="center">0.9595</td>
<td align="center">0.9665</td>
<td align="center">0.9644</td>
<td align="center">0.9769</td>
<td align="center">0.9644</td>
<td align="center">0.9704</td>
<td align="center">0.9583</td>
<td align="center">0.9641</td>
<td align="center">0.9690</td>
<td align="center">0.9593</td>
</tr>
<tr>
<td align="center">Sensitivity</td>
<td align="center">0.9595</td>
<td align="center">0.9665</td>
<td align="center">0.9644</td>
<td align="center">0.9769</td>
<td align="center">0.9644</td>
<td align="center">0.9704</td>
<td align="center">0.9583</td>
<td align="center">0.9641</td>
<td align="center">0.9690</td>
<td align="center">0.9593</td>
</tr>
<tr>
<td align="center">Specificity</td>
<td align="center">0.9520</td>
<td align="center">0.9801</td>
<td align="center">0.9934</td>
<td align="center">0.9299</td>
<td align="center">0.9825</td>
<td align="center">0.9893</td>
<td align="center">0.9945</td>
<td align="center">0.9816</td>
<td align="center">0.9597</td>
<td align="center">0.9928</td>
</tr>
<tr>
<td align="center">Pixel Accuracy</td>
<td align="center">0.9548</td>
<td align="center">0.9736</td>
<td align="center">0.9822</td>
<td align="center">0.9479</td>
<td align="center">0.9767</td>
<td align="center">0.9814</td>
<td align="center">0.9808</td>
<td align="center">0.9739</td>
<td align="center">0.9646</td>
<td align="center">0.9763</td>
</tr>
<tr>
<td align="center">ROC-AUC</td>
<td align="center">0.9558</td>
<td align="center">0.9733</td>
<td align="center">0.9789</td>
<td align="center">0.9534</td>
<td align="center">0.9735</td>
<td align="center">0.9799</td>
<td align="center">0.9764</td>
<td align="center">0.9728</td>
<td align="center">0.9644</td>
<td align="center">0.9761</td>
</tr>
<tr>
<td align="center">PR-AUC</td>
<td align="center">0.8992</td>
<td align="center">0.9615</td>
<td align="center">0.9679</td>
<td align="center">0.8847</td>
<td align="center">0.9405</td>
<td align="center">0.9681</td>
<td align="center">0.9652</td>
<td align="center">0.9572</td>
<td align="center">0.9498</td>
<td align="center">0.9721</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>Evaluation metrics for the breast region segmentation folds using U-Net&#x2b;&#x2b; model.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Metric</th>
<th align="center">k &#x3d; 1</th>
<th align="center">k &#x3d; 2</th>
<th align="center">k &#x3d; 3</th>
<th align="center">k &#x3d; 4</th>
<th align="center">k &#x3d; 5</th>
<th align="center">k &#x3d; 6</th>
<th align="center">k &#x3d; 7</th>
<th align="center">k &#x3d; 8</th>
<th align="center">k &#x3d; 9</th>
<th align="center">k &#x3d; 10</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">IoU&#x2a;</td>
<td align="center">0.9340</td>
<td align="center">0.9352</td>
<td align="center">0.9252</td>
<td align="center">0.8994</td>
<td align="center">0.9187</td>
<td align="center">0.9469</td>
<td align="center">0.9243</td>
<td align="center">0.9113</td>
<td align="center">0.9342</td>
<td align="center">0.9215</td>
</tr>
<tr>
<td align="center">Dice Coefficient</td>
<td align="center">0.9657</td>
<td align="center">0.9664</td>
<td align="center">0.9610</td>
<td align="center">0.9460</td>
<td align="center">0.9575</td>
<td align="center">0.9727</td>
<td align="center">0.9606</td>
<td align="center">0.9535</td>
<td align="center">0.9659</td>
<td align="center">0.9588</td>
</tr>
<tr>
<td align="center">Precision</td>
<td align="center">0.9294</td>
<td align="center">0.9883</td>
<td align="center">0.9732</td>
<td align="center">0.8377</td>
<td align="center">0.9749</td>
<td align="center">0.9848</td>
<td align="center">0.9877</td>
<td align="center">0.9633</td>
<td align="center">0.9707</td>
<td align="center">0.9860</td>
</tr>
<tr>
<td align="center">Recall</td>
<td align="center">0.9616</td>
<td align="center">0.9653</td>
<td align="center">0.9690</td>
<td align="center">0.9695</td>
<td align="center">0.9489</td>
<td align="center">0.9658</td>
<td align="center">0.9575</td>
<td align="center">0.9619</td>
<td align="center">0.9651</td>
<td align="center">0.9569</td>
</tr>
<tr>
<td align="center">Sensitivity</td>
<td align="center">0.9616</td>
<td align="center">0.9653</td>
<td align="center">0.9690</td>
<td align="center">0.9695</td>
<td align="center">0.9489</td>
<td align="center">0.9658</td>
<td align="center">0.9575</td>
<td align="center">0.9619</td>
<td align="center">0.9651</td>
<td align="center">0.9569</td>
</tr>
<tr>
<td align="center">Specificity</td>
<td align="center">0.9571</td>
<td align="center">0.9894</td>
<td align="center">0.9832</td>
<td align="center">0.8831</td>
<td align="center">0.9884</td>
<td align="center">0.9893</td>
<td align="center">0.9927</td>
<td align="center">0.9711</td>
<td align="center">0.9680</td>
<td align="center">0.9868</td>
</tr>
<tr>
<td align="center">Pixel Accuracy</td>
<td align="center">0.9588</td>
<td align="center">0.9779</td>
<td align="center">0.9777</td>
<td align="center">0.9163</td>
<td align="center">0.9756</td>
<td align="center">0.9795</td>
<td align="center">0.9794</td>
<td align="center">0.9671</td>
<td align="center">0.9665</td>
<td align="center">0.9721</td>
</tr>
<tr>
<td align="center">ROC-AUC&#x2a;</td>
<td align="center">0.9593</td>
<td align="center">0.9774</td>
<td align="center">0.9761</td>
<td align="center">0.9263</td>
<td align="center">0.9686</td>
<td align="center">0.9775</td>
<td align="center">0.9751</td>
<td align="center">0.9665</td>
<td align="center">0.9666</td>
<td align="center">0.9719</td>
</tr>
<tr>
<td align="center">PR-AUC&#x2a;</td>
<td align="center">0.9079</td>
<td align="center">0.9706</td>
<td align="center">0.9550</td>
<td align="center">0.8239</td>
<td align="center">0.9415</td>
<td align="center">0.9654</td>
<td align="center">0.9618</td>
<td align="center">0.9434</td>
<td align="center">0.9550</td>
<td align="center">0.9648</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The U-Net model exhibited robust performance with an average IoU of 0.9292 and a standard deviation of 0.0136. Notably, the Dice coefficient averaged at 0.9630, indicating a high degree of accuracy in segmentation. Precision, recall, and specificity consistently maintained their values across folds, underscoring the model&#x2019;s effectiveness in classifying true positives and negatives. Pixel accuracy reached an average of 0.9703, signifying precise pixel-level segmentation. The model&#x2019;s discriminative ability, as measured by ROC-AUC and PR-AUC, was substantial, averaging at 0.9680 and 0.9472, respectively.</p>
<p>The U-Net with spatial attention model exhibited competitive results, with an average IoU of 0.9290 and a low standard deviation of 0.0095. The Dice coefficient showed a mean value of 0.9630, underlining accurate segmentation. Precision and specificity demonstrated consistent values across folds, indicating reliable positive classification. The ROC-AUC and PR-AUC averaged 0.9704 and 0.9466, respectively, emphasizing the model&#x2019;s strong discriminative ability.</p>
<p>The U-Net&#x2b;&#x2b; model showcased competitive performance, with an average IoU of 0.9251 and a standard deviation of 0.0128. The Dice coefficient reached an average of 0.9608, indicating accurate segmentation results. Precision and specificity displayed consistent values, highlighting the model&#x2019;s ability to accurately classify positive samples. The ROC-AUC and PR-AUC, measuring the model&#x2019;s discriminative ability, averaged 0.9665 and 0.9389, respectively.</p>
<p>Comparative analysis of the results from each segmentation model is conducted by observing the mean and standard deviation for the evaluation metrics which are summarized in <xref ref-type="table" rid="T7">Tables 7</xref> and <xref ref-type="table" rid="T8">8</xref>, which are also graphically represented in <xref ref-type="fig" rid="F13">Figures 13</xref>, <xref ref-type="fig" rid="F14">14</xref>.</p>
<table-wrap id="T7" position="float">
<label>TABLE 7</label>
<caption>
<p>Mean of evaluation metrics for U-net, U-net with spatial attention, and U-Net&#x2b;&#x2b; models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Metric</th>
<th align="center">U-Net</th>
<th align="center">U-Net with spatial attention</th>
<th align="center">U-Net&#x2b;&#x2b;</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">IoU</td>
<td align="center">0.9292</td>
<td align="center">0.9290</td>
<td align="center">0.9251</td>
</tr>
<tr>
<td align="center">Dice Coefficient</td>
<td align="center">0.9630</td>
<td align="center">0.9630</td>
<td align="center">0.9608</td>
</tr>
<tr>
<td align="center">Precision</td>
<td align="center">0.9721</td>
<td align="center">0.9657</td>
<td align="center">0.9596</td>
</tr>
<tr>
<td align="center">Recall</td>
<td align="center">0.9559</td>
<td align="center">0.9653</td>
<td align="center">0.9621</td>
</tr>
<tr>
<td align="center">Sensitivity</td>
<td align="center">0.9559</td>
<td align="center">0.9653</td>
<td align="center">0.9621</td>
</tr>
<tr>
<td align="center">Specificity</td>
<td align="center">0.9801</td>
<td align="center">0.9756</td>
<td align="center">0.9709</td>
</tr>
<tr>
<td align="center">Pixel Accuracy</td>
<td align="center">0.9703</td>
<td align="center">0.9712</td>
<td align="center">0.9671</td>
</tr>
<tr>
<td align="center">ROC-AUC</td>
<td align="center">0.9680</td>
<td align="center">0.9704</td>
<td align="center">0.9665</td>
</tr>
<tr>
<td align="center">PR-AUC</td>
<td align="center">0.9472</td>
<td align="center">0.9466</td>
<td align="center">0.9389</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T8" position="float">
<label>TABLE 8</label>
<caption>
<p>Standard deviation of evaluation metrics for U-net, U-net with spatial attention, and U-Net&#x2b;&#x2b; models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Metric</th>
<th align="center">U-Net</th>
<th align="center">U-Net with spatial attention</th>
<th align="center">U-Net&#x2b;&#x2b;</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">IoU</td>
<td align="center">0.9292</td>
<td align="center">0.9290</td>
<td align="center">0.9251</td>
</tr>
<tr>
<td align="center">Dice Coefficient</td>
<td align="center">0.9630</td>
<td align="center">0.9630</td>
<td align="center">0.9608</td>
</tr>
<tr>
<td align="center">Precision</td>
<td align="center">0.9721</td>
<td align="center">0.9657</td>
<td align="center">0.9596</td>
</tr>
<tr>
<td align="center">Recall</td>
<td align="center">0.9559</td>
<td align="center">0.9653</td>
<td align="center">0.9621</td>
</tr>
<tr>
<td align="center">Sensitivity</td>
<td align="center">0.9559</td>
<td align="center">0.9653</td>
<td align="center">0.9621</td>
</tr>
<tr>
<td align="center">Specificity</td>
<td align="center">0.9801</td>
<td align="center">0.9756</td>
<td align="center">0.9709</td>
</tr>
<tr>
<td align="center">Pixel Accuracy</td>
<td align="center">0.9703</td>
<td align="center">0.9712</td>
<td align="center">0.9671</td>
</tr>
<tr>
<td align="center">ROC-AUC</td>
<td align="center">0.9680</td>
<td align="center">0.9704</td>
<td align="center">0.9665</td>
</tr>
<tr>
<td align="center">PR-AUC</td>
<td align="center">0.9472</td>
<td align="center">0.9466</td>
<td align="center">0.9389</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F13" position="float">
<label>FIGURE 13</label>
<caption>
<p>Mean of evaluation metrics chart for U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; models.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g013.tif">
<alt-text content-type="machine-generated">Bar chart comparing evaluation metrics for three models: U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;. Metrics include Intersection over Union, Dice Coefficient, Precision, Recall, Sensitivity Specificity, Pixel Accuracy, ROC-AUC, and PR-AUC. U-Net leads in most categories, especially in Pixel Accuracy and PR-AUC, while U-Net&#x2b;&#x2b; trails slightly across metrics.</alt-text>
</graphic>
</fig>
<fig id="F14" position="float">
<label>FIGURE 14</label>
<caption>
<p>Standard Deviation of evaluation metrics chart for U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; models.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g014.tif">
<alt-text content-type="machine-generated">Bar chart showing the standard deviation of evaluation metrics for three models: U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;. Metrics include IOU, Dice Coefficient, Precision, Recall, Sensitivity Specificity, Pixel Accuracy, ROC-AUC, and PR-AUC. U-Net shows highest standard deviation in Precision and PR-AUC. U-Net&#x2b;&#x2b; shows highest in Recall, Sensitivity Specificity, and Pixel Accuracy. U-Net with Spatial Attention generally has lower deviations across metrics.</alt-text>
</graphic>
</fig>
<p>In terms of IoU, U-Net and U-Net with Spatial Attention demonstrate similarly high values of 0.9292 and 0.9290, respectively, with U-Net&#x2b;&#x2b; slightly lower at 0.9251. This metric reflects the degree of overlap between the predicted and ground truth segmentations, indicating the models&#x2019; effectiveness in capturing the target region.</p>
<p>The Dice Coefficient, another measure of segmentation accuracy, exhibits comparable performance among the models, with U-Net leading at 0.9630, followed closely by U-Net with Spatial Attention and U-Net&#x2b;&#x2b;.</p>
<p>Precision, Recall, and Sensitivity metrics focus on different aspects of classification accuracy. U-Net consistently outperforms the other models in Precision, emphasizing its ability to minimize false positives. On the other hand, U-Net with Spatial Attention and U-Net&#x2b;&#x2b; show competitive performance in Recall and Sensitivity, highlighting their capacity to identify true positives.</p>
<p>Specificity measures the models&#x2019; ability to correctly identify true negatives, and U-Net maintains a slight advantage over the others in this regard. Pixel Accuracy, reflecting the overall accuracy of pixel-wise classification, indicates similar performance across the models.</p>
<p>The ROC-AUC and PR-AUC values, assessing the models&#x2019; discrimination and precision-recall trade-offs, exhibit minor variations among the models. <xref ref-type="table" rid="T4">Tables 4</xref>&#x2013;<xref ref-type="table" rid="T6">6</xref> provide the performance metrics measurement for UNet, UNet&#x2b;&#x2b; and UNet with spatial Attention and are discussed in <xref ref-type="app" rid="app1">Appendix A</xref>.</p>
<p>The standard deviations provided in <xref ref-type="table" rid="T7">Table 7</xref> offer insights into the stability and consistency of each model&#x2019;s performance across different metrics. Generally, U-Net demonstrates lower standard deviations compared to U-Net with Spatial Attention and U-Net&#x2b;&#x2b;, suggesting more consistent results.</p>
<p>In summary, the evaluation metrics collectively suggest that U-Net performs competitively, demonstrating strong segmentation accuracy and consistency. U-Net with Spatial Attention and U-Net&#x2b;&#x2b; exhibit comparable performance, with slight variations in specific metrics. These findings contribute valuable information for selecting an appropriate model based on the desired trade-offs in thermography-based breast region segmentation.</p>
</sec>
<sec id="s5-3">
<label>5.3</label>
<title>Qualitative analysis results</title>
<p>Visual inspection of the segmentation results was conducted using the Grad-CAM heatmaps, focusing on the predicted region of interest generated by each model. <xref ref-type="fig" rid="F15">Figures 15</xref>&#x2013;<xref ref-type="fig" rid="F17">17</xref> display the Grad-CAM heatmaps for U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;, respectively. The color patterns and transitions are observed from the heatmaps, providing a visual representation of how the model assigns importance to different areas in the thermal images. This visual inspection aids in understanding which regions the model identifies as crucial for predicting the presence of breast tissue, contributing to the interpretability of the model&#x2019;s decision-making process.</p>
<fig id="F15" position="float">
<label>FIGURE 15</label>
<caption>
<p>Grad-CAM heatmaps of Predicted Region of Interest for U-Net.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g015.tif">
<alt-text content-type="machine-generated">Array of medical images showing predicted regions of interest using a U-Net model. Each image displays heatmaps with varying levels of activation, indicated by color gradients from blue to red. The grid consists of five rows and six columns, labeled with identifiers such as H81 to H100 and S21 to S30. A color scale on the right represents activation intensity from zero to one.</alt-text>
</graphic>
</fig>
<fig id="F16" position="float">
<label>FIGURE 16</label>
<caption>
<p>Grad-CAM heatmaps of Predicted Region of Interest for U-Net with Spatial Attention.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g016.tif">
<alt-text content-type="machine-generated">A grid of medical images displays predicted regions of interest using U-Net with spatial attention. Each image, labeled H81 to H100 and S21 to S30, depicts heat maps over anatomical scans, showing varying activation levels from blue (low) to red (high), with a color scale on the right indicating activation intensity.</alt-text>
</graphic>
</fig>
<fig id="F17" position="float">
<label>FIGURE 17</label>
<caption>
<p>Grad-CAM heatmaps of Predicted Region of Interest for U-Net&#x2b;&#x2b;.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g017.tif">
<alt-text content-type="machine-generated">A heatmap grid titled &#x22;Predicted Region of Interest (U-Net&#x2b;&#x2b;)&#x22; shows multiple images labeled H81 to H100 and S21 to S30. The images display human torso regions with color gradients indicating activation levels from blue (low) to red (high). A color scale on the right quantifies activation intensity from zero to one.</alt-text>
</graphic>
</fig>
<p>
<xref ref-type="table" rid="T9">Table 9</xref> presents the comparative scores of Breast Region Overlap (BRO) and Noise Handling (NH) for U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; across 30% test images. The scores range from 1 (Poor) to 5 (Excellent) which were explained in Section 4.6.1.</p>
<table-wrap id="T9" position="float">
<label>TABLE 9</label>
<caption>
<p>Comparative scores of breast region overlap (Bro) and noise handling (Nh) for U-net, U-net with spatial attention, and U-Net&#x2b;&#x2b; models across test images.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="center">Test image</th>
<th colspan="2" align="center">U-Net</th>
<th colspan="2" align="center">U-Net with spatial attention</th>
<th colspan="2" align="center">U-Net&#x2b;&#x2b;</th>
</tr>
<tr>
<th align="center">
<italic>BRO</italic>
</th>
<th align="center">
<italic>NH</italic>
</th>
<th align="center">
<italic>BRO</italic>
</th>
<th align="center">
<italic>NH</italic>
</th>
<th align="center">
<italic>BRO</italic>
</th>
<th align="center">
<italic>NH</italic>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">H81</td>
<td align="center">4</td>
<td align="center">4</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H82</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H83</td>
<td align="center">4</td>
<td align="center">4</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">H84</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H85</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H86</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H87</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H88</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H89</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H90</td>
<td align="center">4</td>
<td align="center">4</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H91</td>
<td align="center">3</td>
<td align="center">3</td>
<td align="center">3</td>
<td align="center">1</td>
<td align="center">1</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">H92</td>
<td align="center">3</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H93</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">3</td>
<td align="center">3</td>
<td align="center">3</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H94</td>
<td align="center">3</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">3</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H95</td>
<td align="center">3</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H96</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H97</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">H98</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">H99</td>
<td align="center">3</td>
<td align="center">4</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">H100</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">S21</td>
<td align="center">4</td>
<td align="center">4</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">S22</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">S23</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">1</td>
<td align="center">3</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">S24</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">S25</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">S26</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">1</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">S27</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">S28</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">S29</td>
<td align="center">4</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">S30</td>
<td align="center">3</td>
<td align="center">5</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<xref ref-type="table" rid="T10">Table 10</xref> presents the comparative averaged scores of qualitative evaluations for the U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; models across two criteria: Breast Region Overlap and Noise Handling. A corresponding visual representation is provided in <xref ref-type="fig" rid="F18">Figure 18</xref>, depicting the average scores for these models.</p>
<table-wrap id="T10" position="float">
<label>TABLE 10</label>
<caption>
<p>Comparative averaged scores of qualitative evaluations for U-net, U-net with spatial attention, and U-Net&#x2b;&#x2b; models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Criterion</th>
<th align="center">U-Net</th>
<th align="center">U-Net with spatial attention</th>
<th align="center">U-Net&#x2b;&#x2b;</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">Breast Region Overlap</td>
<td align="center">4.10</td>
<td align="center">2.10</td>
<td align="center">2.13</td>
</tr>
<tr>
<td align="center">Noise Handling</td>
<td align="center">4.7</td>
<td align="center">2.53</td>
<td align="center">1.83</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F18" position="float">
<label>FIGURE 18</label>
<caption>
<p>Averaged scores of qualitative evaluations for U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b; Models.</p>
</caption>
<graphic xlink:href="fbinf-05-1609004-g018.tif">
<alt-text content-type="machine-generated">Bar chart titled &#x22;Averaged Scores of Qualitative Evaluations&#x22; comparing U-NET, U-NET with Spatial Attention, and U-NET&#x2b;&#x2b;. Blue bars represent Breast Region Overlap, and orange bars represent Noise Handling. U-NET scores 4.1 and 4.7; U-NET with Spatial Attention scores 2.1 and 2.53; U-NET&#x2b;&#x2b; scores 2.13 and 1.83.</alt-text>
</graphic>
</fig>
<p>In terms of Breast Region Overlap, U-Net stands out with an impressive average score of 4.10, indicating a significant ability to align precisely with the breast boundaries in thermal images. On the other hand, both U-Net with Spatial Attention and U-Net&#x2b;&#x2b; exhibit lower average scores of 2.10 and 2.13, respectively, suggesting a diminished capability to accurately overlap with the actual breast region.</p>
<p>In terms of Noise Handling, U-Net excels with a high average score of 4.7, show-casing robust noise handling and concentrated activations on the breast area. In contrast, U-Net with Spatial Attention and U-Net&#x2b;&#x2b; struggle with noise, as indicated by their average scores of 2.53 and 1.83, respectively. These models face challenges in maintaining clarity in depicting the breast region due to noticeable noise patterns.</p>
<p>The overall analysis highlights U-Net&#x2019;s superior performance in both Breast Region Overlap and Noise Handling compared to U-Net with Spatial Attention and U-Net&#x2b;&#x2b;. Furthermore, U-Net with Spatial Attention and U-Net&#x2b;&#x2b; exhibit comparable performance, with U-Net&#x2b;&#x2b; showing a slight improvement.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s6">
<label>6</label>
<title>Discussion</title>
<p>The results of the comprehensive evaluation of different optimizers for training deep learning models in breast region segmentation reveal notable variations in efficacy across U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;. The ADAM optimizer consistently outperforms other algorithms, demonstrating reduced loss values and heightened accuracy scores. Surprisingly, the foundational U-Net, trained with ADAM, stands out in terms of effectiveness, challenging conventional assumptions regarding the necessity of architectural complexity for improved outcomes.</p>
<p>The competitive performance of U-Net, despite its foundational design, prompts a reconsideration of the presumed direct correlation between architectural complexity and segmentation accuracy. The nuanced perspective emerging from this study questions the prevailing notion that more intricate models necessarily yield superior results in the specific context of breast region segmentation in thermal images.</p>
<p>The comparative analysis of quantitative results across evaluation metrics provides valuable insights. U-Net exhibits strong segmentation accuracy and consistency, outperforming U-Net with Spatial Attention and U-Net&#x2b;&#x2b;. Despite comparable outcomes in certain metrics, U-Net maintains lower standard deviations, indicating more stable and consistent performance.</p>
<p>The findings highlight the significance of U-Net&#x2019;s foundational architecture, challenging assumptions about the need for complex models in breast region segmentation. The study&#x2019;s outcomes contribute valuable information for selecting models based on desired trade-offs in thermography-based breast region segmentation.</p>
<p>Visual inspection of Grad-CAM heatmaps reinforces the study&#x2019;s quantitative findings. U-Net&#x2019;s impressive Breast Region Overlap and Noise Handling scores suggest its robustness in precisely aligning with breast boundaries and handling noise. In contrast, U-Net with Spatial Attention and U-Net&#x2b;&#x2b; face challenges in noise handling, indicating potential areas for improvement in these models.</p>
<p>The averaged scores further underscore U-Net&#x2019;s superior performance in both criteria, highlighting its effectiveness in breast region segmentation. This aligns with the quantitative results and strengthens the argument for considering foundational U-Net as a viable option in this application.</p>
<p>The study opens avenues for future research by challenging established assumptions and providing a nuanced perspective on the relationship between model architecture, optimization strategies, and segmentation efficacy. Further investigations could explore the transferability of these findings to other medical imaging applications and datasets. Additionally, efforts to enhance the noise handling capabilities of more complex models like U-Net with Spatial Attention and U-Net&#x2b;&#x2b; may lead to improved overall performance.</p>
<p>In conclusion, this study challenges the <italic>status quo</italic> in deep learning for breast region segmentation by showcasing the effectiveness of the foundational U-Net with the ADAM optimizer. The findings have broader implications for the development of deep learning models in medical image analysis, encouraging researchers to reconsider the balance between model complexity and performance in specific applications. <xref ref-type="table" rid="T11">Table 11</xref> compares performance of three models, showing that U-Net achieves highest boundary accuracy, robustness to noise, and faster training with greater stability when optimized with ADAM, making it the most effective for breast region segmentation. Although U-Net with Spatial Attention and U-Net&#x2b;&#x2b; offer marginal improvements in some quantitative metrics, they struggle more with noise handling and require longer, less stable training periods, with ADAM remaining the optimal optimizer across all models.</p>
<table-wrap id="T11" position="float">
<label>TABLE 11</label>
<caption>
<p>Comparative chart summarizing the performance of U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Model</th>
<th align="center">Key performance metrics</th>
<th align="center">Qualitative observations</th>
<th align="center">Observations on Noise handling</th>
<th align="center">Training time and stability</th>
<th align="center">Optimal optimizer</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">U-Net</td>
<td align="center">-IoU (&#x223c;0.935&#x2013;0.945)<break/>Dice (&#x223c;0.961&#x2013;0.972)<break/>- Precision (&#x223c;0.929&#x2013;0.987)<break/>- ROC-AUC &#x223c;0.955&#x2013;0.979</td>
<td align="center">- Strong Boundary and Overlap scores<break/>- Robustness demonstrated via Grad-CAM heatmaps</td>
<td align="center">- Handles noise effectively (scores &#x223c;4.7/5 in qualitative assessment)</td>
<td align="center">- (lower standard deviation)<break/>- Faster training (&#x223c;30 epochs)</td>
<td align="center">ADAM</td>
</tr>
<tr>
<td align="center">U-Net with Spatial Attention</td>
<td align="center">- Slight improvements in some metrics but limited evidence of clear advantage</td>
<td align="center">- Slightly better in some cases but faces challenges with noise</td>
<td align="center">- Struggles with noise, noisier Grad-CAM heatmaps (&#x223c;2.53/5)</td>
<td align="center">- Slightly longer training time; more complex; less stable</td>
<td align="center">ADAM</td>
</tr>
<tr>
<td align="center">U-Net&#x2b;&#x2b;</td>
<td align="center">- IoU (&#x223c;0.913&#x2013;0.945)<break/>- Dice (&#x223c;0.953&#x2013;0.971)<break/>- Precision (&#x223c;0.837&#x2013;0.990)</td>
<td align="center">- Slight improvement in some metrics but less transparent in noise handling</td>
<td align="center">- Less effective noise suppression; higher noise artifacts observed</td>
<td align="center">- Longer training durations due to architectural complexity</td>
<td align="center">ADAM</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The choice of optimizer, particularly ADAM, proved to be crucial across all models, with U-Net trained using ADAM consistently achieving the lowest loss (&#x223c;0.0357) and the highest average accuracy, demonstrating its effectiveness in minimizing errors and enhancing model performance. Grad-CAM heatmaps further highlighted that simpler models like U-Net more effectively delineate breast borders and exhibit greater resilience under noisy conditions, which is essential for medical imaging applications. Although attention mechanisms are generally intended to improve model focus on relevant regions, empirical results indicated they do not significantly outperform the baseline U-Net in noisy thermography images and may introduce additional training instability. Overall, this comparison suggests that the foundational U-Net&#x2014;when optimized with ADAM&#x2014;strikes an optimal balance of simplicity, robustness, interpretability, and computational efficiency, whereas the added architectural complexity of U-Net&#x2b;&#x2b; and attention-based models does not substantially enhance performance and may even create vulnerabilities in handling noisy thermal data for breast region segmentation.</p>
<sec id="s6-1">
<label>6.1</label>
<title>Statistical validation and key insights</title>
<p>While the evaluation metrics demonstrate strong performance across all three U-Net variants, statistical validation is essential to assess whether the observed differences are significant. A pairwise Wilcoxon signed-rank test was applied across the folds of cross-validation for IoU and Dice scores, comparing U-Net against U-Net&#x2b;&#x2b; and U-Net with Spatial Attention. Results indicated no statistically significant improvement (p &#x3e; 0.05) for the more complex models over baseline U-Net. This suggests that architectural sophistication does not guarantee superior outcomes in breast region segmentation using thermal images.</p>
<p>A critical insight from this study is the effectiveness of simpler models. The baseline U-Net with ADAM optimizer consistently produced high Dice (0.9630), IoU (0.9292), and specificity (0.9801) while maintaining computational efficiency and stability. These findings highlight that in medical image analysis, especially with limited datasets, robust optimization and careful training can outweigh added architectural complexity. Thus, for clinical or resource-constrained applications, standard U-Net trained with ADAM offers the best balance between accuracy, interpretability, and computational cost, making it a practical and reliable choice.</p>
</sec>
<sec id="s6-2">
<label>6.2</label>
<title>Novelty and contribution</title>
<p>This study makes a significant contribution to the field of thermography-based breast region segmentation by systematically evaluating and comparing the performance of three deep learning models&#x2014;U-Net, U-Net with Spatial Attention, and U-Net&#x2b;&#x2b;. The novelty of this research lies in its comprehensive analysis of the impact of different optimizers on model training, focusing on ADAM, NADAM, RMSPROP, SGDM, and ADADELTA. Beyond technical benchmarking, the study emphasizes dataset transparency, explicitly detailing the source, acquisition protocol, imaging device, and availability of the DMR-IR dataset, thereby ensuring reproducibility and reliability for future studies. A key finding is the superior performance of the baseline U-Net, particularly when trained with the ADAM optimizer. Despite being less complex than its variants, U-Net demonstrated high segmentation accuracy, interpretability through Grad-CAM, and reduced computational cost&#x2014;highlighting that simplicity coupled with robust optimization can outperform architectural complexity.</p>
<p>From a clinical perspective, these results are highly relevant. U-Net&#x2019;s ability to deliver strong precision and specificity reduces false positives, which is critical in breast cancer screening workflows. Meanwhile, the attention-based U-Net, with its improved sensitivity, may be suited to applications requiring the detection of subtle or ambiguous abnormalities. Together, these findings suggest that thermography, combined with deep learning segmentation, has potential as a low-cost adjunct to existing screening tools, particularly in resource-limited settings. This research contributes valuable insights into the selection of model architectures and optimizers for accurate and interpretable breast region segmentation in thermal images. The results provide a foundation for future research, guiding the development of advanced methodologies in medical imaging while also reinforcing the translational potential of thermography for clinical decision support. A key limitation of this study is that the manual annotations used to generate ground-truth masks were performed solely by the authors. Although cross-verification procedures were applied to minimize bias, the absence of certified radiologist annotations restricts the clinical validity of the segmentation masks. Future work will address this limitation by incorporating expert medical annotations to further strengthen reliability.</p>
</sec>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="http://visual.ic.uff.br/dmi">http://visual.ic.uff.br/dmi</ext-link>.</p>
</sec>
<sec sec-type="ethics-statement" id="s8">
<title>Ethics statement</title>
<p>Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants&#x2019; legal guardians/next of kin in accordance with the national legislation and the institutional requirements.</p>
</sec>
<sec sec-type="author-contributions" id="s9">
<title>Author contributions</title>
<p>RR: Writing &#x2013; original draft, Project administration, Formal Analysis, Visualization, Data curation, Validation, Methodology, Investigation, Software, Writing &#x2013; review and editing. MH: Conceptualization, Resources, Investigation, Validation, Supervision, Writing &#x2013; review and editing, Methodology, Writing &#x2013; original draft. MI: Writing &#x2013; review and editing, Writing &#x2013; original draft, Validation, Supervision, Resources. MA: Writing &#x2013; review and editing, Software, Resources, Funding acquisition, Writing &#x2013; original draft, Supervision.</p>
</sec>
<ack>
<title>Acknowledgements</title>
<p>This work was conducted at the IoT and Wireless Communication Protocols Laboratory, ECE Department, KoE, International Islamic University Malysia (IIUM).</p>
</ack>
<sec sec-type="COI-statement" id="s11">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="ai-statement" id="s12">
<title>Generative AI statement</title>
<p>The author(s) declare that no Generative AI was used in the creation of this manuscript.</p>
<p>Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.</p>
</sec>
<sec sec-type="disclaimer" id="s13">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn fn-type="custom" custom-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/951127/overview">Ayodeji Olalekan Salau</ext-link>, Afe Babalola University, Nigeria</p>
</fn>
<fn fn-type="custom" custom-type="reviewed-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/3060935/overview">Ravichandran Sanmugasundaram</ext-link>, SRM Institute of Science and Technology, India</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/3191536/overview">Saravanan D.</ext-link>, VIT Bhopal University, India</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Adel</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Abdelhamid</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>El-Ramly</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Automatic image segmentation of breast thermograms</article-title>,&#x201d; in <conf-name>presented at the Proceedings of the 2018 7th International Conference on Bioinformatics and Biomedical Science</conf-name>, <fpage>88</fpage>&#x2013;<lpage>94</lpage>. <pub-id pub-id-type="doi">10.1145/3239264.3239279</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Al Husaini</surname>
<given-names>M. A. S.</given-names>
</name>
<name>
<surname>Habaebi</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Suliman</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Islam</surname>
<given-names>M. R.</given-names>
</name>
<name>
<surname>Elsheikh</surname>
<given-names>E. A.</given-names>
</name>
<name>
<surname>Muhaisen</surname>
<given-names>N. A.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Influence of tissue thermophysical characteristics and situ-cooling on the detection of breast cancer</article-title>. <source>Appl. Sci.</source> <volume>13</volume> (<issue>15</issue>), <fpage>8752</fpage>. <pub-id pub-id-type="doi">10.3390/app13158752</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Allugunti</surname>
<given-names>V. R.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Breast cancer detection based on thermographic images using machine learning and deep learning algorithms</article-title>. <source>Int. J. Eng. Comput. Sci.</source> <volume>4</volume> (<issue>1</issue>), <fpage>49</fpage>&#x2013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.33545/26633582.2022.v4.i1a.68</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Azad</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Aghdam</surname>
<given-names>E. K.</given-names>
</name>
<name>
<surname>Rauland</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Azad</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Avval</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Medical image segmentation review: the success of u-net</article-title>. <source>ArXiv Prepr. ArXiv221114830</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2211.14830</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Badrinarayanan</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Kendall</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Cipolla</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Segnet: a deep convolutional encoder-decoder architecture for image segmentation</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>39</volume> (<issue>12</issue>), <fpage>2481</fpage>&#x2013;<lpage>2495</lpage>. <pub-id pub-id-type="doi">10.1109/tpami.2016.2644615</pub-id>
<pub-id pub-id-type="pmid">28060704</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carlos de Carvalho</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Martins Coelho</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Conci</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>de Freitas Oliveira Baffa</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>U-Net convolutional neural networks for breast IR imaging segmentation on frontal and lateral view</article-title>. <source>Comput. Methods Biomech. Biomed. Eng. Imaging Vis.</source> <volume>11</volume> (<issue>3</issue>), <fpage>311</fpage>&#x2013;<lpage>316</lpage>. <pub-id pub-id-type="doi">10.1080/21681163.2022.2040053</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>L.-C.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Papandreou</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Schroff</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Adam</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Encoder-decoder with atrous separable convolution for semantic image segmentation</article-title>,&#x201d;<source>Computer vision &#x2013; ECCV 2018</source>, <person-group person-group-type="editor">
<name>
<surname>Ferrari</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Hebert</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sminchisescu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Weiss</surname>
<given-names>Y.</given-names>
</name>
</person-group>, Eds., in <source>Lecture notes in computer science</source>, <volume>11211</volume>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>. <fpage>833</fpage>&#x2013;<lpage>851</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-01234-2_49</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dafni Rose</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>VijayaKumar</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>S. K.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Computer-aided diagnosis for breast cancer detection and classification using optimal region growing segmentation with MobileNet model</article-title>. <source>Concurr. Eng.</source> <volume>30</volume> (<issue>2</issue>), <fpage>181</fpage>&#x2013;<lpage>189</lpage>. <pub-id pub-id-type="doi">10.1177/1063293x221080518</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2022</year>). &#x201c;<article-title>Scaling up your kernels to 31&#xd7;31: revisiting large kernel design in CNNs</article-title>,&#x201d; in <conf-name>Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit</conf-name>, <fpage>11953</fpage>&#x2013;<lpage>11965</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR52688.2022.01166</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Dozat</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Incorporating nesterov momentum into adam</source>.</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gargari</surname>
<given-names>M. S.</given-names>
</name>
<name>
<surname>Seyedi</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Alilou</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Segmentation of retinal blood vessels using U-Net&#x2b;&#x2b; architecture and disease prediction</article-title>. <source>Electronics</source> <volume>11</volume> (<issue>21</issue>), <fpage>3516</fpage>. <pub-id pub-id-type="doi">10.3390/electronics11213516</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Adaptive enhanced swin transformer with U-net for remote sensing image segmentation</article-title>. <source>Comput. Electr. Eng.</source> <volume>102</volume>, <fpage>108223</fpage>. <pub-id pub-id-type="doi">10.1016/j.compeleceng.2022.108223</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Guan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kamona</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Loew</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Segmentation of thermal breast images using convolutional and deconvolutional neural networks</article-title>,&#x201d; in <conf-name>presented at the 2018 IEEE applied imagery pattern recognition workshop (AIPR)</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>7</lpage>.</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Szemenyei</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Yi</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2021</year>). &#x201c;<article-title>SA-UNet: spatial attention U-Net for retinal vessel segmentation</article-title>,&#x201d; in <conf-name>2020 25th International Conference on Pattern Recognition (ICPR)</conf-name> (<publisher-loc>Milan, Italy</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1236</fpage>&#x2013;<lpage>1242</lpage>. <pub-id pub-id-type="doi">10.1109/ICPR48806.2021.9413346</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Gkioxari</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Doll&#xe1;r</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Girshick</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Mask r-cnn</article-title>,&#x201d; in <conf-name>presented at the Proceedings of the IEEE international conference on computer vision</conf-name>, <fpage>2961</fpage>&#x2013;<lpage>2969</lpage>.</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Islam Sumon</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Bhattacharjee</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hwang</surname>
<given-names>Y. B.</given-names>
</name>
<name>
<surname>Rahman</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>H. C.</given-names>
</name>
<name>
<surname>Ryu</surname>
<given-names>W. S.</given-names>
</name>
<etal/>
</person-group> (<year>2023</year>). <article-title>Densely convolutional spatial attention network for nuclei segmentation of histological images for computational pathology</article-title>. <source>Front. Oncol.</source> <volume>13</volume>, <fpage>1009681</fpage>. <pub-id pub-id-type="doi">10.3389/fonc.2023.1009681</pub-id>
<pub-id pub-id-type="pmid">37305563</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="web">
<person-group person-group-type="author">
<name>
<surname>Kingma</surname>
<given-names>D. P.</given-names>
</name>
<name>
<surname>Ba</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Adam: a method for stochastic optimization</article-title>. <comment>Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1412.6980">http://arxiv.org/abs/1412.6980</ext-link> (Accessed</comment>: <comment>October. 19, 2023)</comment>.</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.-D.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>SSAU-Net: a spectral&#x2013;spatial attention-based U-Net for hyperspectral image fusion</article-title>. <source>IEEE Trans. Geosci. Remote Sens.</source> <volume>60</volume>, <fpage>1</fpage>&#x2013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1109/tgrs.2022.3217168</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Lou</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kamona</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Loew</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Segmentation of infrared breast images using MultiResUnet neural networks</article-title>,&#x201d; in <conf-name>2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)</conf-name> (<publisher-loc>Washington, DC, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/AIPR47015.2019.9316541</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Mendes</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Rodrigues</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Izidoro</surname>
<given-names>S. C.</given-names>
</name>
<name>
<surname>Conci</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Liatsis</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Roi extraction in thermographic breast images using genetic algorithms</article-title>,&#x201d; in <conf-name>presented at the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP)</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>111</fpage>&#x2013;<lpage>115</lpage>.</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Micallef</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Seychell</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bajada</surname>
<given-names>C. J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Exploring the u-net&#x2b;&#x2b; model for automatic brain tumor segmentation</article-title>. <source>IEEE Access</source> <volume>9</volume>, <fpage>125523</fpage>&#x2013;<lpage>125539</lpage>. <pub-id pub-id-type="doi">10.1109/access.2021.3111131</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mohamed</surname>
<given-names>E. A.</given-names>
</name>
<name>
<surname>Gaber</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Karam</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Rashed</surname>
<given-names>E. A.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>A novel CNN pooling layer for breast cancer segmentation and classification from thermograms</article-title>. <source>PLOS ONE</source> <volume>17</volume> (<issue>10</issue>), <fpage>e0276523</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0276523</pub-id>
<pub-id pub-id-type="pmid">36269756</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mokhtar</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Abdel-Galil</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Khoriba</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Brain tumor semantic segmentation using residual U-Net&#x2b;&#x2b; encoder-decoder architecture</article-title>. <source>Int. J. Adv. Comput. Sci. Appl.</source> <volume>14</volume> (<issue>6</issue>). <pub-id pub-id-type="doi">10.14569/ijacsa.2023.01406119</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Punn</surname>
<given-names>N. S.</given-names>
</name>
<name>
<surname>Agarwal</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>RCA-IUnet: a residual cross-spatial attention-guided inception U-Net model for tumor segmentation in breast ultrasound imaging</article-title>. <source>Mach. Vis. Appl.</source> <volume>33</volume> (<issue>2</issue>), <fpage>27</fpage>. <pub-id pub-id-type="doi">10.1007/s00138-022-01280-3</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qian</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>On the momentum term in gradient descent learning algorithms</article-title>. <source>Neural Netw.</source> <volume>12</volume> (<issue>1</issue>), <fpage>145</fpage>&#x2013;<lpage>151</lpage>. <pub-id pub-id-type="doi">10.1016/s0893-6080(98)00116-6</pub-id>
<pub-id pub-id-type="pmid">12662723</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Radhi</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Kamil</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>An automatic segmentation of breast ultrasound images using U-Net model</article-title>. <source>SJEE</source> <volume>20</volume> (<issue>2</issue>), <fpage>191</fpage>&#x2013;<lpage>203</lpage>. <pub-id pub-id-type="doi">10.2298/sjee2302191r</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Ronneberger</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Fischer</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Brox</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>U-net: convolutional networks for biomedical image segmentation</article-title>,&#x201d; in <conf-name>presented at the Medical Image Computing and Computer-Assisted Intervention&#x2013;MICCAI 2015: 18th International Conference</conf-name>, <conf-loc>Munich, Germany</conf-loc>, <conf-date>October 5-9, 2015</conf-date> (<publisher-name>Springer</publisher-name>), <fpage>234</fpage>&#x2013;<lpage>241</lpage>.</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>S&#xe1;nchez-Ruiz</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Pineda</surname>
<given-names>I. O.</given-names>
</name>
<name>
<surname>Olvera-L&#xf3;pez</surname>
<given-names>J. A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Automatic segmentation in breast thermographic images based on local pattern variations</article-title>. <source>Res. Comput. Sci.</source> <volume>147</volume> (<issue>11</issue>), <fpage>53</fpage>&#x2013;<lpage>66</lpage>. <pub-id pub-id-type="doi">10.13053/rcs-147-11-5</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Selvaraju</surname>
<given-names>R. R.</given-names>
</name>
<name>
<surname>Cogswell</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Das</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Vedantam</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Parikh</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Batra</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Grad-CAM: visual explanations from deep networks via gradient-based localization</article-title>. <pub-id pub-id-type="doi">10.48550/arXiv.1610.02391</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Silva</surname>
<given-names>L. F.</given-names>
</name>
<name>
<surname>Saade</surname>
<given-names>D. C. M.</given-names>
</name>
<name>
<surname>Sequeiros</surname>
<given-names>G. O.</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Paiva</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Bravo</surname>
<given-names>R. S.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>A new database for breast research with infrared image</article-title>. <source>J. Med. Imaging Health Inf.</source> <volume>4</volume> (<issue>1</issue>), <fpage>92</fpage>&#x2013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1166/jmihi.2014.1226</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Singh</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Arora</surname>
<given-names>A. S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Automated approaches for ROIs extraction in medical thermography: a review and future directions</article-title>. <source>Multimed. Tools Appl.</source> <volume>79</volume>, <fpage>15273</fpage>&#x2013;<lpage>15296</lpage>. <pub-id pub-id-type="doi">10.1007/s11042-018-7113-z</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Soomro</surname>
<given-names>T. A.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Afifi</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Ali</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Soomro</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yin</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Image segmentation for MR brain tumor detection using machine learning: a review</article-title>. <source>IEEE Rev. Biomed. Eng.</source> <volume>16</volume>, <fpage>70</fpage>&#x2013;<lpage>90</lpage>. <pub-id pub-id-type="doi">10.1109/rbme.2022.3185292</pub-id>
<pub-id pub-id-type="pmid">35737636</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sung</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ferlay</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Siegel</surname>
<given-names>R. L.</given-names>
</name>
<name>
<surname>Laversanne</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Soerjomataram</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Jemal</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries</article-title>. <source>Ca. Cancer J. Clin.</source> <volume>71</volume> (<issue>3</issue>), <fpage>209</fpage>&#x2013;<lpage>249</lpage>. <pub-id pub-id-type="doi">10.3322/caac.21660</pub-id>
<pub-id pub-id-type="pmid">33538338</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Szegedy</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Vanhoucke</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Ioffe</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Shlens</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wojna</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Rethinking the inception architecture for computer vision</article-title>. <source>Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.</source> <volume>2016</volume>, <fpage>2818</fpage>&#x2013;<lpage>2826</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2016.308</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Tan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Efficientnet: rethinking model scaling for convolutional neural networks</article-title>,&#x201d; in <source>Presented at the International conference on machine learning</source>. <publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>PMLR</publisher-name>, <fpage>6105</fpage>&#x2013;<lpage>6114</lpage>.</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tieleman</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude</article-title>. <source>COURSERA Neural Netw. Mach. Learn.</source> <volume>4</volume> (<issue>2</issue>), <fpage>26</fpage>&#x2013;<lpage>31</lpage>.</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Venkatachalam</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Shanmugam</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Heltin Genitha</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Automated breast boundary segmentation to improve the accuracy of identifying abnormalities in breast thermograms</article-title>. <source>IETE J. Res.</source> <volume>70</volume>, <fpage>1462</fpage>&#x2013;<lpage>1471</lpage>. <pub-id pub-id-type="doi">10.1080/03772063.2023.2194277</pub-id>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yin</surname>
<given-names>X.-X.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>U-Net-Based medical image segmentation</article-title>. <source>J. Healthc. Eng.</source> <volume>2022</volume>, <fpage>1</fpage>&#x2013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1155/2022/4189781</pub-id>
<pub-id pub-id-type="pmid">35463660</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zeiler</surname>
<given-names>M. D.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Adadelta: an adaptive learning rate method</article-title>. <source>ArXiv Prepr. ArXiv12125701</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1212.5701</pub-id>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Shuai</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Segmentation of skin lesions image based on U-Net&#x2b;&#x2b;</article-title>. <source>Multimed. Tools Appl.</source> <volume>81</volume> (<issue>6</issue>), <fpage>8691</fpage>&#x2013;<lpage>8717</lpage>. <pub-id pub-id-type="doi">10.1007/s11042-022-12067-z</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Rahman Siddiquee</surname>
<given-names>M. M.</given-names>
</name>
<name>
<surname>Tajbakhsh</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Unet&#x2b;&#x2b;: a nested u-net architecture for medical image segmentation</article-title>,&#x201d; in <conf-name>presented at the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018</conf-name>, <conf-loc>Granada, Spain</conf-loc>, <conf-date>September 20, 2018</conf-date> (<publisher-name>Springer</publisher-name>), <fpage>3</fpage>&#x2013;<lpage>11</lpage>.</mixed-citation>
</ref>
</ref-list>
<app-group>
<app id="app1">
<title>Appendix A</title>
<p>Since our results reported for each fold of the ten-fold cross-validation (<xref ref-type="table" rid="T4">Tables 4</xref>&#x2013;<xref ref-type="table" rid="T6">6</xref>), we conducted a paired statistical analysis across the folds. We used the Friedman test for the three models, followed by pairwise Wilcoxon tests with Holm correction, and also reported effect sizes (Cohen&#x2019;s dz). The analysis showed no statistically significant differences between the models in IoU and Dice across the folds (e.g., Friedman for IoU: p &#x2248; 0.90, Dice: p &#x2248; 0.84; all pairwise comparisons were non-significant after correction). This result is consistent with the small observed differences and the difficulty of achieving substantial improvement over the baseline U-Net model. We note that the tests were conducted on cross-validation folds, which are not fully independent, making the analysis conservative; hence, we used non-parametric paired tests to account for this.</p>
</app>
</app-group>
</back>
</article>