AUTHOR=Muhammad Dost , Bendechache Malika 

TITLE=More than just a heatmap: elevating XAI with rigorous evaluation metrics

JOURNAL=Frontiers in Medical Technology

VOLUME=Volume 7 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/medical-technology/articles/10.3389/fmedt.2025.1674343

DOI=10.3389/fmedt.2025.1674343

ISSN=2673-3129

ABSTRACT=BackgroundMagnetic Resonance Imaging (MRI) and ultrasound are central to tumour diagnosis and treatment planning. Although Deep learning (DL) models achieve strong prediction performance, high computational demand and limited explainability can hinder clinical adoption. Common post hoc Explainable Artificial Intelligence (XAI) methods namely Grad-CAM, LIME, and SHAP often yield fragmented or anatomically misaligned saliency maps.MethodsWe propose SpikeNet, a hybrid framework that combines Convolutional Neural Networks (CNNs) for spatial feature encoding with Spiking Neural Networks (SNNs)for efficient, event driven processing. SpikeNet includes a native saliency module that produces explanations during inference. We also introduce XAlign, a metric that quantifies alignment between explanations and expert tumour annotations by integrating regional concentration, boundary adherence, and dispersion penalties. Evaluation follows patient level cross validation on TCGA–LGG (MRI, 22 folds) and BUSI (ultrasound, 5 folds), with slice level predictions aggregated to patient level decisions and BUSI treated as a three class task. We report per image latency and throughput alongside accuracy, precision, recall, F1, AUROC, and AUPRC.ResultsSpikeNet achieved high prediction performance with tight variability across folds. On TCGA–LGG it reached 97.12±0.63% accuracy and 97.43±0.60% F1; on BUSI it reached 98.23±0.58% accuracy and 98.32±0.50% F1. Patient level AUROC and AUPRC with 95% confidence intervals further support these findings. On a single NVIDIA RTX 3090 with batch size 16 and FP32 precision, per image latency was about 31 ms and throughput about 32 images per second, with the same settings applied to all baselines. Using XAlign, SpikeNet produced explanations with higher alignment than Grad-CAM, LIME, and SHAP on both datasets. Dataset level statistics, paired tests, and sensitivity analyses over XAlign weights and explanation parameters confirmed robustness.ConclusionSpikeNet delivers accurate, low latency, and explainable analysis for MRI and ultrasound by unifying CNN based spatial encoding, sparse spiking computation, and native explanations. The XAlign metric provides a clinically oriented assessment of explanation fidelity and supports consistent comparison across methods. These results indicate the potential of SpikeNet and XAlign for trustworthy and efficient clinical decision support.