<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2025.1662984</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Search-optimized quantization in biomedical ontology alignment</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Bouaggad</surname> <given-names>Oussama</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/3125418/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/investigation/"/>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/validation/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Grabar</surname> <given-names>Natalia</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2289904/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/resources/"/>
<role content-type="https://credit.niso.org/contributor-roles/data-curation/"/>
<role content-type="https://credit.niso.org/contributor-roles/project-administration/"/>
<role content-type="https://credit.niso.org/contributor-roles/supervision/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>CNRS, Univ. Lille, UMR 8163 - STL - Savoirs Textes Langage</institution>, <addr-line>Lille</addr-line>, <country>France</country></aff>
<aff id="aff2"><sup>2</sup><institution>Univ. Lille, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille</institution>, <addr-line>Lille</addr-line>, <country>France</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Mini Han Wang, Zhuhai People&#x00027;s Hospital, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Harishchander Anandaram, Amrita Vishwa Vidyapeetham University, India</p>
<p>Kai Xiao, South China University of Technology, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Oussama Bouaggad <email>oussama.bouaggad&#x00040;univ-lille.fr</email></corresp>
<fn fn-type="other" id="fn001"><p>&#x02020;ORCID: Oussama Bouaggad <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0009-0008-5792-6847">orcid.org/0009-0008-5792-6847</ext-link></p></fn>
<fn fn-type="other" id="fn002"><p>Natalia Grabar <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0000-0002-5120-0382">orcid.org/0000-0002-5120-0382</ext-link></p></fn></author-notes>
<pub-date pub-type="epub">
<day>10</day>
<month>10</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>8</volume>
<elocation-id>1662984</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>07</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>18</day>
<month>08</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2025 Bouaggad and Grabar.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Bouaggad and Grabar</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>In the fast-moving world of AI, as organizations and researchers develop more advanced models, they face challenges due to their sheer size and computational demands. Deploying such models on edge devices or in resource-constrained environments adds further challenges related to energy consumption, memory usage and latency. To address these challenges, emerging trends are shaping the future of efficient model optimization techniques. From this premise, by employing supervised state-of-the-art transformer-based models, this research introduces a systematic method for ontology alignment, grounded in cosine-based semantic similarity between a biomedical layman vocabulary and the Unified Medical Language System (UMLS) Metathesaurus. It leverages Microsoft Olive to search for target optimizations among different Execution Providers (EPs) using the ONNX Runtime backend, followed by an assembled process of dynamic quantization employing Intel Neural Compressor and IPEX (Intel Extension for PyTorch). Through our optimization process, we conduct extensive assessments on the two tasks from the DEFT 2020 Evaluation Campaign, achieving a new state-of-the-art in both. We retain performance metrics intact, while attaining an average inference speed-up of 20x and reducing memory usage by 70%.</p></abstract>
<kwd-group>
<kwd>UMLS Metathesaurus</kwd>
<kwd>ontology alignment</kwd>
<kwd>semantic similarity</kwd>
<kwd>transformer models</kwd>
<kwd>model optimization</kwd>
<kwd>model quantization</kwd>
</kwd-group>
<counts>
<fig-count count="4"/>
<table-count count="5"/>
<equation-count count="12"/>
<ref-count count="59"/>
<page-count count="13"/>
<word-count count="9148"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Medicine and Public Health</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1 Introduction</title>
<p>Biomedical ontology alignment refers to the process of matching semantically related entities across diverse knowledge sources (databases) to facilitate the integration of heterogeneous data. The historical impetus for biomedical ontology alignment arose from the need to consolidate independently developed knowledge sources, each characterized by distinct data vocabularies. In this domain, the Unified Medical Language System (UMLS) Metathesaurus (<xref ref-type="bibr" rid="B2">Bodenreider, 2004</xref>), developed under the auspices of the U.S. National Library of Medicine (NLM), serves as a cornerstone.<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> The UMLS Metathesaurus, which comprises the most extensive collection of biomedical ontologies, including terminologies, controlled vocabularies, thesauri, and classifications, provides an essential framework for unifying standardized knowledge sources. With the ongoing evolution of this project, its size has reached over 10 million atoms, derived from more than 200 controlled vocabularies grouped into approximately 4 million concepts. Its maintenance process is costly, time-consuming, and places significant demands on expert editors. However, decades of meticulous manual curation provide ample material for modern supervised learning applications, establishing UMLS as a foundational resource for ontology alignment. Conversely, the biomedical layman vocabulary (<xref ref-type="bibr" rid="B26">Koptient and Grabar, 2020</xref>) is designed to support the adaptation and simplification of medical texts. Its purpose is to enhance understanding of health-related documents for non-expert audiences, such as patients. Its size is steadily increasing, although it remains significantly smaller than that of large-scale terminologies. The alignment of the layman vocabulary with UMLS is important for ensuring that structured medical knowledge is accessible and useful to non-experts, thereby improving the effectiveness of healthcare communication. This helps bridge the language gap between clinicians and patients, allowing for dynamic adjustment of linguistic complexity. Nevertheless, achieving accurate alignment between layman and expert terms presents significant challenges. These include lexical variation, contextual ambiguity, and the frequent absence of direct one-to-one concept mappings. Furthermore, layman expressions often lack the ontological grounding and semantic precision of formal vocabularies, making purely symbolic or rule-based methods inadequate.</p>
<p>Alongside this, advances in Natural Language Processing (NLP), such as entity linking and semantic similarity, are continuously evolving through state-of-the-art transformer-based supervised deep learning models, incorporating feature engineering with specialized domain knowledge. In this contextualized undertaking, we propose using two approaches, the <sc>krissbert</sc> (Knowledge-RIch Self-Supervision) model developed by Microsoft Research (<xref ref-type="bibr" rid="B58">Zhang et al., 2022</xref>) and the large variant of the S<sc>apbert</sc> model from Cambridge LTL (<xref ref-type="bibr" rid="B30">Liu et al., 2021</xref>) to align the layman vocabulary with UMLS via cosine-based semantic similarity.</p>
<p>Upon generating the vocabulary, the biomedical alignments are manually verified by expert human annotators using a six-point rating scale, ranging from 0 to 5, to assess degrees of similarity (<xref ref-type="bibr" rid="B8">Dagan et al., 2009</xref>). Additional semantic information is included by incorporating all Metathesaurus data file domains and their respective hierarchical structures. These are systematically aligned by means of a left join propagation based on the common <italic>CUI (Concept Unique Identifier)</italic> field.</p>
<p>In conjunction with this, model selection is based on the distinct characteristics of each model, as no single transformer is expected to consistently handle all nuanced details and noise in alignments. Hence, a dual-model approach is used, ensuring that inaccuracies from one model are mitigated by the other. To operationalize this complementarity, alignments are merged iteratively in descending order of rating: starting with all alignments rated 5 by one model, followed by those rated 5 by the other model that are not already included, and proceeding through lower-rated alignments until a comprehensive, high-confidence set is constructed. This dualism leverages the complementary strengths of <sc>KRISSBERT</sc> and S<sc>ap</sc>BERT, ensuring robust performance across diverse biomedical vocabulary contexts. The <sc>KRISSBERT</sc> model addresses ambiguity and context-ignorance, particularly where entities share similar surface forms, by harnessing contextual information to improve identification accuracy. This is achieved by training a contextual mention encoder using contrastive learning with a transformer-based encoder (<xref ref-type="bibr" rid="B47">Vaswani et al., 2017</xref>) and improving linking accuracy by re-ranking the top <italic>K</italic> candidates with a cross-attention encoder (<xref ref-type="bibr" rid="B31">Logeswaran et al., 2019</xref>; <xref ref-type="bibr" rid="B52">Wu L. et al., 2020</xref>). On the other hand, the large version of S<sc>ap</sc>BERT introduces a pretraining metric learning framework grounded in self-supervised masked language modeling. It learns to self-align synonymous biomedical entities, accurately capturing fine-grained semantic relationships by clustering synonyms under the same concept. It distinguishes itself from existing systems through a streamlined design that eliminates complex hybrid tuning components, directly encoding and aligning medical entities from raw text (<xref ref-type="bibr" rid="B54">Xu et al., 2020</xref>; <xref ref-type="bibr" rid="B22">Ji et al., 2020</xref>; <xref ref-type="bibr" rid="B45">Sung et al., 2020</xref>).</p>
<p>The large scale of the alignment task imposes a significant computational cost, laying the groundwork for a bottleneck. For this reason, we propose an interoperable cutting-edge optimization process focused on quantization. Fundamentally, it is significant to highlight that the performance of the alignment techniques is intricately linked to two major factors: time requirements and computational resource limitations. Accordingly, M<sc>icrosoft</sc> O<sc>live</sc> is leveraged to intelligently search for optimizations among different Execution Providers (EPs) using the ONNX R<sc>untime</sc> backend. Sequentially, an accuracy-preserving quantization is then applied using I<sc>ntel</sc> N<sc>eural</sc> C<sc>ompressor</sc> and IPEX, with S<sc>mooth</sc>Q<sc>uant</sc> (<xref ref-type="bibr" rid="B53">Xiao et al., 2024</xref>). This approach shifts quantization complexity from activations to weights. It strategically engineers the scaling factor matrix <italic>S</italic> to parameterize this process, along with the smoothing factor &#x003B1;, in order to mathematically resolve both the dequantization complexity and the inherent incompatibility with modern accelerated hardware computation kernels. The latter requires high efficiency and cannot tolerate lower-throughput operations. To further assess the optimization impact, calibration procedures are systematically conducted using diverse biomedical datasets, specifically aimed at evaluating model performance in aligning terminology across heterogeneous sources.</p>
<p>To rigorously quantify the robustness of our optimization strategies through the trade-off between performance, latency, and resource consumption, we conduct comprehensive evaluations using the <monospace>huggingface_metrics</monospace> backend. These are carried out on the two established benchmark tasks from the DEFT 2020 Evaluation Campaign (<xref ref-type="bibr" rid="B4">Cardon et al., 2020</xref>), as they closely align with our core research objectives. Our work democratizes the use of deep learning applications by offering a scalable, turnkey solution that significantly reduces serving costs without compromising model accuracy.</p>
</sec>
<sec id="s2">
<title>2 Related work</title>
<sec>
<title>2.1 Biomedical ontology alignment</title>
<p>Since knowledge source builders concerned with developing health systems for various model organisms joined to create the Gene Ontology Consortium in 1998, the need for biomedical ontology alignment applications (<xref ref-type="bibr" rid="B27">Lambrix, 2004</xref>) has grown significantly, aiming to determine correspondences between concepts across different ontologies (<xref ref-type="bibr" rid="B10">Euzenat and Shvaiko, 2007</xref>). Scalable logic-based ontology matching systems, including L<sc>og</sc>M<sc>ap</sc> (<xref ref-type="bibr" rid="B23">Jim&#x000E9;nez-Ruiz and Cuenca Grau, 2011</xref>) and A<sc>greement</sc>M<sc>aker</sc>L<sc>ight</sc> (AML) (<xref ref-type="bibr" rid="B12">Faria et al., 2013</xref>), treat alignment as a sequential process, starting with lexical matching, followed by mapping extension and correction. However, these systems primarily consider surface-level text forms, neglecting word semantics.</p>
<p>Recent machine learning approaches, such as D<sc>eep</sc>A<sc>lignment</sc> (<xref ref-type="bibr" rid="B25">Kolyvakis et al., 2018</xref>) and O<sc>nto</sc>E<sc>mma</sc> (<xref ref-type="bibr" rid="B48">Wang L. L. et al., 2018</xref>), map words into vector spaces using embeddings, where semantically closer words have smaller similarity distances. Yet, non-contextual embeddings limit their ability to disambiguate meaning. Fine-tuned BERT models (<xref ref-type="bibr" rid="B18">He et al., 2021</xref>) and Siamese Neural Networks (<sc>SiamNN</sc>) (<xref ref-type="bibr" rid="B6">Chen et al., 2021</xref>) demonstrate improved performance, but challenges remain due to limited annotated data and the large entity space.</p>
<p>To address these challenges, we adopt ontology alignment systems based on state-of-the-art supervised learning schemes, utilizing domain-specific knowledge from UMLS. Our approach combines <sc>KRISSBERT</sc> (<xref ref-type="bibr" rid="B58">Zhang et al., 2022</xref>), which effectively resolves variations and ambiguities among millions of entities through self-supervision, and the large S<sc>ap</sc>BERT variant (<xref ref-type="bibr" rid="B30">Liu et al., 2021</xref>), which employs an extensive metric learning framework to self-align synonymous biomedical entities, linking synonyms into a unified semantic notion. Unlike pragmatic pretrained models, notably <sc>Biobert</sc> (Lee, <xref ref-type="bibr" rid="B29">Lee et al., 2020</xref>), P<sc>ubMedbert</sc> (<xref ref-type="bibr" rid="B14">Gu et al., 2021</xref>), and <sc>Bioformer</sc> (<xref ref-type="bibr" rid="B11">Fang et al., 2023</xref>), which still require labeled data such as gold mention occurrences, constrained by annotation scarcity across expansive biomedical domains, and struggle to produce well-differentiated embedding spaces, our approach captures contextual meaning more efficiently. It coherently retrieves all UMLS entities sharing surface forms and supports the generation of distinct representations for semantically different biomedical concepts.</p>
</sec>
<sec>
<title>2.2 Model optimizations</title>
<p>Techniques for accelerating and compressing deep learning models have garnered significant attention due to their ability to reduce parameters, computations, and energy-intensive memory access. Optimization methods in neural networks date back to the late 1980s (<xref ref-type="bibr" rid="B28">LeCun et al., 1989</xref>; <xref ref-type="bibr" rid="B36">Nowlan and Hinton, 1992</xref>), with quantization (approximating numerical components with low bit-width precision) (<xref ref-type="bibr" rid="B21">Jacob et al., 2018</xref>; <xref ref-type="bibr" rid="B51">Wu H. et al., 2020</xref>; <xref ref-type="bibr" rid="B40">Rokh et al., 2023</xref>), pruning (removing less important connections to create sparse networks) (<xref ref-type="bibr" rid="B17">Hassibi and Stork, 1992</xref>; <xref ref-type="bibr" rid="B13">Frankle and Carbin, 2019</xref>), and knowledge distillation (teacher-student neural model paradigm) (<xref ref-type="bibr" rid="B19">Hinton et al., 2015</xref>; <xref ref-type="bibr" rid="B55">Xu et al., 2017</xref>) becoming widely adopted. These techniques allow smaller models to operate efficiently within energy-saving on-chip memory, reducing reliance on high-latency off-chip DRAM. Recent advances highlight the importance of combining optimization strategies for greater efficiency (<xref ref-type="bibr" rid="B50">Wang et al., 2020</xref>; <xref ref-type="bibr" rid="B37">Park et al., 2022</xref>). Quantization, achieving significant compression with minimal accuracy loss (<xref ref-type="bibr" rid="B5">Carreira-Perpi&#x000F1;&#x000E1;n, 2017</xref>), is often paired with pruning (<xref ref-type="bibr" rid="B57">Yu et al., 2020</xref>; <xref ref-type="bibr" rid="B38">Qu et al., 2020</xref>), automatic mixed precision (<xref ref-type="bibr" rid="B32">Micikevicius et al., 2018</xref>; <xref ref-type="bibr" rid="B39">Rakka et al., 2022</xref>), and performance tuning (<xref ref-type="bibr" rid="B41">Roy et al., 2023</xref>) in sequential pipelines. Extensively applied in transformers (<xref ref-type="bibr" rid="B43">Shen et al., 2020</xref>; <xref ref-type="bibr" rid="B24">Kim et al., 2021</xref>; <xref ref-type="bibr" rid="B42">Schaefer et al., 2023</xref>), quantization benefits from techniques such as weight equalization (<xref ref-type="bibr" rid="B34">Nagel et al., 2019</xref>) and channel splitting (<xref ref-type="bibr" rid="B59">Zhao et al., 2019</xref>), which address weight outliers but fall short in handling activation outliers, a persistent bottleneck. To solve these challenges, our novel proposed quantization approach mitigates activation outliers by shifting the complexity to weight quantization (<xref ref-type="bibr" rid="B53">Xiao et al., 2024</xref>), streamlining computational operations.</p>
</sec>
<sec>
<title>2.3 End-to-end hardware-aware optimizations</title>
<p>Initially, researchers focused on software-level optimizations before addressing hardware efficiency (<xref ref-type="bibr" rid="B16">Han et al., 2015</xref>; <xref ref-type="bibr" rid="B7">Courbariaux et al., 2015</xref>). However, such a static approach fails to exploit the full potential of combining diverse compression techniques to improve performance (<xref ref-type="bibr" rid="B15">Guo et al., 2016</xref>; <xref ref-type="bibr" rid="B56">Yang et al., 2020</xref>). By optimizing memory access patterns and leveraging parallelism, compressed models significantly reduce both hardware costs and computational resource demands (<xref ref-type="bibr" rid="B44">Shivapakash et al., 2020</xref>; <xref ref-type="bibr" rid="B20">Huai et al., 2023</xref>; <xref ref-type="bibr" rid="B1">Balaskas et al., 2024</xref>). To this end, we leverage M<sc>icrosoft</sc> O<sc>live</sc>, with its dedicated hardware-aware ecosystem, to systematically engineer and automate the optimization process.</p>
</sec>
</sec>
<sec sec-type="methods" id="s3">
<title>3 Methodology</title>
<p>In line with our study objective, which focuses on aligning biomedical ontologies using cosine similarity measures, we align the concatenation of two fields, <italic>Biomedical Term</italic> and <italic>Public Explanation</italic>, from the layman biomedical vocabulary with all the French entries in the <italic>String (ST)</italic> field of the <monospace>MRCONSO.RRF</monospace> raw file from the AB2024 UMLS Metathesaurus release. To accomplish this, we devised a sequential algorithmic search process designed to optimize model performance across multiple EPs. It integrates network compression, parallel processing, and memory transfer optimization through M<sc>icrosoft</sc> O<sc>live</sc>, in cooperation with the ONNX R<sc>untime</sc> backend, thus enabling efficient and scalable execution. Furthermore, within this framework, we employ I<sc>ntel</sc> N<sc>eural</sc> C<sc>ompressor</sc> and IPEX, incorporating the logic of S<sc>mooth</sc>Q<sc>uant</sc>, to design a search-optimized, on-the-fly quantization strategy (W8A8). This approach uniformly shifts the burden from activation outliers to weights, thereby enhancing compatibility with specific hardware-accelerated kernels.</p>
<p>By adopting this strategy, memory usage is significantly reduced and inference speed improved, both critical factors for effective alignment. This synergy, essential to the performance of biomedical ontology systems, depends on these optimizations to ensure dynamic scalability.</p>
<sec>
<title>3.1 Formal definition</title>
<p>An ontology is typically defined as an explicit specification of a conceptualization. It often uses representational vocabularies to describe a domain of interest, with the main components being entities<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> and axioms. Ontology alignment involves matching cross-ontology entities with equivalence, subsumption, or related relationships. Alongside this, the current study focuses on equivalence alignment between classes.<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref></p>
<p>The ontology alignment system inputs a pair of ontologies, <italic>O</italic> and <italic>O</italic>&#x02032;, with class sets <italic>C</italic> and <italic>C</italic>&#x02032;. It generates, using cosine similarity, a set of scored mappings in the form (<italic>c</italic> &#x02208; <italic>C, c</italic>&#x02032; &#x02208; <italic>C</italic>&#x02032;, <italic>P</italic>(<italic>c</italic> &#x02261; <italic>c</italic>&#x02032;)), where <italic>P</italic>(<italic>c</italic> &#x02261; <italic>c</italic>&#x02032;) &#x02208; [0, 1] is the probability score (<italic>mapping value</italic>) of equivalence between <italic>c</italic> and <italic>c</italic>&#x02032;. Final mappings are selected based on the highest scores, leveraging supervised SOTA learning schemes with feature engineering. When one model produces more accurate alignments, these are used to correct those of the other, with manual verification by human annotators to improve reliability.</p>
<p>In the present architecture, the input sequence includes a special token <monospace>[CLS]</monospace>, the tokens of two sentences <italic>A</italic> and <italic>B</italic>, and the special token <monospace>[SEP]</monospace> separating them. Each token embedding encodes its content, position, and sentence information. In <inline-formula><mml:math id="M1"><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:math></inline-formula> successive layers of the architecture, the multi-head self-attention block computes contextualized representations for each token. The output of layer <italic>l</italic> is the embedding sequence derived from the input, as defined in <xref ref-type="disp-formula" rid="E1">Equation 1</xref>:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M2"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>v</mml:mi></mml:mstyle><mml:mrow><mml:mi>C</mml:mi><mml:mi>L</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>v</mml:mi></mml:mstyle><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>v</mml:mi></mml:mstyle><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:msubsup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>v</mml:mi></mml:mstyle><mml:mrow><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>v</mml:mi></mml:mstyle><mml:mn>1</mml:mn><mml:mrow><mml:mo>&#x02032;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>v</mml:mi></mml:mstyle><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x02032;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x02032;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mi>&#x0211D;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>+</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x000D7;</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <bold>x</bold> is the input sequence, <inline-formula><mml:math id="M3"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>v</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M4"><mml:msubsup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>v</mml:mi></mml:mstyle><mml:mi>j</mml:mi><mml:mrow><mml:mo>&#x02032;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> are <italic>d</italic>-dimensional vectors of the corresponding tokens. The final layer (<inline-formula><mml:math id="M5"><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:math></inline-formula>) outputs the resulting token embeddings. Unlike non-contextual embeddings such as Word2Vec (<xref ref-type="bibr" rid="B33">Mikolov et al., 2013</xref>), which assign one embedding per token, this configuration distinguishes occurrences of the same token in different contexts. This is critical in expanding biomedical domains where traditional embeddings are biased toward frequent meanings in training corpora. For instance, the acronym &#x0201C;MS&#x0201D; can refer to <italic>Multiple Sclerosis</italic>, a chronic neurological disease affecting the central nervous system, or to <italic>Mass Spectrometry</italic>, an analytical technique used to measure ion mass-to-charge ratios in chemical and biological samples.</p>
<p>Concordantly, given input ontologies <italic>O</italic> and <italic>O</italic>&#x02032; with class sets <italic>C</italic> and <italic>C</italic>&#x02032;, a naive algorithm computes alignments by looking up <inline-formula><mml:math id="M6"><mml:msup><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mo class="qopname">arg</mml:mo><mml:msub><mml:mrow><mml:mo class="qopname">max</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>&#x02261;</mml:mo><mml:msup><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> for each <italic>c</italic> &#x02208; <italic>C</italic>, leading to <italic>O</italic>(<italic>n</italic><sup>2</sup>) time complexity. This is parametrically enhanced via M<sc>icrosoft</sc> O<sc>live</sc>, which employs an algorithmic search approach that calibrates a <monospace>joint</monospace><xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> execution order, backed by the TPE (Tree-structured Parzen Estimator) algorithm.</p>
<p>Our search-optimized quantization pipeline (W8A8) further improves efficiency by shifting computational complexity from activations to weights, ensuring seamless integration with hardware-accelerated compute units and resolving<xref ref-type="fn" rid="fn0005"><sup>5</sup></xref> dequantization issues, conforming to <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption><p>Progression of quantization techniques applied to a generic neural network model. It begins with a linear forward pass using a 1 &#x000D7; 2 input <italic>x</italic> and a 2 &#x000D7; 2 weight matrix <italic>W</italic>, which produces the outputs <italic>y</italic><sub>1</sub> and <italic>y</italic><sub>2</sub> in a straightforward floating-point manner. In the middle section, per-tensor quantization is performed on activation outputs, and per-channel quantization on weights. The quantized outputs &#x00177;<sub>1</sub> and &#x00177;<sub>2</sub> can be dequantized to their original floating-point values <italic>y</italic><sub><italic>fp</italic>1</sub> and <italic>y</italic><sub><italic>fp</italic>2</sub> using the channel-specific scales 1.0/(<italic>s</italic><sub>1</sub><italic>s</italic><sub><italic>x</italic></sub>) and 1.0/(<italic>s</italic><sub>2</sub><italic>s</italic><sub><italic>x</italic></sub>), respectively. Finally, both weights and activations undergo per-channel quantization. This additional layer of complexity hinders accurate dequantization of &#x00177;<sub>1</sub> and &#x00177;<sub>2</sub> back to their original floating-point results, as the activation quantization depends on the specific channel.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-08-1662984-g0001.tif">
<alt-text>Diagram illustrating quantization in a feed-forward neural network with input, hidden, and output layers. The process starts with a linear forward pass using a 1 &#x000D7; 2 input x and a 2 &#x000D7; 2 weight matrix W, producing floating-point outputs y1 and y2. Next, per-tensor quantization is applied to activations and per-channel quantization to weights; dequantized values yfp1 and yfp2 are obtained using scales 1/(s1sx) and 1/(s2sx). Finally, both weights and activations are quantized per channel, making dequantization inconsistent since activation scaling depends on the channel.</alt-text>
</graphic>
</fig>
<p>The present failure occurs due to the mathematical incompatibility<xref ref-type="fn" rid="fn0006"><sup>6</sup></xref> between the quantization scales applied to the different channels, which prevents a straightforward dequantization process that would otherwise be possible in the earlier stages with simpler per-tensor and per-channel quantization.</p>
</sec>
<sec>
<title>3.2 Mathematical model</title>
<p>Following optimization, the dynamically quantized model, along with the tokenizer <inline-formula><mml:math id="M7"><mml:mrow><mml:mi mathvariant="script">T</mml:mi></mml:mrow><mml:mo>:</mml:mo><mml:mrow><mml:mi mathvariant="script">D</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, is loaded, where <italic>B</italic> is the batch size, <italic>L</italic> is the sequence length, and <italic>D</italic> is the embedding dimension. The domain <inline-formula><mml:math id="M8"><mml:mrow><mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:math></inline-formula> denotes the set of raw text inputs.</p>
<p>In turn, a function to batch encode the lists of interest is introduced. It initializes data structures to collect text batch embeddings, while storing intermediate results temporarily to streamline alignment mechanisms. This step ensures that subsequent computations are performed efficiently, improving overall throughput and avoiding memory bottlenecks during batch processing.</p>
<p>The set of texts <bold>T</bold> &#x0003D; {<italic>T</italic><sub>1</sub>, <italic>T</italic><sub>2</sub>, &#x02026;, <italic>T</italic><sub><italic>N</italic></sub>}, with <italic>N</italic> &#x0003D; |<bold>T</bold>|, is divided into batches of size <italic>B</italic> &#x0003D; 10, denoted as <bold>B</bold><sub><italic>k</italic></sub> for <italic>k</italic> &#x0003D; 1, &#x02026;, <italic>K</italic>, where <inline-formula><mml:math id="M9"><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>&#x02308;</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>&#x02309;</mml:mo></mml:mrow></mml:math></inline-formula>, as formulated in <xref ref-type="disp-formula" rid="E2">Equation 2</xref>:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mstyle mathvariant="bold"><mml:mtext>T</mml:mtext></mml:mstyle><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x022C3;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>B</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Each batch <bold>B</bold><sub><italic>k</italic></sub> is defined as in <xref ref-type="disp-formula" rid="E3">Equation 3</xref>:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>B</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mo class="qopname">min</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mi>B</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Accordingly, the tokenizer <inline-formula><mml:math id="M12"><mml:mrow><mml:mi mathvariant="script">T</mml:mi></mml:mrow></mml:math></inline-formula> maps the textual input in each batch <bold>B</bold><sub><italic>k</italic></sub> to its numerical tensor representation <bold>X</bold><sub><italic>k</italic></sub>, as established in <xref ref-type="disp-formula" rid="E4">Equation 4</xref>:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="script">T</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>B</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where the tokenized data <inline-formula><mml:math id="M14"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> represents each batch. Thus, padding and truncation ensure uniform sequence lengths, with <italic>L</italic> &#x0003D; 512 set via the <monospace>max_length</monospace> parameter. The resulting outputs are converted into PyTorch tensors, enabling consistent formatting across batches. This standardization reinforces compatibility and integration with ONNX-based pipelines, after which the tensors are cast to NumPy arrays for seamless transfer within the processing infrastructure.</p>
<p>ONNX R<sc>untime</sc> is then activated by initiating a session that processes the dynamically quantized model <inline-formula><mml:math id="M15"><mml:mrow><mml:mi mathvariant="script">M</mml:mi></mml:mrow><mml:mo>:</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02192;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, producing the embeddings <bold>H</bold><sub><italic>k</italic></sub>, given by <xref ref-type="disp-formula" rid="E5">Equation 5</xref>:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>H</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="script">M</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M17"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>H</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, with <inline-formula><mml:math id="M18"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> denoting the hidden-state vector corresponding to the <italic>j</italic>-th token of the <italic>i</italic>-th input in batch <italic>k</italic>, and <italic>H</italic> denoting the model&#x00027;s output hidden dimension.</p>
<p>Embeddings are subsequently converted into PyTorch tensors and averaged across the sequence length to produce fixed-size, batch-level representations, in accordance with <xref ref-type="disp-formula" rid="E6">Equation 6</xref>:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M19"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>This yields <inline-formula><mml:math id="M20"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, where each row <bold>e</bold><sub><italic>ki</italic></sub> corresponds to the mean-pooled embedding of a single input in batch <italic>k</italic>. The final dataset-level embedding matrix <bold>E</bold> &#x02208; &#x0211D;<sup><italic>N</italic>&#x000D7;<italic>H</italic></sup> is then constructed by stacking all individual embedding vectors <inline-formula><mml:math id="M21"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> (for <italic>i</italic> &#x0003D; 1, &#x02026;, <italic>N</italic>), which are grouped into the batch-level matrices <bold>E</bold><sub><italic>k</italic></sub> (for <italic>k</italic> &#x0003D; 1, &#x02026;, <italic>K</italic>), as detailed in <xref ref-type="disp-formula" rid="E7">Equation 7</xref>:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M22"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x022EE;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x022EE;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Using this function, two sets of texts are encoded, as specified in <xref ref-type="disp-formula" rid="E8">Equation 8</xref>, producing the embeddings tensors <bold>E</bold><sub><italic>L</italic></sub> and <bold>E</bold><sub><italic>M</italic></sub>, where <bold>L</bold> &#x0003D; {<italic>T</italic><sub><italic>L</italic><sub>1</sub></sub>, &#x02026;, <italic>T</italic><sub><italic>L</italic><sub><sub><italic>N</italic></sub><sub><italic>L</italic></sub></sub></sub>} and <bold>M</bold> &#x0003D; {<italic>T</italic><sub><italic>M</italic><sub>1</sub></sub>, &#x02026;, <italic>T</italic><sub><italic>M</italic><sub><sub><italic>N</italic></sub><sub><italic>M</italic></sub></sub></sub>} are the input collections from LEX and MRCONSO, respectively:</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M23"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">EncodeBatch</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>L</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">EncodeBatch</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>M</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Cosine similarity is then computed to quantify pairwise semantic similarity between embeddings. For two vectors <bold>a</bold> and <bold>b</bold>, it is defined as in <xref ref-type="disp-formula" rid="E9">Equation 9</xref>:</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M24"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">cosine_similarity</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>a</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>a</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msup><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mo>&#x02225;</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>a</mml:mtext></mml:mstyle><mml:msub><mml:mrow><mml:mo>&#x02225;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02225;</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle><mml:msub><mml:mrow><mml:mo>&#x02225;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The resulting matrix <inline-formula><mml:math id="M25"><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>, where each element (<italic>i, j</italic>) represents the similarity between the <italic>i</italic>-th embedding vector <inline-formula><mml:math id="M26"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> in LEX and the <italic>j</italic>-th embedding vector <inline-formula><mml:math id="M27"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> in MRCONSO, is obtained as in <xref ref-type="disp-formula" rid="E10">Equation 10</xref>:</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M28"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">cosine_similarity</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msubsup><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="true">&#x02225;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mo stretchy="true">&#x02225;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo stretchy="true">&#x02225;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>E</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mo stretchy="true">&#x02225;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Finally, each term <italic>T</italic><sub><italic>L</italic><sub><italic>i</italic></sub></sub> in LEX is aligned to its closest semantic counterpart in MRCONSO by selecting the index <inline-formula><mml:math id="M29"><mml:msubsup><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> that maximizes the cosine similarity, as determined in <xref ref-type="disp-formula" rid="E11">Equation 11</xref>:</p>
<disp-formula id="E11"><label>(11)</label><mml:math id="M30"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo class="qopname">arg</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">max</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
</sec>
<sec id="s4">
<title>4 Experiments and discussions</title>
<sec>
<title>4.1 Experimental setups</title>
<sec>
<title>4.1.1 Preprocessing</title>
<p>To achieve this, the dataset of the French layman biomedical lexicon, originally in <monospace>TXT</monospace> format, is converted into a DataFrame and defined as LEX. Similarly, the AB2024 version of MRCONSO (extracted by selecting all French entries via <sc>MetamorphoSys</sc>), originally in <monospace>RRF</monospace> format, is also converted into a DataFrame and referred to as MRCONSO. Since the transformer-based models under study are in English, LEX is augmented with the English translations of the fields of interest <italic>Biomedical Term</italic> and <italic>Public Explanation</italic>, using the <sc>Google Translate API</sc>. The same translation is applied to the <italic>String (ST)</italic> field of MRCONSO. Data integrity is then verified through statistical analysis, assessing distributional properties, missing values, and outliers.</p>
<p>Subsequently, text preprocessing is performed via a multi-step pipeline of cleaning and normalization. This includes converting text to lowercase, removing non-alphanumeric characters, normalizing spaces, removing stopwords, and applying lemmatization through the <sc>ScispaCy</sc> model (<xref ref-type="bibr" rid="B35">Neumann et al., 2019</xref>). The resulting outputs are concatenated into a list format for modular processing.<xref ref-type="fn" rid="fn0007"><sup>7</sup></xref></p>
</sec>
<sec>
<title>4.1.2 AI high-performance computing (HPC)</title>
<p>The transformer-based models undergo comprehensive optimization via the infrastructure of M<sc>icrosoft</sc> O<sc>live</sc>. This optimization process refines architectural configurations by leveraging symbolic shape inference to understand tensor shapes.</p>
<p>M<sc>icrosoft</sc> O<sc>live</sc> is used to explore optimal configurations across ONNX R<sc>untime</sc> Execution Providers, specifically CUDAE<sc>xecution</sc>P<sc>rovider</sc> and T<sc>ensor</sc>RTE<sc>xecution</sc>P<sc>rovider</sc>. This is achieved using a JSON-based configuration file (<monospace>olive_config.json</monospace>) and a custom script (<monospace>user_script.py</monospace>) that configures the <italic>Input Model, Data Configurations, Evaluation Criteria, Devices, Engine</italic>, and <italic>Search Strategy</italic> modules. In <italic>Input Model</italic>, the operational domain of Hugging Face is defined, supporting the <monospace>sentence-similarity</monospace> task, while the MedSTS<xref ref-type="fn" rid="fn0008"><sup>8</sup></xref> (Medical Sentence Similarity) (<xref ref-type="bibr" rid="B49">Wang Y. et al., 2018</xref>) Train and Test datasets serve as resources for model calibration through the <italic>Data Configurations</italic> module. <italic>Evaluation Criteria</italic> include accuracy, precision, recall, F1-score, and latency (average, maximum, minimum). The cache directories manage intermediate results, streamlining reproducibility and scalability. Optimization goals are defined algorithmically and adhered to strict parametric thresholds: a maximum performance degradation of 0.01% and a minimum latency improvement of 20%. In the <italic>Device</italic> module, <monospace>local_system</monospace> is designated as the GPU-supported system. <italic>Engine and Search Strategy</italic> employ the <monospace>joint</monospace> execution order with the <monospace>TPE</monospace> algorithm, for profiling and caching within the search space.</p>
</sec>
<sec>
<title>4.1.3 ONNX runtime passes</title>
<p>Optimization begins with <italic>OnnxConversion</italic>, which converts PyTorch models to ONNX format (<monospace>opset: 14</monospace>) for hardware-agnostic execution. Subsequently, <italic>OrtTransformersOptimization</italic> module streamlines computational graphs by combining adjacent layers and pruning redundant nodes. <italic>OrtMixedPrecision</italic> enhances throughput and reduces memory usage by applying FP16<xref ref-type="fn" rid="fn0009"><sup>9</sup></xref> arithmetic where applicable. Lastly, <italic>OrtPerfTuning</italic> profiles latency and throughput, performing runtime tuning<xref ref-type="fn" rid="fn0010"><sup>10</sup></xref> in model configurations. The sequential application of these optimization steps enables modular result storage, allowing model assessment via Pareto frontier analysis.</p>
</sec>
<sec>
<title>4.1.4 Search-optimized quantization</title>
<p>The INT8 (W8A8) quantization logic is implemented using S<sc>mooth</sc>Q<sc>uant</sc> (<xref ref-type="bibr" rid="B53">Xiao et al., 2024</xref>), coordinating I<sc>ntel</sc> N<sc>eural</sc> C<sc>ompressor</sc> and IPEX (Intel Extension for PyTorch), together with M<sc>icrosoft</sc> O<sc>live</sc> and the ONNX R<sc>untime</sc> backend. The <italic>QOperator</italic> format includes <italic>QLinearMatMul, MatMulInteger, QLinearAdd</italic>, and <italic>QLinearRelu</italic> operators, configured via custom JSON settings, in order to manage the transversal redistribution of quantization complexity through a smoothing factor &#x003B1; &#x0003D; 0.5, validated as optimal for the models from Microsoft Research and Cambridge LTL. The use of NGC containers streamlines the integration of the previous configuration script (<monospace>user_script.py</monospace>) and the calibration datasets, to ensure scalable model deployment on accelerated hardware, while retaining optimization objectives.</p>
</sec>
</sec>
<sec>
<title>4.2 Main results and analysis</title>
<sec>
<title>4.2.1 DEFT 2020 evaluation campaign</title>
<p>Since, in our case study, there is no test dataset for inference matched with a training dataset for calibration, the MedSTS resources are used for this purpose, and inference is applied directly to this end as part of our approach. In addition, to quantify the efficiency of our optimization processes by means of performance, latency, and consumption metrics, we use the datasets from the two tasks of the DEFT 2020 Evaluation Campaign (<xref ref-type="bibr" rid="B4">Cardon et al., 2020</xref>), as they are broadly representative of our core objective of biomedical ontology alignment.<xref ref-type="fn" rid="fn0011"><sup>11</sup></xref></p>
<p>In Task 1, which aims to identify the degree of semantic similarity between pairs of sentences, the <monospace>input_cols</monospace> parameter is set to <monospace>[sentence1, sentence2]</monospace>, corresponding to the <italic>source</italic> and <italic>target</italic> fields, respectively. These are formatted as paired token sequences, and the <monospace>label_cols</monospace> parameter is set to <monospace>[label]</monospace> for the <italic>mark</italic> field, representing human-assigned scores from 0 to 5 indicating pairwise sentence-level semantic correspondence.</p>
<p>The same functional topology is transversally adapted for Task 2, concerning the identification of parallel sentences.<xref ref-type="fn" rid="fn0012"><sup>12</sup></xref> In turn, the data from the latter are internally linked with the corresponding identifier present in the <italic>num</italic> field. This linkage linearly maps the inferential string yielding the highest cosine similarity score for each virtually tripartitioned segment, created based on the associated <italic>id</italic> of each data line. Thus, the correspondence with the identifier in <monospace>[label]</monospace>, representing the <italic>target</italic> field, is ensured. The adoption of virtual compartment systems with three distinct conditions is introduced because the second task aims to identify, among three <italic>target</italic> sentences, the one that best corresponds to the <italic>source</italic> in terms of sentence-level parallelism.</p>
</sec>
<sec>
<title>4.2.2 Configurational decorators</title>
<p>These configuration architectures are diligently designed using logging wrappers (decorators) to log the methodically engineered processing pipeline, and to generate the dataloader through H<sc>uggingface</sc>D<sc>ata</sc>C<sc>ontainer</sc>. In practical application, this component enables robust evaluation metrics testing, thereby presenting a wide range of potential options.</p>
</sec>
<sec>
<title>4.2.3 Task 1</title>
<p>The first task, focused on continuous semantic evaluation (Semantic Similarity Evaluation), presented complications in converting the models&#x00027; inference outputs from cosine similarity percentages to the compliant evaluation format. Specifically, it has been found that, particularly for <sc>KRISSBERT</sc> (<xref ref-type="bibr" rid="B58">Zhang et al., 2022</xref>), the percentage scores of cosine semantic similarity are extremely high compared to the norm. This is presumably due to an improperly calibrated cross-entropy loss in the training of the cross-attention encoder, as cursorily reported in Microsoft Research&#x00027;s study, which results in the re-ranking score being maximized even for partial or incorrect entities. The model&#x00027;s inferences, while excelling in Named Entity Linking (NEL), lead to problems in cosine similarity score attribution. It is also advisable to review the linear layer applied to the encoding of the first <monospace>[CLS]</monospace> token to calculate the re-ranking score, as it has been proven that the score is very high even for nonsensical sentence pairs, potentially indicating poor discrimination. To address this, a feature scaling function using <monospace>MinMaxScaler</monospace> is manually added in the <monospace>post_process_data</monospace> module of H<sc>uggingface</sc>D<sc>ata</sc>C<sc>ontainer</sc>, converging into a corrective fine-tuning. Its effectiveness is demonstrated in the following <xref ref-type="table" rid="T1">Table 1</xref>, which highlights evidence of errors from both Microsoft Research and Cambridge LTL.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Examples highlighting a critical issue of score overestimation in the predictions made by the <sc>KRISSBERT</sc> and <sc>SapBERT-large</sc> models, which tend to disproportionately inflate the re-ranking scores, even for incomplete or incorrect entity matches.</p></caption>
<table frame="box" rules="none">
<tbody>
<tr>
<td valign="top" align="left"><bold>Source:</bold> &#x0201C;<italic>Royal jelly is a natural product very rich in vitamin B5 (C0001535), trace elements, acetylcholine (up to 0.1% by mass), and antibiotic factors notably active against Proteus and Escherichia coli B (C0001041), better known as colibacillus</italic>.&#x0201D;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Target:</bold> &#x0201C;<italic>Indeed, the smoke (C0037369) makes the bees (C0005108) perceive a fire, causing them to frantically gather honey reserves in their crop rather than defending their hive from the beekeeper</italic>.&#x0201D;</td>
</tr>
<tr>
<td valign="top" align="left" style="color:#ff0000"><sc>KRISSBERT Prediction Score</sc>: 95%.</td>
</tr>
<tr>
<td valign="top" align="left" style="color:#0000ff">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0002B; <sc>Corrective Fine-Tuning: 12%.</sc></td>
</tr>
<tr>
<td valign="top" align="left" style="color:#ff0000"><sc>SapBERT-large Prediction Score</sc>: 43%.</td>
</tr>
<tr>
<td valign="top" align="left" style="color:#0000ff">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0002B; <sc>Corrective Fine-Tuning: 7%.</sc></td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left"><bold>Source</bold>: &#x0201C;<italic>The degrees of originality (C0006267) and hybridization (C0020155) of these breeds, as well as their homogeneity, are poorly described</italic>.&#x0201D;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Target</bold>: &#x0201C;<italic>Without this precaution when opening a hive, the excitepment of a colony can rise, making it very dangerous (C0205166), given the number of bees (C0005108)</italic>.&#x0201D;</td>
</tr>
<tr>
<td valign="top" align="left" style="color:#ff0000"><sc>KRISSBERT Prediction Score</sc>: 94%.</td>
</tr>
<tr>
<td valign="top" align="left" style="color:#0000ff">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0002B; <sc>Corrective Fine-Tuning: 9%.</sc></td>
</tr>
<tr>
<td valign="top" align="left" style="color:#ff0000"><sc>SapBERT-large Prediction Score</sc>: 37%.</td>
</tr>
<tr>
<td valign="top" align="left" style="color:#0000ff">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0002B; <sc>Corrective Fine-Tuning: 5%.</sc></td>
</tr></tbody>
</table>
</table-wrap>
<p>This enabled the use of the official EDRM evaluation metric (<xref ref-type="bibr" rid="B4">Cardon et al., 2020</xref>), which measures the average relative distance to the solution as a micro-average. For each similarity value, the reference data <italic>r</italic><sub><italic>i</italic></sub> corresponds to the maximum possible distance between the system&#x00027;s predicted response and the data <italic>d</italic><sub>max</sub>(<italic>h</italic><sub><italic>i</italic></sub>, <italic>r</italic><sub><italic>i</italic></sub>), formally defined in <xref ref-type="disp-formula" rid="E12">Equation 12</xref>:</p>
<disp-formula id="E12"><label>(12)</label><mml:math id="M31"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">EDRM</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">dmax</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Our technique surpassed the previous FP32 state-of-the-art achieved by UASZ (Universit&#x000E9; Assane Seck de Ziguinchor) (<xref ref-type="bibr" rid="B9">Dram&#x000E9; et al., 2020</xref>), as presented in <xref ref-type="table" rid="T2">Table 2</xref>, and more statistically in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Comparison of the study models, optimized to INT8 (W8A8) by M<sc>icrosoft</sc> O<sc>live</sc>, against the UASZ state-of-the-art (<xref ref-type="bibr" rid="B9">Dram&#x000E9; et al., 2020</xref>).</p></caption>
<table frame="box" rules="all">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center" colspan="3"><bold>Task</bold> &#x00040;1</th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>EDRM</bold></th>
<th valign="top" align="center"><bold>Spearman-correlation</bold></th>
<th valign="top" align="center"><italic><bold>p</bold></italic><bold>-value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><sc>KRISSBERT INT8</sc></td>
<td valign="top" align="center"><bold>0.8604</bold></td>
<td valign="top" align="center">0.8253</td>
<td valign="top" align="center">2.0724e-97</td>
</tr>
<tr>
<td valign="top" align="left"><sc>SapBERT-Large INT8</sc></td>
<td valign="top" align="center">0.8593</td>
<td valign="top" align="center"><bold>0.8289</bold></td>
<td valign="top" align="center"><bold>2.5965e-99</bold></td>
</tr>
<tr>
<td valign="top" align="left"><sc>UASZ</sc> (<xref ref-type="bibr" rid="B9">Dram&#x000E9; et al., 2020</xref>), 1</td>
<td valign="top" align="center">0.7947</td>
<td valign="top" align="center">0.7528</td>
<td valign="top" align="center">4.3371e-76</td>
</tr>
<tr>
<td valign="top" align="left"><sc>UASZ</sc> (<xref ref-type="bibr" rid="B9">Dram&#x000E9; et al., 2020</xref>), 2</td>
<td valign="top" align="center">0.8217</td>
<td valign="top" align="center">0.7691</td>
<td valign="top" align="center">2.3769e-81</td>
</tr>
<tr>
<td valign="top" align="left"><sc>UASZ</sc> (<xref ref-type="bibr" rid="B9">Dram&#x000E9; et al., 2020</xref>), 3</td>
<td valign="top" align="center">0.7755</td>
<td valign="top" align="center">0.7769</td>
<td valign="top" align="center">5.5766e-84</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The metrics include EDRM, Spearman&#x00027;s rank correlation, and <italic>p</italic>-values. Bold values indicate the highest performance per column.</p>
</table-wrap-foot>
</table-wrap>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption><p>Regression comparison of the study models applied to Task 1, using Linear Regression (LR), Support Vector Regression (SVR), Random Forest Regression (RF), and a Polynomial Trendline with Degree 3 (PTD3). The Radial Basis Function (RBF) kernel is applied in the SVR model.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-08-1662984-g0002.tif">
<alt-text>Comparison of regression methods applied to Task 1 for two embeddings, KRISSBERT and SapBERT-Large. The x-axis shows annotator similarity scores (1-5), and the y-axis shows estimated similarity scores (%). Three predictive models are plotted: Linear Regression (green), Support Vector Regression with RBF kernel (blue), and Random Forest Regression (purple). In addition, a 3rd-degree polynomial trendline (red dashed) is included as a fitted reference curve. The plots illustrate how each approach aligns with human annotations, highlighting differences in smoothness and fit across methods.</alt-text>
</graphic>
</fig>
</sec>
<sec>
<title>4.2.4 Task 2</title>
<p>In the second task of DEFT 2020, which closely aligns with the conditions of our main mission, the evaluation metric consists of a classification-based assessment: the Mean Average Precision (MAP), formulated in <xref ref-type="disp-formula" rid="E13">Equation 13</xref>, is computed as the mean of the non-interpolated precisions <inline-formula><mml:math id="M32"><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> at each position in the ranked list of hypotheses, for each of the <italic>n</italic><sub><italic>i</italic></sub> correct answers <inline-formula><mml:math id="M33"><mml:msubsup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> associated with a given <italic>source</italic> sentence <italic>S</italic><sub><italic>i</italic></sub>:</p>
<disp-formula id="E13"><label>(13)</label><mml:math id="M34"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">MAP</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>As detailed in <xref ref-type="table" rid="T3">Table 3</xref>, our approach has significantly outperformed the previous ones from both the University of Sorbonne (<xref ref-type="bibr" rid="B3">Buscaldi et al., 2020</xref>) and Synapse (<xref ref-type="bibr" rid="B46">Teiss&#x000E8;dre et al., 2020</xref>).</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Comparison of the study models, optimized to INT8 (W8A8) by M<sc>icrosoft</sc> O<sc>live</sc>, against the state-of-the-art benchmarks from Sorbonne (<xref ref-type="bibr" rid="B3">Buscaldi et al., 2020</xref>) and Synapse (<xref ref-type="bibr" rid="B46">Teiss&#x000E8;dre et al., 2020</xref>).</p></caption>
<table frame="box" rules="all">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center" colspan="4"><bold>Task</bold> &#x00040;2</th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>MAP-1</bold></th>
<th valign="top" align="center"><bold>MAP-2</bold></th>
<th valign="top" align="center"><bold>MAP-3</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><sc>KRISSBERT INT8</sc></td>
<td valign="top" align="center">0.9977</td>
<td valign="top" align="center"><bold>0.9991</bold></td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center">0.9989</td>
</tr>
<tr>
<td valign="top" align="left"><sc>SapBERT-Large INT8</sc></td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center">0.9974</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center"><bold>0.9991</bold></td>
</tr>
<tr>
<td valign="top" align="left"><sc>Sorbonne</sc> (<xref ref-type="bibr" rid="B3">Buscaldi et al., 2020</xref>)</td>
<td valign="top" align="center">0.9887</td>
<td valign="top" align="center">0.9887</td>
<td valign="top" align="center">0.9887</td>
<td valign="top" align="center">0.9887</td>
</tr>
<tr>
<td valign="top" align="left"><sc>Synapse</sc> (<xref ref-type="bibr" rid="B46">Teiss&#x000E8;dre et al., 2020</xref>)</td>
<td valign="top" align="center">0.9906</td>
<td valign="top" align="center">0.9849</td>
<td valign="top" align="center">0.9396</td>
<td valign="top" align="center">0.9717</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The metrics include MAP classification scores (MAP-1, MAP-2, MAP-3), with their respective mean values used as the evaluation standard. Bold values indicate the highest performance per column.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>4.2.5 The impact of search-optimized quantization</title>
<p>Trade-off metrics between performance,<xref ref-type="fn" rid="fn0013"><sup>13</sup></xref> latency, power consumption, and estimated carbon emissions<xref ref-type="fn" rid="fn0014"><sup>14</sup></xref> are rigorously quantified using the <monospace>huggingface_metrics</monospace> backend, as reported in <xref ref-type="table" rid="T4">Table 4</xref>.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Comparison of performance, latency, and consumption metrics for <sc>KRISSBERT</sc> and <sc>SapBERT-large</sc> models before and after optimization across the two tasks of the DEFT 2020 Evaluation Campaign.</p></caption>
<table frame="box" rules="all">
<thead>
<tr>
<th/>
<th valign="top" align="center" colspan="4"><bold>Performance</bold></th>
<th valign="top" align="center" colspan="3"><bold>Latency</bold></th>
<th valign="top" align="center" colspan="3"><bold>Consumption</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Task</bold> &#x00040;1</th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>F1-score</bold></th>
<th valign="top" align="center"><bold>Latency-avg</bold></th>
<th valign="top" align="center"><bold>Latency-max</bold></th>
<th valign="top" align="center"><bold>Latency-min</bold></th>
<th valign="top" align="center"><bold>Size</bold></th>
<th valign="top" align="center"><bold>GPU energy</bold></th>
<th valign="top" align="center"><bold>CO2</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><sc>KRISSBERT</sc> (<xref ref-type="bibr" rid="B58">Zhang et al., 2022</xref>)</td>
<td valign="top" align="center">0.8886</td>
<td valign="top" align="center">0.9047</td>
<td valign="top" align="center">0.8920</td>
<td valign="top" align="center">0.8983</td>
<td valign="top" align="center">19.9143</td>
<td valign="top" align="center">20.2043</td>
<td valign="top" align="center">19.6533</td>
<td valign="top" align="center">438</td>
<td valign="top" align="center">2.2127</td>
<td valign="top" align="center">1.0510</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; M<sc>icrosoft</sc> O<sc>live</sc></td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.8886</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.9047</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.8920</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.8983</td>
<td valign="top" align="center" style="background-color:#b2ffb2">1.2114</td>
<td valign="top" align="center" style="background-color:#b2ffb2">1.2165</td>
<td valign="top" align="center" style="background-color:#b2ffb2">1.2051</td>
<td valign="top" align="center" style="background-color:#b2ffb2">166.44</td>
<td valign="top" align="center" style="background-color:#b2ffb2">0.1346</td>
<td valign="top" align="center" style="background-color:#b2ffb2">0.0639</td>
</tr>
<tr>
<td valign="top" align="left"><sc>SapBERT-large</sc> (<xref ref-type="bibr" rid="B30">Liu et al., 2021</xref>)</td>
<td valign="top" align="center">0.8808</td>
<td valign="top" align="center">0.8851</td>
<td valign="top" align="center">0.8937</td>
<td valign="top" align="center">0.8894</td>
<td valign="top" align="center">64.0251</td>
<td valign="top" align="center">64.3159</td>
<td valign="top" align="center">63.7649</td>
<td valign="top" align="center">2293.76</td>
<td valign="top" align="center">7.1139</td>
<td valign="top" align="center">3.3791</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; M<sc>icrosoft</sc> O<sc>live</sc></td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.8808</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.8851</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.8937</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.8894</td>
<td valign="top" align="center" style="background-color:#b2ffb2">3.0494</td>
<td valign="top" align="center" style="background-color:#b2ffb2">3.0562</td>
<td valign="top" align="center" style="background-color:#b2ffb2">3.0453</td>
<td valign="top" align="center" style="background-color:#b2ffb2">756.94</td>
<td valign="top" align="center" style="background-color:#b2ffb2">0.3388</td>
<td valign="top" align="center" style="background-color:#b2ffb2">0.1609</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Task</bold> &#x00040;2</td>
<td valign="top" align="center"><bold>MAP-1</bold></td>
<td valign="top" align="center"><bold>MAP-2</bold></td>
<td valign="top" align="center"><bold>MAP-3</bold></td>
<td valign="top" align="center"><bold>Mean</bold></td>
<td valign="top" align="center"><bold>Latency-avg</bold></td>
<td valign="top" align="center"><bold>Latency-max</bold></td>
<td valign="top" align="center"><bold>Latency-min</bold></td>
<td valign="top" align="center"><bold>Size</bold></td>
<td valign="top" align="center"><bold>GPU energy</bold></td>
<td valign="top" align="center"><bold>CO2</bold></td>
</tr>
<tr>
<td valign="top" align="left"><sc>KRISSBERT</sc> (<xref ref-type="bibr" rid="B58">Zhang et al., 2022</xref>)</td>
<td valign="top" align="center">0.9977</td>
<td valign="top" align="center">0.9991</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0.9989</td>
<td valign="top" align="center">55.3579</td>
<td valign="top" align="center">55.6289</td>
<td valign="top" align="center">55.1095</td>
<td valign="top" align="center">438</td>
<td valign="top" align="center">6.1509</td>
<td valign="top" align="center">2.9217</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; M<sc>icrosoft</sc> O<sc>live</sc></td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.9977</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.9991</td>
<td valign="top" align="center" style="background-color:#d9d9ff">1</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.9989</td>
<td valign="top" align="center" style="background-color:#b2ffb2">3.0276</td>
<td valign="top" align="center" style="background-color:#b2ffb2">3.0351</td>
<td valign="top" align="center" style="background-color:#b2ffb2">3.0228</td>
<td valign="top" align="center" style="background-color:#b2ffb2">171.58</td>
<td valign="top" align="center" style="background-color:#b2ffb2">0.3364</td>
<td valign="top" align="center" style="background-color:#b2ffb2">0.1598</td>
</tr>
<tr>
<td valign="top" align="left"><sc>SapBERT-large</sc> (<xref ref-type="bibr" rid="B30">Liu et al., 2021</xref>)</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0.9974</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0.9991</td>
<td valign="top" align="center">185.5632</td>
<td valign="top" align="center">185.8308</td>
<td valign="top" align="center">185.3122</td>
<td valign="top" align="center">2293.76</td>
<td valign="top" align="center">20.6181</td>
<td valign="top" align="center">9.7936</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; M<sc>icrosoft</sc> O<sc>live</sc></td>
<td valign="top" align="center" style="background-color:#d9d9ff">1</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.9974</td>
<td valign="top" align="center" style="background-color:#d9d9ff">1</td>
<td valign="top" align="center" style="background-color:#d9d9ff">0.9991</td>
<td valign="top" align="center" style="background-color:#b2ffb2">9.7195</td>
<td valign="top" align="center" style="background-color:#b2ffb2">9.7255</td>
<td valign="top" align="center" style="background-color:#b2ffb2">9.7138</td>
<td valign="top" align="center" style="background-color:#b2ffb2">762.13</td>
<td valign="top" align="center" style="background-color:#b2ffb2">1.0799</td>
<td valign="top" align="center" style="background-color:#b2ffb2">0.5130</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p><inline-formula><mml:math id="M35"><mml:mrow><mml:mstyle mathbackground="#d9d9ff"><mml:mtext>Blue</mml:mtext></mml:mstyle></mml:mrow></mml:math></inline-formula> indicates maintained performance metrics in both the original and the algorithm-driven optimized models, while the transition to <inline-formula><mml:math id="M36"><mml:mrow><mml:mstyle mathbackground="#b2ffb2"><mml:mtext>Green</mml:mtext></mml:mstyle></mml:mrow></mml:math></inline-formula> indicates improvements in both timing and resource utilization. In both cases, the optimization process yields reduced latency and energy consumption, while preserving overall performance. All results refer to inference.</p>
</table-wrap-foot>
</table-wrap>
<p>For observational purposes, the effectiveness of the process is validated during the verification phase using the <italic>Quantization Debug</italic> module of ONNX R<sc>untime</sc>, which provides a detailed graphical representation of the redistribution of computational complexity.<xref ref-type="fn" rid="fn0015"><sup>15</sup></xref> For simplicity, the comparison between the activation tensors from the original computation graph and its quantized counterpart is demonstrated in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption><p>Impact of search-optimized quantization on the distribution of activations in the models under study, before and after optimization. Several channels in the original activation map display significantly high magnitudes, while the variance within a particular activation channel is consistently and notably low throughout.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-08-1662984-g0003.tif">
<alt-text>3D plots of activation distributions for KRISSBERT (top) and SapBERT-Large (bottom), shown before (left) and after (right) search-optimized quantization. Axes represent channels, tokens, and absolute activation magnitude (ABS). In the original activation maps (left), several channels exhibit very high magnitudes, while the variance within each channel remains consistently low. After quantization (right), the activation magnitudes are redistributed more uniformly across channels, reducing extreme peaks while preserving the overall activation structure.</alt-text>
</graphic>
</fig>
</sec>
<sec>
<title>4.2.6 Biomedical ontology alignment</title>
<p>Upon completion of the vocabulary, aligned using the <monospace>np.argmax</monospace> matrix logic (Section 3.2) between the LEX and MRCONSO domains, a manual verification is conducted using the six-point rating scale. The resulting alignments, obtained from two quantized transformer models, are then merged using the complementarity-based aggregation strategy, which iteratively integrates non-overlapping alignments in descending order of rating to increase coverage while preserving precision. The comparative rating distribution is reported in <xref ref-type="table" rid="T5">Table 5</xref>, followed by a Gaussian analysis in <xref ref-type="fig" rid="F4">Figure 4</xref>, which illustrates overall performance consistency across model formats.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Comparison of manual rating distributions over scores &#x00040;<italic>k</italic> for vocabulary alignments across individual models and their complementary combination.</p></caption>
<table frame="box" rules="all">
<thead>
<tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>&#x00040;0</bold></th>
<th valign="top" align="center"><bold>&#x00040;1</bold></th>
<th valign="top" align="center"><bold>&#x00040;2</bold></th>
<th valign="top" align="center"><bold>&#x00040;3</bold></th>
<th valign="top" align="center"><bold>&#x00040;4</bold></th>
<th valign="top" align="center"><bold>&#x00040;5</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><sc>KRISSBERT INT8</sc></td>
<td valign="top" align="center">186</td>
<td valign="top" align="center">798</td>
<td valign="top" align="center">1,343</td>
<td valign="top" align="center">3,028</td>
<td valign="top" align="center">4,098</td>
<td valign="top" align="center">7,941</td>
</tr>
<tr>
<td valign="top" align="left"><sc>SapBERT-Large INT8</sc></td>
<td valign="top" align="center">205</td>
<td valign="top" align="center">687</td>
<td valign="top" align="center">1,403</td>
<td valign="top" align="center">2,928</td>
<td valign="top" align="center">4,169</td>
<td valign="top" align="center">8,002</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; <sc>Complementarity</sc></td>
<td valign="top" align="center">/</td>
<td valign="top" align="center">/</td>
<td valign="top" align="center">/</td>
<td valign="top" align="center">897</td>
<td valign="top" align="center">5,473</td>
<td valign="top" align="center">11,024</td>
</tr></tbody>
</table>
</table-wrap>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption><p><bold>(A)</bold> Gaussian kernel density estimation of performance scores across floating-point (FP32/FP16) and quantized (INT8) model formats; <bold>(B)</bold> Detailed view of distribution shifts induced by format variation.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-08-1662984-g0004.tif">
<alt-text>Kernel density plots comparing floating-point (FP32/FP16) and quantized (INT8) models. Panel (A) shows Gaussian kernel density estimation of similarity scores, where FP and INT8 distributions largely overlap with minor deviations. Panel (B) presents a magnified view of performance density, highlighting subtle shifts caused by quantization. In both cases, FP32/FP16 curves (dark green) are contrasted with INT8 curves (light brown), illustrating the small but noticeable distributional changes introduced by quantization.</alt-text>
</graphic>
</fig>
</sec>
</sec>
</sec>
<sec sec-type="conclusions" id="s5">
<title>5 Conclusion</title>
<p>We present a cutting-edge, optimization-driven solution for biomedical ontology alignment, leveraging M<sc>icrosoft</sc> O<sc>live</sc>, ONNX R<sc>untime</sc>, and a novel quantization strategy implemented through I<sc>ntel</sc> N<sc>eural</sc> C<sc>ompressor</sc> and IPEX. Empirical evaluations demonstrate an average 20 &#x000D7; inference speed-up and a 70% reduction in memory usage, achieved without compromising performance. Validated across multiple datasets, our approach establishes new state-of-the-art results in all evaluated domains.</p>
<p>Beyond reducing deployment costs, our approach enables scalability across resource-limited settings. By providing a robust, turnkey framework that preserves accuracy while maximizing efficiency, we contribute to the broader democratization of deep learning technologies. Future work will explore the application of this methodology to other domains, potentially extending its benefits across a wide range of research areas.</p>
</sec>
<sec id="s6">
<title>6 Limitations</title>
<p>The performance of our methods is influenced by external factors, including hardware configurations, software dependencies, and environmental conditions. A thorough analysis of these elements and their impact is essential for practical deployment and real-world applications. Such analysis should also be extended to different model architectures, including large language models.</p>
</sec>
<sec id="s7">
<title>Code availability</title>
<p>The code required to reproduce the findings is available at the GitHub repository <ext-link ext-link-type="uri" xlink:href="https://github.com/OussamaBouaggad/Quantization">https://github.com/OussamaBouaggad/Quantization</ext-link> and is distributed under the MIT License.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s8">
<title>Data availability statement</title>
<p>UMLS (<xref ref-type="bibr" rid="B2">Bodenreider, 2004</xref>) is licensed to individuals for research purposes. CNRS resources are provided under the End User License Agreement (EULA), as are the DEFT 2020 Evaluation Campaign datasets (<xref ref-type="bibr" rid="B4">Cardon et al., 2020</xref>). The MedSTS dataset (<xref ref-type="bibr" rid="B49">Wang Y. et al., 2018</xref>) is freely available for public use. <sc>KRISSBERT</sc> (<xref ref-type="bibr" rid="B58">Zhang et al., 2022</xref>) and <sc>SapBERT-large</sc> (<xref ref-type="bibr" rid="B30">Liu et al., 2021</xref>) models are distributed under the MIT License, as are M<sc>icrosoft</sc> O<sc>live</sc> and ONNX R<sc>untime</sc>. <sc>ScispaCy</sc> (<xref ref-type="bibr" rid="B35">Neumann et al., 2019</xref>), I<sc>ntel</sc> N<sc>eural</sc> C<sc>ompressor</sc>, and IPEX (Intel Extension for PyTorch) are released under the Apache License 2.0.</p>
</sec>
<sec sec-type="author-contributions" id="s9">
<title>Author contributions</title>
<p>OB: Conceptualization, Investigation, Visualization, Formal analysis, Software, Validation, Writing &#x02013; original draft. NG: Methodology, Resources, Data curation, Project administration, Supervision, Writing &#x02013; review &#x00026; editing.</p>
</sec>
<sec sec-type="funding-information" id="s10">
<title>Funding</title>
<p>The author(s) declare that no financial support was received for the research and/or publication of this article.</p>
</sec>
<ack><p>We extend our sincere gratitude to the reviewers from the University of Lille and Centrale Lille for their insightful comments and suggestions, which significantly improved the quality and rigor of our work. We also thank CNRS UMR 8163 STL for providing the valuable resources and contributions essential to the success of this research.</p>
</ack>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="ai-statement" id="s11">
<title>Generative AI statement</title>
<p>The author(s) declare that no Gen AI was used in the creation of this manuscript.</p>
<p>Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.</p>
</sec>
<sec sec-type="disclaimer" id="s12">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="s13">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frai.2025.1662984/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frai.2025.1662984/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>The official UMLS resource is accessible at <ext-link ext-link-type="uri" xlink:href="https://www.nlm.nih.gov/research/umls/index.html">https://www.nlm.nih.gov/research/umls/index.html</ext-link>.</p></fn>
<fn id="fn0002"><p><sup>2</sup>Entities include classes, instances, properties, relationships, data types, annotations, and cardinality constraints.</p></fn>
<fn id="fn0003"><p><sup>3</sup>A class of an ontology typically contains a list of labels (via annotation properties such as <italic>rdfs:label</italic>) that serve as alternative class names, descriptions, synonyms, or aliases.</p></fn>
<fn id="fn0004"><p><sup>4</sup>The search spaces of all passes are combined and jointly evaluated to find optimal parameters, using Optuna&#x00027;s TPESampler.</p></fn>
<fn id="fn0005"><p><sup>5</sup>Such an outcome involves <monospace>Mul</monospace> operations without folding, optimized in IPEX through system-level automatic fusion.</p></fn>
<fn id="fn0006"><p><sup>6</sup>Such behavior is particularly noticeable in scenarios involving activation outliers, where standard quantization methods struggle to maintain consistency across input distributions.</p></fn>
<fn id="fn0007"><p><sup>7</sup>The concatenation of diverse and evolving domains ensures comprehensive biomedical alignment (<xref ref-type="bibr" rid="B26">Koptient and Grabar, 2020</xref>).</p></fn>
<fn id="fn0008"><p><sup>8</sup>MedSTS, which incorporates UMLS concepts, is designed to measure biomedical semantic textual similarity, including sentence pairs annotated with similarity scores.</p></fn>
<fn id="fn0009"><p><sup>9</sup>In the present configuration, Float16 precision is enabled for CUDAExecutionProvider but disabled for TensorRTExecutionProvider, balancing compatibility and computational gains.</p></fn>
<fn id="fn0010"><p><sup>10</sup>The proposed runtime tuning enhances model calibration and inference through dynamic architectural optimization.</p></fn>
<fn id="fn0011"><p><sup>11</sup>In the Train module, the pretrained models are calibrated by framing optimal model optimizations aligned with the highest hardware performance capabilities, whereas in the Test module, the evaluation metrics are established.</p></fn>
<fn id="fn0012"><p><sup>12</sup>The parallelism of the sentences is related to the simple-complex relationship, ergo one of the simple sentences (<italic>target</italic>) is always derived from the complex sentence (<italic>source</italic>).</p></fn>
<fn id="fn0013"><p><sup>13</sup>In Task 1, given the specificity of the EDRM evaluation, the accuracy, precision, recall, and F1-Score metrics are applied instead.</p></fn>
<fn id="fn0014"><p><sup>14</sup>The carbon emissions are calculated based on the GPU emission factor of 0.475 kg CO<sub>2</sub> per kWh.</p></fn>
<fn id="fn0015"><p><sup>15</sup>The module handles activation outliers, which commonly fall within the absolute value range of 2.5 to 5, with extreme cases peaking above 7.5, thus affecting scaling factors.</p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balaskas</surname> <given-names>K.</given-names></name> <name><surname>Karatzas</surname> <given-names>A.</given-names></name> <name><surname>Sad</surname> <given-names>C.</given-names></name> <name><surname>Siozios</surname> <given-names>K.</given-names></name> <name><surname>Anagnostopoulos</surname> <given-names>I.</given-names></name> <name><surname>Zervakis</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>Hardware-aware DNN compression via diverse pruning and mixed-precision quantization</article-title>. <source>IEEE Trans. Emerg. Top. Comput</source>. <volume>12</volume>, <fpage>1079</fpage>&#x02013;<lpage>1092</lpage>. <pub-id pub-id-type="doi">10.1109/TETC.2023.3346944</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bodenreider</surname> <given-names>O.</given-names></name></person-group> (<year>2004</year>). <article-title>The unified medical language system (UMLS): integrating biomedical terminology</article-title>. <source>Nucleic Acids Res</source>. <volume>32</volume>, <fpage>D267</fpage>-<lpage>D270</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkh061</pub-id><pub-id pub-id-type="pmid">14681409</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Buscaldi</surname> <given-names>D.</given-names></name> <name><surname>Felhi</surname> <given-names>G.</given-names></name> <name><surname>Ghoul</surname> <given-names>D.</given-names></name> <name><surname>Le Roux</surname> <given-names>J.</given-names></name> <name><surname>Lejeune</surname> <given-names>G.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Calcul de similarit&#x000E9; entre phrases : quelles mesures et quels descripteurs? (sentence similarity: a study on similarity metrics with words and character strings),&#x0201D;</article-title> in <source>Actes de la 6e conf&#x000E9;rence conjointe Journ&#x000E9;es d&#x00027;&#x000C9;tudes sur la Parole (JEP, 33e &#x000E9;dition), Traitement Automatique des Langues Naturelles (TALN, 27e &#x000E9;dition), Rencontre des &#x000C9;tudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (R&#x000C9;CITAL, 22e &#x000E9;dition) Atelier D&#x000C9;fi Fouille de Textes</source>, eds. R. Cardon, N. Grabar, C. Grouin, T. Hamon (<publisher-loc>Nancy, France</publisher-loc>: <publisher-name>ATALA et AFCP</publisher-name>), <fpage>14</fpage>&#x02013;<lpage>25</lpage>.</citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cardon</surname> <given-names>R.</given-names></name> <name><surname>Grabar</surname> <given-names>N.</given-names></name> <name><surname>Grouin</surname> <given-names>C.</given-names></name> <name><surname>Hamon</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Pr&#x000E9;sentation de la campagne d&#x00027;&#x000E9;valuation DEFT 2020 : similarit&#x000E9; textuelle en domaine ouvert et extraction d&#x00027;information pr&#x000E9;cise dans des cas cliniques (Presentation of the DEFT 2020 challenge: Open domain textual similarity and precise information extraction from clinical cases),&#x0201D;</article-title> in <source>Actes de la 6e conf&#x000E9;rence conjointe Journ&#x000E9;es d&#x00027;&#x000C9;tudes sur la Parole (JEP, 33e &#x000E9;dition), Traitement Automatique des Langues Naturelles (TALN, 27e &#x000E9;dition), Rencontre des &#x000C9;tudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (R&#x000C9;CITAL, 22e &#x000E9;dition). Atelier D&#x000C9;fi Fouille de Textes</source>, eds. R. Cardon, N. Grabar, C. Grouin, T. Hamon (<publisher-loc>Nancy, France</publisher-loc>: <publisher-name>ATALA et AFCP</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>13</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carreira-Perpi&#x000F1;&#x000E1;n</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Model compression as constrained optimization, with application to neural nets. Part I: General framework</article-title>. <source>arXiv [Preprint].</source> arXiv:1707.01209. <pub-id pub-id-type="doi">10.48550/arXiv.1707.01209</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Jim&#x000E9;nez-Ruiz</surname> <given-names>E.</given-names></name> <name><surname>Horrocks</surname> <given-names>I.</given-names></name> <name><surname>Antonyrajah</surname> <given-names>D.</given-names></name> <name><surname>Hadian</surname> <given-names>A.</given-names></name> <name><surname>Lee</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Augmenting ontology alignment by semantic embedding and distant supervision,&#x0201D;</article-title> in <source>The Semantic Web</source>, eds. R. Verborgh, K. Hose, H. Paulheim, P.-A. Champin, M. Maleshkova, O. Corcho, et al. (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>), <fpage>392</fpage>&#x02013;<lpage>408</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-77385-4_23</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Courbariaux</surname> <given-names>M.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>David</surname> <given-names>J.-P.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Binaryconnect: training deep neural networks with binary weights during propagations,&#x0201D;</article-title> in <source>Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS&#x00027;15</source> (<publisher-loc>Cambridge, MA, USA</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>3123</fpage>&#x02013;<lpage>3131</lpage>.</citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dagan</surname> <given-names>I.</given-names></name> <name><surname>Dolan</surname> <given-names>B.</given-names></name> <name><surname>Magnini</surname> <given-names>B.</given-names></name> <name><surname>Roth</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>Recognizing textual entailment: rational, evaluation and approaches</article-title>. <source>Nat. Lang. Eng</source>. <volume>15</volume>, <fpage>i</fpage>&#x02013;<lpage>xvii</lpage>. <pub-id pub-id-type="doi">10.1017/S1351324909990209</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dram&#x000E9;</surname> <given-names>K.</given-names></name> <name><surname>Sambe</surname> <given-names>G.</given-names></name> <name><surname>Diop</surname> <given-names>I.</given-names></name> <name><surname>Faty</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Approche supervis&#x000E9;e de calcul de similarit&#x000E9; s&#x000E9;mantique entre paires de phrases (supervised approach to compute semantic similarity between sentence pairs),&#x0201D;</article-title> in <source>Actes de la 6e conf&#x000E9;rence conjointe Journ&#x000E9;es d&#x00027;&#x000C9;tudes sur la Parole (JEP, 33e &#x000E9;dition), Traitement Automatique des Langues Naturelles (TALN, 27e &#x000E9;dition), Rencontre des &#x000C9;tudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (R&#x000C9;CITAL, 22e &#x000E9;dition). Atelier D&#x000C9;fi Fouille de Textes</source>, eds. R. Cardon, N. Grabar, C. Grouin, T. Hamon (<publisher-loc>Nancy, France</publisher-loc>: <publisher-name>ATALA et AFCP</publisher-name>), <fpage>49</fpage>&#x02013;<lpage>54</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Euzenat</surname> <given-names>J.</given-names></name> <name><surname>Shvaiko</surname> <given-names>P.</given-names></name></person-group> (<year>2007</year>). <source>Ontology Matching</source>. <publisher-loc>Springer</publisher-loc>: <publisher-name>New York</publisher-name>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fang</surname> <given-names>L.</given-names></name> <name><surname>Chen</surname> <given-names>Q.</given-names></name> <name><surname>Wei</surname> <given-names>C.-H.</given-names></name> <name><surname>Lu</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>K.</given-names></name></person-group> (<year>2023</year>). <article-title>Bioformer: An efficient transformer language model for biomedical text mining</article-title>. <source>arXiv preprint arXiv:2302.01588</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2302.01588</pub-id><pub-id pub-id-type="pmid">36945685</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Faria</surname> <given-names>D.</given-names></name> <name><surname>Pesquita</surname> <given-names>C.</given-names></name> <name><surname>Santos</surname> <given-names>E.</given-names></name> <name><surname>Palmonari</surname> <given-names>M.</given-names></name> <name><surname>Cruz</surname> <given-names>I. F.</given-names></name> <name><surname>Couto</surname> <given-names>F. M.</given-names></name></person-group> (<year>2013</year>). <article-title>&#x0201C;The agreementmakerlight ontology matching system,&#x0201D;</article-title> in <source>On the Move to Meaningful Internet Systems: OTM 2013 Conferences</source>, eds. R. Meersman, H. Panetto, T. Dillon, J. Eder, Z. Bellahsene, N. Ritter et al. (<publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer Berlin Heidelberg</publisher-name>), <fpage>527</fpage>&#x02013;<lpage>541</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-41030-7_38</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Frankle</surname> <given-names>J.</given-names></name> <name><surname>Carbin</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;The lottery ticket hypothesis: finding sparse, trainable neural networks,&#x0201D;</article-title> in <source>ICLR</source> (<publisher-loc>OpenReview.net</publisher-loc>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://dblp.uni-trier.de/db/conf/iclr/iclr2019.html&#x00023;FrankleC19">http://dblp.uni-trier.de/db/conf/iclr/iclr2019.html&#x00023;FrankleC19</ext-link></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gu</surname> <given-names>Y.</given-names></name> <name><surname>Tinn</surname> <given-names>R.</given-names></name> <name><surname>Cheng</surname> <given-names>H.</given-names></name> <name><surname>Lucas</surname> <given-names>M.</given-names></name> <name><surname>Usuyama</surname> <given-names>N.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Domain-specific language model pretraining for biomedical natural language processing</article-title>. <source>ACM Trans. Comput. Healthcare</source> <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1145/3458754</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>Y.</given-names></name> <name><surname>Yao</surname> <given-names>A.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>Dynamic network surgery for efficient DNNs</article-title>. <source>arXiv preprint arXiv:1608.04493</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1608.04493</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>S.</given-names></name> <name><surname>Mao</surname> <given-names>H.</given-names></name> <name><surname>Dally</surname> <given-names>W. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding</article-title>. <source>arXiv preprint arXiv:1510.00149</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1510.00149</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hassibi</surname> <given-names>B.</given-names></name> <name><surname>Stork</surname> <given-names>D.</given-names></name></person-group> (<year>1992</year>). <article-title>&#x0201C;Second order derivatives for network pruning: optimal brain surgeon,&#x0201D;</article-title> in <source>Proceedings of the 6th International Conference on Neural Information Processing Systems, Denver, CO, NIPS&#x00027;92</source> (<publisher-loc>San Francisco, CA</publisher-loc>: <publisher-name>Morgan-Kaufmann</publisher-name>), <fpage>164</fpage>&#x02013;<lpage>171</lpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>He</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Antonyrajah</surname> <given-names>D.</given-names></name> <name><surname>Horrocks</surname> <given-names>I.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Biomedical ontology alignment with BERT,&#x0201D;</article-title> in <source>Proceedings of the 16th International Workshop on Ontology Matching co-located with the 20th International Semantic Web Conference (ISWC 2021), CEUR Workshop Proceedings, vol. 3063</source>, eds. P. Shvaiko, J. Euzenat, E. Jim&#x000E9;nez-Ruiz, O. Hassanzadeh, and C. Trojahn (CEUR-WS.org), <fpage>1</fpage>&#x02013;<lpage>12</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://ceur-ws.org/Vol-3063/om2021_LTpaper1.pdf">https://ceur-ws.org/Vol-3063/om2021_LTpaper1.pdf</ext-link></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hinton</surname> <given-names>G.</given-names></name> <name><surname>Vinyals</surname> <given-names>O.</given-names></name> <name><surname>Dean</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Distilling the knowledge in a neural network</article-title>. <source>arXiv preprint arXiv:1503.02531</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1503.02531</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huai</surname> <given-names>S.</given-names></name> <name><surname>Kong</surname> <given-names>H.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>D.</given-names></name> <name><surname>Subramaniam</surname> <given-names>R.</given-names></name> <name><surname>Makaya</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2023</year>). <article-title>On hardware-aware design and optimization of edge intelligence</article-title>. <source>IEEE Des. Test</source> <volume>40</volume>, <fpage>149</fpage>&#x02013;<lpage>162</lpage>. <pub-id pub-id-type="doi">10.1109/MDAT.2023.3307558</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jacob</surname> <given-names>B.</given-names></name> <name><surname>Kligys</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>B.</given-names></name> <name><surname>Zhu</surname> <given-names>M.</given-names></name> <name><surname>Tang</surname> <given-names>M.</given-names></name> <name><surname>Howard</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>&#x0201C;Quantization and training of neural networks for efficient integer-arithmetic-only inference,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>, <fpage>2704</fpage>&#x02013;<lpage>2713</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00286</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Ji</surname> <given-names>Z.</given-names></name> <name><surname>Wei</surname> <given-names>Q.</given-names></name> <name><surname>Xu</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>BERT-based ranking for biomedical entity normalization</article-title>. <source>AMIA Summits Transl. Sci. Proc</source>. <volume>2020</volume>:<fpage>269</fpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/pdf/1908.03548.pdf">https://arxiv.org/pdf/1908.03548.pdf</ext-link></citation>
</ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jim&#x000E9;nez-Ruiz</surname> <given-names>E.</given-names></name> <name><surname>Cuenca Grau</surname> <given-names>B.</given-names></name></person-group> (<year>2011</year>). <article-title>&#x0201C;LogMap: logic-based and scalable ontology matching,&#x0201D;</article-title> in <source>The Semantic Web-ISWC 2011</source>, eds. L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, et al. (<publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer Berlin Heidelberg</publisher-name>), <fpage>273</fpage>&#x02013;<lpage>288</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-25073-6_18</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Gholami</surname> <given-names>A.</given-names></name> <name><surname>Yao</surname> <given-names>Z.</given-names></name> <name><surname>Mahoney</surname> <given-names>M. W.</given-names></name> <name><surname>Keutzer</surname> <given-names>K.</given-names></name></person-group> (<year>2021</year>). <article-title>I-BERT: integer-only BERT quantization</article-title>. <source>arXiv preprint arXiv:2101.01321</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2101.01321</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kolyvakis</surname> <given-names>P.</given-names></name> <name><surname>Kalousis</surname> <given-names>A.</given-names></name> <name><surname>Kiritsis</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;DeepAlignment: unsupervised ontology matching with refined word vectors,&#x0201D;</article-title> in <source>Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)</source>, eds. M. Walker, H. Ji, A. Stent (<publisher-loc>New Orleans, Louisiana</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>787</fpage>&#x02013;<lpage>798</lpage>. <pub-id pub-id-type="doi">10.18653/v1/N18-1072</pub-id><pub-id pub-id-type="pmid">36568019</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Koptient</surname> <given-names>A.</given-names></name> <name><surname>Grabar</surname> <given-names>N.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Rated lexicon for the simplification of medical texts,&#x0201D;</article-title> in <source>The Fifth International Conference on Informatics and Assistive Technologies for Health-Care, Medical Support and Wellbeing HEALTHINFO 2020</source>, <publisher-loc>Porto, Portugal</publisher-loc>. <pub-id pub-id-type="doi">10.3233/SHTI210170</pub-id><pub-id pub-id-type="pmid">38112605</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lambrix</surname> <given-names>P.</given-names></name></person-group> (<year>2004</year>). <article-title>&#x0201C;Ontologies in bioinformatics and systems biology,&#x0201D;</article-title> in <source>Artificial Intelligence Methods And Tools For Systems Biology</source>, eds. D. Werner, and A. Francisco (<publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Springer Netherlands</publisher-name>), <fpage>129</fpage>&#x02013;<lpage>145</lpage>. <pub-id pub-id-type="doi">10.1007/1-4020-2865-2_8</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Denker</surname> <given-names>J.</given-names></name> <name><surname>Solla</surname> <given-names>S.</given-names></name></person-group> (<year>1989</year>). <article-title>&#x0201C;Optimal brain damage,&#x0201D;</article-title> in <source>Proceedings of the 3rd International Conference on Neural Information Processing Systems, NIPS&#x00027;89</source> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>598</fpage>&#x02013;<lpage>605</lpage>.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>J.</given-names></name> <name><surname>Yoon</surname> <given-names>W.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>D.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>So</surname> <given-names>C. H.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>. <source>Bioinformatics</source> <volume>36</volume>, <fpage>1234</fpage>&#x02013;<lpage>1240</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz682</pub-id><pub-id pub-id-type="pmid">31501885</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>F.</given-names></name> <name><surname>Shareghi</surname> <given-names>E.</given-names></name> <name><surname>Meng</surname> <given-names>Z.</given-names></name> <name><surname>Basaldella</surname> <given-names>M.</given-names></name> <name><surname>Collier</surname> <given-names>N.</given-names></name></person-group> (<year>2021</year>). Self-alignment pretraining for biomedical entity representations. in <source>Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>, eds. K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, et al. (<publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>4228</fpage>&#x02013;<lpage>4238</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2021.naacl-main.334</pub-id><pub-id pub-id-type="pmid">36568019</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Logeswaran</surname> <given-names>L.</given-names></name> <name><surname>Chang</surname> <given-names>M.-W.</given-names></name> <name><surname>Lee</surname> <given-names>K.</given-names></name> <name><surname>Toutanova</surname> <given-names>K.</given-names></name> <name><surname>Devlin</surname> <given-names>J.</given-names></name> <name><surname>Lee</surname> <given-names>H.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Zero-shot entity linking by reading entity descriptions,&#x0201D;</article-title> in <source>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>, eds. A. Korhonen, D. Traum, L. M&#x000E0;rquez (<publisher-loc>Florence, Italy</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>3449</fpage>&#x02013;<lpage>3460</lpage>. <pub-id pub-id-type="doi">10.18653/v1/P19-1335</pub-id><pub-id pub-id-type="pmid">36568019</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Micikevicius</surname> <given-names>P.</given-names></name> <name><surname>Narang</surname> <given-names>S.</given-names></name> <name><surname>Alben</surname> <given-names>J.</given-names></name> <name><surname>Diamos</surname> <given-names>G.</given-names></name> <name><surname>Elsen</surname> <given-names>E.</given-names></name> <name><surname>Garcia</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>&#x0201C;Mixed precision training,&#x0201D;</article-title> in <source>International Conference on Learning Representations</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://openreview.net/forum?id=r1gs9JgRZ">https://openreview.net/forum?id=r1gs9JgRZ</ext-link></citation>
</ref>
<ref id="B33">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Mikolov</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>K.</given-names></name> <name><surname>Corrado</surname> <given-names>G.</given-names></name> <name><surname>Dean</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>&#x0201C;Efficient estimation of word representations in vector space,&#x0201D;</article-title> in <source>1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1301.3781">http://arxiv.org/abs/1301.3781</ext-link><pub-id pub-id-type="pmid">31752376</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nagel</surname> <given-names>M.</given-names></name> <name><surname>van Baalen</surname> <given-names>M.</given-names></name> <name><surname>Blankevoort</surname> <given-names>T.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Data-free quantization through weight equalization and bias correction</article-title>. <source>arXiv preprint arXiv:1906.04721</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1906.04721</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Neumann</surname> <given-names>M.</given-names></name> <name><surname>King</surname> <given-names>D.</given-names></name> <name><surname>Beltagy</surname> <given-names>I.</given-names></name> <name><surname>Ammar</surname> <given-names>W.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;ScispaCy: fast and robust models for biomedical natural language processing,&#x0201D;</article-title> in <source>Proceedings of the 18th BioNLP Workshop and Shared Task</source>, eds. D. Demner-Fushman, K. B. Cohen, S. Ananiadou, J. Tsujii (<publisher-loc>Florence, Italy</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>319</fpage>&#x02013;<lpage>327</lpage>. <pub-id pub-id-type="doi">10.18653/v1/W19-5034</pub-id><pub-id pub-id-type="pmid">36568019</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nowlan</surname> <given-names>S. J.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>1992</year>). <article-title>Simplifying neural networks by soft weight-sharing</article-title>. <source>Neural Comput</source>. <volume>4</volume>, <fpage>473</fpage>&#x02013;<lpage>493</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1992.4.4.473</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>J.-H.</given-names></name> <name><surname>Kim</surname> <given-names>K.-M.</given-names></name> <name><surname>Lee</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>Quantized sparse training: a unified trainable framework for joint pruning and quantization in DNNs</article-title>. <source>ACM Trans. Embed. Comput. Syst</source>. <volume>21</volume>:<fpage>60</fpage>. <pub-id pub-id-type="doi">10.1145/3524066</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Qu</surname> <given-names>Z.</given-names></name> <name><surname>Zhou</surname> <given-names>Z.</given-names></name> <name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Thiele</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Adaptive loss-aware quantization for multi-bit networks,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>7985</fpage>&#x02013;<lpage>7994</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR42600.2020.00801</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rakka</surname> <given-names>M.</given-names></name> <name><surname>Fouda</surname> <given-names>M. E.</given-names></name> <name><surname>Khargonekar</surname> <given-names>P.</given-names></name> <name><surname>Kurdahi</surname> <given-names>F.</given-names></name></person-group> (<year>2022</year>). <article-title>Mixed-precision neural <italic>networks</italic>: a survey</article-title>. <source>arXiv preprint arXiv:2208.06064</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2208.06064</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rokh</surname> <given-names>B.</given-names></name> <name><surname>Azarpeyvand</surname> <given-names>A.</given-names></name> <name><surname>Khanteymoori</surname> <given-names>A.</given-names></name></person-group> (<year>2023</year>). <article-title>A comprehensive survey on model quantization for deep neural networks in image classification</article-title>. <source>ACM Trans. Intell. Syst. Technol.</source> <volume>14</volume>:<fpage>97</fpage>. <pub-id pub-id-type="doi">10.1145/3623402</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roy</surname> <given-names>S.</given-names></name> <name><surname>Mehera</surname> <given-names>R.</given-names></name> <name><surname>Pal</surname> <given-names>R.</given-names></name> <name><surname>Bandyopadhyay</surname> <given-names>S.</given-names></name></person-group> (<year>2023</year>). <article-title>Hyperparameter optimization for deep neural network models: a comprehensive study on methods and techniques</article-title>. <source>Innov. Syst. Softw. Eng</source>. <volume>21</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1007/s11334-023-00540-3</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schaefer</surname> <given-names>C. J.</given-names></name> <name><surname>Lambert-Shirzad</surname> <given-names>N.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Chou</surname> <given-names>C.</given-names></name> <name><surname>Jablin</surname> <given-names>T.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2023</year>). <article-title>Augmenting Hessians with inter-layer dependencies for mixed-precision post-training quantization</article-title>. <source>arXiv preprint arXiv:2306.04879</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2306.04879</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname> <given-names>S.</given-names></name> <name><surname>Zhen</surname> <given-names>D.</given-names></name> <name><surname>Ye</surname> <given-names>J.</given-names></name> <name><surname>Ma</surname> <given-names>L.</given-names></name> <name><surname>Yao</surname> <given-names>Z.</given-names></name> <name><surname>Gholami</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Q-BERT: hessian based ultra low precision quantization of BERT</article-title>. <source>Proc. AAAI Conf. Artif. Intell</source>. <volume>34</volume>, <fpage>8815</fpage>&#x02013;<lpage>8821</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v34i05.6409</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shivapakash</surname> <given-names>S.</given-names></name> <name><surname>Jain</surname> <given-names>H.</given-names></name> <name><surname>Hellwich</surname> <given-names>O.</given-names></name> <name><surname>Gerfers</surname> <given-names>F.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;A power efficient multi-bit accelerator for memory prohibitive deep neural networks,&#x0201D;</article-title> in <source>2020 IEEE International Symposium on Circuits and Systems (ISCAS)</source>, <fpage>1</fpage>&#x02013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS45731.2020.9180868</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sung</surname> <given-names>M.</given-names></name> <name><surname>Jeon</surname> <given-names>H.</given-names></name> <name><surname>Lee</surname> <given-names>J.</given-names></name> <name><surname>Kang</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Biomedical entity representations with synonym marginalization,&#x0201D;</article-title> in <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)</source> (<publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>3641</fpage>&#x02013;<lpage>3650</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2020.acl-main.335</pub-id><pub-id pub-id-type="pmid">36400329</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Teiss&#x000E8;dre</surname> <given-names>C.</given-names></name> <name><surname>Belkacem</surname> <given-names>T.</given-names></name> <name><surname>Arens</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Similarit&#x000E9; s&#x000E9;mantique entre phrases : apprentissage par transfert interlingue (semantic sentence similarity: multilingual transfer learning),&#x0201D;</article-title> in <source>Actes de la 6e conf&#x000E9;rence conjointe Journ&#x000E9;es d&#x00027;&#x000C9;tudes sur la Parole (JEP, 33e &#x000E9;dition), Traitement Automatique des Langues Naturelles (TALN, 27e &#x000E9;dition), Rencontre des &#x000C9;tudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (R&#x000C9;CITAL, 22e &#x000E9;dition)</source>, eds. R. Cardon, N. Grabar, C. Grouin, T. Hamon (<publisher-name>Atelier D&#x000C9;fi Fouille de Textes</publisher-name>: <publisher-loc>Nancy, France. ATALA et AFCP</publisher-loc>), <fpage>97</fpage>&#x02013;<lpage>107</lpage>.</citation>
</ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vaswani</surname> <given-names>A.</given-names></name> <name><surname>Shazeer</surname> <given-names>N.</given-names></name> <name><surname>Parmar</surname> <given-names>N.</given-names></name> <name><surname>Uszkoreit</surname> <given-names>J.</given-names></name> <name><surname>Jones</surname> <given-names>L.</given-names></name> <name><surname>Gomez</surname> <given-names>A. N.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>&#x0201C;Attention is all you need,&#x0201D;</article-title> in <source>Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, NIPS&#x00027;17</source> (<publisher-loc>Red Hook, NY</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>), <fpage>6000</fpage>&#x02013;<lpage>6010</lpage>.</citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>L. L.</given-names></name> <name><surname>Bhagavatula</surname> <given-names>C.</given-names></name> <name><surname>Neumann</surname> <given-names>M.</given-names></name> <name><surname>Lo</surname> <given-names>K.</given-names></name> <name><surname>Wilhelm</surname> <given-names>C.</given-names></name> <name><surname>Ammar</surname> <given-names>W.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Ontology alignment in the biomedical domain using entity definitions and context,&#x0201D;</article-title> in <source>Proceedings of the BioNLP 2018 workshop</source>, eds. D. Demner-Fushman, K. B. Cohen, S. Ananiadou, J. Tsujii (<publisher-loc>Melbourne, Australia</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>47</fpage>&#x02013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.18653/v1/W18-2306</pub-id><pub-id pub-id-type="pmid">36568019</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Afzal</surname> <given-names>N.</given-names></name> <name><surname>Fu</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Shen</surname> <given-names>F.</given-names></name> <name><surname>Rastegar-Mojarad</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>MedSTS: A resource for clinical semantic textual similarity</article-title>. <source>arXiv preprint arXiv:1808.09397</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1808.09397</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Lu</surname> <given-names>Y.</given-names></name> <name><surname>Blankevoort</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Differentiable joint pruning and quantization for hardware efficiency,&#x0201D;</article-title> in <source>Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXIX</source> (<publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>), <fpage>259</fpage>&#x02013;<lpage>277</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-58526-6_16</pub-id><pub-id pub-id-type="pmid">35675247</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>H.</given-names></name> <name><surname>Judd</surname> <given-names>P.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Isaev</surname> <given-names>M.</given-names></name> <name><surname>Micikevicius</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>Integer quantization for deep learning inference: principles and empirical evaluation</article-title>. <source>arXiv preprint arXiv:2004.09602</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2004.09602</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>L.</given-names></name> <name><surname>Petroni</surname> <given-names>F.</given-names></name> <name><surname>Josifoski</surname> <given-names>M.</given-names></name> <name><surname>Riedel</surname> <given-names>S.</given-names></name> <name><surname>Zettlemoyer</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Scalable zero-shot entity linking with dense entity retrieval,&#x0201D;</article-title> in <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>, eds. B. Webber, T. Cohn, Y. He, Y. Liu (<publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>6397</fpage>&#x02013;<lpage>6407</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2020.emnlp-main.519</pub-id><pub-id pub-id-type="pmid">36568019</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xiao</surname> <given-names>G.</given-names></name> <name><surname>Lin</surname> <given-names>J.</given-names></name> <name><surname>Seznec</surname> <given-names>M.</given-names></name> <name><surname>Wu</surname> <given-names>H.</given-names></name> <name><surname>Demouth</surname> <given-names>J.</given-names></name> <name><surname>Han</surname> <given-names>S.</given-names></name></person-group> (<year>2024</year>). <article-title>SmoothQuant: Accurate and efficient post-training quantization for large language models</article-title>. <source>arXiv preprint arXiv:2211.10438</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2211.10438</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>D.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Bethard</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;A generate-and-rank framework with semantic type regularization for biomedical concept normalization,&#x0201D;</article-title> in <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>, eds. J. Dan, C. Joyce, S. Natalie, and T. Joel (<publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>8452</fpage>&#x02013;<lpage>8464</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2020.acl-main.748</pub-id><pub-id pub-id-type="pmid">36568019</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Z.</given-names></name> <name><surname>Hsu</surname> <given-names>Y.-C.</given-names></name> <name><surname>Huang</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks</article-title>. <source>arXiv preprint arXiv:1709.00513</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1709.00513</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>H.</given-names></name> <name><surname>Gui</surname> <given-names>S.</given-names></name> <name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Automatic neural network compression by sparsity-quantization joint learning: a constrained optimization-based approach,&#x0201D;</article-title> in <source>2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Los Alamitos, CA</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>), <fpage>2175</fpage>&#x02013;<lpage>2185</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR42600.2020.00225</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>P.-H.</given-names></name> <name><surname>Wu</surname> <given-names>S.-S.</given-names></name> <name><surname>Klopp</surname> <given-names>J. P.</given-names></name> <name><surname>Chen</surname> <given-names>L.-G.</given-names></name> <name><surname>Chien</surname> <given-names>S.-Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Joint pruning and quantization for extremely sparse neural networks</article-title>. <source>arXiv preprint arXiv:2010.01892</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2010.01892</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Cheng</surname> <given-names>H.</given-names></name> <name><surname>Vashishth</surname> <given-names>S.</given-names></name> <name><surname>Wong</surname> <given-names>C.</given-names></name> <name><surname>Xiao</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>&#x0201C;Knowledge-rich self-supervision for biomedical entity linking,&#x0201D;</article-title> <source>Findings of the Association for Computational Linguistics: EMNLP 2022</source>, eds. Y. Goldberg, Z. Kozareva, Y. Zhang (<publisher-loc>Abu Dhabi</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>868</fpage>&#x02013;<lpage>880</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2022.findings-emnlp.61</pub-id><pub-id pub-id-type="pmid">36568019</pub-id></citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>R.</given-names></name> <name><surname>Hu</surname> <given-names>Y.</given-names></name> <name><surname>Dotzel</surname> <given-names>J.</given-names></name> <name><surname>Sa</surname> <given-names>C. D.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name></person-group> (<year>2019</year>). <article-title>Improving neural network quantization without retraining using outlier channel splitting</article-title>. <source>arXiv preprint arXiv:1901.09504</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1901.09504</pub-id></citation>
</ref>
</ref-list>
</back>
</article>