<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Bioeng. Biotechnol.</journal-id>
<journal-title>Frontiers in Bioengineering and Biotechnology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Bioeng. Biotechnol.</abbrev-journal-title>
<issn pub-type="epub">2296-4185</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1228846</article-id>
<article-id pub-id-type="doi">10.3389/fbioe.2024.1228846</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Bioengineering and Biotechnology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Generative data augmentation and automated optimization of convolutional neural networks for process monitoring</article-title>
<alt-title alt-title-type="left-running-head">Schiemer et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fbioe.2024.1228846">10.3389/fbioe.2024.1228846</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Schiemer</surname>
<given-names>Robin</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2254697/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>R&#xfc;dt</surname>
<given-names>Matthias</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1812784/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Hubbuch</surname>
<given-names>J&#xfc;rgen</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/960035/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Institute of Process Engineering in Life Sciences</institution>, <institution>Section IV: Biomolecular Separation Engineering</institution>, <institution>Karlsruhe Institute of Technology (KIT)</institution>, <addr-line>Karlsruhe</addr-line>, <country>Germany</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Institute of Life Technologies</institution>, <institution>HES-SO Valais-Wallis</institution>, <addr-line>Sion</addr-line>, <country>Switzerland</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/204761/overview">Krist V. Gernaey</ext-link>, Technical University of Denmark, Denmark</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/512754/overview">Mario A. Torres-Acosta</ext-link>, Monterrey Institute of Technology and Higher Education (ITESM), Mexico</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/2569456/overview">Pedram Ramin</ext-link>, Technical University of Denmark, Denmark</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: J&#xfc;rgen Hubbuch, <email>juergen.hubbuch@kit.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>31</day>
<month>01</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>12</volume>
<elocation-id>1228846</elocation-id>
<history>
<date date-type="received">
<day>25</day>
<month>05</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>01</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2024 Schiemer, R&#xfc;dt and Hubbuch.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Schiemer, R&#xfc;dt and Hubbuch</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Chemometric modeling for spectral data is considered a key technology in biopharmaceutical processing to realize real-time process control and release testing. Machine learning (ML) models have been shown to increase the accuracy of various spectral regression and classification tasks, remove challenging preprocessing steps for spectral data, and promise to improve the transferability of models when compared to commonly applied, linear methods. The training and optimization of ML models require large data sets which are not available in the context of biopharmaceutical processing. Generative methods to extend data sets with realistic <italic>in silico</italic> samples, so-called data augmentation, may provide the means to alleviate this challenge. In this study, we develop and implement a novel data augmentation method for generating <italic>in silico</italic> spectral data based on local estimation of pure component profiles for training convolutional neural network (CNN) models using four data sets. We simultaneously tune hyperparameters associated with data augmentation and the neural network architecture using Bayesian optimization. Finally, we compare the optimized CNN models with partial least-squares regression models (PLS) in terms of accuracy, robustness, and interpretability. The proposed data augmentation method is shown to produce highly realistic spectral data by adapting the estimates of the pure component profiles to the sampled concentration regimes. Augmenting CNNs with the <italic>in silico</italic> spectral data is shown to improve the prediction accuracy for the quantification of monoclonal antibody (mAb) size variants by up to 50% in comparison to single-response PLS models. Bayesian structure optimization suggests that multiple convolutional blocks are beneficial for model accuracy and enable transfer across different data sets. Model-agnostic feature importance methods and synthetic noise perturbation are used to directly compare the optimized CNNs with PLS models. This enables the identification of wavelength regions critical for model performance and suggests increased robustness against Gaussian white noise and wavelength shifts of the CNNs compared to the PLS models.</p>
</abstract>
<kwd-group>
<kwd>chemometrics</kwd>
<kwd>convolutional neural networks</kwd>
<kwd>process analytical technology</kwd>
<kwd>data augmentation</kwd>
<kwd>hyperparameter optimization</kwd>
<kwd>feature importance</kwd>
</kwd-group>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Bioprocess Engineering</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1 Introduction</title>
<p>Driven by the FDA initiative in 2004 (<xref ref-type="bibr" rid="B21">FDA, 2004</xref>), process analytical technology (PAT) has evolved in the past two decades from niche applications to a frequently applied tool widely used in the biopharmaceutical research and manufacturing (<xref ref-type="bibr" rid="B57">Read et al., 2010a</xref>; <xref ref-type="bibr" rid="B58">Read et al., 2010b</xref>; <xref ref-type="bibr" rid="B79">&#xdc;ndey et al., 2010</xref>; <xref ref-type="bibr" rid="B26">Glassey et al., 2011</xref>; <xref ref-type="bibr" rid="B67">R&#xfc;dt et al., 2017b</xref>; <xref ref-type="bibr" rid="B70">Sauer et al., 2019</xref>; <xref ref-type="bibr" rid="B84">Wei et al., 2022</xref>; <xref ref-type="bibr" rid="B81">Wang et al., 2022</xref>). PAT allows to monitor and control processes efficiently and provides means for real-time release testing or in-process prediction of product quality attributes (<xref ref-type="bibr" rid="B34">Jiang et al., 2017</xref>; <xref ref-type="bibr" rid="B44">Markl et al., 2020</xref>). Optical spectroscopic techniques such as ultraviolet/visible (UV/Vis), Infrared (IR) and Raman spectroscopy have been shown to enable real-time monitoring across a wide range of pharmaceutical processes (<xref ref-type="bibr" rid="B5">Bakeev, 2005</xref>; <xref ref-type="bibr" rid="B23">Feidl et al., 2019</xref>; <xref ref-type="bibr" rid="B76">Trampu&#x17e; et al., 2020</xref>; <xref ref-type="bibr" rid="B63">Romann et al., 2022</xref>; <xref ref-type="bibr" rid="B60">Rolinger et al., 2023</xref>). In combination with multivariate data analysis, these techniques are, e.g., suitable for quantifying product and impurity species from process data (<xref ref-type="bibr" rid="B15">Capito et al., 2013</xref>; <xref ref-type="bibr" rid="B11">Brestrich et al., 2016</xref>, <xref ref-type="bibr" rid="B12">Brestrich et al., 2018</xref>; <xref ref-type="bibr" rid="B66">R&#xfc;dt et al., 2017a</xref>), identify unknown sample compositions (<xref ref-type="bibr" rid="B39">Liu et al., 2017</xref>; <xref ref-type="bibr" rid="B83">Wegner and Hubbuch, 2022</xref>), or determine product modifications (<xref ref-type="bibr" rid="B38">Li et al., 2018</xref>; <xref ref-type="bibr" rid="B90">Zhang et al., 2019a</xref>; <xref ref-type="bibr" rid="B68">Sanden et al., 2019</xref>) owing to their fast and non-invasive characteristics and high selectivity in protein analysis.</p>
<p>Current approaches to the quantitative analysis of spectroscopic data heavily rely on multivariate linear regression methods such as partial least-squares regression (PLS) (<xref ref-type="bibr" rid="B6">Banner et al., 2021</xref>). Due to the linear behavior, these models typically need a limited number of samples for robust calibration and provide comprehensible metrics for critical model evaluation and interpretation (<xref ref-type="bibr" rid="B85">Wold et al., 2001</xref>). Machine learning (ML) methods have gradually been applied to the field of chemometrics and have been shown to sometimes outperform linear methods on various regression and classification tasks, employing artificial neural networks (ANNs) (<xref ref-type="bibr" rid="B40">Long et al., 1990</xref>; <xref ref-type="bibr" rid="B69">Santos et al., 2005</xref>), Gaussian process regression (GPR) (<xref ref-type="bibr" rid="B19">Cui and Fearn, 2017</xref>; <xref ref-type="bibr" rid="B43">Malek et al., 2018</xref>), support vector machines (SVMs) (<xref ref-type="bibr" rid="B19">Cui and Fearn, 2017</xref>), k-nearest neighbor (kNN) (<xref ref-type="bibr" rid="B82">Wang et al., 2023</xref>) or convolutional neural networks (CNNs) (<xref ref-type="bibr" rid="B1">Acquarelli et al., 2017</xref>; <xref ref-type="bibr" rid="B9">Bjerrum et al., 2017</xref>; <xref ref-type="bibr" rid="B20">Cui and Fearn, 2018</xref>; <xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>; <xref ref-type="bibr" rid="B55">Passos and Mishra, 2021</xref>; <xref ref-type="bibr" rid="B62">Rolinger et al., 2021</xref>; <xref ref-type="bibr" rid="B82">Wang et al., 2023</xref>). Next to the increased accuracy, ML models were found to reduce the amount of preprocessing needed prior to spectral modeling (<xref ref-type="bibr" rid="B20">Cui and Fearn, 2018</xref>; <xref ref-type="bibr" rid="B62">Rolinger et al., 2021</xref>; <xref ref-type="bibr" rid="B78">Tulsyan et al., 2021</xref>; <xref ref-type="bibr" rid="B71">Schiemer et al., 2023</xref>) and increase robustness against variability in the data (<xref ref-type="bibr" rid="B20">Cui and Fearn, 2018</xref>; <xref ref-type="bibr" rid="B88">Yuanyuan and Zhibin, 2018</xref>). Major obstacles to successfully deploy these models for process monitoring in biopharmaceutical operations are the required amount of data for model calibration (<xref ref-type="bibr" rid="B77">Tulsyan et al., 2019</xref>; <xref ref-type="bibr" rid="B6">Banner et al., 2021</xref>), the high number of hyperparameters (<xref ref-type="bibr" rid="B56">Passos and Mishra, 2022</xref>) as well as the necessity for universally applicable diagnostic tools to reduce the black-box character of these models (<xref ref-type="bibr" rid="B14">Burkart and Huber, 2021</xref>).</p>
<p>In other branches of ML, where data is more abundantly available, nonlinear methods are in many applications state-of-the-art. Major advances have been made in natural language processing or image analysis by using generative techniques such as data augmentation to further increase the amount and variability of data for building models (<xref ref-type="bibr" rid="B73">Shorten and Khoshgoftaar, 2019</xref>; <xref ref-type="bibr" rid="B24">Feng et al., 2021</xref>). In <xref ref-type="bibr" rid="B9">Bjerrum et al. (2017)</xref>, the authors first introduced a data augmentation method used for chemometric CNN models based on simple mathematical modifications of the underlying spectral data to induce artificial offset or slope effects and wavelength shifts. This method was generalized by <xref ref-type="bibr" rid="B10">Blazhko et al. (2021)</xref> using the theory obtained from extended multiplicative scatter correction. Both mentioned approaches solely address the variations in the spectral domain and do not extract component-specific information for augmenting experimental data. Other ML approaches have been tested using generative adversarial networks (GANs) (<xref ref-type="bibr" rid="B86">Wu et al., 2021</xref>; <xref ref-type="bibr" rid="B48">Mishra and Herrmann, 2021</xref>; G. <xref ref-type="bibr" rid="B46">McHardy et al., 2023</xref>) or variational autoencoders (VAEs) (<xref ref-type="bibr" rid="B31">Guo et al., 2020</xref>), where the different input data are projected onto so-called latent structures before they are recombined into <italic>in silico</italic> representations. Both GANs and VAEs involve neural network structures and hence increase the overall complexity of the approach due to additional hyperparameters. Alternatively, the feature dimension of the experimental data may be extended by stacking the outputs of multiple preprocessing methods as proposed in (<xref ref-type="bibr" rid="B52">Mishra and Passos, 2021d</xref>; <xref ref-type="bibr" rid="B55">Passos and Mishra, 2021</xref>), however, not addressing the limitation in the number of samples.</p>
<p>Finding the right architecture for the underlying problem and tuning the hyperparameters remains a challenging and laborious task due to a high-dimensional search space and long computation times compared to linear methods (<xref ref-type="bibr" rid="B25">Feurer and Hutter, 2019</xref>). While several scholars have proposed rather complex architectures resulting in a large number of trainable parameters (<xref ref-type="bibr" rid="B9">Bjerrum et al., 2017</xref>; <xref ref-type="bibr" rid="B39">Liu et al., 2017</xref>; <xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>) for their chemometric CNNs, others chose simple architectures employing solely one convolutional layer to maintain interpretability (<xref ref-type="bibr" rid="B1">Acquarelli et al., 2017</xref>; <xref ref-type="bibr" rid="B20">Cui and Fearn, 2018</xref>). Automating the process of architecture search and hyperparameter tuning, which is commonly referred to as hyperparameter optimization (HPO), reduces the amount of manual work needed to build ML models and helps to identify the best overall configuration. Model-based HPO methods such as Bayesian optimization have been shown to be more efficient at finding the global optimum for computer vision (<xref ref-type="bibr" rid="B8">Bergstra et al., 2013</xref>) and chemometrics (<xref ref-type="bibr" rid="B55">Passos and Mishra, 2021</xref>, <xref ref-type="bibr" rid="B56">2022</xref>; <xref ref-type="bibr" rid="B62">Rolinger et al., 2021</xref>) compared to randomized or grid-based approaches.</p>
<p>While linear methods such as PLS are well understood and many evaluation metrics exist to assess model quality, ML models are often considered black boxes due to the increased amount of parameters and different mathematical principles. For CNNs, various visualization methods exist to understand the trained convolutions and the corresponding feature importance (<xref ref-type="bibr" rid="B89">Zeiler and Fergus, 2013</xref>; <xref ref-type="bibr" rid="B87">Yosinski et al., 2015</xref>). Gradient-weighted class activation maps (GradCAMs) as proposed in <xref ref-type="bibr" rid="B72">Selvaraju et al. (2020)</xref> have already been applied to chemometrics (<xref ref-type="bibr" rid="B50">Mishra and Passos, 2021b</xref>; <xref ref-type="bibr" rid="B55">Passos and Mishra, 2021</xref>) to provide quantitative insights into the contributions of a specific wavelength. However, GradCAMs are not directly comparable to conventional evaluation metrics for PLS models such as regression coefficients or otherwise computed PLS-specific importance metrics. Additive feature attribution methods such as Shapley additive explanations (SHAP) (<xref ref-type="bibr" rid="B41">Lundberg and Lee, 2017</xref>) or Shapley additive global importance (SAGE) (<xref ref-type="bibr" rid="B17">Covert et al., 2020a</xref>; <xref ref-type="bibr" rid="B18">Covert et al., 2020b</xref>) provide model-agnostic frameworks to compute quantitative feature importance based on multivariate permutations.</p>
<p>In this manuscript, we develop and implement a novel data augmentation method for generating synthetic spectral data based on the local estimation of the pure component profiles. We further establish a holistic modeling workflow for chemometric data considering data augmentation, HPO, and interpretation. The herein calibrated CNN models are evaluated using three different data sets from protein chromatography employing UV/Vis spectroscopy as well as one publicly available data set using IR spectroscopy. Firstly, the suitability of the proposed data augmentation method to enlarge small experimental data sets is demonstrated and a systematic tuning of the method is performed. Secondly, the optimal configuration of the CNN model is determined by automated HPO. Thirdly, we assess the interpretability of the optimized models by quantification of the importance of individual wavelengths. Finally, the robustness and transferability of the optimized CNNs are studied by <italic>in silico</italic> perturbations and model transfer to external data sets.</p>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>2 Materials and methods</title>
<p>The evaluation of chemometric CNNs in this manuscript involves multiple steps which are performed on the basis of four data sets. <xref ref-type="fig" rid="F1">Figure 1</xref> provides an illustrative overview of the individual steps and the data sets used within each step. In this section, the methodology for the individual steps is explained in detail.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Illustration of the study design. The study can largely be divided into 5 steps: 1) Data augmentation, 2) parameter study, 3) automated HPO, 4) model interpretability assessment and 5) robustness and transferability evaluation. The round boxes in the bottom left corners indicate which data set was used for which step. Data set 1 consists of three model proteins, namely Rib A, Cyt C, Lys. Data set 2 and 3 are comprised of mAb size variants and data set 4 stems from samples of corn.</p>
</caption>
<graphic xlink:href="fbioe-12-1228846-g001.tif"/>
</fig>
<sec id="s2-1">
<title>2.1 Data and equipment</title>
<p>In this study, four data sets based on spectroscopic data were used for the evaluation of the herein-presented methods. Data sets 1&#x2013;3 originate from chromatography experiments of which the experimental details were presented elsewhere (<xref ref-type="bibr" rid="B11">Brestrich et al., 2016</xref>; <xref ref-type="bibr" rid="B12">Brestrich et al., 2018</xref>). Data set 4 was presented in (<xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>). A summary of the experimental conditions and the data subsets reserved for training and testing of the developed models are given in <xref ref-type="sec" rid="s11">Supplementary Table S1</xref>. For data sets 1&#x2013;3, the training and test sets were chosen as presented in the referred literature, where the rationale was to evaluate the trained models on independent chromatography experiments with varying process conditions. For data set 4, a random split was used as no additional information about the underlying experiments was available. In the following, we will refer to all data points in a data set as samples. One sample consists of an absorption spectrum and the corresponding concentration values obtained by fraction analytics. In data sets 1 to 3, all samples stem from fractions of chromatography elution peaks.</p>
<sec id="s2-1-1">
<title>2.1.1 Data set 1</title>
<p>Experimental procedures for data set 1 can be found in (<xref ref-type="bibr" rid="B11">Brestrich et al., 2016</xref>). The data set consists of 233 samples stemming from five chromatography experiments with varying elution conditions. From each experiment, fractions were collected and analyzed for the concentrations of the three protein components ribonuclease A (Rib A), cytochrome C (Cyt C), and lysozyme (Lys). The experiments were monitored by UV/Vis spectroscopy using a wavelength range of 240&#xa0;nm&#x2013;300&#xa0;nm at a resolution of 1&#xa0;nm resulting in 61 features for regression modeling. This data set involves well-studied model proteins and therefore serves for method development within this study.</p>
</sec>
<sec id="s2-1-2">
<title>2.1.2 Data set 2</title>
<p>Data set 2 consists of 432 samples stemming from four chromatography experiments with varying elution conditions. The experimental procedures for the data set can be found in (<xref ref-type="bibr" rid="B12">Brestrich et al., 2018</xref>). The concentrations of monoclonal antibody (mAb) monomers and aggregates were obtained by fraction analytics. For UV/Vis monitoring, a variable path length spectrometer was used operating at a wavelength range of 240&#xa0;nm&#x2013;340&#xa0;nm at a resolution of 2&#xa0;nm, resulting in 51 features for regression modeling.</p>
</sec>
<sec id="s2-1-3">
<title>2.1.3 Data set 3</title>
<p>Experimental procedures for data set 3 can be found in (<xref ref-type="bibr" rid="B11">Brestrich et al., 2016</xref>). The data set consists of 348 samples stemming from three chromatography experiments with varying elution conditions. The concentrations of the mAb size variants low molecular weight species (LMWS), monomers, high molecular weight species (HMWS)1, and HMWS2 were obtained by fraction analytics. The experiments were monitored by UV/Vis spectroscopy using a wavelength range of 240&#xa0;nm&#x2013;300&#xa0;nm at a resolution of 1&#xa0;nm, resulting in 61 features for regression modeling.</p>
</sec>
<sec id="s2-1-4">
<title>2.1.4 Data set 4</title>
<p>Data set 4 was obtained from (<xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>) and consists of in total of 80 samples from IR spectroscopy. The data was obtained from analyzing corn samples and the contents of oil, protein and starch were given as reference values. The spectral range is 1,100&#xa0;cm<sup>&#x2212;1</sup> to 2,500&#xa0;cm<sup>&#x2212;1</sup>&#xa0;at a resolution of 2&#xa0;cm<sup>&#x2212;1</sup>, resulting in 701 features for regression modeling. The training and test subsets were assigned using a randomized 80:20% split.</p>
</sec>
<sec id="s2-1-5">
<title>2.1.5 Hardware and software</title>
<p>Data analysis was done in Python 3.8. Data augmentation was performed using <italic>numpy</italic> (v. 1.19.5), <italic>scikit-learn</italic> (v. 1.1.1) and <italic>scipy</italic> (v. 1.7.3). CNNs were implemented in <italic>tensorflow</italic> (v. 2.5.0). HPO was done in <italic>optuna</italic> (v. 3.1.0) in connection with a MySQL<sup>TM</sup>8.0 database and <italic>PyMySQL</italic> (v. 1.0.2). SHAP values were computed using <italic>shap</italic> (v. 0.41.0). All computations were done using a workstation equipped with AMD Ryzen 9 3900X 12-core processor and 32GB of memory operating Microsoft Windows 10.</p>
</sec>
</sec>
<sec id="s2-2">
<title>2.2 Data augmentation</title>
<p>Before describing the data augmentation method mathematically, the motivation is laid out. In spectroscopy, each molecule is considered to possess a unique spectrum characterized by well-defined extinction coefficients. However, in practical scenarios, various factors such as detector saturation, noise, wavelength shifts, or interfering buffer species can influence the observed absorption spectra. The proposed data augmentation method aims to incorporate these effects by local approximations of the pure component profiles. The method can largely be divided into three consecutive steps: 1) Concentration density approximation, 2) subset selection, and 3) spectra generation. An illustrative overview of the data augmentation method is presented in <xref ref-type="fig" rid="F2">Figure 2</xref>. Mathematically, a given data set can be described as <inline-formula id="inf1">
<mml:math id="m1">
<mml:mi mathvariant="bold">X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> for <italic>i</italic> &#x2208; [1, <italic>M</italic>] with <italic>M</italic> being the total number of samples and <inline-formula id="inf2">
<mml:math id="m2">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> and <inline-formula id="inf3">
<mml:math id="m3">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> being absorbance spectra with <italic>N</italic> wavelengths and concentrations of <italic>P</italic> components, respectively. First, the value distribution in <bold>Y</bold> is approximated for each column using a kernel-density estimation as implemented in <italic>scipy.gaussian_kde</italic>. From the approximated distribution, a random concentration vector <inline-formula id="inf4">
<mml:math id="m4">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is sampled. The distance between the sampled vector <inline-formula id="inf5">
<mml:math id="m5">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and all instances in <bold>Y</bold> is then computed according to Eq. <xref ref-type="disp-formula" rid="e1">(1)</xref>
<disp-formula id="e1">
<mml:math id="m6">
<mml:mi>d</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mi mathvariant="normal">f</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mspace width="0.17em"/>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(1)</label>
</disp-formula>with <italic>l</italic> being the order of the vector norm. As a second step, a number of <italic>n</italic>
<sub>LSA</sub> samples with the smallest values <inline-formula id="inf6">
<mml:math id="m7">
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> are selected to form a local subset of available samples with matrices <inline-formula id="inf7">
<mml:math id="m8">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">Y</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>. In the third step, pure-component profiles are estimated based on these local subsets by solving the linear problem as given by<disp-formula id="e2">
<mml:math id="m9">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">S</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">Y</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:math>
<label>(2)</label>
</disp-formula>where <inline-formula id="inf8">
<mml:math id="m10">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">S</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> are the estimated pure-component profiles for the local subsets <inline-formula id="inf9">
<mml:math id="m11">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">Y</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>. The solution of <xref ref-type="disp-formula" rid="e2">(2)</xref> is realized by a ordinary least-squares or non-negative least squares (NNLS) solver to constrain solutions to positive values as implemented in <italic>numpy</italic> or <italic>scipy</italic>, respectively. Thirdly, an <italic>in silico</italic> spectrum <bold>x</bold>
<sub>&#x2a;</sub> is calculated by Eq. <xref ref-type="disp-formula" rid="e3">(3)</xref>
<disp-formula id="e3">
<mml:math id="m12">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">S</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Visualization of the LSA data augmentation method. The method is based on experimentally derived response data <bold>(A)</bold> and spectral data <bold>(D)</bold>. Firstly, the response data distributions are approximated by kernel density functions <bold>(B)</bold>. Second, for a specific sample for which a new spectrum is generated, vector distances for all instances in the supplied experimental data are computed. The <italic>n</italic>
<sub>LSA</sub> nearest samples in terms of most similar concentration profiles are then selected <bold>(C)</bold> to form a representative subset of the sampled concentration vector and posing as the basis to derive the pure component spectra <bold>(E)</bold>. Given the sampled concentration vector, the pure component spectra are combined into a newly generated sum spectrum <bold>(F)</bold>.</p>
</caption>
<graphic xlink:href="fbioe-12-1228846-g002.tif"/>
</fig>
<p>The motivation behind assembling a subset of samples with closely similar compositions lies in the pursuit of extracting the local differences in the pure component estimations between experimental data points, e.g., induced by concentration differences or higher noise contents. By focusing on samples that are closely related in composition, we aim to enhance the quality of the <italic>in silico</italic> spectra. As the generation of <italic>in silico</italic> spectra is based on local subsets of the available data, the method is coined local subset augmentation (LSA). A more detailed explanation of the approach can be found in the <xref ref-type="sec" rid="s11">Supplementary Section S1</xref>. To add more variation to the synthesized spectrum, a Gaussian white noise distributed as <inline-formula id="inf10">
<mml:math id="m13">
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>noise</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is added to each feature and a normally distributed wavelength shift <inline-formula id="inf11">
<mml:math id="m14">
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>shift</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is applied to the entire spectrum <inline-formula id="inf12">
<mml:math id="m15">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>. In summary, the LSA method presents several configurable parameters, namely <italic>n</italic>
<sub>LSA</sub>, <italic>l</italic>, <italic>&#x3c3;</italic>
<sub>noise</sub>, <italic>&#x3c3;</italic>
<sub>shift</sub>, and the type of solver employed to derive the pure-component profiles. These so-called hyperparameters are automatically tuned using a cross-validation scheme. The residuals between the <italic>in silico</italic> and the measured spectra were used as quality metrics and summarized by the root mean squared error (RMSE) of reconstruction. Hyperparameters associated with the LSA method were screened using a grid-based scheme and the optimal configuration with regard to the cross-validated reconstruction error RMSECV was selected. Depending on the data set, the determined local subset size served as an initial estimate and was further refined within the optimization procedure described in <xref ref-type="sec" rid="s2-3-3">Section &#x2009;2.3.3</xref>.</p>
</sec>
<sec id="s2-3">
<title>2.3 Convolutional neural networks</title>
<p>CNNs are multivariate regression models, which may be composed of several convolutional, pooling, fully connected (FC), dropout, and regularization layers. For a comprehensive, theoretical overview of CNNs the reader is referred to pertinent literature (<xref ref-type="bibr" rid="B28">Goodfellow et al., 2016</xref>; <xref ref-type="bibr" rid="B64">Rosebrock, 2018</xref>).</p>
<sec id="s2-3-1">
<title>2.3.1 Neural network architecture</title>
<p>The following design choices in neural network architecture were made based on existing studies (<xref ref-type="bibr" rid="B20">Cui and Fearn, 2018</xref>; <xref ref-type="bibr" rid="B62">Rolinger et al., 2021</xref>; <xref ref-type="bibr" rid="B56">Passos and Mishra, 2022</xref>), while aiming to keep model complexity low in order to reduce computational time during HPO. It is worth noting that the UV/Vis data mostly used in this study usually is of lower dimension than Raman or IR spectroscopy data used in other studies and hence the required model complexity may be lower. CNNs were constructed from 1 to 3 convolutional layers. For each convolutional layer, a number of 1&#x2013;10 convolutional filters with a customized filter width were defined. After the first and second convolutional layers, a maximum pooling layer was implemented using a window size of 2, effectively halving the number of features generated from the previous layer. As the convolutional filter width is constrained by the output dimension of the previous layer, the maximum allowed filter width is adjusted accordingly after each pooling step. After the convolutional block, a flattening operation was implemented to concatenate the outputs from all convolutional filters of the last convolutional layer into a one-dimensional vector. In the regression block, an FC layer with up to 100 units was used. The output layer is configured to use a rectified linear unit (reLU) activation function to restrict the prediction to positive values. As activation functions for the convolutional and FC layers, linear and hyperbolic tangent (tanh) functions were used, respectively. Other options may individually be chosen and several options were tested within this study. Depending on the chosen architecture, the number of configurable hyperparameters may greatly vary and hence a standardized workflow for optimization is required. The base architectures and corresponding hyperparameters for training and data augmentation used for data sets 1 and 2 are listed in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Overview of hyperparameters used for data augmentation, CNN architecture and training for data sets 1 and 2.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="center">Category</th>
<th rowspan="2" align="left">Hyperparameter</th>
<th rowspan="2" align="left">Base</th>
<th colspan="2" align="center">Data set 1</th>
<th colspan="2" align="center">Data set 2</th>
</tr>
<tr>
<th align="left">Initial</th>
<th align="left">Optimized</th>
<th align="left">Initial</th>
<th align="left">Optimized</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="6" align="left">Data augmentation</td>
<td align="left">Number of gen. Samples</td>
<td align="left">1e5</td>
<td align="left">1e5</td>
<td align="left">1e5</td>
<td align="left">1e5</td>
<td align="left">1e5</td>
</tr>
<tr>
<td align="left">Local subset size</td>
<td align="left">5</td>
<td align="left">11</td>
<td align="left">11</td>
<td align="left">13</td>
<td align="left">12</td>
</tr>
<tr>
<td align="left">Distance norm</td>
<td align="left">2</td>
<td align="left">2</td>
<td align="left">2</td>
<td align="left">1</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Solver type</td>
<td align="left">NNLS</td>
<td align="left">NNLS</td>
<td align="left">NNLS</td>
<td align="left">NNLS</td>
<td align="left">NNLS</td>
</tr>
<tr>
<td align="left">Std. White noise</td>
<td align="left">0.001</td>
<td align="left">0.001</td>
<td align="left">0.001</td>
<td align="left">0.001</td>
<td align="left">0.001</td>
</tr>
<tr>
<td align="left">Std. Wavelength shift</td>
<td align="left">0.01</td>
<td align="left">0.01</td>
<td align="left">0.01</td>
<td align="left">0.03</td>
<td align="left">0.03</td>
</tr>
<tr>
<td rowspan="10" align="left">Model architecture</td>
<td align="left">Number of convolutional layers</td>
<td align="left">1</td>
<td align="left">1</td>
<td align="left">2 (1&#x2013;3)</td>
<td align="left">1</td>
<td align="left">3 (1&#x2013;3)</td>
</tr>
<tr>
<td align="left">Number of conv. Filters</td>
<td align="left">5</td>
<td align="left">5</td>
<td align="left">[2, 7] (1&#x2013;10)</td>
<td align="left">5</td>
<td align="left">[2, 10, 8] (1&#x2013;10)</td>
</tr>
<tr>
<td align="left">Filter width</td>
<td align="left">9</td>
<td align="left">9</td>
<td align="left">[3, 9] (3&#x2013;61)</td>
<td align="left">9</td>
<td align="left">[7, 5, 11] (3&#x2013;51)</td>
</tr>
<tr>
<td align="left">Pooling width</td>
<td align="left">2</td>
<td align="left">2</td>
<td align="left">2</td>
<td align="left">2</td>
<td align="left">2</td>
</tr>
<tr>
<td align="left">Number of FC units</td>
<td align="left">12</td>
<td align="left">12</td>
<td align="left">29 (5&#x2013;100)</td>
<td align="left">12</td>
<td align="left">9 (5&#x2013;100)</td>
</tr>
<tr>
<td align="left">Activation function conv. Layer</td>
<td align="left">linear</td>
<td align="left">linear</td>
<td align="left">linear</td>
<td align="left">linear</td>
<td align="left">linear</td>
</tr>
<tr>
<td align="left">Activation function FC units</td>
<td align="left">sigmoid</td>
<td align="left">tanh</td>
<td align="left">tanh</td>
<td align="left">tanh</td>
<td align="left">tanh</td>
</tr>
<tr>
<td align="left">Initialization function weights</td>
<td align="left">glorot uniform</td>
<td align="left">random uniform</td>
<td align="left">random uniform</td>
<td align="left">random uniform</td>
<td align="left">random uniform</td>
</tr>
<tr>
<td align="left">Dropout rate</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0 (0&#x2013;0.3)</td>
<td align="left">0</td>
<td align="left">0.07</td>
</tr>
<tr>
<td align="left">Regularization factor</td>
<td align="left">0</td>
<td align="left">0</td>
<td align="left">0 (10<sup>&#x2212;9</sup>&#x2013;10<sup>&#x2212;3</sup>)</td>
<td align="left">0</td>
<td align="left">3.07 &#xd7; 10<sup>&#x2212;9</sup> (10<sup>&#x2212;9</sup>&#x2013;10<sup>&#x2212;3</sup>)</td>
</tr>
<tr>
<td rowspan="4" align="left">Model training</td>
<td align="left">Learning rate</td>
<td align="left">10<sup>&#x2212;3</sup>
</td>
<td align="left">10<sup>&#x2212;3</sup>
</td>
<td align="left">10<sup>&#x2212;3</sup>
</td>
<td align="left">10<sup>&#x2212;3</sup>
</td>
<td align="left">10<sup>&#x2212;3</sup>
</td>
</tr>
<tr>
<td align="left">Batch size</td>
<td align="left">100</td>
<td align="left">100</td>
<td align="left">100</td>
<td align="left">100</td>
<td align="left">100</td>
</tr>
<tr>
<td align="left">Optimizer</td>
<td align="left">Adam</td>
<td align="left">Adam</td>
<td align="left">Adam</td>
<td align="left">Adam</td>
<td align="left">Adam</td>
</tr>
<tr>
<td align="left">Patience</td>
<td align="left">4</td>
<td align="left">4</td>
<td align="left">4</td>
<td align="left">4</td>
<td align="left">4</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The base configuration refers to hyperparameter values used during the single-factor parameter study. Data augmentation parameters were derived from tuning of the LSA method and a simple CNN architecture was assumed. The initial configuration refers to settings adapted after the parameter study and served as a comparison for the optimized models. The optimized configuration refers to values derived from HPO. Here, the square brackets indicate the optimized parameter values. Multiple numbers are given for the determined values for the individual layers. The parenthesis denote the search spaces during HPO. If no search space is given, the parameter was not included in the optimization. For the convolutional filter width, the allowed maximum filter width was configured to halve with each additional layer due to the interposed pooling layers.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s2-3-2">
<title>2.3.2 Training, cross-validation and testing</title>
<p>In the context of training CNN models, data produced by the LSA method are denoted <italic>in silico</italic> data, while experimental data are split into training and test data as listed in <xref ref-type="sec" rid="s11">Supplementary Table S1</xref>. For the remainder of the manuscript, we will further refer to <italic>calibration</italic> as the process of fitting the CNN model weights and <italic>training</italic> as an entire cycle of generating <italic>in silico</italic> data, fitting the weights, and evaluation based on the training data.</p>
<p>To train a CNN model, the <italic>in silico</italic> subset is generated solely based on the training data, and the CNN model is solely calibrated on the <italic>in silico</italic> data. For cross-validation, a similar data setup was used. Here, the experimental training data is rotated in a <italic>leave-one-group-out</italic> scheme, i.e., holding out one of the assigned training experiments. During each rotation, the experimental training data are split into rotation-specific training and test sets. Again, the <italic>in silico</italic> data are generated based on the rotation-specific training data and are further used to calibrate the CNN model. In both cases, training and cross-validation, the assigned training data serve to evaluate the stopping criteria. The CNN models were calibrated for a maximum of 100 epochs using the mean squared error (MSE) of all responses as the loss function and the stochastic gradient-based optimizer referred to as Adam (<xref ref-type="bibr" rid="B35">Kingma and Ba, 2015</xref>). For all parameter studies and HPO, the number of generated samples was set to 10<sup>5</sup>, and model calibration was stopped when the loss of the assigned validation set did not improve for 4 consecutive epochs further referred to as <italic>patience</italic>. Finally, the CNN models were evaluated on the independent test set, which has not been used for <italic>in silico</italic> data generation or cross-validation.</p>
<p>To study the effect of hyperparameters associated with data generation, CNN architecture, or training on model performance, hyperparameters were varied in a one-factor-at-a-time scheme while all other parameters remained constant. Model performance was measured using the cross-validation error RMSECV across all response variables and the optimal settings were adapted as the base configuration for subsequent HPO. This parameter study was solely conducted with data set 1. The findings for the initial configuration were then also used for HPO for data set 2.</p>
</sec>
<sec id="s2-3-3">
<title>2.3.3 Hyperparameter optimization</title>
<p>HPO routines were implemented in <italic>optuna</italic> for data sets 1 and 2. For both data sets, a combination of a randomized and a tree-parzen estimator (TPE)-based sampler was used. A random sampling of hyperparameters in pre-configured ranges was performed for the first 100 trials when the optimizer switched to the TPE. The TPE can be used to optimize continuous, discrete, and categorical variables at the same time using a Bayesian approach based on kernel-density estimations. For the technical details and the theory of the method, the reader is referred to <xref ref-type="bibr" rid="B8">Bergstra et al. (2013)</xref>; <xref ref-type="bibr" rid="B3">Akiba et al. (2019)</xref>. The sum of the component-specific cross-validated coefficients of determination <inline-formula id="inf13">
<mml:math id="m16">
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>CV,&#x2009;i</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> was used as the objective value. Automated pruning of unpromising hyperparameter combinations was configured to set in after 100 trials and was triggered when <inline-formula id="inf14">
<mml:math id="m17">
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>train,&#x2009;i</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> was lower than the median of all previously reported trials. This effectively reduces the computation time as cross-validation does not need to be performed for the pruned trials. The search spaces for the optimizer were determined based on previously conducted individual parameter studies and are listed in <xref ref-type="table" rid="T1">Table 1</xref>. A MySQL<sup>TM</sup>&#x2009; database was used to facilitate distributed computation to accelerate HPO. In total, the optimization was run over 500 trials. However, the effective number of finished trials differs due to automated pruning.</p>
<p>Among the top 5 models, the best candidate was selected based on quantitative metrics such as training and cross-validation performance as well as qualitative metrics such as model complexity. The selected model was retrained for 10 repetitions using a modified patience of 10. The obtained performance metrics were compared with optimized PLS models as measured by the normalized error <inline-formula id="inf15">
<mml:math id="m18">
<mml:mi mathvariant="normal">N</mml:mi>
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">M</mml:mi>
<mml:mi mathvariant="normal">S</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">M</mml:mi>
<mml:mi mathvariant="normal">S</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> with <inline-formula id="inf16">
<mml:math id="m19">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> being the arithmetic mean of the observed concentration for the respective data subset. As a second baseline comparison, the optimized CNN models were trained without any prior data augmentation. Therefore, the models were trained for 300 epochs and the early stopping criteria were disabled. The training data were divided randomly into 80/20% calibration-validation subsets to determine after which epoch the best performance was achieved.</p>
</sec>
</sec>
<sec id="s2-4">
<title>2.4 Partial least squares regression</title>
<p>PLS models were implemented in <italic>scikit-learn</italic> using the non-linear iterative partial least squares (NIPALS) algorithm. While the CNNs were used as multi-response models, i.e., predicting all target species using the same model, single-response PLS models were used for each component. Spectral data were preprocessed using a Savitzky-Golay filter (SGF) and mean-centered. HPO for PLS models was performed using a grid-based scheme. Therefore, the number of PLS components (1&#x2013;10), the order of derivative (0&#x2013;2) and the width of the smoothing window of the SGF (3&#x2013;31) were varied in pre-configured ranges as stated in parenthesis. The SGF was used with a second-degree polynomial. The optimal configuration was chosen using the cross-validated and scaled sum of squared errors SSECV<sub>scaled</sub> according to <xref ref-type="bibr" rid="B85">Wold et al. (2001)</xref> as given by Eq. <xref ref-type="disp-formula" rid="e4">(4)</xref>
<disp-formula id="e4">
<mml:math id="m20">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">S</mml:mi>
<mml:mi mathvariant="normal">S</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
<mml:mi mathvariant="normal">V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>scaled</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mstyle>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>PLS</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf17">
<mml:math id="m21">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> and <italic>y</italic>
<sub>
<italic>i</italic>
</sub> denote the predicted and observed response values for a sample <italic>i</italic>, respectively, and <italic>n</italic>
<sub>PLS</sub> designates the number of PLS components.</p>
</sec>
<sec id="s2-5">
<title>2.5 Feature importance</title>
<p>To quantitatively evaluate the importance of individual wavelengths, GradCAM and SHAP were employed. While GradCAM can solely be applied to the CNNs, SHAP is model-agnostic and can therefore be used to directly compare CNN and PLS models. For the PLS models, the regression coefficients and variable importance in projection (VIP) scores were used as evaluation metrics. Feature importance techniques were solely employed using the optimized models from HPO for data set 1 and 2.</p>
<sec id="s2-5-1">
<title>2.5.1 Gradient-weighted class activation maps</title>
<p>Guided GradCAM is a response-discriminative localization technique which was proposed in <xref ref-type="bibr" rid="B72">Selvaraju et al. (2020)</xref> and was implemented according to <xref ref-type="bibr" rid="B64">Rosebrock (2018)</xref>. GradCAM can be largely divided into three steps: 1) Computation of backward gradients with respect to each response variable and the last convolutional layer for one specific input spectrum, 2) global average pooling of the computed gradients along the wavelength dimension to obtain a single weighting value for each filter in the last convolutional layer, and 3) computation of the GradCAM estimate. In cases, where pooling layers are used in between convolutional layers, the localization estimate is of reduced dimension compared to the original input spectrum and is thus linearly interpolated to match the original dimension.</p>
</sec>
<sec id="s2-5-2">
<title>2.5.2 Shapley additive explanations</title>
<p>SHAP is a model-agnostic additive feature attribution method derived from economic game theory and can be used to quantify feature importance which are in turn referred to as SHAP values. To compute a SHAP value for a specific wavelength, the absorbance values in a given data set are randomly permuted and replaced by absorbance values sampled from a conditional distribution. Every sample in the given data set is permuted <italic>d</italic> times and passed through the regression model to obtain the model prediction for the permuted input spectrum. The permuted model prediction <italic>w</italic>(<italic>S</italic>) is compared to the prediction using the original data <italic>w</italic>(<italic>S</italic> &#x222a; {<italic>i</italic>}). The SHAP value <italic>&#x3d5;</italic>
<sub>
<italic>i</italic>
</sub>(<italic>w</italic>) for a feature <italic>i</italic> and a permutation cycle <italic>w</italic> with <italic>d</italic> permutations (<italic>w</italic>
<sub>1</sub>, &#x2026;, <italic>w</italic>
<sub>
<italic>d</italic>
</sub>) is according to <xref ref-type="bibr" rid="B41">Lundberg and Lee (2017)</xref> then defined as Eq. <xref ref-type="disp-formula" rid="e5">(5)</xref>
<disp-formula id="e5">
<mml:math id="m22">
<mml:msub>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>&#x2286;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>&#x5c;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mtable class="matrix">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mi>d</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>&#x222a;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>w</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(5)</label>
</disp-formula>where <italic>D</italic> and <italic>S</italic> denote the entire feature set and the permuted feature subset, respectively. To incorporate inter-dependency due to collinearity between multiple wavelengths, conditional sampling is performed for multiple wavelengths at the same time and repeated for a fixed number of permutations. For a deeper overview of the theory and discrepancies to closely related methods, we refer to (<xref ref-type="bibr" rid="B41">Lundberg and Lee, 2017</xref>; <xref ref-type="bibr" rid="B18">Covert et al., 2020b</xref>; <xref ref-type="bibr" rid="B17">Covert et al., 2020a</xref>; <xref ref-type="bibr" rid="B7">Belle and Papantonis, 2021</xref>). To compute SHAP values within this study, the test subsets for data sets 1 and 2 were used to obtain permuted input spectra. All computations were done as implemented in <italic>shap</italic> using the <italic>PermutationExplainer</italic>. In total, 10<sup>5</sup> permutations per input spectrum were used.</p>
</sec>
<sec id="s2-5-3">
<title>2.5.3 Variable importance in projection</title>
<p>The VIP scores are a common metric to assess variable importance in PLS models next to the regression coefficients. According to (<xref ref-type="bibr" rid="B47">Mehmood et al., 2012</xref>), the VIP score <italic>v</italic>
<sub>
<italic>j</italic>
</sub> for a wavelength <italic>j</italic> &#x2208; [1, <italic>N</italic>] is defined as Eq. <xref ref-type="disp-formula" rid="e6">(6)</xref>
<disp-formula id="e6">
<mml:math id="m23">
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:munderover>
</mml:mstyle>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:mfenced open="&#x2016;" close="&#x2016;">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
<mml:mo>/</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:munderover>
</mml:mstyle>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msqrt>
<mml:mo>,</mml:mo>
</mml:math>
<label>(6)</label>
</disp-formula>where <italic>w</italic>
<sub>
<italic>a</italic>
</sub>, <italic>q</italic>
<sub>
<italic>a</italic>
</sub>, and <italic>t</italic>
<sub>
<italic>a</italic>
</sub> denote the loading weights, the y-loadings and the scores vector corresponding to the PLS component <italic>a</italic> &#x2208; [1, <italic>A</italic>], respectively. The total number of wavelengths is given by <italic>N</italic>.</p>
</sec>
</sec>
<sec id="s2-6">
<title>2.6 Robustness and transferability</title>
<p>The robustness and transferability of the LSA method and the optimized CNN were evaluated by an <italic>in silico</italic> noise perturbation study and a model transfer to two external data sets based on UV/Vis and IR spectroscopy.</p>
<sec id="s2-6-1">
<title>2.6.1 <italic>In silico</italic> noise perturbation</title>
<p>To compensate for increasingly noisy data, both model types were evaluated with modified generated data sets with an increasing level of white noise <inline-formula id="inf18">
<mml:math id="m24">
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>noise</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> with <italic>&#x3c3;</italic>
<sub>noise</sub> of {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1}&#x2009;mAU and an increasing level of axial wavelength shifts <inline-formula id="inf19">
<mml:math id="m25">
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>shift</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> with <italic>&#x3c3;</italic>
<sub>shift</sub> of {0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3}&#x2009;nm. We differentiate between models solely being trained on the noise-free data set and models being retrained for each noise level. The latter models are referred to as &#x201c;retrained.&#x201d;</p>
</sec>
<sec id="s2-6-2">
<title>2.6.2 Model transfer to data set 3</title>
<p>To evaluate model performance on a data set without additional HPO, the optimized model from data set 2 was transferred to data set 3. The CNN input and output dimensions were adjusted according to the experimental data. The CNN models were trained in the optimized configuration for 10 repetitions and evaluated against the PLS models which were optimized as described in <xref ref-type="sec" rid="s2-4">Section 2.4</xref>.</p>
</sec>
<sec id="s2-6-3">
<title>2.6.3 Model transfer to data set 4</title>
<p>To compare the LSA method to the augmentation method presented in <xref ref-type="bibr" rid="B10">Blazhko et al. (2021)</xref>, which will be referred to as extended multiplicative scatter augmentation (EMSA), both methods were applied to data sets 2 and 4. The CNN architecture found for data set 2 was therefore transferred to data set 4 without additional HPO. To account for the higher dimension of the IR data, the pooling window size of the first CNN layer was adjusted to 4 and the number of FC units was raised to 25. The EMSA method was obtained from <xref ref-type="bibr" rid="B10">Blazhko et al. (2021)</xref> and used with its default configurations. The LSA method was tuned as described in <xref ref-type="sec" rid="s2-2">Section 2.2</xref>. IR spectra were preprocessed using a second derivative SGF with a window size of 19 and second-order polynomial, as this was reported to improve the performance of the EMSA method (<xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>).</p>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>3 Results</title>
<sec id="s3-1">
<title>3.1 Generating highly realistic <italic>in silico</italic> spectra from experimental data</title>
<p>The LSA method was used to generate <italic>in silico</italic> UV/Vis absorbance spectra based on the assigned training data as previously described in <xref ref-type="sec" rid="s2-2">Section 2.2</xref>. To systematically evaluate the suitability of the proposed data augmentation method and tune the corresponding hyperparameters, the LSA method was used to reconstruct the experimental data in a cross-validation scheme. The reconstruction accuracy for the cross-validation and test subsets for both data sets as measured by the RMSECV are displayed in <xref ref-type="fig" rid="F3">Figures 3A, B, D, E</xref> with regard to the local subset size and the standard deviation of the applied wavelength shift. The local subset size strongly affects the reconstruction RMSECV and shows optima at sizes of 5 and 13 samples for data sets 1 and 2, respectively. In both cases, the RMSECV remains stable for small wavelength shifts and grows exponentially starting at 0.1&#xa0;nm. The Cityblock norm is observed to slightly improve the RMSECV for data set 2 compared to the Euclidean norm while the overall impact on RMSECV is considerably small in comparison to the local subset size. While the Cityblock norm uses absolute differences, the Euclidean norm is based on squared differences and hence can affect the selection of local subsets depending on the concentration ranges in the samples. The residuals for the spectra in the test sets are displayed in <xref ref-type="fig" rid="F3">Figures 3C, F</xref> and show maximal deviations of 5 and 15% for data sets 1 and 2, respectively, as measured by the maximum deviation normalized by the maximum absorbance in the corresponding run. As LSA is based on local subsets of the experimental data, the pure component profiles differ depending on the selected data points. <xref ref-type="fig" rid="F4">Figure 4</xref> shows the local pure component profiles for all components from data sets 1 and 2 for concentration samples stemming from the test data. The color of the lines indicates the concentration of the respective component in the corresponding sample. The dashed lines indicate the global pure component profiles using the entire training data for estimation instead of the local subsets. For all components, the local profiles are scattered around the global profiles with larger deviations for samples in extraordinarily high or low concentration regimes. Particularly for data set 2, these effects are visible for the aggregate component, where the local pure component profiles strongly deviate from the global estimates for low concentration regimes by a factor <inline-formula id="inf20">
<mml:math id="m26">
<mml:mo>&#x3e;</mml:mo>
</mml:math>
</inline-formula>10. Contrarily, for the monomer species low concentration regimes cause the spectra to capture an increased level of noise in the data.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Tuning of the LSA data augmentation method for data set 1 <bold>(A&#x2013;C)</bold> and data set 2 <bold>(D&#x2013;F)</bold>. From left to right, the reconstruction RMSE in dependence of the local subset size, the reconstruction RMSE in dependence of the standard deviation of the wavelength shift and a heatmap of wavelength-specific residuals for the respective test subsets are shown. The errors in <bold>(A, B, D, E)</bold> are shown for fixed levels of white noise <italic>&#x3c3;</italic>
<sub>noise</sub> &#x3d; 0.001.</p>
</caption>
<graphic xlink:href="fbioe-12-1228846-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Local estimations of the pure component profiles for Rib A <bold>(A)</bold>, Cyt C <bold>(B)</bold>, Lys <bold>(C)</bold> for data set 1 and monomer <bold>(D)</bold> and aggregate <bold>(E)</bold> species for data set 2. The pure component spectra are shown for each sample in the test data (solid lines) with a local subset size of 11 for the data augmentation method and are colored according to the concentration of the corresponding component with darker colors denote high concentrations. The pure component spectra estimates using the entire training data are shown as dashed black lines.</p>
</caption>
<graphic xlink:href="fbioe-12-1228846-g004.tif"/>
</fig>
<p>In summary, LSA provides a concentration-adaptive data augmentation method by leveraging variations in the spectral and concentration domain. The tuned LSA method can generate highly realistic <italic>in silico</italic> spectra and can hence be used to augment experimental data sets.</p>
</sec>
<sec id="s3-2">
<title>3.2 Training convolutional networks with augmented spectral data</title>
<p>To study the effect of hyperparameters associated with data augmentation, CNN training, and model architecture on predictive performance, hyperparameters were varied in a one-factor-at-a-time scheme while all other parameters remained constant. The base configuration of the CNN for this parameter study can be found in <xref ref-type="table" rid="T1">Table 1</xref>. The obtained performance as measured by the predictive RMSECV for all response variables for data set 1 are displayed in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Influence of selected hyperparameters on the predictive cross-validation performance (RMSECV) for each component in data set 1. The selected hyperparameters can be categorized in three groups: generation parameters <bold>(A&#x2013;C)</bold>, model parameters <bold>(D&#x2013;I)</bold> and training parameters <bold>(J&#x2013;L)</bold>.</p>
</caption>
<graphic xlink:href="fbioe-12-1228846-g005.tif"/>
</fig>
<p>Considering the data augmentation hyperparameters in <xref ref-type="fig" rid="F5">Figures 5A&#x2013;C</xref>, the kernel-density estimation (KDE) sampling surpasses the performances obtained by uniform and normal sampling. The local subet size shows a stable performance for Cyt C and Lys between 5 and 15 samples, while the RMSECV for Rib A is reduced by approximately 25% by increasing <italic>n</italic>
<sub>LSA</sub> from 5 to 11, reaching a more balanced performance between all three components. This suggests that it may be beneficial to include the local subset size during HPO to ensure finding the optimal solution for all response variables. Hence, the local subset size was subsequently incorporated during HPO for both data sets 1 and 2. The number of <italic>in silico</italic> generated samples positively affects model performance and starts to plateau at 10<sup>5</sup>. As increasing the number beyond that only minorly affects model performance while increasing computational time considerably, 10<sup>5</sup> was adopted for the base configuration for HPO.</p>
<p>The influence of initialization and activation functions for convolutional and FC layers are shown in <xref ref-type="fig" rid="F5">Figures 5D&#x2013;F</xref>. Initialization only minorly affects model performance with random uniform providing the best option. While the linear function performs best for the convolutional layer, non-linear activation functions show superior accuracy for the FC layers with tanh and the exponential linear unit (ELU) returning the lowest RMSECV values for all components. Increasing the complexity of the CNN by increasing the number of convolutional layers, the number of FC units, or the size of the convolutional window, does not directly improve model performance. Performance gains are not equally distributed among all response variables and no overall trends can be extracted as depicted in <xref ref-type="fig" rid="F5">Figures 5G&#x2013;I</xref>. Regarding the training parameters, it is suggested that a learning rate of 10<sup>&#x2212;3</sup>, a batch size of 100, and the Adam optimizer provide the best options as shown in <xref ref-type="fig" rid="F5">Figures 5J&#x2013;L</xref>.</p>
<p>In summary, it was possible to identify optimal configurations for hyperparameters for data augmentation and model training. However, given the high dimensionality of the search space of the remaining hyperparameters and unequally distributed effects on model performance it is suggested to use automated HPO to identify the optimal model architecture for each data set individually.</p>
</sec>
<sec id="s3-3">
<title>3.3 Automating hyperparameter search by Bayesian optimization</title>
<p>As hyperparameter search is a multi-dimensional, computationally expensive problem, automated HPO was performed using a TPE-based Bayesian optimizer as implemented in <italic>optuna</italic>. The optimizer considers local and global hyperparameters which enables the solution of optimization problems with multiple decision levels such as the choice of the number of convolutional layers and the optimization of a set of layer-specific hyperparameters. <xref ref-type="fig" rid="F6">Figure 6</xref> presents the evolution of objective values with regard to all studied hyperparameters throughout the optimization process for data set 2 exemplarily. As shown in <xref ref-type="fig" rid="F6">Figure 6A</xref>, the transition between random and TPE-based sampling can be captured after 100 trials. The objective values form a band centered around 1.9 with scattered maxima at 1.93, with 2 being the maximum achievable objective value. The color of the points indicates the number of the optimization trials with dark blue being the end of the optimization process. The evolution profiles of global parameters (cf. <xref ref-type="fig" rid="F6">Figures 6B&#x2013;F</xref>) suggest an optimum at 3 convolutional layers with less than 25 FC units. Both regularization methods, governed by the regularization factor and the dropout rate, were found to positively influence model performance although weight regularization using a fairly low regularization factor in the range of 10<sup>&#x2212;8</sup>&#x2013;10<sup>&#x2212;6</sup> was employed in later trials. In contrast to data set 1, where the local subset size needed to be adjusted for optimal model performance, here, the previously determined value was found to be suitable for the given data set and only varied slightly between 9 and 15 in later trials. According to <xref ref-type="fig" rid="F6">Figures 6G&#x2013;L</xref>, the three-layer CNN achieves the highest accuracy for all components with a convolutional window size between 1 and 10 for all three layers. The number of convolutional filters shows no clear optimum for the first and second layers, while higher counts are found beneficial for the third layer.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Hyperparameter evolution profiles during optimization for data set 2. The objective values <italic>&#x2211;R</italic>
<sup>2</sup> are shown for all trials <bold>(A)</bold> and each hyperparameter individually <bold>(B&#x2013;L)</bold>. The colors of the circles indicate the number of HPO trials with darker shades of blue corresponding to later sampling points. Random sampling was performed for the first 100 trials after which median pruning and the TPE-based optimization were enabled.</p>
</caption>
<graphic xlink:href="fbioe-12-1228846-g006.tif"/>
</fig>
<p>For data set 1, optimal performance with an objective value close to 2.98 was achieved by employing a two-layer CNN with 2 and 7 convolutional filters, a window size of 3 and 9, and 29 FC units. Contrarily, dropout and regularization were both found to be disadvantageous. The evolution plot can be found in the <xref ref-type="sec" rid="s11">Supplementary Figure S1</xref>. The exact hyperparameters and visualization of the architecture for both optimized models can be found in <xref ref-type="table" rid="T1">Table 1</xref> and <xref ref-type="sec" rid="s11">Supplementary Figure S2</xref>, respectively.</p>
<p>The optimized CNN models were retrained for 10 repetitions using random initialization of the weights and an increased patience of 10 epochs. The obtained model predictions for training and test subsets for both data sets are summarized by the NRMSE and presented in <xref ref-type="fig" rid="F7">Figure 7</xref>. The boxplots show the distribution of errors for the 10 repetitions as a result of random initilizations and the stochastic nature of the training process. CNN models using the initial configuration obtained from tuning the augmentation method are also included for reference. As a baseline comparison, the NRMSE obtained from optimized single-response PLS models are indicated by the dashed lines. Optimized PLS hyperparameters can be found in <xref ref-type="sec" rid="s11">Supplementary Table S1</xref>. Additionally, the optimized architectures were trained without using any prior data augmentation as described in <xref ref-type="sec" rid="s2-3-3">Section 2.3.3</xref>. For reference, timely predictions for the optimized CNN and the PLS can be found in <xref ref-type="sec" rid="s11">Supplementary Figure S3</xref>.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Model performance for initial and optimized CNN models for data set 1 <bold>(A)</bold> and data set 2 <bold>(B)</bold>. The normalized error NRMSE for training and test data are shown on the left and right, respectively. For both data sets, the initial and optimized CNN model were trained 10 times with an elevated patience of 10. The arithmetic mean of the obtained performances is indicated by the solid lines within the boxes. Outliers are shown as diamonds and were defined as such when the error exceeded 1.5 times the interquartile range. The dashed lines correspond to the NRMSE obtained from an optimized PLS model using a SGF for preprocessing as a benchmark.</p>
</caption>
<graphic xlink:href="fbioe-12-1228846-g007.tif"/>
</fig>
<p>For data set 1 (cf. <xref ref-type="fig" rid="F7">Figure 7A</xref>), initial and optimized CNN models show generally lower NRMSE on average than the PLS models for the training subset for all components. In the test subset, the prediction error for Rib A is reduced by up to 50%, while the test errors for Cyt C and Lys increase. This increase can be attributed to erroneous predictions of Cyt C and Lys during the elution of Rib A as can be seen in the timely predictions in <xref ref-type="sec" rid="s11">Supplementary Figure S3</xref>. The initial CNN architecture performs slightly better on the test subset than the optimized CNN architecture as indicated by a lower average and variance. Interestingly, CNN models without prior data augmentation perform similarly to the PLS models achieving higher accuracy for Cyt C and Lys and significantly lower accuracy for Rib A compared to the CNNs using data augmentation. For data set 2, the optimized CNN model reduces the NRMSE compared to the initial architecture and the PLS model by up to 50% for the aggregate species. The accuracy for the monomer species slightly decreases compared to the PLS model with a 5% increase in the NRMSE. The reduction for the aggregate species can be attributed to the improved capture of the onset of the elution peak as can be seen in the timely prediction profiles in <xref ref-type="sec" rid="s11">Supplementary Figure S3</xref>. In contrast to data set 1, the optimized CNN without augmentation results in considerably higher NRMSE than both augmented CNN models and the PLS. In summary, HPO enabled the automated identification of the optimal model architecture for data sets 1 and 2, leading to improved quantification of Rib A and mAb aggregates, respectively, while sometimes reducing the accuracy for the other species compared to the benchmark methods.</p>
</sec>
<sec id="s3-4">
<title>3.4 Understanding model predictions through feature importance</title>
<p>To investigate the differences in model performance, model-specific and model-agnostic importance metrics were used. <xref ref-type="fig" rid="F8">Figure 8</xref> presents an overview of multiple importance metrics for data set 2. CNN-specific GradCAM localization maps are shown in <xref ref-type="fig" rid="F8">Figures 8A, B</xref>. PLS regression coefficients and VIP scores are shown in <xref ref-type="fig" rid="F8">Figures 8G, H</xref>. To enable a direct comparison between the two model types, SHAP values are illustrated in <xref ref-type="fig" rid="F8">Figures 8C&#x2013;F</xref>. The left and right columns correspond to the feature importances for the monomer and aggregate components, respectively. For the monomer species, both CNN- and PLS-SHAP values closely resemble the PLS regression coefficients with wavelengths between 260 and 280&#xa0;nm positively contributing to the model output and wavelengths between 280 and 300&#xa0;nm being assigned negative values. A similar behavior can be observed for the GradCAM estimates with both wavelength ranges being assigned positive importance as GradCAM is constrained to positive values per definition. Additionally, GradCAM identifies the border areas at the beginning and the end of the spectrum as important. The SHAP values generally confirm those observations for small wavelengths, whereas wavelengths above 315&#xa0;nm are shown to not affect the model output for both the CNN and PLS models. This is in accordance with the PLS-VIP scores as well as the spectral data as no absorbance is detected at wavelengths above 315&#xa0;nm (cf. <xref ref-type="sec" rid="s11">Supplementary Figure S4</xref>). For the aggregate species, GradCAM provides a fairly similar profile compared to the monomer species showing a shift in the importance peak from 280&#x2013;305 to 300&#x2013;305&#xa0;nm. The PLS regression coefficients support this observation with a maximum of 315&#xa0;nm for the aggregate species, while the VIP scores for all wavelengths between 280 and 340&#xa0;nm are between 0.5&#x2013;1. Regression coefficients for smaller wavelengths appear comparably noisy and without any structural integrity in contrast to the coefficients for the monomer species. While CNN-SHAP values indicate a similar profile as seen for the monomer with an inversed importance for wavelengths between 295 and 305&#xa0;nm, PLS-SHAP suggests a high degree of noise and considerably lower importance for the previously identified wavelength region relative to the remainder of the spectrum. As a means for comparison, the visualization of the convolutional layers for data set 2 can be found in <xref ref-type="sec" rid="s11">Supplementary Figure S5</xref>. For each convolutional layer and each corresponding filter, the convoluted input spectra for the test set are shown. To emphasize the differences between the monomer and aggregate components, the samples with the maximum concentration of monomers and aggregrates are marked in green and blue, respectively. While the convoluted signals in layer 1 largely resemble the original input spectra, more fluctuations between positive and negative outputs are introduced with layers 2 and 3, with some filters showing fairly similar profiles within one layer. Due to the interposed pooling layers, the resolution of the presented profiles is gradually decreasing across layers 1 to 3.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Model-specific and model-agnostic importance measures for data set 2. CNN-specific GradCAM importance values <bold>(A, B)</bold>, SHAP values for CNN and PLS <bold>(C, D)</bold> and <bold>(E, F)</bold>, respectively. Each dot in <bold>(C&#x2013;F)</bold> corresponds to the feature contribution for one specific model output in the test data set, which are colored according to the measured absorbance. The absorbances were normalized by the maximum value at 280&#xa0;nm. PLS-specific regression coefficients (shown as lines) and VIP scores (shown as bars) are presented in <bold>(G, H)</bold>. The left and right columns correspond to importance metrics for the monomer and aggregate species, respectively.</p>
</caption>
<graphic xlink:href="fbioe-12-1228846-g008.tif"/>
</fig>
<p>In summary, the employed feature importance methods point to an elevated importance for higher wavelengths for the aggregates compared to those for the monomer species and identify this area as critical for differentiation between both species. The filter visualization further provides insights into the working principle of the CNNs. Analogous to the presented comparison for data set 2, the feature importance evaluation and filter visualization for data set 1 can be found in <xref ref-type="sec" rid="s11">Supplementary Figures S6, S7</xref>, respectively.</p>
</sec>
<sec id="s3-5">
<title>3.5 Evaluating robustness and transferability</title>
<p>To evaluate the capability of the CNN and PLS models to compensate for increasingly noisy data, both model types were retrained using modified generated data sets with 1) an increasing level of Gaussian noise and 2) an increasing level of axial wavelength shifts. The model performance as summarized by the sum of NRMSE for all three components in data set 1 is shown in <xref ref-type="fig" rid="F9">Figures 9A, B</xref>. It is observed that the PLS model does generally not reach the same level of accuracy as the CNN model being trained on the generated data. For both cases, 1) and 2), the PLS model error starts to increase at lower levels of white noise and wavelength shift than the CNN model. Similarly, when both models are retrained at each noise level, the CNN model is observed to be more robust against noisy data and can slightly decrease the model error at a noise level of 0.01&#x2009;mAU or a wavelength shift of 0.03&#xa0;nm.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Evaluation of CNN robustness and transferability. Comparison of CNN and PLS model robustness for increasing noise <bold>(A)</bold> and wavelength shifts <bold>(B)</bold> in spectral data for data set 1 as measured by the normalized error NRMSE. CNN performance after model transfer to data set 3 <bold>(C)</bold>. Comparison of model performance with data augmentation conducted by the EMSA and LSA method for data set 4 <bold>(D)</bold>.</p>
</caption>
<graphic xlink:href="fbioe-12-1228846-g009.tif"/>
</fig>
<p>Using the hyperparameters determined during HPO for data set 2, the CNN was trained on data set 3 including data augmentation by the LSA method to assess the generalizability of the identified architecture. All hyperparameters were obtained from data set 2 and remained unchanged. <xref ref-type="fig" rid="F9">Figure 9C</xref> presents a comparison of CNN and PLS models as measured by the NRMSE. The transferred CNN achieves NRMSE values lower than 0.25 for all components for the training subset and decreases the observed errors for the monomer, HMWS1 and HMWS2 species by up to 50%. For the test subset, the variance of the transferred CNN increases and the error for HMWS1 increases compared to the PLS model. The PLS model fails to predict the LMWS component which was observed to depend on the cross-validation split used for optimizing the PLS models.</p>
<p>Finally, the herein-developed LSA data augmentation method was compared to the EMSA method (<xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>). Both methods were evaluated based on data sets 2 and 4 to assess the applicability of EMSA on UV/Vis data as well as the transferability of the LSA method to IR data. Therefore, the optimized CNN model from data set 2 was trained with <italic>in silico</italic> data from the LSA and the EMSA method for both data sets 2 and 4. A detailed overview of the results for both data sets including the performance of PLS models can be found in <xref ref-type="sec" rid="s11">Supplementary Table S3</xref>. <xref ref-type="fig" rid="F9">Figure 9D</xref> provides a summary of the obtained performances using the LSA method versus the EMSA method for data set 4. The LSA method in combination with a three-layer CNN could successfully be transferred to data set 4 using IR data. When compared with the EMSA method as employed in <xref ref-type="bibr" rid="B10">Blazhko et al. (2021)</xref>, the LSA method achieves accuracies of &#x2b;32%, &#x2b;4%, and &#x2212;34% in terms of the relative difference in <italic>R</italic>
<sup>2</sup> for oil, protein, and starch content in the test subset, respectively. However, the training accuracy for the CNN-EMSA model is considerably higher for all three components indicating potential overfitting of the model. When applied to data set 2, the EMSA method fails to generate high-quality data to be used for chemometric modeling resulting in a two-fold increase in RMSE for both components compared to the LSA method.</p>
<p>In summary, the optimized CNN models in combination with the LSA method present a more flexible alternative to commonly used PLS models due to their increased robustness against noisy data and good generalization capability to other UV/Vis spectroscopy data. They may further be universally applied to other types of spectroscopic data without prior HPO.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>4 Discussion</title>
<sec id="s4-1">
<title>4.1 Data augmentation</title>
<p>Data augmentation can be used to enlarge experimental data sets by extracting and recombining information from the collected data. This is considered to avoid overfitting and thus improve the generalizability of ML models (<xref ref-type="bibr" rid="B73">Shorten and Khoshgoftaar, 2019</xref>). The herein presented LSA method builds upon this idea and leverages latent spectral information in terms of pure component profiles. By employing a local subset selection, these pure component profiles are locally approximated and flexibly adapted to the concentration regime of the sample composition for which a new spectrum is generated. The local subset selection further induces the mixing of spectral information from different experimental runs. We consider the designed working principle beneficial to leverage locally available spectral information. While other methods solely take variations in the spectral domain into account, e.g., by modifying the spectra based on variations in model coefficients (<xref ref-type="bibr" rid="B9">Bjerrum et al., 2017</xref>; <xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>), the LSA method projects the information to a latent space in the spectral domain and also accounts for variations in the concentration domain. This idea is similar to algorithms employed in variational autoencoding or GANs (<xref ref-type="bibr" rid="B31">Guo et al., 2020</xref>; <xref ref-type="bibr" rid="B48">Mishra and Herrmann, 2021</xref>), while maintaining interpretability by employing simple numerical methods. By using a cross-validation-based reconstruction approach, we showed that the LSA method can be tuned by various hyperparameters and is able to reconstruct the hold-out test set of the experimental data with a maximum error of roughly 5% and 15% for data sets 1 and 2, respectively. While pure component profiles for samples within the center of the experimental data range showed slight variations around the global estimates, samples with exceptionally low or high concentration regimes were shown to incorporate more noise or considerably diverge from the global estimates. The shape and magnitude of the estimated profiles is dependent on the selected subsets. For low-concentrated samples, the selected spectra exhibit a low signal-to-noise ratio potentially causing distortions in the spectral shape or numerical inflation of the estimated profiles. Whereas for high-concentrated samples, the density of similar data points is lower resulting in local adaptations of the pure component profiles. In the case of data set 2, there are several samples almost solely containing monomer or aggregate species due to the working principle of the underlying separation process the data is taken from. When one component concentration is close to 0, the estimation algorithm will numerically inflate the component&#x2019;s profile and hence a higher absorption is observed. The described handling of different concentration regimes is considered to introduce more variance in the <italic>in silico</italic> data. Although the estimates obtained for the extreme data points are not realistic, their impact is considered to be low. It was further observed that the local subset size determined by the tuning method did not provide the optimal value in the case of data set 1. This may be explained by the low absorbance signal by Rib A causing a reduced influence on the reconstruction error compared to Cyt C and Lys (<xref ref-type="bibr" rid="B33">Hansen et al., 2011</xref>; <xref ref-type="bibr" rid="B65">R&#xfc;dt et al., 2019</xref>).</p>
<p>When comparing the LSA and the EMSA method for training CNNs, it was observed that CNN-EMSA models tend to overfit the training data, potentially caused by solely depending on variations in the spectral domain. Whereas the CNN-LSA model resulted in consistent training and test performances. When applied to UV/Vis data, the CNN-EMSA models were not able to achieve the same level of accuracy as the CNN-LSA models. As the scatter correction method, on which EMSA is based, is routinely used with vibrational spectroscopy data (<xref ref-type="bibr" rid="B45">Martens and Stark, 1991</xref>; <xref ref-type="bibr" rid="B2">Afseth and Kohler, 2012</xref>), the lower accuracy may be explained by insufficient accuracy of the estimated model or an overestimation of the variations in the UV/Vis spectra. However, it should be noted that the EMSA method was used with its default configuration (<xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>) and may potentially be tuned to be applicable to UV/Vis data.</p>
<p>Many studies published in the literature do not use any type of data augmentation before chemometric modeling with CNNs (<xref ref-type="bibr" rid="B1">Acquarelli et al., 2017</xref>; <xref ref-type="bibr" rid="B20">Cui and Fearn, 2018</xref>; <xref ref-type="bibr" rid="B43">Malek et al., 2018</xref>; <xref ref-type="bibr" rid="B91">Zhang et al., 2019b</xref>). This can either be realized by using comparably large data sets in combination with simple model architectures, e.g., in <xref ref-type="bibr" rid="B20">Cui and Fearn (2018)</xref> or by longer training on the same data (<xref ref-type="bibr" rid="B91">Zhang et al., 2019b</xref>). For the herein evaluated data sets, training CNNs without prior data augmentation resulted in unbalanced model performance with regard to all components which was comparable to the performance of the PLS model. On the one hand this suggests that non-linear models such as CNNs do not necessarily improve the accuracy compared to PLS models. In theory, the CNNs without augmentation have little leverage over PLS models as they are potentially overparameterized given the number of available training data. On the other hand this suggests that data augmentation effectively helps to extract more information from the data to enable more accurate multi-response modeling. This may especially be beneficial in cases where one of the target components contributes a comparably weak spectral signal or when all target components exhibit highly similar spectral profiles. While data set 1 comprises proteins with clearly differentiable UV/Vis absorbance spectra (<xref ref-type="bibr" rid="B33">Hansen et al., 2011</xref>), the size variants of a mAb are more challenging to distinguish (<xref ref-type="bibr" rid="B12">Brestrich et al., 2018</xref>). Based on the given results, we hypothesize that by augmenting spectral data, minor differences in spectral profiles or signals from underrepresented components can be amplified and therefore increase regression performance. Other spectroscopic methods such as IR or Raman can provide higher selectivity depending on the analytes and hence the benefits of data augmentation should be studied in more detail for those spectroscopic methods. <xref ref-type="bibr" rid="B10">Blazhko et al. (2021)</xref> pointed out that for the classification of yeast and mould species using IR data, the usage of data augmentation positively affected the classification performance.</p>
<p>In conclusion, the combination of CNNs and data augmentation via the LSA method provides a flexible, generally applicable approach for chemometric modeling of multiple quality attributes. Given the findings mentioned above, it would further be interesting to study the effect of spectral mixing introduced by the LSA method with data sets considerably larger than the ones used within this study, possibly including data from multiple spectrometers, target proteins, or cell lines.</p>
</sec>
<sec id="s4-2">
<title>4.2 Model architecture and hyperparameter optimization</title>
<p>Finding the optimal configuration for complex ML models such as CNNs is challenging due to the high-dimensional search space and potentially long computation times (<xref ref-type="bibr" rid="B25">Feurer and Hutter, 2019</xref>). In this study, automated HPO based on Bayesian optimization has been used to optimize CNN configurations for data sets 1 and 2. Here, the aim was to maximize the predictive accuracy of the CNN models, while keeping model complexity low. To weight all response variables, i.e., molecular species, equally, the optimizer used the sum of <italic>R</italic>
<sup>2</sup> of the cross-validation residuals. While equal weighting of all components may also be realized by other evaluation metrics, the sum of <italic>R</italic>
<sup>2</sup> was chosen to provide a simple figurative metric. During the optimization, the structure of the neural network was varied simultaneously with other hyperparameters such as the local subset size <italic>n</italic>
<sub>LSA</sub> as well as regularization parameters, namely the regularization factor and the dropout rate. In general, increasing the number of convolutional layers implicitly increases the number of hyperparameters to optimize as each layer is assigned an individual set of parameters. At the same time, the intermediate number of features is reduced by the pooling layers thus potentially reducing the number of units in the FC layer.</p>
<p>For data set 1, the identified architecture using two convolutional layers was found to achieve only marginally improved performance compared to the initial configuration. For the second data set, the optimized configuration could improve the initial configuration considerably with regard to the contaminating aggregate species. The ineffectiveness observed for data set 1 may be caused by a suboptimal splitting of the experimental runs or non-exhaustive exploration of the search space as well as potential overfitting on the training subset. It should however be noted that the achieved accuracy lies above 0.97 in terms of the <italic>R</italic>
<sup>2</sup> for all components and the differences between the studied models are considered marginal. In the context of chemometrics, TPE-based Bayesian optimization has previously been applied to the HPO of CNNs (<xref ref-type="bibr" rid="B55">Passos and Mishra, 2021</xref>; <xref ref-type="bibr" rid="B56">Passos and Mishra, 2022</xref>). While previous studies have used pruning and optimization based on the simple training-validation splits, the herein-used cross-validation approach is considered more robust and less prone to overfitting (<xref ref-type="bibr" rid="B61">Rolinger et al., 2020</xref>). A collection of more detailed practical remarks for using HPO can be found in <xref ref-type="sec" rid="s11">Supplementary Section S2</xref>.</p>
<p>Furthermore, it is unclear from previous studies which hyperparameters should be included during HPO. While <xref ref-type="bibr" rid="B55">Passos and Mishra (2021)</xref> vary the depth of the regression block by incorporating more FC layers, an increased number of convolutional layers was found beneficial within this study. Additionally, it was found that critical hyperparameters requiring optimization are the number of filters in each convolutional layer, the convolutional window size per layer, and the number of FC units in the regression block. Increasing the number of filters increases the model&#x2019;s capacity to learn complex features from the input data. However, a larger number of filters potentially requires a higher number of FC units and thus can cause slower training time (<xref ref-type="bibr" rid="B75">Szegedy et al., 2015</xref>). The convolutional window size determines the size of the receptive field, which affects the model&#x2019;s ability to capture relevant features from the input data (<xref ref-type="bibr" rid="B28">Goodfellow et al., 2016</xref>, chap.&#x2009;9). Smaller sizes capture local features, while larger sizes capture more global features (<xref ref-type="bibr" rid="B29">Gu et al., 2018</xref>). Here, the optimal window sizes were found to be as small as 3 for the first convolutional, with a general increasing trend for additional layers. With regard to CNN design choices, multiple suggestions have been made in literature including simple architectures with only one convolutional layer (<xref ref-type="bibr" rid="B1">Acquarelli et al., 2017</xref>; <xref ref-type="bibr" rid="B20">Cui and Fearn, 2018</xref>), or more complex models using multiple FC layers and a comparably high number of convolutional kernels (<xref ref-type="bibr" rid="B9">Bjerrum et al., 2017</xref>; <xref ref-type="bibr" rid="B55">Passos and Mishra, 2021</xref>, <xref ref-type="bibr" rid="B56">Passos and Mishra, 2022</xref>). Whereas, others adapted design choices made for widely applied computer vision architectures such as the so-called vgg-block design (<xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>) or inception modules (<xref ref-type="bibr" rid="B91">Zhang et al., 2019b</xref>) or even tried to combine the convolutional block with other types of regression models such as Gaussian process regression (<xref ref-type="bibr" rid="B43">Malek et al., 2018</xref>). Given the variety of studied architectures, it is difficult to draw a conclusion on which architecture works best for which type of spectroscopic data as they are often combined with prior preprocessing. Transferring the model architecture identified for data set 2 to data sets 3 and 4 in this study, was found suitable for modeling UV/Vis and IR spectroscopic data. The CNN with three convolutional and one FC layer, using weight and dropout regularization, was shown to work well with previously unknown data sets. Especially for data sets with few available experiments or samples, the transfer of CNN architectures is preferred over potentially unstable HPO. It has been shown that multiple factors may affect the successful transfer of chemometric CNNs (<xref ref-type="bibr" rid="B51">Mishra and Passos, 2021c</xref>; <xref ref-type="bibr" rid="B49">Mishra and Passos, 2021a</xref>), leading to a high degree of freedom in HPO.</p>
<p>While HPO has shown great potential for improving the performance of chemometric models in this study and several other cases (<xref ref-type="bibr" rid="B13">Brunel et al., 2021</xref>; <xref ref-type="bibr" rid="B55">Passos and Mishra, 2021</xref>), further studies are needed to investigate which CNN architectures are the most suitable for different types of spectroscopic data such as UV/Vis, IR, and Raman spectroscopy. In this regard, the authors advocate for a large-scale HPO study using multiple data sets at the same time. Such research could help identify the most viable CNN architectures for chemometrics in terms of accuracy, generalizability, and robustness, and pave the way to dedicated pre-trained models for biotechnological applications. Pre-trained models facilitate easier transfer from case to case by computationally less expensive transfer learning (<xref ref-type="bibr" rid="B51">Mishra and Passos, 2021c</xref>).</p>
</sec>
<sec id="s4-3">
<title>4.3 Model interpretability</title>
<p>Explainability of machine learning models has become an important topic of research in recent years (<xref ref-type="bibr" rid="B7">Belle and Papantonis, 2021</xref>; <xref ref-type="bibr" rid="B14">Burkart and Huber, 2021</xref>). In the context of spectroscopy, it is critical to understand which elements of the spectral data are rendered important and contribute to the model&#x2019;s predictions. In this study, an interpretability assessment compared the predictions of CNNs with PLS models. Four feature importance metrics were used to compare the models: Guided-GradCAM, SHAP values, PLS regression coefficients, and PLS VIP scores.</p>
<p>The results for data set 2 showed that three of four methods were able to identify a small wavelength region that was relevant for differentiating between the two target components. For data set 1, the distinction between the three target components was less pronounced as the importance profiles for Cyt C and Lys were found to be fairly similar for CNN and PLS models. For Rib A, it was possible to resolve clear differences between both model types which supports the improved performance of the CNN model. In general, the studied methods differ in their level of directness in assessing the importance of individual wavelengths. Established methods such as GradCAM or other visualization methods for CNNs (<xref ref-type="bibr" rid="B89">Zeiler and Fergus, 2013</xref>) are generally applicable to all types of model architectures (<xref ref-type="bibr" rid="B72">Selvaraju et al., 2020</xref>) and enable a fast inspection of the chemometric model. In other studies (<xref ref-type="bibr" rid="B53">Mishra et al., 2021</xref>; <xref ref-type="bibr" rid="B56">Passos and Mishra, 2022</xref>), it has been shown that GradCAM was able to identify the most suitable section of features from a preprocessing-based extension method. However, as the variable importance is computed from the feature maps of the final convolutional layer, it is directly influenced by previous normalization and pooling operations causing lower resolution profiles or potentially distorted peak locations (<xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>). Similar to GradCAM, PLS VIP scores provide a positively constrained importance metric. In combination with the PLS regression coefficients, the VIP scores are a commonly applied metric to evaluate the chemical information used by the PLS model (<xref ref-type="bibr" rid="B27">Goldrick et al., 2020</xref>; <xref ref-type="bibr" rid="B84">Wei et al., 2022</xref>) or conduct variable selection (<xref ref-type="bibr" rid="B47">Mehmood et al., 2012</xref>). While the regression coefficients display positive and negative attributions, the VIP scores help determine which variables mostly contribute to the prediction. Despite their practicality, the VIP scores are specific to PLS models and do not contain any information about the variable sensitivity.</p>
<p>Other chemometric studies exist where SHAP estimation was used to identify the most important variables for various chemometric models (<xref ref-type="bibr" rid="B32">Haghi et al., 2021</xref>; <xref ref-type="bibr" rid="B42">Mahynski et al., 2022</xref>; <xref ref-type="bibr" rid="B30">Guindo et al., 2023</xref>). By using SHAP values in this study, it was possible to identify the most relevant features in a direct comparison with the PLS model. Using SHAP or other closely related feature removal-based importance quantification techniques is computationally more expensive than analyzing regression coefficients or GradCAM importance (<xref ref-type="bibr" rid="B17">Covert et al., 2020a</xref>). Especially for spectroscopic data, where multiple wavelengths are highly correlated, permutation-based feature importance techniques should be handled carefully by ensuring a sufficient number of permutations and permuting a coalition of wavelengths rather than single wavelengths (<xref ref-type="bibr" rid="B74">&#x160;trumbelj and Kononenko, 2014</xref>). By using a single-wavelength approach based on infinitesimal perturbations in <xref ref-type="bibr" rid="B20">Cui and Fearn (2018)</xref>, the studied CNNs were observed to have increased robustness compared to PLS models. A similar single-wavelength perturbation approach was used in <xref ref-type="bibr" rid="B71">Schiemer et al. (2023)</xref> to perform automated variable selection for a Gaussian process regression model for monitoring an antibody-drug conjugation reaction.</p>
<p>Next to the identification of important features, removal-based methods such as SHAP may also be used to determine the sensitivity of specific features (<xref ref-type="bibr" rid="B18">Covert et al., 2020b</xref>) which could potentially be applied to detect failures in chemometric models. In the future, it would be interesting to employ such methods in real-time during the optimization or maintenance of chemometric models. When new data are available from a manufacturing process, individual feature contributions may be beneficial to identify potential process or model failures and hence trigger model updating mechanisms (<xref ref-type="bibr" rid="B54">Nikzad-Langerodi et al., 2018</xref>; <xref ref-type="bibr" rid="B36">Krause et al., 2021</xref>; <xref ref-type="bibr" rid="B22">FDA, 2023</xref>). In summary, we consider the usage of multiple orthogonal variable importance methods crucial to build explainable and robust chemometric models, and to ensure that the most relevant structural and chemical information present in the spectroscopic data are used for prediction.</p>
</sec>
<sec id="s4-4">
<title>4.4 Robustness</title>
<p>Spectral data are often corrupted by noise or baseline and peak shifts (<xref ref-type="bibr" rid="B9">Bjerrum et al., 2017</xref>; <xref ref-type="bibr" rid="B10">Blazhko et al., 2021</xref>). Routinely applied operations to counteract these effects are among others spectral smoothing, derivation, or corrections (<xref ref-type="bibr" rid="B4">Anderson et al., 2020</xref>; <xref ref-type="bibr" rid="B78">Tulsyan et al., 2021</xref>). Spectral data may contain different levels of noise or baseline effects due to new data being recorded with a different spectrometer (<xref ref-type="bibr" rid="B49">Mishra and Passos, 2021a</xref>), biological variability (<xref ref-type="bibr" rid="B78">Tulsyan et al., 2021</xref>) or as part of a technology transfer to another site (<xref ref-type="bibr" rid="B16">Christler et al., 2021</xref>).</p>
<p>To evaluate the robustness of CNNs compared to PLS models, an <italic>in silico</italic> noise perturbation study was performed. In summary, the CNNs were found to be more robust against increasing noise and wavelength shifts than PLS models. CNNs have been shown to be less susceptible to noise in images and electrocardiograms depending on the architecture used (<xref ref-type="bibr" rid="B59">Rodr&#xed;guez-Rodr&#xed;guez et al., 2021</xref>; <xref ref-type="bibr" rid="B80">Venton et al., 2021</xref>). In general, the increased robustness may be attributed to various features inherent to CNNs, such as convolution operations and in-built regularization. Each input is transformed by a convolution operation followed by a non-linear activation and a max-pooling operation. By stacking multiple convolutional blocks in sequence, a funnel-like structure is created which is considered to increase robustness. Secondly, regularization methods like dropout and weight regularization can lead to increased robustness (<xref ref-type="bibr" rid="B37">Krizhevsky et al., 2017</xref>). In chemometrics, <xref ref-type="bibr" rid="B20">Cui and Fearn (2018)</xref> have shown that using dropout reduced noise in the trained weights of their CNNs, while <xref ref-type="bibr" rid="B49">Mishra and Passos (2021a)</xref> could show that tuning the regularization factor led to improved performance on newly acquired data.</p>
<p>In summary, the CNNs used in this study appear to be more robust against different sorts of noise compared to PLS models, whereas it is not clear which of the mentioned aspects are the main contributors to the robustness of the model. It is thus suggested to evaluate the robustness of chemometric models in an independent study specifically designed for that purpose.</p>
</sec>
</sec>
<sec sec-type="conclusion" id="s5">
<title>5 Conclusion</title>
<p>This study demonstrates the augmentation, optimization, and interpretation of CNNs for process monitoring using spectroscopic data. The herein proposed LSA data augmentation method was shown to generate realistic <italic>in silico</italic> spectra for three data sets based on UV/Vis spectroscopy of model proteins and mAb size variants, and one data set based on IR spectroscopy. By augmenting CNN models with the <italic>in silico</italic> data, the prediction accuracy for the detection of mAb size variants could be improved by up to 50% compared to conventional PLS models.</p>
<p>Through automated optimization, the neural network architecture and the configuration of other model elements could be simultaneously tuned to maximize model performance. The combined optimization led to neural network structures with multiple convolutional blocks similar to previously published models, while most of the accuracy boost could be attributed to the data augmentation approach for UV/Vis spectroscopy data. Transferring the optimized architecture without prior HPO to a data set with multiple mAb size variants resulted in comparable or superior accuracy compared to PLS models, proving good generalizability of the optimized CNN model. Although the CNN model in combination with the LSA data augmentation method provided promising results for four data sets, further studies are required to validate the applicability of the LSA method to cases where more background effects are present such as IR or Raman spectroscopy data. Finally, the optimized CNN and PLS models were directly compared with regard to the importance of different wavelength regions and robustness against spectral noise. While model-agnostic methods such as GradCAM and VIPs were able to provide specific importance estimates for the CNN and PLS models, respectively, SHAP was able to resolve differences between the two model types directly and suggested an improved capability to discriminate between mAb size-variants. Additionally, the studied CNNs appear to be more robust against synthetic white noise and peak shifts in the spectral data. However, this property should be evaluated in more detail with real data involving biological and spectral variability.</p>
<p>In summary, this study expands on previous works on CNNs for chemometrics and proves the applicability of the suggested methods for the quantification of model proteins and mAb size variants. The deployment of the demonstrated workflow is considered to improve the accuracy, generalizability, and scalability of chemometric models used for process monitoring and control.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s7">
<title>Author contributions</title>
<p>JH initiated and supervised the work. JH and MR were involved in funding acquisition. MR derived the idea for and prototyped the data augmentation method. RS evolved the concepts and methods presented in this manuscript, advanced the data augmentation method, performed hyperparameter optimization and interpretability assessment of the established models. RS analyzed and interpreted the data, drafted the figures and wrote the manuscript. All authors reviewed and approved the final manuscript.</p>
</sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>This project received funding from the Ministry of Science, Research and the Arts of the state of Baden-W&#xfc;rttemberg within the initiative Ideenwettbewerb Biotechnologie (Grant number: 33-7533-7-11.10/11/14).</p>
</sec>
<ack>
<p>The authors would like to thank Christina H. Wegner and Annabelle Dietrich for careful proof-reading of the manuscript. We would further like to thank the Karlsruhe Graduate School for Computational and Data Science (KCDS) and particularly Sebastian Krumscheid for the valuable scientific discussions. We acknowledge support by the KIT-Publication Fund of the Karlsruhe Institute of Technology.</p>
</ack>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s11">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fbioe.2024.1228846/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fbioe.2024.1228846/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.PDF" id="SM1" mimetype="application/PDF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Acquarelli</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>van Laarhoven</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Gerretzen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tran</surname>
<given-names>T. N.</given-names>
</name>
<name>
<surname>Buydens</surname>
<given-names>L. M.</given-names>
</name>
<name>
<surname>Marchiori</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Convolutional neural networks for vibrational spectroscopic data analysis</article-title>. <source>Anal. Chim. Acta</source> <volume>954</volume>, <fpage>22</fpage>&#x2013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1016/j.aca.2016.12.010</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Afseth</surname>
<given-names>N. K.</given-names>
</name>
<name>
<surname>Kohler</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Extended multiplicative signal correction in vibrational spectroscopy, a tutorial</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>117</volume>, <fpage>92</fpage>&#x2013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2012.03.004</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Akiba</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sano</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yanase</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ohta</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Koyama</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Optuna: a next-generation hyperparameter optimization framework</article-title>. <conf-name>Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>, <conf-loc>Anchorage, AK, USA</conf-loc>, <conf-date>August 4-8, 2019</conf-date>, <fpage>2623</fpage>&#x2013;<lpage>2631</lpage>. <pub-id pub-id-type="doi">10.1145/3292500.3330701</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Anderson</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Walsh</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Subedi</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hayes</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Achieving robustness across season, location and cultivar for a NIRS model for intact mango fruit dry matter content</article-title>. <source>Postharvest Biol. Technol.</source> <volume>168</volume>, <fpage>111202</fpage>. <pub-id pub-id-type="doi">10.1016/j.postharvbio.2020.111202</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bakeev</surname>
<given-names>K. A.</given-names>
</name>
</person-group> (<year>2005</year>). <source>Process analytical technology</source>. <edition>1</edition>. <publisher-loc>Oxford, GB</publisher-loc>: <publisher-name>Blackwell Publishing</publisher-name>.</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Banner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Alosert</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Spencer</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Cheeks</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Farid</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>A decade in review: use of data analytics within the biopharmaceutical sector</article-title>. <source>Curr. Opin. Chem. Eng.</source> <volume>34</volume>, <fpage>100758</fpage>. <pub-id pub-id-type="doi">10.1016/j.coche.2021.100758</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Belle</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Papantonis</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Principles and practice of explainable machine learning</article-title>. <source>Front. Big Data</source> <volume>4</volume>, <fpage>1</fpage>&#x2013;<lpage>25</lpage>. <pub-id pub-id-type="doi">10.3389/fdata.2021.688969</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Bergstra</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yamins</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Cox</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2013</year>). &#x201c;<article-title>Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures (JMLR.org)</article-title>,&#x201d; in <conf-name>Proceedings of the 30th International Conference on Machine Learning, vol. 28, I&#x2013;115&#x2013;I&#x2013;123</conf-name>, <conf-loc>Atlanta, GA, USA</conf-loc>, <conf-date>16-21 June 2013</conf-date>.</citation>
</ref>
<ref id="B9">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Bjerrum</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Glahder</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Skov</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Data augmentation of spectral data for convolutional neural network (cnn) based deep chemometrics, 1&#x2013;10</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1710.01927">https://arxiv.org/abs/1710.01927</ext-link>.</comment>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blazhko</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Shapaval</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Kovalev</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Kohler</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Comparison of augmentation and pre-processing for deep learning and chemometric classification of infrared spectra</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>215</volume>, <fpage>104367</fpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2021.104367</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brestrich</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hahn</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Application of spectral deconvolution and inverse mechanistic modelling as a tool for root cause investigation in protein chromatography</article-title>. <source>J. Chromatogr. A</source> <volume>1437</volume>, <fpage>158</fpage>&#x2013;<lpage>167</lpage>. <pub-id pub-id-type="doi">10.1016/j.chroma.2016.02.011</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brestrich</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>R&#xfc;dt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>B&#xfc;chler</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Selective protein quantification for preparative chromatography using variable pathlength UV/Vis spectroscopy and partial least squares regression</article-title>. <source>Chem. Eng. Sci.</source> <volume>176</volume>, <fpage>157</fpage>&#x2013;<lpage>164</lpage>. <pub-id pub-id-type="doi">10.1016/j.ces.2017.10.030</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brunel</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Alsamad</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Piot</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Toward automated machine learning in vibrational spectroscopy - use and settings of genetic algorithms for pre-processing and regression optimization</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>219</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2021.104444</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burkart</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Huber</surname>
<given-names>M. F.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A survey on the explainability of supervised machine learning</article-title>. <source>J. Artif. Intell. Res.</source> <volume>70</volume>, <fpage>245</fpage>&#x2013;<lpage>317</lpage>. <pub-id pub-id-type="doi">10.1613/jair.1.12228</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Capito</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Skudas</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Kolmar</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Stanislawski</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Host cell protein quantification by fourier transform mid infrared spectroscopy (ft-mir)</article-title>. <source>Biotechnol. Bioeng.</source> <volume>110</volume>, <fpage>252</fpage>&#x2013;<lpage>259</lpage>. <pub-id pub-id-type="doi">10.1002/bit.24611</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Christler</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Scharl</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sauer</surname>
<given-names>D. G.</given-names>
</name>
<name>
<surname>K&#xf6;ppl</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tschelie&#xdf;nig</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Toy</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Technology transfer of a monitoring system to predict product concentration and purity of biopharmaceuticals in real-time during chromatographic separation</article-title>. <source>Biotechnol. Bioeng.</source> <volume>118</volume>, <fpage>3941</fpage>&#x2013;<lpage>3952</lpage>. <comment>Publisher: John Wiley and Sons Inc</comment>. <pub-id pub-id-type="doi">10.1002/bit.27870</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Covert</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Lundberg</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S.-I.</given-names>
</name>
</person-group> (<year>2020a</year>). <article-title>Explaining by removing: a unified framework for model explanation</article-title>. <source>J. Mach. Learn. Res.</source> <volume>22</volume>, <fpage>1</fpage>&#x2013;<lpage>90</lpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Covert</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Lundberg</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S.-I.</given-names>
</name>
</person-group> (<year>2020b</year>). <source>Understanding global feature contributions with additive importance measures. Advances in Neural Information Processing Systems 2020-Decem</source>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2004.00668">https://arxiv.org/abs/2004.00668</ext-link>.</comment>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cui</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Fearn</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Comparison of partial least squares regression, least squares support vector machines, and Gaussian process regression for a near infrared calibration</article-title>. <source>J. Near Infrared Spectrosc.</source> <volume>25</volume>, <fpage>5</fpage>&#x2013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1177/0967033516678515</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cui</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Fearn</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Modern practical convolutional neural networks for multivariate regression: applications to nir calibration</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>182</volume>, <fpage>9</fpage>&#x2013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2018.07.008</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="book">
<collab>FDA</collab> (<year>2004</year>). <source>Guidance for industry PAT - a framework for innovative pharmaceutical development, manufacuring, and quality assurance</source>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm070305.pdf">http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm070305.pdf</ext-link>.</comment>
</citation>
</ref>
<ref id="B22">
<citation citation-type="book">
<collab>FDA</collab> (<year>2023</year>). <source>Artificial intelligence in drug manufacturing</source>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.fda.gov/media/165743/download">https://www.fda.gov/media/165743/download</ext-link>.</comment>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feidl</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Luna</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Souquet</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Vogg</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Souquet</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Broly</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Combining mechanistic modeling and Raman spectroscopy for monitoring antibody chromatographic purification</article-title>. <source>Processes</source> <volume>7</volume>, <fpage>683</fpage>. <pub-id pub-id-type="doi">10.3390/pr7100683</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feng</surname>
<given-names>S. Y.</given-names>
</name>
<name>
<surname>Gangal</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chandar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Vosoughi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Mitamura</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>A survey of data augmentation approaches for NLP</article-title>. <source>Find. Assoc. Comput. Linguistics ACL-IJCNLP</source> <volume>2021</volume>, <fpage>968</fpage>&#x2013;<lpage>988</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2021.findings-acl.84</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Feurer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hutter</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Hyperparameter optimization</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>, <fpage>3</fpage>&#x2013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-05318-5_1</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Glassey</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gernaey</surname>
<given-names>K. V.</given-names>
</name>
<name>
<surname>Clemens</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Schulz</surname>
<given-names>T. W.</given-names>
</name>
<name>
<surname>Oliveira</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Striedner</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). <article-title>Process analytical technology (PAT) for biopharmaceuticals</article-title>. <source>Biotechnol. J.</source> <volume>6</volume>, <fpage>369</fpage>&#x2013;<lpage>377</lpage>. <pub-id pub-id-type="doi">10.1002/biot.201000356</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goldrick</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Umprecht</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Zakrzewski</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Cheeks</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>High-throughput Raman spectroscopy combined with innovate data analysis workflow to enhance biopharmaceutical process development</article-title>. <source>Processes</source> <volume>8</volume>, <fpage>1179</fpage>. <pub-id pub-id-type="doi">10.3390/pr8091179</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Goodfellow</surname>
<given-names>I. J.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Courville</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Deep learning</source>. <publisher-loc>Cambridge, MA, USA</publisher-loc>: <publisher-name>MIT Press</publisher-name>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.deeplearningbook.org">http://www.deeplearningbook.org</ext-link>.</comment>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Kuen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Shahroudy</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Shuai</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Recent advances in convolutional neural networks</article-title>. <source>Pattern Recognit.</source> <volume>77</volume>, <fpage>354</fpage>&#x2013;<lpage>377</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2017.10.013</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guindo</surname>
<given-names>M. L.</given-names>
</name>
<name>
<surname>Kabir</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<etal/>
</person-group> (<year>2023</year>). <article-title>Chemometric approach based on explainable AI for rapid assessment of macronutrients in different organic fertilizers using fusion spectra</article-title>. <source>Molecules</source> <volume>28</volume>, <fpage>799</fpage>. <pub-id pub-id-type="doi">10.3390/molecules28020799</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A deep learning just-in-time modeling approach for soft sensor based on variational autoencoder</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>197</volume>, <fpage>103922</fpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2019.103922</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haghi</surname>
<given-names>R. K.</given-names>
</name>
<name>
<surname>P&#xe9;rez-Fern&#xe1;ndez</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Robertson</surname>
<given-names>A. H.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Prediction of various soil properties for a national spatial dataset of Scottish soils based on four different chemometric approaches: a comparison of near infrared and mid-infrared spectroscopy</article-title>. <source>Geoderma</source> <volume>396</volume>, <fpage>115071</fpage>. <pub-id pub-id-type="doi">10.1016/j.geoderma.2021.115071</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hansen</surname>
<given-names>S. K.</given-names>
</name>
<name>
<surname>Skibsted</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Staby</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>A label-free methodology for selective protein quantification by means of absorption measurements</article-title>. <source>Biotechnol. Bioeng.</source> <volume>108</volume>, <fpage>2661</fpage>&#x2013;<lpage>2669</lpage>. <pub-id pub-id-type="doi">10.1002/bit.23229</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Severson</surname>
<given-names>K. A.</given-names>
</name>
<name>
<surname>Love</surname>
<given-names>J. C.</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Swann</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zang</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Opportunities and challenges of real-time release testing in biopharmaceutical manufacturing</article-title>. <source>Biotechnol. Bioeng.</source> <volume>114</volume>, <fpage>2445</fpage>&#x2013;<lpage>2456</lpage>. <pub-id pub-id-type="doi">10.1002/bit.26383</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kingma</surname>
<given-names>D. P.</given-names>
</name>
<name>
<surname>Ba</surname>
<given-names>J. L.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Adam: a method for stochastic optimization</article-title>,&#x201d; in <conf-name>3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings , 1&#x2013;15</conf-name>, <conf-loc>San Diego, CA, USA</conf-loc>, <conf-date>May 7-9, 2015</conf-date>.</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krause</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>G&#xfc;nder</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schulz</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Gruna</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>New active learning algorithms for near-infrared spectroscopy in agricultural applications</article-title>. <source>A. T. - Autom.</source> <volume>69</volume>, <fpage>297</fpage>&#x2013;<lpage>306</lpage>. <pub-id pub-id-type="doi">10.1515/auto-2020-0143</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krizhevsky</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>ImageNet classification with deep convolutional neural networks</article-title>. <source>Commun. ACM</source> <volume>60</volume>, <fpage>84</fpage>&#x2013;<lpage>90</lpage>. <pub-id pub-id-type="doi">10.1145/3065386</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>M.-Y.</given-names>
</name>
<name>
<surname>Ebel</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Paris</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Chauchard</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Guedon</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Marc</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Real-time monitoring of antibody glycosylation site occupancy by <italic>in situ</italic> Raman spectroscopy during bioreactor cho cell cultures</article-title>. <source>Biotechnol. Prog.</source> <volume>34</volume>, <fpage>486</fpage>&#x2013;<lpage>493</lpage>. <pub-id pub-id-type="doi">10.1002/btpr.2604</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Osadchy</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ashton</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Foster</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Solomon</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>S. J.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Deep convolutional neural networks for Raman spectrum recognition: a unified solution</article-title>. <source>Analyst</source> <volume>142</volume>, <fpage>4067</fpage>&#x2013;<lpage>4074</lpage>. <pub-id pub-id-type="doi">10.1039/c7an01371j</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Long</surname>
<given-names>J. R.</given-names>
</name>
<name>
<surname>Gregoriou</surname>
<given-names>V. G.</given-names>
</name>
<name>
<surname>Gemperline</surname>
<given-names>P. J.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>Spectroscopic calibration and quantitation using artificial neural networks</article-title>. <source>Anal. Chem.</source> <volume>62</volume>, <fpage>1791</fpage>&#x2013;<lpage>1797</lpage>. <pub-id pub-id-type="doi">10.1021/ac00216a013</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lundberg</surname>
<given-names>S. M.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S.-I.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>A unified approach to interpreting model predictions</article-title>,&#x201d; in <source>Advances in neural information processing systems 30</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Guyon</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Luxburg</surname>
<given-names>U. V.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wallach</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Fergus</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Vishwanathan</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<publisher-loc>Red Hook, New York, United States</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>), <fpage>4765</fpage>&#x2013;<lpage>4774</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf">https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf</ext-link>
</comment>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mahynski</surname>
<given-names>N. A.</given-names>
</name>
<name>
<surname>Ragland</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Schuur</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>V. K.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Building interpretable machine learning models to identify chemometric trends in seabirds of the north pacific ocean</article-title>. <source>Environ. Sci. Technol.</source> <volume>56</volume>, <fpage>14361</fpage>&#x2013;<lpage>14374</lpage>. <pub-id pub-id-type="doi">10.1021/acs.est.2c01894</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Malek</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Melgani</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Bazi</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>One-dimensional convolutional neural networks for spectroscopic signal regression</article-title>. <source>J. Chemom.</source> <volume>32</volume>, <fpage>1</fpage>&#x2013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1002/cem.2977</pub-id>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Markl</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Warman</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Dumarey</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bergman</surname>
<given-names>E.-L.</given-names>
</name>
<name>
<surname>Folestad</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Z.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Review of real-time release testing of pharmaceutical tablets: state-of-the art, challenges and future perspective</article-title>. <source>Int. J. Pharm.</source> <volume>582</volume>, <fpage>119353</fpage>. <pub-id pub-id-type="doi">10.1016/j.ijpharm.2020.119353</pub-id>
</citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martens</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Stark</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>1991</year>). <article-title>Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy</article-title>. <source>J. Pharm. Biomed. Analysis</source> <volume>9</volume>, <fpage>625</fpage>&#x2013;<lpage>635</lpage>. <pub-id pub-id-type="doi">10.1016/0731-7085(91)80188-F</pub-id>
</citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McHardy</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Antoniou</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Conn</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Palmer</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Augmentation of FTIR spectral datasets using Wasserstein generative adversarial networks for cancer liquid biopsies</article-title>. <source>Analyst</source> <volume>148</volume>, <fpage>3860</fpage>&#x2013;<lpage>3869</lpage>. <comment>Publisher: Royal Society of Chemistry</comment>. <pub-id pub-id-type="doi">10.1039/D3AN00669G</pub-id>
</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mehmood</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Liland</surname>
<given-names>K. H.</given-names>
</name>
<name>
<surname>Snipen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>S&#xe6;b&#xf8;</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>A review of variable selection methods in partial least squares regression</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>118</volume>, <fpage>62</fpage>&#x2013;<lpage>69</lpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2012.07.010</pub-id>
</citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mishra</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Herrmann</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Gan meets chemometrics: segmenting spectral images with pixel2pixel image translation with conditional generative adversarial networks</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>215</volume>, <fpage>104362</fpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2021.104362</pub-id>
</citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mishra</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Passos</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2021a</year>). <article-title>Deep chemometrics: validation and transfer of a global deep near-infrared fruit model to use it on a new portable instrument</article-title>. <source>J. Chemom.</source> <volume>35</volume>, <fpage>e3367</fpage>. <pub-id pub-id-type="doi">10.1002/cem.3367</pub-id>
</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mishra</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Passos</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2021b</year>). <article-title>Realizing transfer learning for updating deep learning models of spectral data to be used in new scenarios</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>212</volume>, <fpage>104283</fpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2021.104283</pub-id>
</citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mishra</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Passos</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2021c</year>). <article-title>Realizing transfer learning for updating deep learning models of spectral data to be used in new scenarios</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>212</volume>, <fpage>104283</fpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2021.104283</pub-id>
</citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mishra</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Passos</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2021d</year>). <article-title>A synergistic use of chemometrics and deep learning improved the predictive performance of near-infrared spectroscopy models for dry matter prediction in mango fruit</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>212</volume>, <fpage>104287</fpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2021.104287</pub-id>
</citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mishra</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Roger</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Marini</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Biancolillo</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Rutledge</surname>
<given-names>D. N.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Parallel pre-processing through orthogonalization (porto) and its application to near-infrared spectroscopy</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>212</volume>, <fpage>104190</fpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2020.104190</pub-id>
</citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nikzad-Langerodi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Lughofer</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Cernuda</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Reischer</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kantner</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Pawliczek</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Calibration model maintenance in melamine resin production: integrating drift detection, smart sample selection and model adaptation</article-title>. <source>Anal. Chim. Acta</source> <volume>1013</volume>, <fpage>1</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1016/j.aca.2018.02.003</pub-id>
</citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Passos</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Mishra</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>An automated deep learning pipeline based on advanced optimisations for leveraging spectral classification modelling</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>215</volume>, <fpage>104354</fpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2021.104354</pub-id>
</citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Passos</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Mishra</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>A tutorial on automatic hyperparameter tuning of deep spectral modelling for regression and classification tasks</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>223</volume>, <fpage>104520</fpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2022.104520</pub-id>
</citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Read</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Riley</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Brorson</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Rathore</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2010a</year>). <article-title>Process analytical technology (pat) for biopharmaceutical products: Part i. concepts and applications</article-title>. <source>Biotechnol. Bioeng.</source> <volume>105</volume>, <fpage>276</fpage>&#x2013;<lpage>284</lpage>. <pub-id pub-id-type="doi">10.1002/bit.22528</pub-id>
</citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Read</surname>
<given-names>E. K.</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>R. B.</given-names>
</name>
<name>
<surname>Riley</surname>
<given-names>B. S.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>J. T.</given-names>
</name>
<name>
<surname>Brorson</surname>
<given-names>K. A.</given-names>
</name>
<name>
<surname>Rathore</surname>
<given-names>A. S.</given-names>
</name>
</person-group> (<year>2010b</year>). <article-title>Process Analytical Technology (PAT) for biopharmaceutical products: Part II. Concepts and applications</article-title>. <source>Biotechnol. Bioeng.</source> <volume>105</volume>, <fpage>285</fpage>&#x2013;<lpage>295</lpage>. <pub-id pub-id-type="doi">10.1002/bit.22529</pub-id>
</citation>
</ref>
<ref id="B59">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rodr&#xed;guez-Rodr&#xed;guez</surname>
<given-names>J. A.</given-names>
</name>
<name>
<surname>Molina-Cabello</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Ben&#xed;tez-Rochel</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>L&#xf3;pez-Rubio</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2021</year>). &#x201c;<article-title>The impact of linear motion blur on the object recognition efficiency of deep convolutional neural networks</article-title>,&#x201d; in <source>Pattern recognition. ICPR international workshops and challenges</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Del Bimbo</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Cucchiara</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sclaroff</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Farinella</surname>
<given-names>G. M.</given-names>
</name>
<name>
<surname>Mei</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Bertini</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing), Lecture Notes in Computer Science</publisher-name>), <fpage>611</fpage>&#x2013;<lpage>622</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-68780-9_47</pub-id>
</citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rolinger</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>R&#xfc;dt</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Monitoring of ultra- and diafiltration processes by Kalman-filtered Raman measurements</article-title>. <source>Anal. Bioanal. Chem.</source> <volume>415</volume>, <fpage>841</fpage>&#x2013;<lpage>854</lpage>. <pub-id pub-id-type="doi">10.1007/s00216-022-04477-7</pub-id>
</citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rolinger</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>R&#xfc;dt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A critical review of recent trends, and a future perspective of optical spectroscopy as PAT in biopharmaceutical downstream processing</article-title>. <source>Anal. Bioanal. Chem.</source> <volume>412</volume>, <fpage>2047</fpage>&#x2013;<lpage>2064</lpage>. <pub-id pub-id-type="doi">10.1007/s00216-020-02407-z</pub-id>
</citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rolinger</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>R&#xfc;dt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Comparison of uv- and Raman-based monitoring of the protein a load phase and evaluation of data fusion by pls models and cnns</article-title>. <source>Biotechnol. Bioeng.</source> <volume>118</volume>, <fpage>4255</fpage>&#x2013;<lpage>4268</lpage>. <pub-id pub-id-type="doi">10.1002/bit.27894</pub-id>
</citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Romann</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Kolar</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tobler</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Herwig</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bielser</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Villiger</surname>
<given-names>T. K.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Advancing Raman model calibration for perfusion bioprocesses using spiked harvest libraries</article-title>. <source>Biotechnol. J.</source> <volume>17</volume>, <comment>e2200184</comment>. <pub-id pub-id-type="doi">10.1002/biot.202200184</pub-id>
</citation>
</ref>
<ref id="B64">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rosebrock</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Deep learning for computer vision with Python (pyimagesearch)</source>.</citation>
</ref>
<ref id="B65">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>R&#xfc;dt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Andris</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schiemer</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Factorization of preparative protein chromatograms with hard-constraint multivariate curve resolution and second-derivative pretreatment</article-title>. <source>J. Chromatogr. A</source> <volume>1585</volume>, <fpage>152</fpage>&#x2013;<lpage>160</lpage>. <pub-id pub-id-type="doi">10.1016/j.chroma.2018.11.065</pub-id>
</citation>
</ref>
<ref id="B66">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>R&#xfc;dt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Brestrich</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Rolinger</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2017a</year>). <article-title>Real-time monitoring and control of the load phase of a protein a capture step</article-title>. <source>Biotechnol. Bioeng.</source> <volume>114</volume>, <fpage>368</fpage>&#x2013;<lpage>373</lpage>. <pub-id pub-id-type="doi">10.1002/bit.26078</pub-id>
</citation>
</ref>
<ref id="B67">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>R&#xfc;dt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Briskot</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2017b</year>). <article-title>Advances in downstream processing of biologics &#x2013; spectroscopy: an emerging process analytical technology</article-title>. <source>J. Chromatogr. A</source> <volume>1490</volume>, <fpage>2</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.chroma.2016.11.010</pub-id>
</citation>
</ref>
<ref id="B68">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sanden</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Suhm</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>R&#xfc;dt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Fourier-transform infrared spectroscopy as a process analytical technology for near real time in-line estimation of the degree of pegylation in chromatography</article-title>. <source>J. Chromatogr. A</source> <volume>1608</volume>, <fpage>460410</fpage>. <pub-id pub-id-type="doi">10.1016/j.chroma.2019.460410</pub-id>
</citation>
</ref>
<ref id="B69">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Santos</surname>
<given-names>V. O.</given-names>
</name>
<name>
<surname>Oliveira</surname>
<given-names>F. C.</given-names>
</name>
<name>
<surname>Lima</surname>
<given-names>D. G.</given-names>
</name>
<name>
<surname>Petry</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Garcia</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Suarez</surname>
<given-names>P. A.</given-names>
</name>
<etal/>
</person-group> (<year>2005</year>). <article-title>A comparative study of diesel analysis by ftir, ftnir and ft-Raman spectroscopy using pls and artificial neural network analysis</article-title>. <source>Anal. Chim. Acta</source> <volume>547</volume>, <fpage>188</fpage>&#x2013;<lpage>196</lpage>. <pub-id pub-id-type="doi">10.1016/j.aca.2005.05.042</pub-id>
</citation>
</ref>
<ref id="B70">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sauer</surname>
<given-names>D. G.</given-names>
</name>
<name>
<surname>Melcher</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mosor</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Walch</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Berkemeyer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Scharl-Hirsch</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Real-time monitoring and model-based prediction of purity and quantity during a chromatographic capture of fibroblast growth factor 2</article-title>. <source>Biotechnol. Bioeng.</source> <volume>116</volume>, <fpage>1999</fpage>&#x2013;<lpage>2009</lpage>. <pub-id pub-id-type="doi">10.1002/bit.26984</pub-id>
</citation>
</ref>
<ref id="B71">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schiemer</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Weggen</surname>
<given-names>J. T.</given-names>
</name>
<name>
<surname>Schmitt</surname>
<given-names>K. M.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>An adaptive soft-sensor for advanced real-time monitoring of an antibody-drug conjugation reaction</article-title>. <source>Biotechnol. Bioeng.</source> <volume>120</volume>, <fpage>1914</fpage>&#x2013;<lpage>1928</lpage>. <pub-id pub-id-type="doi">10.1002/bit.28428</pub-id>
</citation>
</ref>
<ref id="B72">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Selvaraju</surname>
<given-names>R. R.</given-names>
</name>
<name>
<surname>Cogswell</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Das</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Vedantam</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Parikh</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Batra</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Grad-cam: visual explanations from deep networks via gradient-based localization</article-title>. <source>Int. J. Comput. Vis.</source> <volume>128</volume>, <fpage>336</fpage>&#x2013;<lpage>359</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-019-01228-7</pub-id>
</citation>
</ref>
<ref id="B73">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shorten</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Khoshgoftaar</surname>
<given-names>T. M.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A survey on image data augmentation for deep learning</article-title>. <source>J. Big Data</source> <volume>6</volume>, <fpage>60</fpage>. <pub-id pub-id-type="doi">10.1186/s40537-019-0197-0</pub-id>
</citation>
</ref>
<ref id="B74">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>&#x160;trumbelj</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Kononenko</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Explaining prediction models and individual predictions with feature contributions</article-title>. <source>Knowl. Inf. Syst.</source> <volume>41</volume>, <fpage>647</fpage>&#x2013;<lpage>665</lpage>. <pub-id pub-id-type="doi">10.1007/s10115-013-0679-x</pub-id>
</citation>
</ref>
<ref id="B75">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Szegedy</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Sermanet</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Reed</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Anguelov</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <article-title>Going deeper with convolutions</article-title>. In <conf-name>2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1&#x2013;9</conf-name>. <conf-loc>Boston, MA, USA</conf-loc>, <conf-date>7-12 June 2015</conf-date>. <pub-id pub-id-type="doi">10.1109/CVPR.2015.7298594</pub-id>
</citation>
</ref>
<ref id="B76">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Trampu&#x17e;</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tesli&#x107;</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Likozar</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Process analytical technology-based (pat) model simulations of a combined cooling, seeded and antisolvent crystallization of an active pharmaceutical ingredient (api)</article-title>. <source>Powder Technol.</source> <volume>366</volume>, <fpage>873</fpage>&#x2013;<lpage>890</lpage>. <pub-id pub-id-type="doi">10.1016/j.powtec.2020.03.027</pub-id>
</citation>
</ref>
<ref id="B77">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tulsyan</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Garvin</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Undey</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Industrial batch process monitoring with limited data</article-title>. <source>J. Process Control</source> <volume>77</volume>, <fpage>114</fpage>&#x2013;<lpage>133</lpage>. <pub-id pub-id-type="doi">10.1016/j.jprocont.2019.03.002</pub-id>
</citation>
</ref>
<ref id="B78">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tulsyan</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Khodabandehlou</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Schorner</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Coufal</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Undey</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Spectroscopic models for real-time monitoring of cell culture processes using spatiotemporal just-in-time Gaussian processes</article-title>. <source>AIChE J.</source> <volume>67</volume>. <pub-id pub-id-type="doi">10.1002/aic.17210</pub-id>
</citation>
</ref>
<ref id="B79">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>&#xdc;ndey</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ertun</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Mistretta</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Looze</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Applied advanced process analytics in biopharmaceutical manufacturing: challenges and prospects in real-time monitoring and control</article-title>. <source>J. Process Control</source> <volume>20</volume>, <fpage>1009</fpage>&#x2013;<lpage>1018</lpage>. <pub-id pub-id-type="doi">10.1016/j.jprocont.2010.05.008</pub-id>
</citation>
</ref>
<ref id="B80">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Venton</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>P. M.</given-names>
</name>
<name>
<surname>Sundar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>N. A. S.</given-names>
</name>
<name>
<surname>Aston</surname>
<given-names>P. J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Robustness of convolutional neural networks to physiological electrocardiogram noise</article-title>. <source>Philosophical Trans. R. Soc. A Math. Phys. Eng. Sci.</source> <volume>379</volume>, <fpage>20200262</fpage>. <pub-id pub-id-type="doi">10.1098/rsta.2020.0262</pub-id>
</citation>
</ref>
<ref id="B81">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Bowles-Welch</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Yeago</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Roy</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Process analytical technologies in cell therapy manufacturing: state-of-the-art and future directions</article-title>. <source>J. Adv. Manuf. Process.</source> <volume>4</volume>, <fpage>1</fpage>&#x2013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1002/amp2.10106</pub-id>
</citation>
</ref>
<ref id="B82">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Studts</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>In-line product quality monitoring during biopharmaceutical manufacturing using computational Raman spectroscopy</article-title>. <source>mAbs</source> <volume>15</volume>, <fpage>2220149</fpage>. <pub-id pub-id-type="doi">10.1080/19420862.2023.2220149</pub-id>
</citation>
</ref>
<ref id="B83">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wegner</surname>
<given-names>C. H.</given-names>
</name>
<name>
<surname>Hubbuch</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Calibration-free pat: locating selective crystallization or precipitation sweet spot in screenings with multi-way parafac models</article-title>. <source>Front. Bioeng. Biotechnol.</source> <volume>10</volume>, <fpage>1</fpage>&#x2013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.3389/fbioe.2022.1051129</pub-id>
</citation>
</ref>
<ref id="B84">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Woon</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Dai</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Fish</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Tai</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Handagama</surname>
<given-names>W.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Multi-attribute Raman spectroscopy (mars) for monitoring product quality attributes in formulated monoclonal antibody therapeutics</article-title>. <source>mAbs</source> <volume>14</volume>, <fpage>2007564</fpage>. <pub-id pub-id-type="doi">10.1080/19420862.2021.2007564</pub-id>
</citation>
</ref>
<ref id="B85">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wold</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sj&#xf6;str&#xf6;m</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Eriksson</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Pls-regression: a basic tool of chemometrics</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>58</volume>, <fpage>109</fpage>&#x2013;<lpage>130</lpage>. <pub-id pub-id-type="doi">10.1016/S0169-7439(01)00155-1</pub-id>
</citation>
</ref>
<ref id="B86">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Terentis</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Strasswimmer</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Deep learning data augmentation for Raman spectroscopy cancer tissue classification</article-title>. <source>Sci. Rep.</source> <volume>11</volume>, <fpage>23842</fpage>. <comment>Number: 1 Publisher: Nature Publishing Group</comment>. <pub-id pub-id-type="doi">10.1038/s41598-021-02687-0</pub-id>
</citation>
</ref>
<ref id="B87">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Yosinski</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Clune</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fuchs</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Lipson</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Understanding neural networks through deep visualization</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1506.06579">https://arxiv.org/abs/1506.06579</ext-link>.</comment>
</citation>
</ref>
<ref id="B88">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yuanyuan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhibin</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Quantitative analysis modeling of infrared spectroscopy based on ensemble convolutional neural networks</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>181</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2018.08.001</pub-id>
</citation>
</ref>
<ref id="B89">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zeiler</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Fergus</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2013</year>). &#x201c;<article-title>Visualizing and understanding convolutional networks</article-title>,&#x201d; in <source>Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 8689 LNCS</source>, <fpage>818</fpage>&#x2013;<lpage>833</lpage>. <comment>Visualization of imageNet convolutional layers by deconvolutional networks applied to the trained model to project the filter down to pixel space and show e.g the strongest activation patterns</comment>. <pub-id pub-id-type="doi">10.1007/978-3-319-10590-1_53</pub-id>
</citation>
</ref>
<ref id="B90">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Springall</surname>
<given-names>J. S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Barman</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2019a</year>). <article-title>Rapid, quantitative determination of aggregation and particle formation for antibody drug conjugate therapeutics with label-free Raman spectroscopy</article-title>. <source>Anal. Chim. Acta</source> <volume>1081</volume>, <fpage>138</fpage>&#x2013;<lpage>145</lpage>. <pub-id pub-id-type="doi">10.1016/j.aca.2019.07.007</pub-id>
</citation>
</ref>
<ref id="B91">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ying</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019b</year>). <article-title>Deepspectra: an end-to-end deep learning approach for quantitative spectral analysis</article-title>. <source>Anal. Chim. Acta</source> <volume>1058</volume>, <fpage>48</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1016/j.aca.2019.01.002</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>