Introduction

Front. Mater.

Frontiers in Materials

Front. Mater.

2296-8016

Frontiers Media S.A.

1732297

10.3389/fmats.2025.1732297

Original Research

Intelligent pavement moduli back-calculation using an SEM–transformer framework

Wang and Zhao

10.3389/fmats.2025.1732297

Wang

Guozhong

¹ ² * Investigation Methodology Software Writing - original draft Data curation Zhao

Yanqing

³ Conceptualization Funding acquisition Supervision Writing - review and editing

1 School of Infrastructure Engineering, Dalian University of Technology, Dalian, China 2 Shanxi Provincial Transportation Construction Engineering Quality Inspection Center (Co., Ltd.), Taiyuan, China 3 Department of Transportation and Logistics, Dalian University of Technology, Dalian, China

*Correspondence: Guozhong Wang, wangguozhong_41@126.com

28 01 2026

2025

1732297

28 10 2025 17 12 2025 22 12 2025

2026

Wang and Zhao

https://creativecommons.org/licenses/by/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

This study proposes an intelligent back-calculation framework to estimate multilayer pavement elastic moduli from FWD deflection data under realistic measurement uncertainty. A spectral element method (SEM) model is used to simulate transient FWD responses and generate large-scale datasets. A Transformer regression model is trained to map peak deflection basins to layer moduli, considering four noise scenarios (no error, random, systematic, and combined). Baseline models (BPNN, SVR, and XGBoost) are also evaluated for comparison. The proposed SEM–Transformer framework achieves strong accuracy and robustness, with average R²>0.94 and MAPE < 8% across all noise cases, and shows superior performance for the base course under noisy conditions. The results demonstrate a reliable and efficient data-driven feasibility framework to support pavement structural evaluation and future digital-twin-based pavement management.

data-driven modeling FWD intelligent back-calculation intelligent maintenance SEM transformer

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the National Natural Science Foundation of China NSFC (51678114), Urumqi Transportation Research Project (JSKJ201806), and Shanxi Province Transportation Research Project (19-JKKJ-4). The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

section-at-acceptance

Computational Materials Science

1 Introduction

The Falling Weight Deflectometer (FWD) test has become one of the most widely used nondestructive evaluation techniques for assessing pavement structural performance (Elbagalati et al., 2018; Nam et al., 2016; Plati et al., 2016). By applying an impulse load to the pavement surface and recording the resulting deflection data, the FWD test provides valuable information about the mechanical response of pavement layers. However, the measured surface deflections do not directly yield the material properties of each layer; therefore, an inverse analysis, commonly referred to as back-calculation, is required to estimate key parameters such as elastic moduli. Most current studies focus on the surface layer (Shamiyeh et al., 2022; Plati et al., 2024), lacking an overall performance evaluation of the pavement structure including the base layers (Yang et al., 2025). Accurate parameter back-calculation is essential for evaluating the structural integrity, residual life, and load-bearing capacity of pavements, serving as a foundation for performance prediction and maintenance decision-making. With the increasing demand for data-driven and intelligent infrastructure management, the integration of intelligent algorithms into the back-calculation process has emerged as a promising approach to enhance efficiency, robustness, and automation of pavement performance evaluation and smart maintenance systems.

Over the past several decades, numerous back-calculation methodologies have been developed to interpret FWD deflection data and estimate pavement layer moduli. Classical approaches, such as the layered elastic theory (LET) and finite element-based iterative algorithms, have formed the foundation of conventional inverse analysis. Early methods, such as the ILLI-BACK (Ioannides et al., 1989), BISDEF (Bush, 1985), CHEVDEF (Bush and Alexander, 1985) and MODCOMP (Irwin, 1994; Irwin and Szebenyi, 1983) or MODULUS (Scullion et al., 1990) programs, relied heavily on deterministic optimization techniques such as the regression formula based on experience, Newton-Raphson, gradient descent, or least-squares fitting. These methods typically minimize the discrepancy between measured and calculated deflections by repeatedly adjusting material parameters within predefined bounds. Although these traditional approaches have contributed significantly to the advancement of pavement evaluation, they suffer from several inherent limitations. The inverse problem is often ill-posed and highly nonlinear, making the solution sensitive to measurement noise and initial guesses (Jiang et al., 2022; Ullidtz, 1998). Moreover, conventional optimization algorithms tend to converge to local minima, require significant computational effort, and exhibit poor adaptability when dealing with complex pavement structures or large-scale datasets (Coletti et al., 2024; Torquato E Silva et al., 2025). The phenomenon of modulus layering, which undermines the credibility of the assessment results, occurs from time to time (Wang et al., 2024). These shortcomings highlight the need for more robust, efficient, and intelligent back-calculation strategies capable of capturing the nonlinear mapping between deflection responses and pavement material properties.

In recent years, the rapid development of artificial intelligence (AI) and machine learning (ML) techniques has provided new opportunities for solving the complex and nonlinear inverse problems in pavement engineering. Data-driven models, such as artificial neural networks (ANNs) (Khazanovich and Roesler, 1997; Sharma and Das, 2008; Tarefder et al., 2015), BPNN (Meier et al., 1997; Wang and Zhao, 2022), support vector machines (SVMs) (Wang et al., 2023; Zhang et al., 2021), and deep learning architectures (Chen et al., 2025), have been successfully applied to capture the intricate relationships between FWD deflection data and pavement parameters. These intelligent approaches overcome many limitations of traditional iterative methods by learning from large datasets and establishing direct mappings between input and output variables without the need for repeated forward simulations. Among various deep learning frameworks, Transformer-based models have recently attracted growing attention due to their outstanding ability to process sequential data and model long-range dependencies through self-attention mechanisms (Vaswani et al., 2017). Unlike conventional neural networks, Transformers can effectively learn complex spatial-mechanical correlations in multi-layer pavement systems, enabling more accurate and robust modulus back-calculation under uncertain or noisy measurement conditions. Consequently, the integration of Transformer architectures into FWD-based parameter back-calculation represents a promising direction toward automated, data-driven, and intelligent pavement evaluation and maintenance.

Beyond pavement engineering, physics-informed and data-driven inverse analysis based on indirect structural responses has been extensively investigated in broader civil and structural engineering domains. In the context of acoustic emission (AE)–based damage identification, deep residual learning has been successfully applied to AE source localization in steel–concrete composite slabs, demonstrating strong capability in learning inverse mappings under complex wave propagation conditions (Zhou et al., 2024b). AE-based data-driven approaches have also been employed for damage pattern recognition in corroded reinforced concrete beams strengthened with CFRP anchorage systems (Pan et al., 2023), as well as for localized corrosion-induced damage monitoring of large-scale RC piles in marine environments (Zheng et al., 2020), highlighting the effectiveness of deep learning in extracting damage-sensitive features from high-dimensional AE signals. In parallel, hybrid physics–data-driven frameworks that integrate numerical modeling with deep learning have gained increasing attention. Representative examples include a hybrid FEM and 1D-CNN methodology for structural damage detection in typical high-pile wharves (Zhou et al., 2022). Moreover, vibration-based damage localization frameworks combining ambient vibration measurements with multi–1D CNN ensemble models have been proposed and validated on large-scale reinforced concrete pedestrian bridges (Zhou et al., 2025b), demonstrating the scalability of data-driven inverse identification methods to complex, real-world structures. At a more fundamental level, lattice modeling approaches have been developed to simulate complete AE waveforms and fracture-induced AE wave propagation in concrete, providing physically interpretable forward models for inverse analysis (Zhou et al., 2024a; Zhou et al., 2025a).

Although these studies focus on different sensing modalities (AE or vibration) and structural systems, they share a common methodological paradigm with the present work: leveraging physics-based models to generate informative data and employing deep learning architectures to learn inverse mappings from indirect measurements to internal structural states. The proposed SEM–Transformer framework follows this paradigm in the context of pavement engineering by integrating high-fidelity numerical simulations with attention-based learning for FWD-based modulus back-calculation.

With the advancement of sensing technologies and the increasing availability of large-scale pavement monitoring data, data-driven pavement management and intelligent maintenance systems have become an emerging trend in modern infrastructure engineering (Golmohammadi et al., 2025; Li et al., 2025; Lu et al., 2025). By integrating FWD test results with other sensing and inspection data, it is now possible to continuously evaluate pavement health conditions, predict performance degradation, and optimize maintenance scheduling through automated analytical frameworks. In this context, intelligent back-calculation serves as a crucial component of smart pavement management, enabling real-time structural assessment and decision support. Leveraging powerful deep learning models such as Transformers, the back-calculation of pavement mechanical parameters can be achieved with high efficiency and accuracy, supporting predictive maintenance and life-cycle performance optimization. Therefore, this study aims to develop a Transformer-based intelligent back-calculation framework for modulus back-calculation of pavements, providing a foundation for data-driven performance evaluation and intelligent pavement operation and maintenance.

The remainder of this paper is organized as follows. Section 2 introduces the overall methodology, including the spectral element method (SEM) for forward simulation, the Transformer-based intelligent back-calculation framework, and the evaluation metrics employed to assess model performance. Section 3 describes the procedures of data collection, extraction, and preprocessing, emphasizing the introduction of random and systematic measurement errors to simulate realistic field conditions. Section 4 presents the results and discussion, where the Transformer-based back-calculation model is comprehensively evaluated under four noise scenarios (no measurement error, random error, systematic error, and combined random–systematic error) and benchmarked against representative machine learning models (BPNN, SVR, and XGBoost), followed by a comparative discussion of robustness and generalization, an assessment of the physical plausibility of the predicted moduli, and considerations regarding potential overfitting. Finally, Section 5 summarizes the main findings of this study and outlines potential directions for future research in intelligent pavement performance evaluation and maintenance.

It should be emphasized that the present study focuses on a numerical feasibility investigation, in which both training and testing datasets are generated using a validated spectral element method (SEM). Although synthetic noise is introduced to approximate typical measurement uncertainty in Falling Weight Deflectometer (FWD) tests, no field FWD dataset is directly used for model validation at this stage. Consequently, the primary objective of this work is to evaluate the learning capability, robustness, and stability of the proposed SEM–Transformer framework under controlled yet realistic conditions, rather than to claim immediate applicability to in-service pavements.

2 Methodology

The overall workflow of the proposed intelligent back-calculation system integrates three key components: 1) numerical simulation of pavement responses using the SEM, 2) machine learning-based modulus prediction using the Transformer architecture, and 3) performance evaluation through multiple statistical metrics. The methodology is illustrated in Figure 1.

FIGURE 1

Flowchart of the intelligent back-calculation methodology for pavement structure parameter prediction.

Flowchart illustrating a three-phase process. Phase 1: Data collection and preprocessing involve numerical simulation, feature selection, error handling, and data preprocessing, splitting data into training and testing sets. Phase 2: Model construction and training use a transformer model with input embedding, multi-head attention, and output embedding. Phase 3: Model assessment and result analysis assess performance using MAE, MSE, RMSE, R-squared, and MAPE metrics, with a graph comparing predicted versus actual values.

2.1 SEM

The SEM is employed to simulate the pavement surface deflection response under FWD loading. Compared with conventional finite element or finite difference schemes, the SEM achieves high accuracy by interpolating the field variables with high-order spectral shape functions within each element and by describing the distributed mass inertia exactly. In this study, a one-dimensional axisymmetric SEM formulation is adopted following Zhao et al. (2015) and Cao et al. (2020). The layered pavement structure is modeled as a stack of homogeneous, isotropic layers characterized by thickness, elastic modulus, Poisson’s ratio, and density, resting on a semi-infinite subgrade.

The governing equations of motion for the axisymmetric elastic medium are given in Equation 1: λ t + μ t ∇ ∇ · u + μ ∇ 2 u = ρ u ¨ (1) where u represents the displacement vector composed of the radial component u and the vertical component w , u ¨ refers to the acceleration vector, λ t presents Lame’s constant for the material, while μ t represents the shear modulus. ∇ denotes the gradient differential operator, ∇ · u represents the divergence of u , ∇ 2 u represents the Laplacian of u , and ρ signifies the material density.

In the vertical direction, each pavement layer is discretized by a 2-node axisymmetric spectral layer element, while the semi-infinite subgrade is represented by a 1-node throw-off element that conducts energy out of the system. Each node carries two degrees of freedom (radial and vertical displacements). Within a spectral element, the displacement field is interpolated by high-order spectral shape functions constructed from Lagrange polynomials passing through Gauss–Lobatto–Legendre points, so that one spectral element per physical layer is sufficient and no further mesh refinement is required through the thickness.

In the radial direction, the domain is discretized by a graded mesh that is refined beneath and near the FWD loading area and gradually coarsened toward an outer truncation radius. This radius is chosen sufficiently large such that the computed surface vibration decays to negligible levels, which avoids spurious reflections from the lateral boundary. Axisymmetry is enforced at the centerline r = 0 , and the pavement surface is traction-free outside the circular loading area where the FWD pressure is applied. At the bottom of the truncated domain, vertical displacement is fixed while radial displacement continuity is maintained through the throw-off spectral element to mimic the semi-infinite half-space.

The spatial discretization leads to a semi-discrete system of second-order ordinary differential equations in time. This system is advanced using an explicit central-difference time integration scheme, with the time step selected according to the standard SEM stability criterion based on the smallest element size and the maximum wave speed. Since all layers are modeled as linear elastic materials and no additional material or Rayleigh damping is introduced, the computed response corresponds to the undamped elastic wave propagation problem. The resulting SEM formulation has been validated in previous studies (Cao et al., 2020; Zhao et al., 2015), confirming its accuracy and stability for simulating pavement surface deflection histories. The peak values of the computed surface deflection basins at the sensor locations are then extracted and used as input features for the Transformer-based learning model described in Section 2.2.

2.2 Intelligent back-calculation methodology

It should be clarified that the term physics-informed in this study refers to the use of SEM-based forward simulations to generate physically consistent training and testing datasets, rather than to the explicit enforcement of physical laws or inequality constraints within the neural network architecture itself. The Transformer model is trained as a data-driven regression mapping from deflection basins to layer elastic moduli and does not impose hard constraints such as modulus ordering or monotonicity during learning. In this study, an intelligent back-calculation framework based on the Transformer architecture is established to predict the elastic modulus of pavement layers from measured deflection data. The Transformer model, originally proposed by Vaswani et al. (2017), has demonstrated exceptional performance in capturing long-range dependencies through its self-attention mechanism, making it well-suited for modeling complex nonlinear relationships in pavement structural systems.

2.2.1 Overall structure of the transformer

As shown in Figure 2, the proposed model is designed as an Encoder-only Transformer architecture specifically optimized for regression based back-calculation tasks. The input vector, composed of multiple deflection peaks extracted from FWD data, is first transformed into a high-dimensional feature representation through an input embedding layer, allowing the model to capture latent spatial and mechanical patterns. Within the Transformer encoder, each layer consists of two fundamental components: the Multi-Head Self-Attention (MHSA) mechanism and the feed-forward network (FFN). The MHSA module enables the model to learn global correlations among deflection points by dynamically computing the weighted relevance between all positions in the input sequence, effectively capturing inter peak dependencies that reflect subsurface mechanical interactions. The subsequent FFN applies nonlinear transformations to further refine and abstract the learned features, thereby enhancing the model’s expressive capability. Each sublayer is enclosed within a residual connection and layer normalization (Add and Norm), which collectively stabilizes the training process, prevents gradient degradation, and accelerates convergence. Finally, the encoder output is passed through a regression head, which maps the learned feature representations to the predicted modulus values of each pavement layer, enabling accurate and interpretable estimation of structural parameters for intelligent pavement evaluation.

FIGURE 2

Overall architecture of the Transformer-based intelligent back-calculation model.

Diagram of a Transformer model architecture with Encoder and Decoder blocks. The Encoder includes Input Embedding, Positional Encoding, Multi-Head Attention, Add & Norm, and Feed Forward layers. The Decoder contains Masked Multi-Head Attention, Add & Norm, Feed Forward, Linear, ReLU, and Softmax layers. Arrows indicate data flow, and components are color-coded.

An encoder-only Transformer architecture is adopted in this study because the back-calculation task involves a fixed-length regression mapping from FWD deflection basins to elastic moduli, rather than a sequence-to-sequence or generative problem. The encoder-only design is therefore sufficient and computationally efficient. Compared with simpler architectures such as one-dimensional convolutional neural networks or attention-augmented multilayer perceptrons, the Transformer encoder enables direct modeling of global, nonlocal interactions among all deflection sensors through self-attention, without imposing predefined receptive fields or handcrafted feature aggregation rules.

2.2.2 Multi-head attention mechanism

The core of the Transformer lies in its multi-head attention mechanism, as shown in Figure 3. For each attention head, the input sequence is linearly projected into three matrices: the Query (Q), Key (K), and Value (V). The scaled dot-product attention is computed as Equation 2: Attention Q , K , V = Softmax Q K T d k V (2) where d k denotes the dimensionality of the key vectors. Multiple attention heads operate in parallel to capture diverse feature interactions, and their outputs are concatenated and linearly transformed. This mechanism allows the model to learn complex dependencies between deflection measurements and corresponding modulus responses at multiple representation levels (Dosovitskiy et al., 2020).

FIGURE 3

Structure of the multi-head attention mechanism.

Diagram of multi-head attention architecture in transformers. It includes Scaled Dot-Product Attention with components: MatMul, Scale, optional Mask, and SoftMax, followed by MatMul. Multiple heads process linear transformations of input vectors V, K, and Q, and results are concatenated. Output passes through a Linear layer.

2.2.3 Application to pavement modulus back-calculation

The developed Transformer model is employed to perform back-calculation of pavement layer elastic moduli from FWD deflection data. It learns a direct mapping between surface deflection basins (either measured in the field or generated through numerical simulations) and the elastic moduli of individual pavement layers. Through supervised training on paired datasets of deflection responses and known material parameters, the model captures the complex nonlinear relationships between surface mechanical behavior and the internal structural characteristics of the pavement system. Unlike conventional iterative back-calculation algorithms that rely heavily on initial guesses and are prone to convergence to local minima, the Transformer exploits a data-driven learning mechanism and attention-based architecture to achieve high generalization performance across diverse pavement configurations, while also enabling efficient parallel computation and substantially reducing computational time. In addition, by incorporating global contextual dependencies among sensor readings, the Transformer exhibits strong robustness to measurement noise and maintains prediction stability under uncertain or imperfect data conditions. These characteristics make it a powerful and intelligent approach for accurately estimating layer moduli in multilayer pavement systems, thereby providing a solid foundation for automated and reliable pavement evaluation and maintenance decision-making.

The nonlinear mapping from the FWD deflection basin to the layer elastic moduli is realized by an encoder-only Transformer architecture. The model input is the vector of nine peak surface deflections D J 9 i = 1 , measured at radial distances r i = 0 , 20 , 30 , 50 , 80 , 110 , 140 , 170 , 200 mm from the load center. To better exploit both the magnitude and spatial layout of these measurements, the raw deflections are first processed by a dedicated embedding module, denoted as PeakEmbed. In this module, each scalar peak deflection is projected from R into a d model = 128 -dimensional feature space by a fully connected layer. A sinusoidal positional encoding, similar to that used in the original Transformer formulation, is then added to retain the ordered sensor index information along the radial direction. Furthermore, the physical sensor spacing is explicitly encoded through a small multilayer perceptron (MLP) that maps the normalized sensor distance (in meters) through a 1–128–128 MLP with ReLU activation. The output of this distance MLP is added elementwise to the peak-value embedding, so that the final token representation accounts for both the measured deflection and its radial location. This procedure yields an input sequence of length M = 9 with feature dimension d model = 128 .

On top of the PeakEmbed module, we employ an encoder-only Transformer with N enc = 2 identical encoder layers. Each encoder layer is implemented using the standard TransformerEncoderLayer in PyTorch with batch_first = True. The multi-head self-attention (MHSA) block uses N head = 4 attention heads, leading to key and value dimensions d k = d v = d model / N head = 32 for each head. The position-wise feed-forward network (FFN) in each encoder layer consists of two fully connected layers with an intermediate dimension d ff = 256 and a ReLU nonlinearity. Residual connections, layer normalization, and dropout are applied around both the MHSA and FFN sublayers following the default PyTorch implementation, with a dropout rate of 0.1 in each encoder layer. To obtain a compact representation of the entire deflection basin, a learnable “[CLS]” token of size 1 × 1 × 128 is prepended to the embedded sequence. The CLS token is introduced as a learnable global representation to aggregate information from all sensor tokens through self-attention. Although the input deflection vector has a fixed length, the CLS-based aggregation provides a principled alternative to fixed pooling operations (e.g., mean or max pooling) and allows the model to adaptively learn the relative contribution of each deflection measurement to the inverse mapping. The concatenated sequence (CLS token plus nine sensor tokens) is passed through the Transformer encoder, and only the output corresponding to the CLS position is retained as a global feature vector. This global vector is then mapped to the target elastic moduli through a regression head comprising a two-layer MLP: a fully connected layer from 128 to 128 units with ReLU activation, followed by a linear layer from 128 to 3 units. The three outputs correspond to the standardized (Z-score) elastic moduli E 1 , E 2 , E 3 of the surface layer, base layer, and subgrade, respectively.

Prior to training, both the input peak deflections and the output moduli are standardized by Z-score normalization using statistics (mean and standard deviation) computed solely from the training subset. The Transformer is trained to minimize the Smooth L1 loss (Smooth L1 Loss with β = 0.5 ) between the predicted and true standardized moduli, which provides a compromise between the robustness of L1 loss and the sensitivity of L2 loss to small errors. The optimizer is AdamW with an initial learning rate of 1.0 × 10 − 3 and a weight decay of 1.0 × 10 − 4 . Training is performed for 90 epochs with a mini-batch size of 256, on a GPU when available or otherwise on a CPU. During training, we monitor the average training loss in the standardized space as well as the mean absolute error (MAE) on the held-out test set to verify convergence and stability.

The number of encoder layers and attention heads was selected based on empirical trade-offs between model capacity, training stability, and overfitting risk. Given the relatively small number of input sensors (nine deflection measurements) and the synthetic nature of the dataset, deeper or wider Transformer configurations were found to offer limited performance gains while increasing computational cost and susceptibility to overfitting. Accordingly, a compact configuration with two encoder layers and four attention heads was adopted as a balanced and reproducible design for the present feasibility study.

All key architectural and training hyperparameters of the Transformer model are summarized in Table 1 for ease of reference and reproducibility.

TABLE 1

Transformer architecture and training hyperparameters used for the back-calculation of multilayer pavement elastic moduli from FWD deflection data.

Item	Description	Value/setting
Input dimension	Number of FWD peak deflections in each sample	9 sensors
Sensor offsets	Radial distances of FWD sensors from load center	0, 20, 30, 50, 80, 110, 140, 170, 200 mm
Input embedding (PeakEmbed)	Linear projection from scalar peak value to model space	Linear (1 → 128)
Positional encoding	Sinusoidal positional encoding added to each sensor token	Sinusoidal, max length ≥64
Distance embedding	MLP applied to normalized sensor distance (in m) and added to token features	MLP: 1 → 128 → 128, ReLU
Model dimension	Feature dimension of all tokens	d model = 128
Encoder type	Encoder-only transformer (TransformerEncoder)	PyTorch implementation
Number of encoder layers	Stacked self-attention + FFN blocks	N enc = 2
Number of attention heads	Heads in multi-head self-attention	N head = 4
Key/value dimension per head	Dimension of key and value vectors	d k = d v = 32
Feed-forward dimension	Hidden size of position-wise FFN	d ff = 256
FFN activation	Nonlinear activation in FFN	ReLU
Dropout in encoder	Dropout rate in each encoder layer	0.1 (PyTorch default)
Normalization	Layer normalization with residual connections	Post-attention and post-FFN
CLS token	Learnable global token prepended to sequence	1 × 1 × 128 parameter
Regression head	MLP mapping CLS representation to moduli	Linear (128 → 128) + ReLU + Linear (128 → 3)
Output dimension	Number of target variables	3 (E1, E2, E3)
Normalization of inputs/outputs	z-score normalization using training statistics	Mean and std from training set only
Loss function	Objective for regression	Smooth L1 (β = 0.5)
Optimizer	Optimization algorithm	AdamW
Initial learning rate	Base learning rate for AdamW	1.0 × 10 − 3
Weight decay	L2 regularization through AdamW	1.0 × 10 − 4
Batch size	Mini-batch size during training	256
Number of epochs	Training iterations over the dataset	90
Computation device	Hardware used for training	GPU if available, otherwise CPU

2.3 Evaluation metrics

The model’s predictive performance is evaluated using five standard statistical metrics: MAE, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and the Coefficient of Determination (R²). These metrics assess both the magnitude and distribution of prediction errors, providing a comprehensive evaluation of model accuracy.

2.3.1 MAE

The MAE calculates the average magnitude of the absolute differences between predicted values and observed values. It is a linear score, meaning all individual differences are weighted equally in the average. MAE (Equation 3) avoids the issue of error cancellation and thus accurately reflects the actual size of the prediction errors (Wudil et al., 2024). M A E = 1 n ∑ i = 1 n y ^ i − y i (3)

2.3.2 MSE

The MSE is a statistical metric used to evaluate the accuracy of a model. It is calculated by taking the average of the squared differences between the actual and predicted values (Goodfellow et al., 2016). MSE (Equation 4) is sensitive to outliers—since large deviations between predictions and true values become even larger after squaring—but this property also allows it to effectively reflect the overall distribution of prediction errors. M S E = 1 n ∑ i = 1 n y ^ i − y i 2 (4)

2.3.3 RMSE

The RMSE (Equation 5) represents the sample standard deviation of the differences—known as residuals—between predicted and observed values (Bypour et al., 2024). It indicates the degree of dispersion of the sample errors. In practical measurements, the number of observations n is always limited, and the true value can only be approximated by the most reliable (best-estimated) value. R M S E = 1 n ∑ i = 1 n y ^ i − y i 2 (5)

2.3.4 MAPE

The MAPE (Equation 6) is a statistical metric used to measure the degree of error between predicted and actual values (Chen et al., 2024). It is calculated by taking the absolute difference between the predicted and actual values as a percentage of the actual value, and then averaging these percentages to reflect the overall accuracy of the predictions. M A P E = 1 n ∑ i = 1 n y ^ i − y i y i × 100 % (6)

2.3.5 R<sup>2</sup>

The R 2 (Equation 7) is a statistical measure based on the decomposition of the total sum of squares, used to evaluate how well a regression model fits the observed data. It represents the proportion of variance in the dependent variable that is explained by the regression model (Draper and Smith, 1998). Therefore, the higher the R 2 value, the better the model fits the data. R 2 = 1 − ∑ i = 1 n y ^ i − y i 2 ∑ i = 1 n y i − y ¯ 2 (7) where y ^ i represents the predicted value, y i denotes the actual value, y ¯ the average value of the actual values, and n indicates the number of samples. Smaller values of MAE, MSE, RMSE, and MAPE indicate better predictive performance of the model, and the closer the value of R² is to 1, the better the performance of the model is.

3 Data collection, extraction and preprocessing

This section presents the complete workflow of data preparation for the intelligent back-calculation model, as illustrated in Figure 4. The entire process integrates four major stages: numerical simulation, feature extraction, noise processing, and data preprocessing.

FIGURE 4

Flowchart of Data collection, extraction and preprocessing.

Diagram showing an applied load of 0.7 MPa on a pavement with measurement points labeled D1 to D9. A graph compares deflection peaks from numerical simulations and SEM in millimeters. The workflow includes data collection, error handling, feature selection, and database management. Various error scenarios are listed: none, random error, systematic error, and a combination of both.

Firstly, a three-layer pavement system comprising surface course, base course, and subgrade is modeled, and FWD loading is applied to reproduce field testing conditions. The dynamic responses of the pavement structure are computed using the SEM, which offers high precision and computational efficiency for transient wave propagation in layered media.

Secondly, the simulated deflection time histories at multiple measurement points are analyzed to obtain the peak deflection values, which serve as representative features reflecting the stiffness characteristics of the pavement layers. These extracted features are paired with their corresponding layer moduli to form the raw dataset.

Thirdly, in order to account for possible measurement uncertainty and improve model generalization, noise processing is introduced. Synthetic noise consistent with the statistical properties of field measurements is added to part of the dataset, simulating realistic variability in FWD test data.

Finally, the dataset undergoes preprocessing steps, including data normalization (via z-score standardization), sample shuffling, and train-test partitioning. These operations ensure that all features are dimensionally comparable and that the Transformer model can achieve stable convergence during training.

Overall, this systematic data preparation framework establishes a solid foundation for the subsequent intelligent back-calculation analysis, ensuring both the physical realism of the inputs and the statistical robustness of the learning process.

3.1 Data collection and extraction

To train and evaluate the Transformer-based intelligent back-calculation model, a large-scale synthetic dataset was established through numerical simulations using the validated SEM model. The SEM approach provides high computational efficiency and accuracy for solving dynamic response problems of layered pavement systems, making it particularly suitable for simulating FWD tests.

The modeled pavement structure consists of three primary layers: a surface course, a base course, and a subgrade. Each layer is characterized by its elastic modulus, Poisson’s ratio, and thickness, as summarized in Table 2. In the SEM simulations, pavement layers are modeled as linear elastic materials, and light viscous damping is introduced at the dynamic response level, following standard practice in SEM-based dynamic analysis of pavements (Al-Khoury et al., 2001; 2002). The mechanical parameters were randomly combined within reasonable engineering ranges to ensure adequate representation of various pavement conditions, resulting in a total of 20592 combinations of pavement structures.

TABLE 2

Structure information of asphalt pavements.

Layer	Thickness (cm)	Modulus (MPa)	Poisson’s ratio	Density (kg/m³)	Damping ratio
Surface course	10, 15, 20	2500, 5000, 7500, 10000, 12500, 15000, 17500, 20000, 22500, 25000, 27500, 30000	0.30	2400	0.02
Base course	20, 30, 40, 50	4000, 5500, 7000, 8500, 10000, 11500, 13000, 14500, 16000, 17500, 19000	0.25	2200	0.02
Subgrade	-	40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100	0.40	1600	0.05

During the simulation, the FWD test applies an impulsive load to the pavement surface in the form of a half-sine pulse with a peak pressure of 0.7 MPa and a duration of 25 ms. The loading plate radius is set to 15 cm, following standard FWD testing procedures. The pavement response was monitored at nine measurement points located at 0, 20, 30, 50, 80, 110, 140, 170, and 200 cm from the load center, corresponding to the typical sensor arrangement used in field testing.

For each simulation case, the SEM model outputs the deflection time history at all nine sensors. The peak deflection values were extracted from these time histories using an automated peak detection algorithm, representing the maximum surface displacement under the dynamic load. These peak values form the input features for the back-calculation model, while the corresponding elastic moduli of the three pavement layers serve as the output targets. Consequently, a comprehensive dataset with nine input variables (deflection peaks) and three output variables (layer moduli) was constructed, containing 20592 samples in total. This dataset was subsequently normalized and divided into training and testing subsets for the Transformer model development and performance evaluation.

3.2 Data preprocessing 3.2.1 Noise processing

From each simulated time-domain response, the peak deflection values are extracted as input features. To account for potential measurement imperfections in real-world applications, random and systematic errors are introduced into the simulated deflection data to simulate realistic noise conditions. The corresponding error assumptions are listed in Table 2.

From each simulated time-domain response, the peak deflection values are extracted as input features representing the pavement structural stiffness. However, the idealized numerical simulations do not fully reflect the uncertainties that commonly occur in field FWD measurements, such as sensor inaccuracies, temperature effects, and load plate contact variations. To account for these potential measurement imperfections and to enhance the model’s robustness, different error treatment strategies were implemented during data preprocessing, as summarized in Table 3.

TABLE 3

Specific information of measurement errors.

Measurement errors	Specific assumed values
Random error ε i r	ε i r ∼ N 0 , 2
Systematic error ε i s	− 4 % ≤ ε i s ≤ + 4 %

Four distinct data processing scenarios were designed to assess the model’s sensitivity to measurement noise:

3.2.1.1 Case ①: No error treatment

The original simulated deflection data are used directly without any modification. This serves as the baseline condition, representing an ideal, noise-free environment where the inverse model is trained purely on clean data.

3.2.1.2 Case ②: Random error only

In this scenario, only random errors are introduced to each deflection value to simulate stochastic disturbances arising from equipment fluctuations or environmental noise. The random error term, denoted as ε i r , follows a Gaussian distribution ε i r ∼ N 0 , 2 , meaning the noise has a zero mean and a standard deviation of 2 μm, consistent with typical FWD sensor resolution limits (Stubstad et al., 2000).

3.2.1.3 Case ③: Systematic error only

To represent bias-type deviations caused by sensor miscalibration, temperature drift, or uneven load application, a systematic error term ε i s is applied uniformly across all sensors in each test. The error magnitude is randomly selected within the range from 4% to +4%, implying either underestimation or overestimation of the deflection amplitude by the entire measurement system.

3.2.1.4 Case ④: Combined random and systematic errors

In the most realistic scenario, both error components are simultaneously introduced. The final deflection at measurement point i is expressed as D i * = D i 1 + ε i s + ε i r where D i is the true simulated deflection, ε i s the systematic error, and ε i r the random error. This condition closely emulates the uncertainty characteristics encountered in actual FWD testing, where both instrument bias and stochastic fluctuations coexist.

Through this four-level noise injection strategy, the constructed datasets enable comprehensive evaluation of the Transformer model’s robustness, generalization capability, and resistance to measurement uncertainty, ensuring its applicability to real-world pavement deflection data.

It should be emphasized that the present dataset is fully generated from numerical simulations, and the introduced noise scenarios only approximate, rather than fully reproduce, the complexity of real FWD measurements. In practice, measurement errors may exhibit spatial correlation among sensors, time-dependent drift, temperature-induced bias, and coupling effects between sensors and pavement surface conditions. These factors are not explicitly modeled in the current study. Therefore, the adopted noise model should be regarded as a first-order representation designed to test model robustness, rather than a comprehensive description of field measurement uncertainty.

3.2.2 Z-score normalization

The deflection data and modulus labels are standardized using Z-score normalization, ensuring zero mean and unit variance. This normalization helps maintain numerical stability and prevents feature dominance caused by scale differences. Z-score normalization is a method that transforms data into a distribution with zero mean and unit variance by subtracting the mean and dividing by the standard deviation. Before model training, both the input features (FWD peak deflections) and output variables (elastic moduli of the three pavement layers) were standardized using the Z-score method to eliminate dimensional differences and improve numerical stability during training. Let the training samples be denoted as x i j and y i j , with their respective means and standard deviations represented by μ j and σ j . The standardization formulas are as follows: x i j ′ = x i j − μ j σ j (8) y i j ′ = y i j − μ j Y σ j Y (9)

Both model training and prediction are performed in the standardized space. The means μ x j , μ y k and standard deviations σ x j , σ y k used in Equations 8–10 are computed exclusively from the training subset. The same statistics are then applied to normalize the validation and test subsets, and to inverse-transform the predicted outputs back to physical units (MPa). This protocol ensures that no information from the validation or test data “leaks” into the training process through normalization, and that the reported performance truly reflects the model’s generalization capability. After prediction, the results are transformed back to the physical units (MPa) through an inverse transformation as follows: y ^ i j = y ^ i j ′ σ j Y + μ j Y (10) where x i j represents the original value of the j _th input feature for the i _th sample (for example, the peak deflection measured by the j _th sensor); y i j denotes the original target value of the j _th output variable for the i -th sample (i.e., the elastic modulus of the corresponding layer, in MPa). μ j and σ j are the mean and standard deviation of the j _th input feature in the training set, respectively. μ j Y and σ j Y denote the mean and standard deviation of the j -th output variable (modulus) in the training set. x i j ′ represents the dimensionless value obtained by applying Z-score normalization to the input feature x i j ; y i j ′ is the normalized value of the output variable y i j ; y ^ i j ′ is the model’s predicted output in the standardized space; and y ^ i j represents the actual predicted value after inverse standardization (in MPa). The notation “ · ′ ” indicates a standardized variable, while the “^” symbol denotes a predicted value. To avoid numerical instability, a lower bound correction is applied to very small standard deviations, defined as σ j = max ⁡ σ j , 10 − 8 . Both training and testing data are standardized using the statistics ( μ j , σ j ) computed from the training set to prevent data leakage.

For each noise scenario, the synthetic SEM-based dataset of noiseless input–output pairs is first generated and then randomly partitioned into two mutually exclusive subsets, with 70% of the samples used for training and 30% reserved for testing, using a fixed random seed to ensure reproducibility (Section 3.1). The clean deflection basins are computed from the SEM model and, after this train–test split, are corrupted by the prescribed noise models: random measurement noise is introduced by adding zero-mean Gaussian noise to each sensor deflection with a standard deviation proportional to the local peak deflection magnitude, whereas systematic noise is represented by a constant bias term applied to all sensors; a combined-noise case is constructed by superposing the random and systematic components. Prior to training, both the input peak deflections and the output elastic moduli are standardized via Z-score normalization. All standardization operations are performed in a strictly training-only manner to avoid data leakage: the means and standard deviations of the input and output variables are computed from the training subset only and then used to normalize both the training and test data, as well as to inverse-transform the model predictions back to physical units (MPa). The normalized training subset is finally fed to the Transformer model described in Section 2.2, and model training and loss computation are carried out entirely in this standardized space.

4 Results and discussion

This section presents a comprehensive evaluation of the Transformer-based intelligent back-calculation model under four distinct noise conditions: (1) no measurement error, (2) random error, (3) systematic error, and (4) combined random and systematic error. Each condition corresponds to a realistic field scenario, reflecting the influence of measurement imperfections in FWD testing. Model performance was quantitatively assessed using the MAE, MSE, RMSE, MAPE, and R². These metrics were calculated for each pavement layer (surface course, base course, and subgrade) as well as averaged over all layers to provide a holistic understanding of model behavior.

4.1 Model performance without measurement error

The benchmark case, without any added noise, represents the ideal data condition for model evaluation. As shown in Figure 5; Table 4, the Transformer model demonstrates excellent agreement between predicted and true elastic moduli for all pavement layers. The predicted points in Figure 5 closely follow the 1:1 reference line, indicating strong consistency across the entire modulus range.

FIGURE 5

Comparison between predicted and true moduli for all pavement layers with no measurement error: (a) surface course modulus E₁; (b) base course modulus E₂; (c) subgrade modulus E₃.

Three scatter plots (a, b, c) compare predicted versus actual values of E1, E2, and E3 in MPa for training and testing sets. Each plot includes a diagonal line y = x, histograms on the axes, and a legend indicating cyan for training and magenta for testing data. There is a positive correlation in each plot.

TABLE 4

Model performance evaluation on the test dataset with no measurement error.

Metric	Surface course	Base course	Subgrade	Average
MAE (MPa)	1284.68	847.39	0.79	710.95
MSE (×10³ MPa²)	3658995.25	1348095.88	1.20	1669030.78
RMSE (MPa)	1912.85	1161.08	1.10	1025.01
MAPE (%)	8.76	7.88	1.14	5.93
R²	0.95	0.94	1.00	0.96

Quantitatively, the average MAE reaches 710.95 MPa, and the MAPE remains as low as 5.93%, signifying a high prediction accuracy. The average R² of 0.96 further confirms that the model captures over 96% of the variance in the true modulus values.

Among individual layers, the subgrade modulus exhibits a very high statistical correlation with the reference values (R² close to 1.00) and relatively small absolute errors (MAE = 0.79 MPa), reflecting its dominant influence on the overall deflection basin under the considered parameter ranges and sensor configuration. Conversely, the surface course exhibits slightly larger deviations due to its higher stiffness and greater sensitivity to small perturbations in deflection measurements.

These results highlight the Transformer model’s powerful feature extraction ability and its capacity to establish a robust nonlinear mapping between deflection patterns and pavement layer moduli under ideal conditions.

4.2 Model performance under random error

To emulate random fluctuations in field measurements, Gaussian noise with zero mean and specified variance was introduced into the input data. The corresponding results are illustrated in Figure 6; Table 5. Remarkably, even in the presence of random noise, the model maintains a high level of predictive accuracy. The average MAE (674.37 MPa) and RMSE (979.83 MPa) are slightly lower than those in the noise-free case, and R² remains above 0.95, suggesting that the model benefits from minor data perturbations, which can enhance generalization by reducing overfitting.

FIGURE 6

Comparison between predicted and true moduli for all pavement layers with random error: (a) surface course modulus E₁; (b) base course modulus E₂; (c) subgrade modulus E₃.

Scatter plots with histograms compare predicted versus actual values for three elastic moduli: E1, E2, and E3 in megapascals. In each plot, cyan dots represent the training set, and magenta dots represent the testing set. The diagonal line denotes y = x. Histograms display distributions of actual values above and predicted values to the right of each plot. Panel (a) depicts E1, (b) E2, and (c) E3.

TABLE 5

Model performance evaluation on the test dataset with random error.

Metric	Surface course	Base course	Subgrade	Average
MAE (MPa)	1226.26	796.28	0.56	674.37
MSE (×10³ MPa²)	3345253.50	1231481.75	0.59	1525578.61
RMSE (MPa)	1829.00	1109.72	0.77	979.83
MAPE (%)	8.53	8.04	0.86	5.81
R²	0.95	0.95	1.00	0.97

The R² values remain consistently high (≥0.95 for all layers), indicating that the random disturbances do not significantly affect the model’s regression capability. This robustness can be attributed to the self-attention mechanism in the Transformer architecture, which effectively identifies key spatial dependencies among deflection features and suppresses the influence of random noise.

In particular, the base course achieves an R² of 0.95 with MAPE below 8.1%, demonstrating the model’s adaptability to intermediate stiffness layers. The scatter distribution in Figure 6 remains tightly clustered around the reference line, further confirming the model’s insensitivity to random fluctuations.

4.3 Model performance under systematic error

Systematic errors, such as sensor calibration bias or consistent drift in FWD equipment, were next introduced to evaluate the model’s resilience to directional deviations. The outcomes are summarized in Figure 7; Table 6. Compared with the previous cases, the performance metrics show a moderate decline. The average MAE increases to 856.30 MPa, RMSE to 1205.76 MPa, and MAPE to 8.19%, while the average R² decreases slightly to 0.94.

FIGURE 7

Comparison between predicted and true moduli for all pavement layers with systematic error: (a) surface course modulus E₁; (b) base course modulus E₂; (c) subgrade modulus E₃.

Three scatter plots, each comparing predicted versus actual values of elastic moduli (E1, E2, E3) in MPa. Panel (a) shows data for E1 from 0 to 30,000, (b) for E2 from 4,000 to 20,000, and (c) for E3 from 40 to 100. In each plot, training data is teal, testing data magenta, with a y=x dashed line indicating perfect prediction. Vertical and horizontal histograms show data distribution.

TABLE 6

Model performance evaluation on the test dataset with systematic error.

Metric	Surface course	Base course	Subgrade	Average
MAE (MPa)	1507.41	1058.94	2.55	856.30
MSE (×10³ MPa²)	4768297.50	2046352.00	9.79	2271553.10
RMSE (MPa)	2183.64	1430.51	3.13	1205.76
MAPE (%)	9.37	11.43	3.78	8.19
R²	0.93	0.91	0.97	0.94

Visual inspection of Figure 7 reveals that the predicted moduli tend to deviate systematically from the 1:1 line, producing a slight offset pattern. This shift reflects the influence of persistent bias in the input data, which cannot be fully corrected by the model’s internal learning process. The Transformer architecture, while capable of capturing complex nonlinear relationships, inherently inherits a portion of the systematic bias embedded in the training data distribution.

Nevertheless, even under such challenging conditions, the model’s prediction accuracy remains acceptable for engineering applications. The R² values for all layers remain above 0.90, demonstrating that the model retains substantial predictive capability. These findings suggest that moderate systematic measurement errors do not critically impair the Transformer’s inference reliability, making it feasible for use with field FWD data where small calibration biases are common.

4.4 Model performance under combined random and systematic errors

The most realistic testing condition involves the coexistence of both random and systematic errors. Figure 8; Table 7 show that under this comprehensive noise environment, the Transformer model continues to perform robustly. The average MAE increases modestly to 732.56 MPa, while the average R² remains high at 0.95. The MAPE of 7.48% indicates that overall prediction deviations remain within an acceptable engineering range.

FIGURE 8

Comparison between predicted and true moduli for all pavement layers with random and systematic error: (a) surface course modulus E₁; (b) base course modulus E₂; (c) subgrade modulus E₃.

Three scatter plots (a, b, c) depict predicted versus actual values of E₁, E₂, and E₃ in MPa. Each plot includes data points for training (cyan) and testing sets (pink), with a diagonal y=x line indicating perfect prediction. Density plots and histograms along the axes and top right corners provide additional data distribution insights.

TABLE 7

Model performance evaluation on the test dataset with random and systematic error.

Metric	Surface course	Base course	Subgrade	Average
MAE (MPa)	1311.87	882.85	2.95	732.56
MSE (×10³ MPa²)	3719232.25	1374503.75	13.35	1697916.45
RMSE (MPa)	1928.53	1172.39	3.65	1034.86
MAPE (%)	9.09	8.83	4.51	7.48
R²	0.95	0.94	0.96	0.95

The subgrade layer once again demonstrates the highest stability, with an R² of 0.96, reflecting its lower sensitivity to noise due to smaller deflection amplitude variability. The surface and base layers experience minor performance degradation; however, the overall trend remains consistent, confirming that the Transformer effectively generalizes the underlying input–output relationship even when measurement uncertainty increases.

The results collectively demonstrate that the Transformer-based back-calculation model is not only accurate under ideal conditions but also robust and reliable under realistic noise perturbations.

4.5 Compared with common machine learning models based on random and systematic dataset

For a fair and consistent comparison, all baseline models (BPNN, SVR, and XGBoost) were trained and evaluated under the same experimental conditions as the proposed Transformer model. Specifically, all models used identical input features (nine FWD deflection peaks), output targets (layer elastic moduli), train–test split (70%/30%), and data preprocessing procedures, including Z-score normalization. The comparative evaluation was conducted on the same synthetic dataset with combined random and systematic errors, and model performance was assessed on an identical test set using the same evaluation metrics (MAE, MSE, RMSE, MAPE, and R²) for the surface course, base course, and subgrade. The hyperparameters of all baseline models were selected using standard tuning strategies within commonly accepted ranges. For the BPNN, the number of hidden neurons and the learning rate were adjusted empirically based on validation performance. For SVR, key hyperparameters including the kernel type, penalty parameter, and kernel width were optimized using grid search. For XGBoost, the tree depth, learning rate, and number of estimators were tuned through empirical validation. Default parameter settings were avoided when they resulted in clear underfitting or overfitting. Consequently, the adopted configurations represent reasonable and competitive baselines rather than minimally tuned models. The corresponding test results are summarized in Table 8.

TABLE 8

Test set performance of different models on the random and systematic noisy dataset.

Model	Layer	MAE (MPa)	MSE (×10³ MPa²)	RMSE (MPa)	MAPE (%)	R²
BPNN	Surface course	1176.47	2343725.47	1530.92	10.70	0.97
	Base course	1073.63	2064374.05	1436.79	10.63	0.91
	Subgrade	2.71	10.72	3.27	4.00	0.97
SVR	Surface course	2252.88	9047769.73	3007.95	18.27	0.88
	Base course	1502.35	3944127.77	1985.98	16.38	0.83
	Subgrade	2.61	9.80	3.13	3.84	0.97
XGBoost	Surface course	2939.54	14,370,359.16	3790.83	23.51	0.80
	Base course	2537.81	9855760.82	3139.39	28.04	0.56
	Subgrade	2.92	12.95	3.60	4.32	0.96
Tansformer	Surface course	1311.87	3719232.25	1928.53	9.09	0.95
	Base course	882.85	1374503.75	1172.39	8.83	0.94
	Subgrade	2.95	13.35	3.65	4.51	0.96

As shown in Table 8, the Transformer model achieves the lowest overall error levels and the most consistent performance across all three layers. In terms of MAPE, the average values across the three layers are approximately 7.48% for the Transformer, compared with 8.44% for BPNN, 12.83% for SVR, and 18.62% for XGBoost. The corresponding average R² values are about 0.95 for both the Transformer and BPNN, but decrease to roughly 0.89 and 0.77 for SVR and XGBoost, respectively. These results indicate that although BPNN can reach a comparable average R², it still yields larger errors than the Transformer, whereas SVR and XGBoost suffer from a clear degradation in predictive accuracy under the random and systematic noisy condition.

The advantage of the Transformer is particularly evident for the base course, which is generally the most difficult layer to identify due to its intermediate position and strong interaction with both the surface course and the subgrade. For this layer, the Transformer attains MAE = 882.85 MPa, RMSE = 1172.39 MPa, MAPE = 8.83%, and R² = 0.94, outperforming BPNN (MAE = 1073.63 MPa, MAPE = 10.63%, R² = 0.91) and substantially surpassing SVR and XGBoost (e.g., XGBoost yields MAPE = 28.04% and R² = 0.56). For the surface course and subgrade, the Transformer also provides competitive MAE/RMSE and high R² values, remaining at least as accurate as, and in several cases more accurate than, the baseline models.

These quantitative comparisons help clarify why a Transformer is preferred over simpler models in the proposed SEM + Transformer framework. The multi-head self-attention mechanism enables the Transformer to explicitly capture global dependencies among all FWD deflection measurements, allowing the network to focus on physically informative deflection patterns and to down-weight noisy or less relevant components. In contrast, BPNN relies on fixed fully connected mappings, SVR depends on pre-defined kernel functions, and XGBoost aggregates a series of decision trees, all of which have more limited capacity to represent the highly nonlinear and ill-posed mapping from surface deflections to multilayer elastic moduli under noisy conditions. Consequently, the Transformer not only achieves lower errors and higher R² on the random and systematic noisy dataset, but also exhibits stronger robustness and generalization, especially for the critical base course.

4.6 Physical plausibility, identifiability, and limitations of unconstrained learning

The above results presented above indicate that the proposed SEM–Transformer framework achieves high predictive accuracy and strong robustness across all considered noise scenarios. Beyond numerical accuracy, however, two fundamental issues deserve careful discussion: physical plausibility of the predicted moduli and identifiability of the inverse mapping from FWD deflections to multilayer elastic properties. It should be emphasized that the adopted Transformer configuration is not claimed to be universally optimal. Rather, it represents a compact and effective design choice tailored to the specific characteristics of the FWD-based back-calculation problem considered in this study.

From the perspective of physical plausibility, the predicted moduli in this study remain consistent with expected pavement mechanics within the predefined parameter space. Across all experiments and noise levels, no non-physical outcomes such as negative elastic moduli or severe modulus layering were observed in the reported test cases. In particular, the stiffness ordering between the surface course, base course, and subgrade is generally preserved. This behavior can be attributed primarily to the use of SEM-generated training data, which inherently satisfy mechanical consistency and realistic stiffness hierarchies.

Nevertheless, it is important to emphasize that the Transformer model itself is unconstrained. No explicit monotonicity, ordering, or inequality constraints (e.g., E surface ≥ E base ≥ E subgrade ) are enforced during training or inference. As a result, the observed physical consistency arises implicitly from the data distribution rather than from hard constraints embedded in the learning model. In principle, when applied outside the training distribution or under substantially different field conditions, the model may produce physically inconsistent modulus combinations, such as a surface-layer modulus lower than that of the subgrade. This limitation is common to most purely data-driven back-calculation approaches and should be carefully considered in practical applications.

The issue of identifiability and uniqueness is intrinsic to FWD-based modulus back-calculation and is independent of the specific learning algorithm employed. The inverse mapping from surface deflection basins to multilayer elastic moduli is inherently ill-posed and non-unique: different combinations of layer properties may yield very similar surface deflection responses, particularly under limited sensor spacing and in the presence of measurement noise. Consequently, even under ideal noise-free conditions, a mathematically unique inverse solution does not generally exist.

In this context, the role of the proposed Transformer model is not to recover a unique physical solution, but rather to learn a statistically optimal inverse mapping conditioned on the assumed parameter ranges, pavement configurations, sensor layout, and noise characteristics represented in the training data. The predicted moduli should therefore be interpreted as the most probable estimates within this constrained statistical space, rather than as exact physical truths. This interpretation is consistent with both classical optimization-based back-calculation methods and recent data-driven approaches reported in the literature.

Finally, it should be recognized that a non-negligible domain shift exists between SEM-generated responses and real-world FWD measurements. Real pavements exhibit temperature-dependent and viscoelastic material behavior, layer heterogeneity, construction-induced variability, and non-ideal load–pavement contact conditions, whereas the present SEM model assumes linear elasticity, homogeneity, and axisymmetry. Field FWD data are also affected by sensor coupling effects and spatially correlated measurement errors that are difficult to reproduce numerically. While the noise models adopted in this study provide a first-order approximation of measurement uncertainty, they do not fully capture these complexities. Addressing both physical constraint enforcement and the simulation-to-field domain gap will be essential steps toward reliable deployment of the proposed framework in real-world pavement evaluation and digital twin–based management systems.

From the perspective of inverse problem theory, pavement modulus back-calculation based on FWD deflections is a fundamentally ill-posed problem. The surface deflection basin represents an aggregated structural response, and different combinations of layer moduli may produce very similar deflection profiles, particularly when sensor spacing is limited and measurement noise is present. The proposed Transformer model does not eliminate this ill-posedness, but rather provides a data-driven regularization by learning the most statistically probable inverse mapping under the assumed parameter ranges and noise conditions.

It should also be noted that different pavement layers exhibit markedly different sensitivities in FWD measurements. The subgrade modulus predominantly controls the overall curvature and far-field deflections of the basin, while the surface and base layers mainly affect near-load deflections. As a result, the inverse mapping is inherently more sensitive to subgrade stiffness variations than to variations in upper-layer moduli. This sensitivity imbalance explains the near-perfect R² values observed for the subgrade in the present study. Such high R² values reflect dominant sensitivity rather than guaranteed identifiability or uniqueness of the subgrade modulus.

The current study does not explicitly assess the model’s ability to distinguish between different modulus combinations that generate nearly indistinguishable deflection basins. A rigorous identifiability or sensitivity analysis—such as controlled perturbation studies or equivalence-class analysis—would be required to quantify this capability and is left for future work. The goal of this study is not to exhaustively benchmark all possible architectures, but to demonstrate feasibility and robustness of a Transformer-based inverse framework. The comparative analysis is intended to provide contextual performance references under consistent experimental conditions, rather than a statistically exhaustive or uncertainty-aware benchmark across all model families.

4.7 Overall summary

Across all four scenarios, the Transformer-based intelligent back-calculation framework demonstrates high accuracy, stability, and adaptability. The model achieves an average R² exceeding 0.94 under all noise conditions, confirming its robustness against both random and systematic measurement errors. The average MAPE values remain below 8%, well within acceptable limits for pavement engineering applications.

These findings verify that the Transformer model effectively learns the intrinsic relationships between FWD deflection responses and layer elastic moduli, even in the presence of complex noise patterns. Consequently, this method provides a reliable and data-driven solution for practical modulus back-calculation tasks, offering improved accuracy and interpretability compared with traditional approaches.

5 Conclusion

This study developed an intelligent back-calculation framework integrating the SEM and a Transformer-based deep learning model to estimate multilayer pavement elastic moduli from FWD deflection data. Based on the numerical simulations, data preprocessing, and performance evaluations under four noise conditions, the main findings are summarized as follows:

5.1 High prediction accuracy and robustness

The Transformer-based model achieved excellent predictive performance, with average R 2 exceeding 0.94 and MAPE below 8% across all scenarios. Even under combined random and systematic noise, the model maintained stable accuracy, demonstrating strong generalization and robustness to measurement uncertainty.

5.2 Superior feature learning and physical consistency

By leveraging multi-head self-attention, the Transformer effectively captured global dependencies among deflection sensors, enabling precise mapping between surface deflection patterns and underlying layer moduli. The predicted trends were physically consistent with pavement structural behavior—subgrade moduli showed the highest stability due to smoother deformation responses, while the surface layer exhibited slightly higher variability owing to its stiffness contrast.

5.3 Efficiency and applicability

Once trained, the proposed model provided rapid, millisecond-level predictions, offering a computationally efficient and fully data-driven solution for modulus inversion. Its end-to-end design minimizes manual parameter tuning and avoids convergence issues common in traditional iterative back-calculation, supporting integration into real-time pavement condition evaluation and intelligent maintenance systems.

Overall, the developed SEM–Transformer framework demonstrates strong potential for intelligent, accurate, and efficient pavement structural evaluation, and provides a promising basis for data-driven digital twin systems in pavement management. However, the present study also has clear limitations. All training and testing data are synthetically generated using an SEM model, and no field FWD dataset is used for direct validation. As a result, the current findings should be interpreted as a numerical benchmark demonstrating feasibility and robustness, rather than as evidence of immediate field applicability. Future work will focus on validating the proposed framework using large-scale field FWD datasets, incorporating temperature-dependent and viscoelastic material behavior, and developing physics-guided or domain-adaptive learning strategies to mitigate the simulation-to-field gap. These efforts are essential before the proposed method can be reliably deployed in real-world pavement evaluation and digital twin–based infrastructure management systems. Future work will also explicitly address physical constraint enforcement by incorporating monotonicity or inequality constraints into the learning process, for example, through output reparameterization, physics-guided loss functions, or hybrid inversion frameworks. Such extensions are expected to further improve physical interpretability, reduce the risk of inconsistent predictions, and enhance robustness when applying the model to real-world FWD datasets. It should be emphasized that the reported prediction accuracy does not imply mathematical uniqueness of the inverse solution. The proposed framework provides statistically optimal estimates conditioned on the assumed data distribution, rather than resolving the intrinsic non-uniqueness of pavement modulus back-calculation. Future work will incorporate uncertainty quantification and statistical testing, such as repeated sampling, confidence interval estimation, or Bayesian approaches, to further strengthen the rigor of comparative performance evaluation.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

GW: Investigation, Methodology, Software, Writing – original draft, Data curation. YZ: Conceptualization, Funding acquisition, Supervision, Writing – review and editing.

Acknowledgements

The authors gratefully acknowledged their financial support.

Conflict of interest

Author GW was employed by Shanxi Provincial Transportation Construction Engineering Quality Inspection Center (Co., Ltd.).

The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Edited by: Alireza Tabarraei, University of North Carolina at Charlotte, United States

Reviewed by: Zeping Yang, Griffith University, Australia

Yubao Zhou, Delft University of Technology, Netherlands

Abbreviations:

u , Displacement vector SI unit: m; u ¨ Acceleration vector; λ t , μ t , Lame’s constant of the material; ∇ , Gradient differential operator; ∇ · u , Divergence of u ; ∇ 2 u , Laplacian of u ; ρ , Material density SI unit:kg/m³; d k , Dimensionality of the key vectors in the attention mechanism; D i , True simulated deflection value at the i -th measurement point SI unit: μm; ε i s , Systematic error component for the i -th measurement; ε i r , Random error component for the i -th measurement; y i , Actual (observed) value of the target variable SI unit: MPa; y ^ i , Predicted value obtained from the model SI unit: MPa; y ˉ , Mean value of the actual target variable SI unit: MPa; n , Number of samples; x i j , Original value of the j _th input feature for the i _th sample (e.g., peak deflection) SI unit: μm; y i j , Original target value of the j _th output variable for the i _th sample (elastic modulus) SI unit: MPa; μ j , Mean of the j _th input feature in the training set; σ j , Standard deviation of the j _th input feature in the training set; μ j Y , Mean of the j _th output variable (modulus) in the training set SI unit: MPa; σ j Y , Standard deviation of the j _th output variable (modulus) in the training set SI unit: MPa; x i j ′ , Standardized (dimensionless) value of input feature x i j after Z-score normalization; y i j ′ , Standardized (dimensionless) value of output variable y i j ; y ^ i j ′ , Predicted output in the standardized space; y ^ i j , Predicted output after inverse standardization (restored to physical units) SI unit: MPa.

References

Al-Khoury

Kasbergen

Scarpas

Blaauwendraad

(2001). Spectral element technique for efficient parameter identification of layered media: part II: inverse calculation. Int. J. Solids Struct. 38 (48), 8753–8772. 10.1016/S0020-7683(01)00109-3

Al-Khoury

Scarpas

Kasbergen

Blaauwendraad

(2002). Spectral element technique for efficient parameter identification of layered media. Part III: viscoelastic aspects. Int. J. Solids Struct. 39 (8), 2189–2201. 10.1016/S0020-7683(02)00079-3

Bush

A. J.

(1985). Computer program BISDEF. Vicksburg. Miss: US army corps of engineer waterways experiment station.

Bush

Alexander

(1985). Pavement evaluation using deflection basin measurements and layered theory, 1022, 16–29.

Bypour

Mahmoudian

Yekrangnia

Kioumarsi

(2024). Explainable tuned machine learning models for assessing the impact of corrosion on bond strength in concrete. Clean. Eng. Technol. 23, 100834. 10.1016/j.clet.2024.100834

Cao

Zhou

Zhao

Liu

(2020). Effectiveness of static and dynamic backcalculation approaches for asphalt pavement. Can. J. Civ. Eng. 47 (7), 846–855. 10.1139/cjce-2019-0052

Chen

Wang

Cai

(2024). Data-driven atmospheric corrosion prediction model for alloys based on a two-stage machine learning approach. Process Saf. Environ. Prot. 188, 1093–1105. 10.1016/j.psep.2024.06.028

Chen

Cao

Wan

Huang

Abdel-Aty

(2025). A novel CPO-CNN-LSTM based deep learning approach for multi-time scale deflection basin area prediction in asphalt pavement. Constr. Build. Mater. 458, 139540. 10.1016/j.conbuildmat.2024.139540

Coletti

Romeo

R. C.

Davis

R. B.

(2024). Bayesian backcalculation of pavement properties using parallel transitional markov chain monte carlo. Comput.-Aided Civ. Infrastruct. Eng. 39 (13), 1911–1927. 10.1111/mice.13123

Dosovitskiy

Beyer

Kolesnikov

Weissenborn

Zhai

Unterthiner

(2020). An image is worth 16x16 words: transformers for image recognition at scale.

Draper

N. R.

Smith

(1998). Applied regression analysis.

Elbagalati

Elseifi

Gaspard

Zhang

(2018). Development of the pavement structural health index based on falling weight deflectometer testing. Int. J. Pavement Eng. 19 (1), 1–8. 10.1080/10298436.2016.1149838

Golmohammadi

Hernando

Van den Bergh

Hasheminejad

(2025). Advanced data-driven FBG sensor-based pavement monitoring system using multi-sensor data fusion and an unsupervised learning approach. Measurement 242, 115821. 10.1016/j.measurement.2024.115821

Goodfellow

Bengio

Courville

(2016). Deep learning. The MIT Press.

Ioannides

A. M.

Barenberg

E. J.

Lary

J. A.

(1989). “Interpretation of falling weight deflectometer results using principles of dimensional analysis,” in Paper presented at the the 4th international conference on concrete pavement design and rehabilitation: proceedings, west Lafayettefrom.

Irwin

L. H.

(1994). Instructional guide for back-calculation and the use of MODCOMP3 version 3.6. Ithaca, NY: Cornell University Local Roads Program, CLRP Publications, 4–10.

Irwin

L. H.

Szebenyi

(1983). User's guide to modcomp2. Ithaca, NY: Cornell University Local Roads Program, 83–88.

Jiang

Gabrielson

Huang

Bai

Polaczyk

Zhang

(2022). Evaluation of inverted pavement by structural condition indicators from falling weight deflectometer. Constr. Build. Mater. 319, 125991. 10.1016/j.conbuildmat.2021.125991

Khazanovich

Roesler

(1997). DIPLOBACK: neural-network-based backcalculation program for composite pavements. Transp. Res. Rec. 1570 (1), 143–150. 10.3141/1570-17

Zhang

Wang

(2025). Physics-informed neural network with fuzzy partial differential equation for pavement performance prediction. Autom. Constr. 171, 105983. 10.1016/j.autcon.2025.105983

D'Avigneau

A. M.

Pan

Sun

Luo

Brilakis

(2025). Modeling heterogeneous spatiotemporal pavement data for condition prediction and preventive maintenance in digital twin-enabled highway management. Autom. Constr. 174, 106134. 10.1016/j.autcon.2025.106134

Meier

Alexander

Freeman

(1997). Using artificial neural networks as a forward approach to backcalculation. Transp. Res. Rec. 1570, 126–133. 10.3141/1570-15

Nam

B. H.

Kim

Murphy

M. R.

Zhang

(2016). Improvements to the structural condition index (SCI) for pavement structural evaluation at network level. Int. J. Pavement Eng. 17 (8), 680–697. 10.1080/10298436.2015.1014369

Pan

Zheng

Zhou

Luo

Hou

(2023). Damage pattern recognition for corroded beams strengthened by CFRP anchorage system based on acoustic emission techniques. Constr. Build. Mater. 406, 133474. 10.1016/j.conbuildmat.2023.133474

Plati

Georgiou

Papavasiliou

(2016). Simulating pavement structural condition using artificial neural networks. Struct. Infrastruct. Eng. 12 (9), 1127–1136. 10.1080/15732479.2015.1086384

Plati

Gkyrtis

Loizos

(2024). A practice-based approach to diagnose pavement roughness problems. Int. J. Civ. Eng. 22 (3), 453–465. 10.1007/s40999-023-00900-x

Scullion

Uzan

Paredes

(1990). MODULUS: a microcomputer-based backcalculation system. Transp. Res. Rec. 1260, 180–191.

Sharma

Das

(2008). Backcalculation of pavement layer moduli from falling weight deflectometer data using an artificial neural network. Can. J. Civ. Eng. 35 (1), 57–66. 10.1139/l07-083

Shamiyeh

Gunduz

Shamiyeh

M. E.

(2022). Assessment of pavement performance management indicators through analytic network process. IEEE Trans. Eng. Manage. 69(6), 2684–2692. 10.1109/TEM.2019.2952153

Stubstad

Irwin

Lukanen

Clevenson

(2000). It's 10 o'clock: do you know where your sensors are? Transp. Res. Rec. 1716, 10–19. 10.3141/1716-02

Tarefder

R. A.

Ahsan

Ahmed

M. U.

(2015). Neural network–based thickness determination model to improve backcalculation of layer moduli without coring. Int. J. Geomech. 15 (3), 4014058. 10.1061/(asce)gm.1943-5622.0000407

Torquato E Silva

S. D. A.

Oliveira

J. L. F. D.

Furtado

L. B. G.

Babadopulos

L. F. A. L.

Parente Junior

Batista Dos Santos

(2025). Effect of the input of structural parameters’ uncertainties and analysts’ arbitrary decisions on the results of backcalculated pavement materials’ resilient moduli. Can. J. Civ. Eng. 52 (9), 1743–1751. 10.1139/cjce-2024-0256

Ullidtz

(1998). Modelling flexible pavement response and performance. Lyngby: Polyteknisk Forlag.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

(2017). Paper presented at the proceedings of the 31st international conference on neural information processing systems. Long Beach, California, USA.

Wang

Zhao

(2022). Predicting bedrock depth under asphalt pavement through a data-driven method based on particle swarm optimization-back propagation neural network. Constr. Build. Mater. 354, 129165. 10.1016/j.conbuildmat.2022.129165

Wang

Zhao

Sun

(2023). Influence of bedrock on viscoelastic responses and parametric back-calculation results for asphalt pavements and prediction of bedrock depth under FWD tests. Constr. Build. Mater. 377, 131158. 10.1016/j.conbuildmat.2023.131158

Wang

Zhao

Sun

(2024). Intelligent back-calculation approach to obtain viscoelastic properties of asphalt pavements on bedrock using falling weight deflectometer tests. Transp. Res. Rec. 2679 (4), 431–447. 10.1177/03611981241292582

Wudil

Y. S.

Shalabi

A. F.

Al-Osta

M. A.

Gondal

M. A.

Al-Nahari

(2024). Effective corrosion detection in reinforced concrete via laser-induced breakdown spectroscopy and machine learning. Mater. Today Commun. 41, 111005. 10.1016/j.mtcomm.2024.111005

Yang

Chen

Cheng

Yang

Sun

Cui

(2025). Integrating FWD test and laboratory observation for assessing the damage state of semi-rigid base in asphalt pavement. Constr. Build. Mater. 496, 143769. 10.1016/j.conbuildmat.2025.143769

Zhang

Khan

Huyan

Zhong

Peng

Cheng

(2021). Predicting marshall parameters of flexible pavement using support vector machine and genetic programming. Constr. Build. Mater. 306, 124924. 10.1016/j.conbuildmat.2021.124924

Zhao

Cao

Chen

(2015). Dynamic backcalculation of asphalt pavement layer properties using spectral element method. Road. Mater. Pavement Des. 16 (4), 870–888. 10.1080/14680629.2015.1056214

Zheng

Zhou

Pan

Sun

Liu

(2020). Localized corrosion induced damage monitoring of large-scale RC piles using acoustic emission technique in the marine environment. Constr. Build. Mater. 243, 118270. 10.1016/j.conbuildmat.2020.118270

Zhou

Zheng

Liu

Pan

Zhou

(2022). A hybrid methodology for structural damage detection uniting FEM and 1d-CNNs: demonstration on typical high-pile wharf. Mech. Syst. Signal Proc. 168, 108738. 10.1016/j.ymssp.2021.108738

Zhou

Aydin

B. B.

Zhang

Hendriks

M. A. N.

Yang

(2024a). A lattice modelling framework for fracture-induced acoustic emission wave propagation in concrete. Eng. Fract. Mech. 312, 110589. 10.1016/j.engfracmech.2024.110589

Zhou

Liang

Yue

(2024b). Deep residual learning for acoustic emission source localization in a steel-concrete composite slab. Constr. Build. Mater. 411, 134220. 10.1016/j.conbuildmat.2023.134220

Zhou

Aydin

B. B.

Zhang

Hendriks

M. A. N.

Yang

(2025a). Lattice modelling of complete acoustic emission waveforms in the concrete fracture process. Eng. Fract. Mech. 320, 111040. 10.1016/j.engfracmech.2025.111040

Zhou

Liu

Lian

Pan

Zheng

Zhou

(2025b). Ambient vibration measurement-aided multi-1d CNNs ensemble for damage localization framework: demonstration on a large-scale RC pedestrian bridge. Mech. Syst. Signal Proc. 224, 111937. 10.1016/j.ymssp.2024.111937