Introduction

Front. Immunol.

Frontiers in Immunology

Front. Immunol.

1664-3224

Frontiers Media S.A.

10.3389/fimmu.2021.735135

Immunology

Hypothesis and Theory

How Naive T-Cell Clone Counts Are Shaped By Heterogeneous Thymic Output and Homeostatic Proliferation

Dessalles

Renaud

¹ Pan

Yunbei

² Xia

Mingtao

³ Maestrini

Davide

¹ D’Orsogna

Maria R.

¹ ² Chou

Tom

¹ ³ ^*

¹ Department of Computational Medicine, University of California at Los Angeles (UCLA), Los Angeles, CA, United States ² Department of Mathematics, California State University at Northridge, Los Angeles, CA, United States ³ Department of Mathematics, University of California at Los Angeles (UCLA), Los Angeles, CA, United States

Edited by: Grégoire Altan-Bonnet, Division of Cancer Biology (NCI), United States

Reviewed by: Carmen Molina-paris, University of Leeds, United Kingdom; Antoine Toubert, Université Paris Diderot, France; Meriem Bensouda Koraichi, École Normale Supérieure, France

*Correspondence: Tom Chou, tomchou@ucla.edu

This article was submitted to Systems Immunology, a section of the journal Frontiers in Immunology

17 02 2022

2021

735135

02 07 2021 06 12 2021

2022

Dessalles, Pan, Xia, Maestrini, D’Orsogna and Chou

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The specificity of T cells is that each T cell has only one T cell receptor (TCR). A T cell clone represents a collection of T cells with the same TCR sequence. Thus, the number of different T cell clones in an organism reflects the number of different T cell receptors (TCRs) that arise from recombination of the V(D)J gene segments during T cell development in the thymus. TCR diversity and more specifically, the clone abundance distribution, are important factors in immune functions. Specific recombination patterns occur more frequently than others while subsequent interactions between TCRs and self-antigens are known to trigger proliferation and sustain naive T cell survival. These processes are TCR-dependent, leading to clone-dependent thymic export and naive T cell proliferation rates. We describe the heterogeneous steady-state population of naive T cells (those that have not yet been antigenically triggered) by using a mean-field model of a regulated birth-death-immigration process. After accounting for random sampling, we investigate how TCR-dependent heterogeneities in immigration and proliferation rates affect the shape of clone abundance distributions (the number of different clones that are represented by a specific number of cells, or “clone counts”). By using reasonable physiological parameter values and fitting predicted clone counts to experimentally sampled clone abundances, we show that realistic levels of heterogeneity in immigration rates cause very little change to predicted clone-counts, but that modest heterogeneity in proliferation rates can generate the observed clone abundances. Our analysis provides constraints among physiological parameters that are necessary to yield predictions that qualitatively match the data. Assumptions of the model and potentially other important mechanistic factors are discussed.

naive T cells T-cell receptor repertoire diversity clone-count distributions mathematical modeling immigration-proliferation model heterogeneity

National Institutes of Health10.13039/100000002

National Science Foundation10.13039/100000001

Introduction

Naive T cells play a crucial role in the immune system’s response to pathogens, tumors, and other infectious agents. These cells are produced in the bone marrow, mature in the thymus, circulate through the blood, and migrate to the lymph nodes where they may be presented with different antigen proteins from various pathogens. Naive T cells mature in the thymus where the so-called V, D, and J segments of genes that code T cell receptors undergo rearrangement. Most T cell receptors (TCRs) are comprised of an alpha chain and a beta chain that are formed after VJ segment and VDJ segment recombination, respectively. The number of possible TCR gene sequences is extremely large, but while recombination is a nearly random process, not all TCRs are formed with the same probability.

The unique receptors expressed on the cell surface of circulating TCRs enable them to recognize specific antigens; well-known examples include the naive forms of helper T cells (CD4+) and cytotoxic T cells (CD8+). The set of naive T cells that express the same TCR are said to belong to the same T cell clone. Upon encountering the antigens that activate their TCRs, naive T cells turn into effector cells that assist in eliminating infected cells. Effector cells die after pathogen clearance, but some develop into memory T cells. Because of the large number of unknown pathogens, TCR clonal diversity is a key factor for mounting an effective immune response. Recent studies also reveal that human TCR clonal diversity is implicated in healthy aging, neonatal immunity, vaccination response and T cell reconstitution following haematopoietic stem cell transplantation (1, 2). Despite the central role of the naive T cell pool in host defense, and broadly speaking in health and disease, TCR diversity is difficult to quantify. For example, the human body hosts a large repertoire of T cell clones, however the actual distribution of clone sizes is not precisely known (3). Only recently have experimental and theoretical efforts been devoted to understanding the mechanistic origins of TCR diversity (4–9). The goal of this work is to formulate a realistic mathematical model that incorporates heterogeneity in naive T cell generation and reproduction. Model predictions are compared with T cell clone data to estimate reasonable and realistic parameter values.

One way to describe the TCR repertoire is by tallying the population n_i of T cells carrying receptor i. Another is to use the clone abundance distribution or “clone count” that measures the number of distinct clones composed of exactly k T cells, c ^ k : = Σ i = 1 ∞ (n_i, k), where the indicator function (n, k) = 1 if n = k and 0 otherwise. Clone counts c ^ k do not carry TCR identity information as n_i does, however, they can be used to construct other summary indices for T cell diversity such as Shannon’s entropy, Simpson’s index, or the whole population richness C ^ : = Σ k = 1 ∞ c ^ k (10).

Clone counts c ^ k and the total number of circulating naive T cells are difficult to measure in humans. Nonetheless, high-throughput DNA sequencing on samples of peripheral blood containing T cells (11–14) have provided some insight into TCR diversity. A commonly invoked model is that clone counts c ^ k exhibit a power-law distribution (4, 12, 15–17) in the clone abundance k. Several models have been developed to explain observed features of clone counts (3, 4, 15, 18, 19), including the apparent power-law behavior. One proposal is that T cells in different clones have TCRs that have different affinities for self-ligands that are necessary for peripheral proliferation (4–6), leading to clone specific replication rates. An alternative hypothesis (7) is that specific TCR sequences are more likely to arise in the V(D)J recombination process in the thymus (20) leading to a higher probability that these TCRs are produced. De Greef et al. (7) estimated the probability of production of a given TCR sequence by using the Inference and Generation of Repertoires (IGoR) simulation tool that quantitatively characterizes the statistics of receptor generation from both cDNA and gDNA data (20).

Although power-law models have been motivated, this behavior has been observed across only about two decades of clone sizes k, as shown in Figure 1 . Moreover, the above models have not systematically incorporated and compared heterogeneity in both immigration and replication rates, and/or fitted models to measured TCR clone abundance distributions. Finally, some of them have not taken into account subsampling in measurements, which will affect the predicted clone counts, especially for small clone sizes k which can be missed in small samples. In this paper, we analyze the effects of heterogeneity and sampling within a dynamic mean-field model based on a stochastic clone-dependent birth-death-immigration (BDI) process that includes (i) immigration representing the arrival of new clones from the thymus, (ii) birth during homeostatic proliferation of naive T cells that yield newborn naive T cells with the same TCR as their parent, and (iii) death representing cell apoptosis (10). We also include a regulating “carrying capacity” mechanism through a total population-dependent death rate which may represent the global competition for cytokines, such as Interleukin-7 (21–25), needed for naive T cell survival and homeostasis (26, 27). Since these cytokine signals are TCR-independent, the regulatory interaction, which ensures a finite homeostatic naive T cell population, is clone-independent (23).

Figure 1

Normalized naive T cell clone count data from one patient in Oakes et al. (12) plotted on a log-log scale. Values of the normalized clone counts along the vertical axis are the average of three samples among CD4 and CD8 cell subgroups. Clones are defined by different nucleotide sequences associated with different alpha or beta chains of the TCR.

We derive analytic expressions for the steady state clone counts in the entire organism and show that the predicted distributions are negative binomials. However, since T cell clone populations are measured in small blood subsamples extracted from an organism, we modify our predictions to include the effects of random subsampling and find that the negative binomial structure is preserved. Finally, the subsampled prediction will be averaged over distributions of TCR generation (thymic output) and homeostatic proliferation rates. The distribution of TCR generation rates are extracted from new computational tools: Inference and Generation of Repertoires (IGoR) (20) and Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences (OLGA) (28). Since there are no equivalent tools that measure proliferation rates, we will assume simple functions for the distribution of homeostatic proliferation rates. These model-derived results depend on the rate parameters of the model and the hyperparameters defining the probability distributions over these T cell production and proliferation rates (see Table 1 ).

Table 1

Model parameters θ and hyperparameters θ ₀.

(Hyper) Parameters	definition
α ∈ ℝ⁺	naive T cell production rate
α ¯ ∈ ℝ +	mean production rate across all possible Q TCRs
r ∈ [0, R]	naive T cell proliferation rate
r ¯ ∈ ℝ +	mean proliferation rate across all possible Q TCRs
R ∈ ℝ⁺	maximum proliferation rate of all possible Q TCRs
w ∈ [0, 1]	dimensionless width of box distribution of r
µ ^∗ > R	naive T cell death rate at steady state
η ∈ [0,1]	blood subsampling fraction

The dimensional parameters associated with our mechanistic population model. Hyperparameters such as α ¯ , r, R, w define the probability distribution or heterogeneity in the underling rate parameters α ¯ and r. In our analyses, we typically nondimensionalize by normalizing all rates by R, the maximum proliferation rate across all clones.

Our results are then compared to the data shown in Figure 1 and used to estimate hyperparameters associated with the heterogeneity in the TCR-specific immigration and proliferation rates. Specifically, we quantify how the width of a simple uniform proliferation rate distribution and the heterogeneity of immigration rates from a generative model affect the predicted clone counts. Our analysis explicitly shows that within reasonable physiological parameter ranges, heterogeneity in the thymic immigration rate cannot significantly change clone count distributions. However, clone counts are sensitive to heterogeneity in T cell proliferation rates. Thus, different levels of heterogeneity in proliferation rates can give rise to qualitatively different clone count distributions. This finding of the dominance of proliferation in shaping clone count distributions is consistent with the observation that in older humans with severely reduced thymic output a broad clone count distribution is still maintained (9, 29).

Materials and Methods

To understand the observed clone counts, we focus on the clone count distribution c ^ k associated only with naive T cells, the first type of cells produced by the thymus that have not yet been activated by any antigen. Antigen-mediated activation initiates a largely irreversible cascade of differentiation into effector and memory T cells that we can subsume into a death rate. Thus, we limit our analysis to birth, death, and immigration within the naive T cell compartment. Here, we first present the mathematical framework of the BDI process to provide an initial qualitative understanding for clone counts.

Heterogeneous Birth-Death-Immigration Model

The multiclone BDI process is depicted in Figure 2 . We define Q to be the theoretical number of all possible functional naive T cell receptor clones that can be generated by V(D)J recombination in the thymus which is estimated to be Q ~ 10¹³ – 10¹⁸ (6, 28). As we will later show, results of our model will not depend on the explicit value of Q as long as Q ≫ 1. Due to naive T cell death or removal from the sampling-accessible pool, not all possible clone types will be presented in the organism, so we denote the number of clones actually present in the body (or “richness”) by C ^ ≪ Q , where estimates of C ^ range from ~ 10⁶ – 10⁸ in mice and humans (1, 6, 32, 33, 35, 36).

Figure 2

Schematic of a multiclone birth-death-immigration process. Clones are defined by distinct TCR sequences i. Each clone carries its own thymic output and peripheral proliferation rates, α_i and r_i , respectively. We assume all clones have the same population-dependent death rate μ(), where is the total number of cells in the organism that influence the death rate. Since Q ≫ 1, we impose a continuous distribution over the rates α and r. Theoretically, there may be Q ≳ 10¹⁵ (6) or more (30, 31) possible viable V(D)J recombinations. The actual, effective number of different selected TCRs sequences is expected to be much less since extremely low probability sequences may never be formed during the organism’s lifetime. A strict lower bound on Q is the actual number of distinct clones Ĉ in an entire organism [Ĉ ∼ 10⁶ – 10⁸ for humans (1, 6, 32–34)].

Although naive T cells are difficult to distinguish from the entire T cell population, the total number of naive T cells (across all clones present) in humans has been estimated to be about N ^ ∼ 10 11 . Circulating naive T cells number approximately 10⁹ (37) but can exchange, at different time scales, with those that reside in peripheral tissue, which may carry their own proliferation and death rates. The effective pool that is ultimately sampled is thus difficult to estimate, but measurements show that the theoretical number of different clones is much larger than the total number of naive T cells, which is in turn much greater that the total number of different T cell clones actually in the body ( Q ≫ N ^ ≫ C ^ ) . Regardless of the precise values of the discrete quantities Q , N ^ , C ^ , they are related to the discrete clone counts c ^ k via

(1) C ^ = ∑ k ≥ 1 c ^ k ≪ Q and N ^ = ∑ k ≥ 1 k c ^ k .

As depicted in Figure 2 , each distinct clone i (with 1 ≤ i ≤ Q) is characterized by an immigration rate α_i and a per cell replication rate r_i . The immigration rate α_i is clone-specific because it depends on the preferential V(D)J recombination process; the replication rate r_i is also clone-specific due to the different interactions with self-peptides that trigger proliferation. Since both the numbers of theoretically possible ( Q ≫ 1 ) and observed ( C ^ ≫ 1 ) clones are extremely large, we can define a continuous, normalized probability density π(α, r) from which immigration and proliferation rates α and r of a randomly chosen clone are drawn. This means that the probability that a randomly chosen clone has an immigration rate between α and α + dα and replication rate between r and r + dr is π(α, r)dαdr, and ∫ 0 ∞ d α ∫ 0 ∞ d r π ( α , r ) = 1 .

Since Q is finite and countable, there will exist maximum values A and R for the immigration and proliferation rates, respectively, such that π(α, r) = 0 for α > A or r > R. In the BDI process, the upper bound R on the proliferation rate prevents unbounded numbers of naive T cells and is necessary for a self-consistent solution. The heterogeneity in the immigration and replication rates allows us to go beyond typical “neutral” BDI models, where both rates are fixed to a specific value for all clones, α_i = α and r_i = r for all i.

Finally, we assume the per cell death rate μ ( N ^ ) is clone-independent but a function of the total population N ^ . This dependence represents the competition among all naive T cells for a common resource (such as cytokines), which effectively imposes a carrying capacity on the population (24, 31, 38). The specific form of the regulation will not qualitatively affect our findings since we will ultimately be interested in only its value μ(N ^∗) ≡ μ ^∗ at the mean steady state population N ^∗.

Mean-Field Approximation of the BDI Process

The exact steady-state probabilities of configurations of the discrete abundances c ^ k for a fully stochastic neutral BDI model with regulated death rate μ ( N ^ ) were recently derived (10). In Dessalles et al. (10) exact results were derived for the steady-state probability P ( c ^ 1 , c ^ 2 , … , c ^ k ) under uniform immigration, proliferation, and death rates α, r, and μ, respectively. The significant contribution of this paper is that we go beyond the neutral model (equal immigration, proliferation, and death rates for all clones) by allowing for heterogeneous distributions of these rates. To incorporate TCR-dependent immigration and replication rates in a non-neutral model, we must consider distinct values of α_i and r_i for each clone i. In this case, an analytic solution for the probability distribution over c ^ k , even at steady state, cannot be expressed in an explicit form. However, since the effective number of naive T cells ( N ^ ∼ 10 9 − 10 11 (35)) is large, we can exploit a mean-field approximation to the non-neutral BDI model and derive expressions for the mean values of the discrete clone counts c ^ k . We will show later that under realistic parameter regimes, the mean-field approximation is quantitatively accurate. Breakdown of the mean field approximation has been carefully analyzed in other studies (39).

i) Deterministic Approximation for the Total Population and the Effective Death Rate

To implement the mean-field approximation in the presence of a general regulated death rate μ ( N ^ ) , we start by writing the deterministic, “mass-action” ODE for the mean number of cells n_α,r (t) with a realized immigration rate α and proliferation rate r in a BDI process

(2) d n α , r ( t ) d t = α + r n α , r ( t ) − μ ( N ( t ) ) n α , r ( t ) .

Next, we define and exploit the density of realized values of α and r. Since Q ≫ 1, the number of TCRs that are associated with immigration rate between α and α + dα and a replication rate between r and r + dr is denoted Qπ(α, r)dαdr, where π(α, r) is a normalized density that describes how these realized values of α and r are distributed. Our model for the total mean number N(t) of naive T cells can then be estimated as a weighted integral over all n_α,r (t)

(3) N ( t ) = Q ∫ 0 A d α ∫ 0 R d r n α , r ( t ) π ( α , r ) .

Note that the limits of the integration above can equivalently be taken as A, R→∞ as long as π(α, r) = 0 when α > A or r > R. At steady-state, the solution to Eq. 2 can be simply expressed as

(4) n α , r ∗ = α μ ( N ∗ ) − r

in which N ^∗ is the predicted steady-state value of N(t) as t → ∞. Thus, upon weighting Eq. 4 over all possible values of α and r, we find

(5) N ∗ = Q ∫ 0 R d r ∫ 0 ∞ d α α π ( α , r ) μ ( N ∗ ) − r ’

a self-consistent equation for N ^∗ which depends implicitly on the parameters that define the distribution π(α, r). Eq. 5 clearly shows why a finite cutoff π(α, r > R) = 0, R < μ(N ^∗) is required since the integral diverges if π(α, r ≥ μ(N ^∗)) > 0. However, as long as π(α, r) decays faster than 1/α ², the α-integration converges with an explicit cutoff A.

We will first assume that α and r are uncorrelated and that the distribution factorises: π(α,r) = π_α (α)π_r (r). Then, the self-consistent effective steady state death rate μ ^∗ ≡ μ(N ^∗) depends only on the combination

N ∗ ( α ¯ Q ) = ∫ 0 R d r π r ( r ) ( μ ∗ − r ) ,

where

α ¯ ≡ ∫ 0 A α π α ( α ) d α

is the mean immigration rate across all possible clones. To simplify subsequent notation, we normalize all rates by the maximum proliferation rate R. To avoid population blow-up, we impose that the maximum proliferation is smaller than the steady-state death rate R < μ ^∗. By measuring time in units of 1/R, we redefine r/R → r ≤ 1, α/R → α, α ¯ / R → α ¯ , μ ^∗/R → μ ^∗, and R ² π(α, r) → π(α, r) so that these quantities are now dimensionless, unless otherwise explicitly stated. The steady-state self-consistent condition becomes

(6) N ∗ α ¯ Q ≡ λ α ¯ = ∫ 0 1 d r π r ( r ) μ ∗ − r .

Since the effective Q is a large, uncertain number, we parameterize our model in terms of λ ≡ N ^∗/Q, the total steady state naive T cell population normalized by the total possible number of clones Q. It is sometimes deemed a measure of the “coverage” of the entire repertoire (6). Values of N ^∗ and Q that are consistent with measurements and physiologic expectations give λ ≪ 1. Once λ / α ¯ and π_r (r) are estimated, we can self-consistently determine μ ^∗ from Eq. 6. Besides λ / α ¯ , the self-consistent value of μ ^∗ will also depend on the function π_r (r). Note from the form of Eq. 6, the self-consistent μ ^∗ is inversely related to λ.

ii) Mean-Field Model of Clone Counts

Given a relationship such as Eq. 6 that determines μ ^∗, we can explicitly develop a model that quantifies naive T cell subpopulations according to their immigration and proliferation rates α and r. For a given, realized value of α and r, we denote the expected number of clones of size k with these immigration and proliferation rates by c k ( α , r ) . The mean-field equations for the dynamics of these mean clone counts in the neutral model were derived in (39, 40) and are reviewed in Section 1 of the Supplementary Material . In a neutral model, we assume that all clones Q carry the same rates α and r so that the mean field evolution equation for c k ( a , r ) is given by solving (38, 39)

(7) d c k ( α , r ) d t = α [ c k − 1 ( α , r ) − c k ( α , r ) ] + r [ ( k − 1 ) c k − 1 ( α , r ) − k c k ( α , r ) ] + μ ( N ) [ ( k + 1 ) c k + 1 ( α , r ) − k c k ( α , r ) ] ,

along with the constraint ∑ k = 0 ∞ c k ( α , r ) = c 0 + ∑ k = 1 ∞ c k ( α , r ) = Q . Note that c k ( α , r ) and n_α,r are related via ∑ k = 1 ∞ k c k ( α , r ) = n α , r . We use the notation c_k to denote the predicted clone counts derived from our mathematical model to distinguish them from measured clone counts c ^ k . Equation 7 assumes that both c k ( α , r ) and N are uncorrelated, allowing us to write the last term as a product of functions of the mean population N = ∑ k = 1 ∞ k c k and c k + 1 , c k . Under steady-state, we approximate μ(N) by μ ^∗ found by solving Eq. 6 as a function of λ , α ¯ , and the hyperparameters defining π_r (r). The steady-state solution of Eq. 7 follows a negative binomial distribution with parameters α/r and r/μ ^∗ < 1 (10, 39)

(8) c k ≥ 1 ( α , r , μ ∗ ) = Q ( 1 − r μ ∗ ) α / r ( r μ ∗ ) k 1 k ! ∏ ℓ = 0 k − 1 ( α r + ℓ ) ,

The predicted number of absent clones is c 0 = Q − ∑ k = 1 ∞ c k ( α , r , μ ∗ ) . The solution 8 depends implicitly on the parameter λ / α ¯ through μ ^∗ determined by Eq. 6. Although c_k (α, r, μ ^∗) has not yet been averaged over α, r, it also implicitly depends on λ and the parameters that define π_r (r) through μ ^∗ and Eq. 6. Specifically, larger λ leading to smaller μ ^∗ results in a more slowly decaying c_k (α, r, μ ^∗) as a function of k. This behavior will be propagated through subsampling and averaging over α and r.

Subsampling

Unless an animal is sacked and its entire naive T cell population is sequenced, TCR clone distributions are typically measured from sequencing TCRs in a small blood sample. In such samples, low population clones may be missed. In order to compare our predictions with measured clone abundance distributions, we must revise our predictions to allow for random cell sampling. We define η as the fraction of naive T cells in an organism that is drawn in a sample and assume that all naive T cells in the organism have the same probability η of being sampled. This is true only if naive T cells carrying different TCRs are not preferentially partitioned into different tissues and are uniformly distributed within an animal. Let us assume that a specific clone is represented by ℓ cells in an organism. If N ∗ η ≫ ℓ , the probability that k cells are randomly sampled from the same clone approximately follows a binomial distribution with parameters ℓ and η (40–44)

(9) ℙ [ k | ℓ ] ≈ ( k ℓ ) η k ( 1 − η ) ℓ − k , k ≤ ℓ .

The associated mean sampled clone count c k s depends on the predicted whole-organism clone count and ℙ[k|ℓ] via the formula

(10) c k s ( α , r , μ ∗ , η ) ≈ ∑ ℓ ≥ k c ℓ ( α , r , μ ∗ ) ℙ [ k | ℓ ] = ∑ ℓ ≥ k c ℓ ( α , r , μ ∗ ) ( k ℓ ) η k ( 1 − n ) ℓ − k .

where c_ℓ (α, r, μ ^∗) is determined by Eq. 8. Explicitly performing the sum in Eq. 10 yields the sampled clone count

(11) c k s ( α , r , μ ∗ , η ) = Q k ! ( η r / μ ∗ 1 − ( 1 − η ) ( r / μ ∗ ) ) k ( 1 − r / μ ∗ 1 - ( 1 − η ) ( r / μ ∗ ) ) α r ∏ j = 0 k − 1 ( α r + j ) .

The total expected number of clones in the sample (the richness) can be found via direct summation:

(12) C s ( α , r , μ ∗ , η ) = ∑ k = 1 ∞ c k s ( α , r , μ ∗ , η ) = Q [ 1 − ( 1 − r / μ ∗ 1 − ( 1 − η ) r / μ ∗ ) α / r ] .

As shown in Figure 3 , random subsampling greatly affects the observed clone counts, with small sampling fractions η leading to fast decay in k of c k s ( α , r , μ ∗ , η ) and shifting c_k at large k to much smaller values of k while reducing the values of c_k for small k (42). Note that setting η = 1 in Eq. 11 leads to Eq. 8, the whole-body clone count. In Figures 3A, B we plot results from our model using two very different dimensionless parameter sets, α = 10^-5, r = 1/2, λ = 0.01, and α = λ =10, r = 1/2, to generate two qualitatively different patterns of neutral model clone counts c_k . If the subsampling η ≪ 1 is sufficiently small, the resulting c k s corresponding to the two qualitatively different c_k can appear similar. This implies that small sampling fractions make the estimation of whole-body clone counts from sampled data somewhat ill-conditioned, i.e., different whole-body clone counts, upon sampling, may yield similar sampled clone counts. Although sampling can strongly affect the inference of c_k , immigration and proliferation rate distributions may also affect the observed clone count as we investigate below.

Figure 3

The effects of sampling on two different neutral-model relative clone counts c k s / C s plotted using the dimensionless proliferation rate r = 1/2 in Eqs. 11 and 12 or Eq. 10 and S9 from Section 2 of the Supplementary Material . In (A), we used α = 10^-5, λ =0.01. The effect of sampling is illustrated for η = 1 (no subsampling), 10^-3, 10^-4, 10^-5, and 10^-6. All clone counts are qualitatively similar, with subsampling increasing the exponential decay in c_k . In (B), we use a physiologically unrealistic set of parameters, α = λ = 10, which leads to a qualitatively different unsampled clone count pattern that exhibits a peak. However, under small subsampling fractions η, the clone count loses its peak as it shifts to a rapidly decreasing patterns c k s that are not significantly different from sampled clone counts predicted using the parameters α and λ in (A). This indicates inferring parameters using clone counts is ill-conditioned (rather insensitive to parameters) if η is too small.

Heterogeneity and Determination of <italic>π</italic> (<italic>α</italic>, <italic>r | Ɵ</italic> <sub>0</sub>)

The fundamental result given in Eq. 11 applies only to the clone count density in a neutral model in which the immigration and proliferation rates are α and r for all clones. We now average the sampled clone counts c k s ( α , r , μ ∗ , η ) (Eq. 11) and the richness C^s (α, r, μ ^∗, η) (Eq. 12) over a distribution of immigration and proliferation rates π(α, r) to capture the heterogeneity across TCR clones. This final result can then be compared with experimentally measured clone counts. Recall that π(α, r) can depend on hyperparameters θ ₀ that define the shape of π. We then explicitly denote the distribution by π(α, r|θ ₀).

Once π(α, r|θ ₀) is defined, we can weight sampled clone counts accordingly. For example, one may assume θ 0 = { α ¯ , w } , with each of the two hyperparameters defining π ( α , r | θ 0 ) = π α ( α | α ¯ ) π r ( r | w ) , leading to

c k s ( μ ∗ , η , θ 0 = { α ¯ , w } ) = ∫ 0 ∞ d α ∫ 0 1 d r π ( α , r | θ 0 ) c k s ( α , r , μ ∗ , η ) .

i) Proliferation Rate Heterogeneity

First, we consider a distribution of TCR sequence-dependent proliferation rates. Since TCR-antigen affinity depends on the receptor amino-acid sequence, the rate of T cell activation and subsequent proliferation can be clone-specific (31, 45). Thus, the specific interactions between TCRs and low-affinity MHC/self-peptide complexes maps to a distribution of proliferation rates among all the Q possible clones. Since there are no data (known to us) that can be used to infer this mapping or the specific shape of π_r (r|w), we assume, for simplicity, a simple uniform “box” distribution centered about a mean value r ¯ = 1 / 2 :

(13) π r ( r | w ) = { 0 otherwise 1 / w if | r − 1 / 2 | < w / 2

where 0 ≤ w ≤ 1 represents the relative width of the uniform box distribution. The minimum and maximum dimensionless proliferation rates in this distribution are then 1/2-w/2 and 1/2+w/2, respectively. The dimensionless self-consistency condition (Eq. 6) thus yields

(14) μ ∗ = ( 1 2 + w 2 ) e λ w / α ¯ − ( 1 2 − w 2 ) e λ w / α ¯ − 1

To understand the effects of proliferation rate heterogeneity we begin by considering it effects on whole-organism (η = 1) clone counts. Since the function c_k (α, r, μ ^∗) defined by Eq. 8 contains the exponentially decaying term (r/μ ^∗) ^k , a fixed dimensionless value of μ ^∗ and r = 1/2 leads to an exponential decay in c_k in k. However, if w > 0, different values of r and μ ^∗ contribute to this decay term, yielding nontrivial behavior and a much slower decay as seen in Figure 4 for λ / α ¯ = 8 , 80 and different values of w.

Figure 4

An exploration of the effects of proliferation rate heterogeneity on the mean clone counts c_k with Q = 10¹³. (A) Various box distributions π_r (r|w) for w = 0, 0.2, 0.4, 0.6, 0.8, and 1. (B) Using Eq. 14 and the dimensionless values α ¯ = 10 − 3 , λ = 8 × 10 − 3 such that λ / α ¯ = 8 , we plot, using the same color spectrum as (A), the corresponding clone counts C_k and show that wider distributions typically generate longer-tailed c_k . However, if λ is set even larger such that λ / α ¯ = 80 even modest values of w can generate a very long-tailed c_k , as shown in (C). The color spectrum in (C) is for visualization only and not associated with that in (A, B). In the limit of very large λ / α ¯ , the effects of heterogeneous proliferation saturate at very small w beyond which it has negligible effect in further extending the tail.

ii) Immigration Rate Heterogeneity

Next, we use previous studies that predict V(D)J recombination frequencies associated with each TCR sequence to construct a distribution π _α(α) for the TCR sequence-dependent thymic output. A statistical model for differential V(D)J recombination in humans is implemented in the Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences (OLGA) software (28), which is an updated version of the Inference and Generation of Repertoires (IGoR) software (20). Below, we estimate π α ( α | α ¯ ) by sampling a large number of TCRs from OLGA that draws sequences according to their generation probability. Our working assumption is that thymic selection is uncorrelated with V(D)J recombination so the relative probabilities of forming different TCRs provide an accurate representation of the ratios of the TCRs exported into the periphery.

Both IGoR and OLGA can be used to generate the probabilities corresponding to each drawn sequence but this requires significant computational time and memory. Equivalently, since the sequence draws are proportional to the underlying probabilities, we simply drew N_* sequences and counted the frequencies of each amino acid sequence. Out of N_* sequence draws from IGoR or OLGA, there will be C_* distinct amino sequences (the richness of the drawn sequences). Since some sequences are drawn j>1 times, C_* ≤ N_* . If b_j distinct sequences are drawn j times, and the maximum observed frequency max{j} ≡ J, C * = ∑ j = 1 J b j , N * = ∑ j = 1 J j b j , while b_j /C_* is the fraction of all drawn sequences that appear j times. For N_* = 10⁹, we found C_* = 372,806,648 ≈ 3.72 × 10⁸ and a maximum observed frequency max{j} = J = 52,294 for the alpha chain and C_* = 875,920,705 ≈ 8.76 × 10⁸ and J = 6430 for the beta chain.

We model the effective immigration rate of a TCR sequence drawn j times to be proportional to j so that α_j ≡ α_*j. To fix the proportionality α _*, we identify the mean immigration rate averaged across the C _* observed sequences with the mean physiological rate α ¯

(15) α * ∑ j j b j C * ≈ α ¯

to find α * = α ¯ C * / N * and thus

(16) α j = α ¯ j ( N * / C * ) .

The frequencies j of the drawn realization of clones are plotted in decreasing order against the C _* distinct sequences in Figures 5A, B . From these frequencies j and the number of sequences b_j exhibiting them, we approximate averages of any function y(α) over π α ( α | α ¯ ) by taking a sum over the values α_j :

(17) ∫ π α ( α | α ¯ ) y ( α ) ≈ ∑ j = 1 J b j C * y ( α j ) .

Figure 5

Ordered integer-valued frequencies j, plotted on a log-log scale, of the C_* distinct (A) alpha and (B) beta chains drawn using OLGA. The index 1 ≤ i ≤ C_* < N_* labels the distinct sequences drawn while b_j is defined as the number these sequences that exhibit the specific frequency j [b ₁ and b ₂ are explicitly indicated in (B)]. The highest frequency clone appears J times such that b_j>J = 0. Since C_* is comparable to N_* , the drawn sequences are dominated by the low probability ones that appear only once. The insets display the frequencies on a linear scale and indicate the long-tailed behavior of the frequencies. The shape of the frequency spectra is self-similar once N_* ≳ 10⁷, allowing us to use this sampling procedure to reliably estimate π α ( α | α ¯ ) .

Alternatively, when drawing sequences IGoR and OLGA (using the Pgen feature) one can also directly output their probabilities p_i , whose values would be proportional to the frequency j if large numbers of sequences are drawn as described above. We can use these countable sequences and probabilities to construct α and π_α (α) by defining α i = α ¯ Q C * p i / p T where p T = ∑ i = 1 C * p i . By plotting the values of p_i , we arrive at a distribution similar to that shown in Figure 5 . In this case too, we find that a large number of low-probability sequences dominates the averaging of clone counts using the distribution of immigration rates constructed using IGoR/OLGA.

Now that we have specified the distributions for π α ( α | α ¯ ) and π_r (r|w), we can compute the mean, sampled, immigration- and proliferation-averaged clone counts and compare them with measurements. The full formula for the immigration and proliferation rate-averaged clone counts under subsampling is

(18) c k ( α ¯ , μ ∗ , w , η ) = ∫ 0 ∞ d α ∫ 0 1 d r π α ( α | α ¯ ) π r ( r | w ) c k s ( α , r , μ ∗ , η ) = Q k ! ∑ j = 1 J b j C * ∫ ( 1 − w ) / 2 ( 1 + w ) / 2 d r w ( η r / μ ∗ 1 − ( 1 − η ) r / μ ∗ ) k × ( 1 − r / μ ∗ 1 − ( 1 − η ) r / μ ∗ ) α j r ∏ i = 0 k = 1 ( α i r + i ) ,

where α_j is given by Eq. 16 and μ ^∗ is given by Eq. 14. Eq. 18 is our “full model” from which we make predictions of clones count-related quantities and compare them with data. Using this expression, we can mathematically study the importance of the heterogeneities in α and r by comparing predictions from simple forms of π α ( α | α ¯ ) and π_r (r|w), as presented in Section 2 of the Supplementary Material to those derived from π ( α , r ) = δ ( α − α ¯ ) δ ( r − 1 2 ) of the neutral model.

From Figure 5 , observe that b 1 ≫ b j > 1 . In fact, a majority of the naive T cell population is comprised of clones that are produced only once. The linear-scale insets also show a long tail indicating a large number of clones that are generated few times. Thus, for sufficiently small α ¯ , our formulae for c_k and all subsequent quantities can be approximated by taking the α ¯ / r ≪ 1 limit. As we show in Section 3 of the Supplementary Material , such a simpler expression remains highly accurate, provided the dimensionless α ¯ < 10 − 2 , and allows efficient computation. This implies that the full result arising from averaging c k s ( α , r , μ ∗ , η ) over π α ( α | α ¯ ) can also be approximated by using a single effective value c k s ( α ¯ , r , μ ∗ , η ) , supporting our overall conclusion that predicted heterogeneity in human T cell immigration rates do not appreciably influence clone count distributions. While physiological distributions π α ( α | α ¯ ) do not yield clone counts appreciably different from those of a neutral immigration model, small changes in proliferation rate heterogeneity w can significantly affect the clone count structure c k s . Nonetheless, for completeness, we will perform the full summation over α_j (Eq. 18). All parameters, hyperparameters, and variables used in our modeling and data analysis are listed in Tables 1 , 2 .

Table 2

Model variables and their definitions.

variables	definition
Q ∈ ℤ ⁺	theoretical number of possible TCRs ~10¹³ – 10¹⁸ (36)
N ^ ∈ ℤ +	number of naive T cells in organism ~10¹⁰ – 10¹¹ (5)
N(t) ∈ ℝ⁺	number of naive T cells from model
N ^∗ ∈ ℝ⁺	steady-state number of naive T cells from model
N^s ≡ ηN ^∗ ∈ ℝ⁺	subsampled number of naive cells from model
N_* ∈ ℤ⁺	number of draws from IGoR/OLGA
C ^ ∈ ℤ +	total number of clones in organism (richness) ~10⁶ – 10⁸ (36)
C ^ s ∈ ℤ +	total number of sampled clones (sampled richness)
C(θ) ∈ ℝ⁺	total number of clones in organism from model
C^s (θ,η) ∈ ℝ⁺	total number of sampled clones from model
C_* ∈ ℤ⁺	number of different sequences drawn from IGoR/OLGA
c ^ k ∈ ℤ +	discrete number of clones of size k
c_k (θ) ∈ ℝ⁺	model of number of clones containing k cells
c ^ k s ∈ ℤ +	discrete number clones of size k in sample
c k s ( θ , η ) ∈ ℝ +	modeled number of sampled clones containing k cells
f k s = k c ^ k s C ^ s ∈ [ 0 , 1 ]	fraction of all sampled cells in clones of size k
f k s ( θ , η ) = k c k s ( θ , η ) C s ( θ , η ) ∈ [ 0 , 1 ]	modeled fraction of all sampled cells in clones of size k

The variables with . ^ denote measured numbers, while populations written as functions of parameters θ are those predicted from our model (the dimensionless parameters used in our model are θ = {α,r}). The probability distributions π(α,r|θ₀) are defined by hyperparameters θ₀ (the dimensionless hyperparameters used in this study are θ 0 = { α ¯ , w } ). Upon averaging predicted quantities such as c k s ( α , r ) over π(α, r|θ₀) we find c k s ( θ 0 ) .

Results and Analysis

Before performing a quantitative comparison with measured clone counts from Oakes et al. (12), we discuss the qualitative features of our model and typical physiological parameter ranges. While even the basic model parameters are difficult to measure, our nondimensionalized model unifies the mechanisms and concepts common to the maintenance of diversity in the T cell repertoire across different organisms.

When considering the data, we observe that even after significant subsampling, there are appreciable clone counts at reasonably large clone sizes k, whereas the unsampled clone counts decay exponentially in k with rate log(μ ^∗/r). Even though r may take on a range of values, as determined by π _r (r), the slowest decay of c_k arises from the largest possible values of r. Thus, a larger proliferation rate heterogeneity w will generally yield a longer-tailed c_k , as illustrated in Figure 4 . Since the data we analyze are derived from human samples, we will use the following arguments as a rough guide to the relevant range of parameters:

The average total number of naive T cells is not completely known but is estimated to be about N ^∗~ 10¹¹ (35). However, the circulating population in the peripheral blood is approximately two orders of magnitude smaller. These circulating naive T cells nonetheless exchange with those in the much larger population in the lymph and other tissues. The timescale of this exchange (relative to the age of the organism being sampled or the intersample times) will determine the effective statistically accessible N ^∗ relevant for sampling clone counts c k s . We will use an order-of-magnitude estimate on the lower range of measurements and estimate N ^∗~10¹⁰−10¹¹.

The theoretical total possible number Q of TCRs of either alpha or beta chains may be in the range 10¹³−10¹⁸ (46), but the actual number of clones with immigration rate α_i that allows it to be produced even once in a lifetime is more relevant and probably much smaller. Thus, the effective value of Q may reside at the lower range, leading to λ ≡ N ^∗/Q ~ 10^-4−10^-2.

The average (dimensional) immigration rate per clone α ¯ can be deduced from the total thymic output of all clones α ¯ Q , which has been estimated across a wide range of values α ¯ Q ∼ 10 7 − 10 8 /day (29, 47–50). If we use an effective repertoire size of Q ~ 10¹³−10¹⁴, the average per clone immigration rate becomes α ¯ ∼ 10 − 7 − 10 − 5 /day.

The mean proliferation rate r is difficult to measure but has been estimated to be on the order of r ¯ ∼ 10 − 4 − 10 − 3 /day (29). If we nondimensionalize using R = 2 r ¯ , the dimensionless α ¯ ∼ 10 − 4 − 10 − 1 .

The sampling fraction η, although in principle determined experimentally, is also hard to quantify due to the uncertainty in N ^∗. Blood sampling volume fractions from humans are typically η ~ 10^-3; however, in recent experiments (12) the number of enumerated sequences ~10⁵, which, given rough estimates of effective N ^∗ ~ 10¹⁰-10¹¹, yield η ~ 10^-6 - 10^-4. Due to this uncertainty in η, we will explore different fixed values of η around 10^-5.

Using the above guide for reasonable parameter ranges, we now consider fitting our results in Eqs. 18, S9-S14 to some of the available data (12). Before doing so, note that although the log-log plots shown in Figures 1A, B provide a simple visual for log c k s or log [ c k s / C s ] , fitting must be performed on the linear scale. The measured data includes data at values of k for which no clones were detected so that c k s = 0 . These data points nonetheless should be included in the fitting as they represent realizations of the system. However, on the log scale these zero data points translate to log c k s → − ∞ so numerical fitting on the log-log scale could be misleading once a value of c k s = 0 is encountered. Thus, we will fit our mean-field model on the linear scale to the fraction f k s of the total number of sampled cells that are in clones of size k

(19) f k s ( α ¯ , λ , w , η ) ≡ k c k s ( α ¯ , λ , w , η ) N s = k c k s ( α ¯ , λ , w , η ) ∑ ℓ = 1 ∞ ℓ c ℓ s ( α ¯ , λ , w , η ) = k c k s ( α ¯ , λ , w , η ) Q η λ

where the denominator Qηλ comes directly from the definition ∑ ℓ = 1 ∞ ℓ c ℓ s ( α ¯ , λ | η ) ≡ N s , the sampling relation N^s = ηN ^∗, and Eq. 6. Note that we have switched the dependence from μ ^∗ to λ (see Eq. 14). Rather than using N^s directly from the number of reads in an experimental sample, equivalently, we use the model expression N^s = Qηλ to arrive at the last equality in Eq. 19. This form ensures strict normalization and is independent of the unknown repertoire size Q since c k s is proportional to Q. The implicit factor of Q in c k s from Eq. 11 cancels the explicit Q in the denominator of Eq. 19 so that f k s as well as c k s / C s depend on Q only through the determination of μ ^∗ through λ ≡ N ^∗/Q in Eq. 6.

Our mathematical framework provides only mean sampled clone counts while each sample of the data represents one realization. Large sample-to-sample variations in the clone counts would render the fitting less informative, but these large variations were not seen in the triplicate samples in Oakes et al. (12). Mechanistically, we expect that for large k the number of cells contributing to f k s is also large so demographic stochasticity is relatively small and results in small uncertainties in the value of k, and not in the magnitude of f k s . Large clones are also likely to include memory T cells that have been produced after antigen stimulation of specific clones. Memory T cells are difficult to accurately distinguish from naive T cells (12) but we will see that large k components of f k s negligibly influence the fitting. We can now compare our model f k s ( α ¯ , λ , w , η ) with the data f k s (data) by constructing the error

(20) H ( α ¯ , λ , w , η ) = ∑ k = 1 ∞ | f k s ( data ) − f k s ( α ¯ , λ , w , η ) | 2

and exploring how it depends on the parameters α ¯ , λ , w , and sampling fraction η. Our goal is to find relationships among the parameters λ , α ¯ , and w that minimize H ( α ¯ , λ , w , η ) .

In Figures 6A–C the data f k s (data) were derived from the average of three samples of beta chain CD4 sequences from one patient (12). These data, were used to compute and plot the error H ( α ¯ , λ , w = 0 , η ) as a function of λ for various values of α ¯ using the neutral model (w = 0, Eq. S9 in Section 2 of the Supplementary Material ). For reasonable values of dimensionless α ¯ ≈ 10 − 5 − 0.01 and sampling fractions η = 10^-4, 10^-5, and 10^-6, we find that the value of λ that minimizes H ( α ¯ , λ , w = 0 , η ) , λ _min, is typically 𝒪(1) or larger. In Figures 6D–F we use the full-width distribution π_r (r|w = 1) to show the error for the same data using the same sampling fractions η = 10^-4, 10^-5, 10^-6. Note that the values of λ _min are significantly smaller than those in found using w = 0 in Figures 6A–C and that the results are rather insensitive to the sampling fraction η. These smaller values of λ _min are more consistent with known physiological understanding. Thus, the distributed proliferation rate model provides a much more self-consistent fit to the data than the fixed proliferation rate neutral model. Figure 6 also reveals that the values of H along the minimum valley are nearly constant, only slightly decreasing as α ¯ → 0 . For each value of α we can identify the corresponding λ _min that minimizes H. However since the values of H ( α ¯ , λ min , w = 0 , η ) for each ( α ¯ , λ min ) pair do not change appreciably, we cannot independently determine both.

Figure 6

The error H ( α ¯ , λ , w , η ) plotted as a function of α (on a log₁₀ scale) and λ. Darker colors represent smaller values of error as shown by the scale bar on the right. The data used are the clone counts of beta chain sequences of naive CD4 cells from one patient, averaged over three samples. Panels (A–C) use the simple neutral model (Eqs. S9 and S10) and sampling fractions η = 10^-4, 10^-5, and 10^-6, respectively. Since α ¯ is on a log scale, the error is minimal along a line λ min ∝ α ¯ ; the error does not change appreciably along this path and only slightly decreases as λ and α ¯ become smaller. For the neutral model (w = 0), the error is very sensitive to the sampling fraction η. Here, a fixed, physiologically reasonable value of α ¯ results in a minimizing λ _min that is unreasonably large, in excess of one and that does not agree well with our expectations of λ = N ∗ / Q ≪ 1 . Panels (D–F) show results for the distributed proliferation rate model at full width (w = 1). In this case, the errors are insensitive to the specific choice of η and the minimizing λ _min values are much smaller, consistent with our estimates of N ^∗ and repertoire size. For w = 1, the values of the errors H are also smaller along the λ m i n − α ¯ minimum valley.

An alternate representation is shown in Figure 7 where the relationship between α ¯ and λ _min is seen to be approximately linear for both the neutral model (w = 0) and the heterogeneous, full-width model (w = 1). The color shading represents the corresponding value of H ( α ¯ , λ min , w , η ) . One major observation is that the full-width case yields values of ( α ¯ , λ min ) that are closer to measured and expected physiological values and that these results are also less sensitive to η compared to those of the neutral case. On the other hand, although the variation in H is negligible across α ¯ in both cases, the fully heterogeneous model (w = 1) carries a slightly larger error than the neutral one (w = 0). This is solely a consequence of our use of f k s which weights the small k values significantly more in the fitting.

Figure 7

Log-log plots of λ _min values as functions of α for η = 10^-4, 10^-5, 10^-5, and 10^-6 for (A) the neutral model, w = 0, and (B) the full-width distributed proliferation rate model, w = 1. These curves trace the values of λ _min along the minimum valley in and show the relative insensitivity of the distributed proliferation rate model to the subsampling fraction η. In both (A, B), the minimum line slopes are near one, with (B) showing a slightly greater slope, indicating λ _min is approximately proportional to α ¯ over the entire range of w. The color intensity along the lines in (A, B) indicates variation in the total error along the minimum valley; their uniformity shows that the errors are nearly constant along each line. (C) Log-linear plot of λ _min as a function of proliferation rate heterogeneity w for α ¯ = 2 × 1 0 - 5 , 1 0 - 4 . The lower darker curves in each pair correspond to η = 10^-4 while the lighter curves correspond to η = 10^-6. The curves show that even a small heterogeneity w quickly reduces λ _min to below one; however, if λ is forced to be even smaller, the required heterogeneity w increases.

Since experimentally we expect small λ, we also investigate whether small errors H emerge for values of ( α ¯ , λ min ≪ 1 ) at intermediate 0 < w < 1. In Figure 7C , we plot λ _min as a function of w for various values of α ¯ . Note that even small w significantly reduces, relative to the neutral case, the corresponding λ _min. However, if our target is λ _min ∼ 10^-4-10^-3, the required w can become quite large. These results indicate that more heterogeneity is associated with more realistic values of the experimentally observed values of N ^∗/Q.

Finally, to explore the dependence of the error on the proliferation rate heterogeneity w, we fix α ¯ , λ , and η, and plot H ( α ¯ , λ , w , η ) as a function of w. Figure 8 shows that the H-minimizing w is very sensitive to λ / α ¯ : for fixed η, as λ / α ¯ is decreased the error is lowest for larger proliferation heterogeneity w. The minimum value of H ( α ¯ , λ , w , η ) , however, is rather insensitive to λ / α ¯ for all chosen η. Hence, near-optimal solutions with λ ≪ 1 can be found when the proliferation rate heterogeneity w is appreciable. Using the parameters associated with the minima in Figure 8A (η = 10^-4), we plot our predicted f k s against the data f k s (data) in Figure 9 . As can be seen, when proliferation rate heterogeneity is allowed, the best-fits have small error and are found using realistic values, λ ≪ 1 . Note that most of the information in the data lies in how f k s (data) decreases over the first few values of k. The neutral model (w = 0) fits best for small values of k, but the corresponding values of λ and α ¯ are too large and small, respectively. The goodness of fit of our model to the data depends mostly on the predicted initial decreases in f k s ( α ¯ , λ , w , η ) . The constraints among the parameters λ , α ¯ , w , and η derived from our model and can be applied to different clone counts such as the data shown in Figure 1 . However, due to the ill-conditioning when η ≪ 1, the differences in these constraints across different data sets do not vary appreciably are only quantitatively different. Generally, the more rapidly decaying a clone count, the smaller the w, the smaller the η, of the larger the λ, all else being equal.

Figure 8

The error H ( α ¯ , λ , w , η ) using CD4 alpha data from Oakes et al. (12) plotted as a function of w for various λ / α ¯ . We fixed λ = 10^-3 and varied, from left to right, α ¯ = 2 × 10 − 5 (red), 6 × 10^-5 (green), 10^-4 (blue) and 1.4 × 10^-4 (black). From (A–C), η = 10^-4, 10^-5, and 10^-6. Smaller values of λ / α ¯ result in larger best-fit values of w.

Figure 9

Plots of the representative optimal solutions of clone counts f k s from Eq. 19 (using η = 10^-4 and λ = 10^-3 unless otherwise indicated) plotted along side the shown data from Oakes et al. (12). The model predictions and CD4 beta chain data are shown in both (A) log-log and (B) linear scales (there are no zero-values clone counts in this dataset). In (A), the best fit model for the neutral model (w = 0 and π α ( α | α ¯ ) = δ ( α − α ¯ ) ) using α ¯ = 1 0 - 4 is given by λ ≈ 3 shown by the solid black curve. The dashed curves represents best-fit curves using the values associated with the error minima in, where α ¯ = 2 × 1 0 - 5 , w ≈ 0.09 (red), 6 × 10^-5, w ≈ 0.3 (green), 10^-4, w ≈ 0.53 (blue) and 1.4 × 10^-4, w ≈ 0.76 (black). Note that the neutral model fits well for only the first 2-3 k-points, while the heterogeneous model (w > 0) fits better at larger k.

Discussion

Here, we review and justify a number of critical biological assumptions and mathematical approximations used in our analysis. The effects of relaxing our approximations are also discussed.

Distinct T Cell Components

It is known that naive T cells can change in time, with recent thymic emigrants evolving into mature naive T cells that carry different proliferation and death rates (51). For simplicity, we have assumed a single naive T cell compartment. To incorporate naive T cell evolution, we can allow the distribution π_r (r) to evolve in time to reflect the relative abundances of T cell subpopulations, or, one can explicitly include multiple compartments, with cells from a recent emigrant compartment transitioning into a mature compartment. Each compartment would be described by its own steady-state death rates, clone counts, and distributions of proliferation rates. An analysis of a related sequential cell state transition model has been developed for clonal tracking in hematopoiesis (41).

Factorization of π (α, r)

For mathematical tractability, we have assumed π ( α , r | θ 0 ) = π α ( α | α ¯ ) π r ( r | w ) . Given the typical physiological values of α ¯ , the clone count formulae derived from our model can be accurately approximated by a single value of α ¯ . Thus, we expect that the immigration rate distribution can be approximated by π α ( α | α ¯ ) = δ ( α − α ¯ ) . This allows further approximation of our formulae as shown in Section 3 of the Supplementary Material . In Section 4 of the Supplementary Material , we explicitly show that factorisation is an accurate approximation.

We have also assumed that selection is uncorrelated with the generation probabilities of the TCR nucleotide sequences encoded in IGoR/OLGA. The assumption is that the recombination statistics are uncorrelated with the statistics of thymic selection, a process that is based on TCR amino acid sequences. However, we note that it has been suggested that selection pressure may induce a correlation between TCRs generated and selected (52). The corresponding statistics of the frequencies of selected TCRs would be modified from those of the generated TCRs shown in Figure 5 . Nonetheless, we assume that the resulting distribution can still be approximated by a single-α model which will not qualitatively alter our conclusions.

Mean-Field Approximation

Our mean-field approximation for the mean clone count c_k is embodied in Eq. 7, where correlations between fluctuations in the total population N = ∑ k k c k in the regulation term μ(N) and the explicit c_k terms are neglected. This approximation has been shown to be accurate for k ≲ N ^∗ when α ¯ Q 2 > μ ( N ∗ ) (39). The mean-field results overestimate the clone counts for k ≲ N ^∗. Moreover, when the total steady-state T cell immigration rate is extremely small, the effects of competitive exclusion dominate and a single large clone arises (39, 53, 54). Nonetheless, an accurate approximation for the steady-state clone abundance c_k can be obtained using a variation of the two-species Moran model as shown in (39). For the naive T cell system, because Q is so large, the mean immigration rate α ¯ is such that competitive exclusion is not a dominant feature. Moreover, since N ^∗ ≳ 10¹⁰, clones counts at comparable sizes are not observed and predicted to be negligible in all models. Since the values of f k s (data) become exponentially smaller for large k, our inference is most sensitive to the values of f k s (data) for small to modest k. The information in the data is primarily manifested by how the f k s (data) decays in k, we before the mean-field approximation deviates from the exact solution. Thus, the parameters associated with the human adaptive immune system satisfy the conditions for the mean-field approximation to be accurate, justifying its use in the BDI model.

Steady State Assumption

In this study, we only considered the steady state of our birth-death-immigration model in Eq. 8 because this limit allowed relatively easy derivations of analytical results. This was also the strategy for previous modeling work (4, 6, 7, 38, 39). However, the per-clone immigration and proliferation times may be on the order of months or years, a time scale over which thymic output diminishes as an individual ages (29, 55–57). Indeed, clone abundance distributions have been shown to show specific patterns as a function of age (58–60). Although N(t), with fixed α ¯ and r ¯ relaxes to steady-state quickly, on a timescale of months, the different subpopulations of specific sizes described by their number c_k relax to quasi-steady-state across a spectrum of time scales depending on the clone sizes k (39, 61). The timescales of relaxation of the largest clones can be estimated from the eigenvalues of the linearized system (Eqs. 7) and are found to be ~ 10 years. Thymic involution could be modeled by using a time-dependent α(t) that slowly decreases with age (57). Although T cells are thought to be primarily maintained through proliferation, thymic regeneration has also been shown to affect the naive T cell pool many years after thymectomy in infants. Here, a time dependent increase in α(t) after early thymectomy could be used. Indeed, the clone counts may be determined in early life (17) suggesting the dynamics of certain clones may be very slow, precluding a strict steady-state analysis for the entire repertoire.

In addition to time-dependent changes in α, more subtle time-inhomogeneities such as changes in proliferation and death rates have been demonstrated (55, 56, 62). Thus, our steady-state assumption could be relaxed by incorporation of time-dependent perturbations to the model parameters μ ^∗ and/or π(α, r). Longitudinal measurements of clone abundances or experiments involving time-dependent perturbations would provide significant insight into the overall dynamics of clone abundances. The timescales required to reach steady state fall between 1/( α ¯ Q ) and 1 / α ¯ . Thus, it is possible that some components of c_k does not reach steady state in an organism’s lifetime and our steady state model might not be be valid for all values of c_k (57, 61) and a dynamic approach must be taken.

Clustered Immigration

Our mean field model assumed that each immigration event introduced a single naive T cell in the immune system. However, T cells can divide before leaving the thymus and reach a homeostatic state in the periphery. This process can be described by the simultaneous immigration of more than one naive T cell with the same TCR. Clustered immigration of q cells can be implemented in the core model for c_k (Eq. 7) via an immigration term of the form α_q (c_k-q (α_q , r)-c_k (α_q , r)), where c_k-q = 0 for k-q < 0 (see Section 5 of the Supplementary Material ). For q > 1, an informative analytic expression for c_k is not available. In Figure S2 of the Section 5 of the Supplementary Material , we show the predicted clone abundance c_k for a neutral model in which q = 5. When compared to the case where there is only one cell per immigration, the clone abundance c_k will have a larger slope for k ⪅ q, making it kink more downward near k ≈ q. Thus, from Figures S2 and 9A , we can see that paired immigration (q = 2) would increase f k s for k = 2, providing an improved fitting to data over single copy immigration (q = 1).

Thus, in addition to appreciable sensitivity of the predicted clone counts to π _r (r|w), we also expect clustered immigration defined through the immigration rates α _q , q > 1 to control the goodness of fit to data. Indeed, Figure S2 suggests that the distribution of immigration cluster sizes q, in addition to the proliferation rate heterogeneity w, is an important determinant of measured clone counts and that α_q may be constrained by data. We leave this for future investigation.

General Conclusions

We developed a heterogeneous multispecies birth-death-immigration model and analyzed it in the context of T cell clonal heterogeneity; the clone abundance distribution is derived in the mean-field limit. Unlike previous studies (4), our modeling approach incorporated sampling statistics and provided simple formulae, allowing us to predict clone abundances under different rate distributions for arbitrarily large systems (N ^∗ ∼ 10¹⁰ - 10¹¹), without the need for simulation. The properties of the BDI model and the overall shape of the sampled clone count data renders the first few k-values of c k s or f k s the most important for determining the constraints among the model parameters. In other words, only the initial rate of the decrease in f k s (data) for small k governs the quality of fitting to the model, and one should not expect to be able to explicitly infer more than one or two free parameters.

Our heterogeneous BDI model produced mean sampled clone count distributions that we could directly compare with measured clone counts. The unsampled clone counts c_k of the neutral model (homogeneous α and r) follow a negative binomial distribution which is further modified upon sampling and distribution over the heterogeneous immigration and proliferation rates. Although we determined π α ( α | α ¯ ) through a code that implemented recombination statistics inferred from cDNA and gDNA sequences (20, 28), we found that the behavior of the model is rather insensitive to distributions π α ( α | α ¯ ) with mean values α ¯ much smaller than the largest proliferation rates r. The model results are dominated by many low immigration-rate clones and a model that replaces α with its mean value α ¯ is sufficient.

Conversely, we find that the shape of the clone count profiles c_k are quite sensitive to the proliferation rate heterogeneity w. A small amount of heterogeneity quickly reduces the best-fit values of λ to reasonable values. For estimated values η ~ 10^-6 – 10^-4, α ¯ ∼ 10 − 4 , and small values of λ = N ^∗/Q ≲ 10^-3, requires a best-fit width w ≈ 1. Heterogeneity is needed to generate clones of sufficiently large size that persist after sampling. Although the number of TCR clones with large proliferation rates r may be small, such clones proliferate more rapidly contributing to higher clone counts at larger sizes. In particular, we found that the shape of expected clone abundance is sensitive to the behavior of the proliferation rate distribution near the maximum dimensional proliferation rate R, π_r (r ≈ R). The predicted clone counts are also modestly sensitive to the distribution of immigration cluster sizes q (representing transient proliferation just before thymic output). When q > 1 cells of a clone are simultaneously exported by the thymus, the predicted mean clone counts decay much more slowly for small k ≲ q (see Figure S2 ). This modification will allow for better fitting since clustered immigration increases the predicted clone counts for larger k , c 2 s , c 3 s , etc., and eventually f 2 s , f 3 s , etc. Thus, we expect that a model containing multiple clustered immigration rates α_q≥1 will lower the error and provide better fitting, particularly at larger w. Additional analysis using a distribution of immigration cluster sizes may allow this type of clone count data to reveal more information about the physiological mechanism of naive T cell maintenance.

Even assuming modest heterogeneity, our work leads to the conclusion that the typical immigration heterogeneity is not enough to influence measured clone counts and that varying levels of proliferation heterogeneity is needed to shape c k s (and f k s ) (12). These results are consistent with the finding that naive T cells in humans are maintained by proliferation rather than thymic output (9). Since we have only investigated the effects of a uniform distribution for π_r (r|w), further studies using more complex shapes of π(α, r|θ ₀) can be easily explored numerically using our modeling framework. Different parameter values and rate distributions appropriate for mice, in which naive T cells are maintained by thymic output, should also be explored within our modeling framework. Finally, it will be important to extend our steady-state model to allow α(t), π_r (r,t), and μ ^∗(t) to be functions of time in order to predict clone abundances in the presence of thymic involution and reduced proliferation with age (62, 63), which can even arise differentially in different compartments (64).

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.frontiersin.org/articles/10.3389/fimmu.2017.01267/full.

Author Contributions

RD, TC, and MRD developed and analyzed the model and wrote the manuscript. YP organized published data, and DM assisted in sorting and organizing generated data. TC, YP, and MX performed numerical analyses and data fitting. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by grants from the NIH through grant R01HL146552 (TC), the Army Research Office through grant W911NF-18-1-0345 (MRD), the NSF through grants DMS-1814364 (TC) and DMS-1814090 (MRD). The authors also thank the Collaboratory in Institute for Quantitative and Computational Biosciences at UCLA for support to RD.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2021.735135/full#supplementary-material

References 1 Qi

Liu

Cheng

Glanville

Zhang

Lee

. Diversity and Clonal Selection in the Human T-Cell Repertoire. Proc Natl Acad Sci (USA) (2014) 111:13139–44. doi: 10.1073/pnas.1409155111 2 van den Broek

Borghans

JAM

van Wijk

. The Full Spectrum of Human Naive T Cells. Nat Rev Immunol (2018) 18:363–73. doi: 10.1038/s41577-018-0001-y 3 Laydon

Bangham

CRM

Asquith

. Estimating T-Cell Repertoire Diversity: Limitations of Classical Estimators and a New Approach. Phil Trans R Soc B (2015) 370:20140291. doi: 10.1098/rstb.2014.0291 4 Desponds

Mora

Walczak

. Fluctuating Fitness Shapes the Clone-Size Distribution of Immune Repertoires. Proc Natl Acad Sci USA (2016) 113:274–9. doi: 10.1073/pnas.1512977112 5 Desponds

Mayer

Mora

Walczak

. Population Dynamics of Immune Repertoires. In: Molina-París

Lythe

, editors. Mathematical, Computational and Experimental T Cell Immunology. Cham, Switzerland: Springer (2021). p. 67–79. 6 Lythe

Callard

Hoare

Molina-París

. How Many TCR Clonotypes Does a Body Maintain? J Theor Biol (2016) 389:214–24. doi: 10.1016/j.jtbi.2015.10.016 7 de Greef

Oakes

Gerritsen

Ismail

Heather

Hermsen

. The Naive T-Cell Receptor Repertoire has an Extremely Broad Distribution of Clone Sizes. eLife (2020) 9:e49900. doi: 10.7554/eLife.49900 8 Koch

Starenki

Cooper

Myers

. powerTCR: A Model-Based Approach to Comparative Analysis of the Clone Size Distribution of the T Cell Receptor Repertoire. PloS Comput Biol (2018) 14:e1006571. doi: 10.1371/journal.pcbi.1006571 9 den Braber

Mugwagwa

Vrisekoop

Westera

Mögling

Bregje de Boer

. Maintenance of Peripheral Naive T Cells Is Sustained by Thymus Output in Mice But Not Humans. Immunity (2012) 36:288–97. doi: 10.1016/j.immuni.2012.02.006 10 Dessalles

D’Orsogna

Chou

. Exact Steady-State Distributions of Multispecies Birth-Death-Immigration Processes: Effects of Mutations and Carrying Capacity on Diversity. J Stat Phys (2018) 173:182–221. doi: 10.1007/s10955-018-2128-4 11 Mora

Walczak

Bialek

Callan

. Maximum Entropy Models for Antibody Diversity. Proc Natl Acad Sci USA (2010) 107:5405–10. doi: 10.1073/pnas.1001705107 12 Oakes

Heather

Best

Byng-Maddick

Husovsky

Ismail

. Quantitative Characterization of the T Cell Receptor Repertoire of Naïve and Memory Subsets Using an Integrated Experimental and Computational Pipeline Which Is Robust, Economical, and Versatile. Front Immunol (2017) 8:1267. doi: 10.3389/fimmu.2017.01267 13 Aguilera-Sandoval

OYang

Jojic

Lovato

Chen

Boechat

. Supranormal Thymic Output Up to Two Decades After HIV-1 Infection. AIDS (Lond Engl) (2016) 30:701–11. doi: 10.1097/QAD.0000000000001010 14 Gerritsen

Pandit

Andeweg

de Boer

. RTCR: A Pipeline for Complete and Accurate Recovery of T Cell Repertoires From High Throughput Sequencing Data. Bioinformatics (2016) 32:3098–106. doi: 10.1093/bioinformatics/btw339 15 Naumov

Naumova

Hogan

Selin

Gorski

. A Fractal Clonotype Distribution in the CD8+ Memory T Cell Repertoire Could Optimize Potential for Immune Responses. J Immunol (2003) 170:3994–4001. doi: 10.4049/jimmunol.170.8.3994 16 Meier

Roberts

Avent

Archer

Manjili

Toor

. Fractal Organization of the Human T Cell Repertoire in Health and After Stem Cell Transplantation. Biol Blood Marrow Transplant (2013) 19:366–77. doi: 10.1016/j.bbmt.2012.12.004 17 Gaimann

Nguyen

Desponds

Mayer

. Early Life Imprints the Hierarchy of T Cell Clone Sizes. eLife (2020) 9:e61639. doi: 10.7554/eLife.61639 18 Burgos

Moreno-Tovar

. Zipf-Scaling Behavior in the Immune System. Biosystems (1996) 39:227–32. doi: 10.1016/0303-2647(96)01618-8 19 Weinstein

Jiang

White

Fisher

Quake

. High-Throughput Sequencing of the Zebrafish Antibody Repertoire. Science (2009) 324:807–10. doi: 10.1126/science.1170020 20 Marcou

Mora

Walczak

. High-Throughput Immune Repertoire Analysis With IGoR. Nat Commun (2018) 9:561. doi: 10.1038/s41467-018-02832-w 21 Tan

Dudl

LeRoy

Murray

Sprent

Weinberg

. IL-7 Is Critical for Homeostatic Proliferation and Survival of Naïve T Cells. Proc Natl Acad Sci USA (2001) 98:8732–7. doi: 10.1073/pnas.161126098 22 Schluns

Kieper

Jameson

Lefrançois

. Interleukin-7 Mediates the Homeostasis of Naïve and Memory CD8 T Cells In Vivo . Nat Immunol (2000) 1:426–32. doi: 10.1038/80868 23 Ciupe

Devlin

Markert

Kepler

. The Dynamics of T-Cell Receptor Repertoire Diversity Following Thymus Transplantation for DiGeorge Anomaly. PloS Comput Biol (2009) 5:e1000396. doi: 10.1371/journal.pcbi.1000396 24 Reynolds

Coles

Lythe

Molina-París

. Mathematical Model of Naive T Cell Division and Survival IL-7 Thresholds. Front Immunol (2013) 4:434. doi: 10.3389/fimmu.2013.00434 25 Silva

Sousa

. Establishment and Maintenance of the Human Naïve Cd4+ T-Cell Compartment. Front Pediatr (2016) 4:119. doi: 10.3389/fped.2016.00119 26 Surh

Sprent

. Homeostasis of Naive and Memory T Cells. Immunity (2008) 29:848–62. doi: 10.1016/j.immuni.2008.11.002 27 Farber

Yudanin

Restifo

. Human Memory T Cells: Generation, Compartmentalization and Homeostasis. Nat Rev Immunol (2014) 14:24–35. doi: 10.1038/nri3567 28 Sethna

Elhanati

Callan

Walczak

Mora

. OLGA: Fast Computation of Generation Probabilities of B- and T-Cell Receptor Amino Acid Sequences and Motifs. Bioinformatics (2019) 35:2974–81. doi: 10.1093/bioinformatics/btz035 29 Westera

van Hoeven

Drylewicz

Spierenburg

van Velzen

de Boer

. Lymphocyte Maintenance During Healthy Aging Requires No Substantial Alterations in Cellular Turnover. Aging Cell (2015) 14:219–27. doi: 10.1111/acel.12311 30 Robins

Campregher

Srivastava

Wacher

Turtle

Kahsai

. Comprehensive Assessment of T-Cell Receptor β.-Chain Diversity in αβ T Cells. Blood (2009) 114:4099–107. doi: 10.1182/blood-2009-04-217604 31 Mayer

Zhang

Perelson

Wingreen

. Regulation of T Cell Expansion by Antigen Presentation Dynamics. Proc Natl Acad Sci USA (2019) 116:5914–9. doi: 10.1073/pnas.1812800116 32 Arstila

Casrouge

Baron

Even

Kanellopoulos

Kourilsky

. A Direct Estimate of the Human αβ T Cell Receptor Diversity. Science (1999) 286:958–61. doi: 10.1126/science.286.5441.958 33 Warren

Freeman

Zeng

Choe

Munro

Moore

. Exhaustive T-Cell Repertoire Sequencing of Human Peripheral Blood Samples Reveals Signatures of Antigen Selection and a Directly Measured Repertoire Size of at Least 1 Million Clonotypes. Genome Res (2011) 21:790–7. doi: 10.1101/gr.115428.110 34 Zarnitsyna

Evavold

Schoettle

Blattman

Antia

. Estimating the Diversity, Completeness, and Cross-Reactivity of the T Cell Repertoire. Front Immunol (2013) 4:485. doi: 10.3389/fimmu.2013.00485 35 Jenkins

Chu

McLachlan

Moon

. On the Composition of the Preimmune Repertoire of T Cells Specific for Peptide–Major Histocompatibility Complex Ligands. Annu Rev Immunol (2010) 28:275–94. doi: 10.1146/annurev-immunol-030409-101253 36 Mora

Walczak

. How Many Different Clonotypes do Immune Repertoires Contain? Curr Opin Syst Biol (2019) 18:104–10. doi: 10.1016/j.coisb.2019.10.001 37 Murphy

Weaver

. Janeway’s Immunobiology. New York NY: Garland Science/Taylor & Francis (2016). 38 Goyal

Kim

Chen

ISY

Chou

. Mechanisms of Blood Homeostasis: Lineage Tracking and a Neutral Model of Cell Populations in Rhesus Macaques. BMC Biol (2015) 13:85. doi: 10.1186/s12915-015-0191-8 39 Xu

Chou

. Immigration-Induced Phase Transition in a Regulated Multispecies Birth-Death Process. J Phys A: Math Theor (2018) 51:425602. doi: 10.1088/1751-8121/aadcb4 40 Wiuf

Stumpf

MPH

. Binomial Subsampling. Proc R Soc A (2006) 462:1181–95. doi: 10.1098/rspa.2005.1622 41 Xu

Kim

Chen

ISY

Chou

. Modeling Large Fluctuations of Thousands of Clones During Hematopoiesis: The Role of Stem Cell Self-Renewal and Bursty Progenitor Dynamics in Rhesus Macaque. PloS Comput Biol (2018) 14:e1006489. doi: 10.1371/journal.pcbi.1006489 42 Levina

Priesemann

. Subsampling Scaling. Nat Commun (2017) 8:15140. doi: 10.1038/ncomms15140 43 Ferrarini

Molina-París

Lythe

. Sampling From T Cell Receptor Repertoires. In: Graw

Franziska Matthäus

, editors. Modeling Cellular Systems. Cham, Switzerland: Springer International Publishing (2017). p. 67–79. 44 Lythe

Molina-París

. Some Deterministic and Stochastic Mathematical Models of Naive T-Cell Homeostasis. Immunol Rev (2018) 285:206–17. doi: 10.1111/imr.12696 45 Min

Foucras

Meier-Schellersheim

Paul

. Spontaneous Proliferation, a Response of Naïve CD4 T Cells Determined by the Diversity of the Memory Cell Repertoire. Proc Natl Acad Sci USA (2004) 101:3874–9. doi: 10.1073/pnas.0400606101 46 Davis

Bjorkman

. T-Cell Antigen Receptor Genes and T-Cell Recognition. Nature (1988) 334:395. doi: 10.1038/334395a0 47 Clark

de Boer

Wolthers

Miedema

. T Cell Dynamics in HIV-1 Infection. Adv Immunol (1999) 73:301–27. doi: 10.1016/S0065-2776(08)60789-0 48 Ye

Kirschner

. Reevaluation of T Cell Receptor Excision Circles as a Measureof Human Recent Thymic Emigrants. J Immunol (2002) 168:4968–79. doi: 10.4049/jimmunol.168.10.4968 49 Hazenberg

Borghans

JAM

de Boer

Miedema

. Thymic Output: A Bad TREC Record. Nat Immunol (2003) 4:97–9. doi: 10.1038/ni0203-97 50 Bains

Thiébaut

Yates

Callard

. Quantifying Thymic Export: Combining Models of Naive T Cell Proliferation and TCR Excision Circle Dynamics Gives an Explicit Measure of Thymic Output. J Immunol (2009) 183:4329–36. doi: 10.4049/jimmunol.0900743 51 Cunningham

Helm

Fink

. Reinterpreting Recent Thymic Emigrant Function: Defective or Adaptive? Curr Opin Immunol (2018) 51:1–6. doi: 10.1016/j.coi.2017.12.006 52 Elhanati

Murugan

Callan

Jr Mora

Walczak

. Quantifying Selection in Immune Receptor Repertoires. Proc Natl Acad Sci USA (2014) 111:9875–80. doi: 10.1073/pnas.1409572111 53 Hardin

. The Competitive Exclusion Principle. Science (1960) 131:1292–7. doi: 10.1126/science.131.3409.1292 54 Hutchinson

. The Paradox of the Plankton. Am Nat (1961) 95:137–45. doi: 10.1086/282171 55 Hogan

Gossel

Yates

Seddon

. Temporal Fate Mapping Reveals Age-Structured Heterogeneity in Naive CD4 and CD8 T Lymphocyte Populations in Mice. Proc Natl Acad Sci USA (2015) 112:E6917–26. doi: 10.1073/pnas.1517246112 56 Rane

Hogan

Seddon

Yates

. Age Is Not Just a Number: Naive T Cells Increase Their Ability to Persist in the Circulation Over Time. PloS Comput Biol (2018) 16:e2003949. doi: 10.1371/journal.pbio.2003949 57 Lewkiewicz

Chuang

Chou

. A Mathematical Model of the Effects of Aging on Naive T Cell Populations and Diversity. Bull Math Biol (2019) 81:2783–817. doi: 10.1007/s11538-019-00630-z 58 Johnson

Yates

Goronzy

Antia

. Peripheral Selection Rather Than Thymic Involution Explains Sudden Contraction in Naive CD4 T-Cell Diversity With Age. Proc Natl Acad Sci USA (2012) 109:21432–7. doi: 10.1073/pnas.1209283110 59 Britanova

Shugay

Merzlyak

Staroverov

Putintseva

Turchaninova

. Dynamics of Individual T Cell Repertoires: From Cord Blood to Centenarians. J Immunol (2016) 196:5005–13. doi: 10.4049/jimmunol.1600005 60 Egorov

Kasatskaya

Zubov

Izraelson

Nakonechnaya

Staroverov

. The Changing Landscape of Naive T Cell Receptor Repertoire With Human Aging. Front Immunol (2018) 9:1618. doi: 10.3389/fimmu.2018.01618 61 Lewkiewicz

Chuang

Chou

. Dynamics of T Cell Receptor Distributions Following Acute Thymic Atrophy and Resumption. Math Biosci Eng (2020) 17:28–55. doi: 10.3934/mbe.2020002 62 Mold

Réu

Olin

Bernard

Michaëlsson

Rane

. Cell Generation Dynamics Underlying Naive T-Cell Homeostasis in Adult Humans. PloS Biol (2019) 17:e3000383. doi: 10.1371/journal.pbio.3000383 63 van den Broek

Delemarre

Janssen

Nievelstein

Broen

Tesselaar

. Neonatal Thymectomy Reveals Differentiation and Plasticity Within Human Naive T Cells. J Clin Invest (2016) 126:1126–36. doi: 10.1172/JCI84997 64 Thome

JJC

Grinshpun

Kumar

Kubota

Ohmura

Lerner

. Long-Term Maintenance of Human Naive T Cells Through In Situ Homeostasis in Lymphoid Tissue Sites. Sci Immunol (2016) 1:eaah6506. doi: 10.1126/sciimmunol.aah6506