Introduction

Front. Syst. Biol.

Frontiers in Systems Biology

Front. Syst. Biol.

2674-0702

Frontiers Media S.A.

893366

10.3389/fsysb.2023.893366

Systems Biology

Original Research

Clonal abundance patterns in hematopoiesis: Mathematical modeling and parameter estimation

Pan et al.

10.3389/fsysb.2023.893366

Pan

Yunbei

¹ ² D’Orsogna

Maria R.

¹ ² Tang

Min

³ * Stiehl

Thomas

⁴ Chou

Tom

¹ ⁵ *

¹ Department of Computational Medicine, UCLA, Los Angeles, CA, United States ² Department of Mathematics, California State University at Northridge, Los Angeles, CA, United States ³ Institute of Natural Sciences, Shanghai Jiaotong University, Shanghai, China ⁴ Institute of Computational Biomedicine, RWTH Aachen University, Aachen, Germany ⁵ Department of Mathematics, UCLA, Los Angeles, CA, United States

Edited by: Jayajit Das, The Ohio State University, United States

Reviewed by: Steven M. Abel, The University of Tennessee, United States

Wenjie Sun, Institute Curie, France

*Correspondence: Maria R. D’Orsogna, dorsogna@csun.edu; Min Tang, tangmin@sjtu.edu.cn; Tom Chou, tomchou@ucla.edu

This article was submitted to Integrative Systems Immunology, a section of the journal Frontiers in Systems Biology

09 02 2023

2023

893366

10 03 2022 09 01 2023

2023

Pan, D’Orsogna, Tang, Stiehl and Chou

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Hematopoiesis has been studied via stem cell labeling using barcodes, viral integration sites (VISs), or in situ methods. Subsequent proliferation and differentiation preserve the tag identity, thus defining a clone of mature cells across multiple cell type or lineages. By tracking the population of clones, measured within samples taken at discrete time points, we infer physiological parameters associated with a hybrid stochastic-deterministic mathematical model of hematopoiesis. We analyze clone population data from Koelle et al. (Koelle et al., 2017) and compare the states of clones (mean and variance of their abundances) and the state-space density of clones with the corresponding quantities predicted from our model. Comparing our model to the tagged granulocyte populations, we find parameters (stem cell carrying capacity, stem cell differentiation rates, and the proliferative potential of progenitor cells, and sample sizes) that provide reasonable fits in three out of four animals. Even though some observed features cannot be quantitatively reproduced by our model, our analyses provides insight into how model parameters influence the underlying mechanisms in hematopoiesis. We discuss additional mechanisms not incorporated in our model.

stem cells hematopoiesis barcodes clonal tracking differentiation

R01HL146552

W911NF-18-1-0345

DMS-1814090

Office of Extramural Research, National Institutes of Health10.13039/100006955

Army Research Office10.13039/100000183

National Science Foundation10.13039/100000001

Introduction

Hematopoiesis, the process by which hematopoietic stem cells (HSCs) generate all mature blood cells in an animal through proliferation and differentiation plays a crucial role in an organism’s immune response and maintaining overall homeostasis. Estimates of the number of actively cycling HSC range from 50000–200000 in humans (Lee-Six et al., 2018) and approximately 5,000 in mice (Busch et al., 2015; Mayle et al., 2015). It is well-known that these small numbers of hematopoietic stem cells can generate 10¹⁰–10¹² cells of multiple cell types daily, over an organism’s lifetime (Fliedner, 2002; Doulatov et al., 2012). Understanding the mechanisms of hematopoiesis can help guide clinical treatment, especially those related to bone marrow transplantation and in the context of blood cancers (Mendelson and Frenette, 2014; Busch et al., 2015; Goyal et al., 2015).

HSCs are often quiescent (Seita and Weissman, 2010), making them hard to track in vivo and difficult to control in vitro. Thus, the HSC dynamics in vivo can only be straightforwardly interrogated through analysis of populations of more downstream progenitors and differentiated blood cells (Bystrykh et al., 2012). One way to quantitatively probe the hematopoiesis process is the labeling of multipotent HSCs by tagging their genomes. The tags can take the form of viral integration sites or barcodes (Grosselin et al., 2013; Kim et al., 2014; Wu et al., 2014; Biasco et al., 2016; Koelle et al., 2017). In a typical in vivo experiment, CD34⁺ stem cells are extracted, tagged, and then autologously transplanted back into the animal, typically a mouse or a rhesus macaque. CD34⁺ cells contain HSCs as well as early hematopoietic stem and progenitor cells (HSPCs), with a wide range of estimated relative proportions (Corso et al., 2005; Attar, 2014; Parmentier et al., 2020). The downstream progenitor and mature cells that derive from proliferation of HSCs of a particular tag will form a clone of cells that share the same tag. Clonal tracking of cell tags is thus a powerful tool for interrogating the differentiation process during hematopoiesis (Lyne et al., 2018; Challen and Goodell, 2020; Cordes et al., 2021). For example, the abundances of the different tags that appear in the different types of mature cells can shed light on the branching structure of differentiation and on proliferation dynamics, particularly when coupled with mathematical models and/or simulations (Stiehl and Marciniak-Czochra, 2011; Sun and Komarova, 2012; Székely et al., 2014; Höfer and Rodewald, 2016; Xu J. et al., 2018).

Clonal tracking in mice (Copley et al., 2012; Sun et al., 2014) has revealed the timescales of repopulation dynamics under homeostasis and after bone marrow transplantation (Muller-Sieburg et al., 2012; Verovskaya et al., 2013; Busch et al., 2015), but typically involves very few clones that cover only a small fraction of the HSC population. To transplant many HSC clones in order to see patterns of how clones are distributed during hematopoiesis requires experiments on animals larger than mice.

Transplant experiments in rhesus macaque on the other hand allow for hundreds or thousands of clones to be engrafted into an organism that exhibits population levels and timescales closer to those in humans. One experiment in rhesus macaque involved tracking HSC clonal dynamics of lentivirus-tagged HSCs and early progenitor cells (HSPCs), and following hematopoiesis over a time period comparable to the animal’s life-span (Kim et al., 2010, 2014). Here, CD34⁺ HSPCs from the bone marrow, which include various progenitor cells, were marked via the integration of a lentivirus vector with an accompanying green fluorescent protein (GFP) tag at random viral integration sites (VISs). After sublethal myeloablative irradiation to eliminate a substantial number of cells in the bone marrow, the tagged HSPCs were autologously transplanted. If these cells divide and differentiate after transplantation, their progeny will inherit the unique VISs. Sampling and sequencing of these mature cells indicates which ones are descendants of a founder HSC. Data collected from four macaques over 14 years were analyzed showing how bias towards the lymphoid or myeloid differentiation branches changes over time. More detailed analyses were also performed in order to connect clonal patterns during hematopoiesis with a mathematical model that describes how self-renewal, differentiation, and subsampling of a multiclone population affects clone abundances and their fluctuations across time (Goyal et al., 2015; Xu S. et al., 2018). By fitting a simple mechanistic model to abundances of hundreds to thousands of clones, random initial differentiation events that each led to a subsequent burst of mature cells was proposed as a mechanism to explain observed population fluctuations. The number of generations L that progenitor cells traverse along a differentiation pathway (lineage) before terminal differentiation was also estimated to be L ∼ 24 for the granulocyte lineage (Xu S. et al., 2018). To obtain this result, a mean-field model for HSC self-renewal was developed and applied to experimental data on granulocytes, using only the mean and variance of clone populations in the data fitting.

In this paper, we improve on the model used in (Xu S. et al., 2018) by developing a framework that can explain population transients and that can predict the density of the number of clones with respect to mean clone sizes. Instead of analyzing VIS data from (Kim et al., 2010, 2014), we consider the barcode data from (Wu et al., 2014; Koelle et al., 2017). In the latter experimental studies, replication-incompetent HIV-derived lentiviral barcoding vectors were used to tag HSCs that were transplanted into four rhesus macaques. The barcode consists of a six base-pair library identification and a 35 base pair high-diversity cellular barcode. As with the VIS experiments, barcoded cells were reinfused in the animals after myeloablative total-body irradiation. Purified samples of blood cells were then subject to low-cycle PCR amplification with the two primers bracketing the barcode. This barcoding approach provides more precise quantification relative to other clonal tracking protocols such as VIS (Kim et al., 2014) and transposon tagging (Sun et al., 2014) approaches. Thus, we will analyze the barcoding data via a mathematical model with the goal of more accurately estimating physiological parameters such as HSC carrying capacity, progenitor cell division rates, and progenitor cell proliferative potential for the granulocyte cell lineage. Although clonal structure of mature cells of different lineages, such as T, B, monocytes, and NK cells, were quantified in (Wu et al., 2014; Koelle et al., 2017), lymphocyte maturation is more complex, involving additional intermediate steps and subsequent immune signaling and mature cell proliferation. Thus, we focus on the simpler and abundant mature granulocyte population (Bystrykh et al., 2012).

In the following Materials and Methods section, we briefly describe the raw data and present the mathematical model. In the Data Analysis and Results section, we describe how measured clone data is compared to predicted clone abundances and show that minimization of the difference leads to reasonable estimates of parameter estimates. Finally, in the Discussion and Conclusions, we provide qualitative insight into how model parameters affect the predicted clonal patterns and discuss further improvements and potential new modeling directions.

Materials and methods

In this section, we describe information extracted from the granulocyte abundance data in (Koelle et al., 2017) and the mathematical model used to describe this data. The experimental parameters associated with the experiments are listed in Table 1, which lists the number of cells (tagged and untagged) transplanted, the barcode library size used, and the total number of different barcodes observed across all samples of all lineages for each animal. These values inform us on the typical magnitude of experimental parameters to which our subsequent model must conform. In Table 2, we list parameters used in our mathematical model as determined either from experimental data or through estimates.

TABLE 1

Transplant parameters. The initial transplant populations for the four animals ZH33, ZG66, ZH19, and ZJ31. The total library size for the cell preparation was in the range C _L = 53319 − 109085. The total number of cells injected was H = 2.3 × 10⁷ − 4.8 × 10⁷, of which H* = 8.0 × 10⁶ − 1.67 × 10⁷ were barcoded (corresponding to 23%–35% GFP+ labeling). Across all peripheral blood samples and cell lineages, the total number of barcodes detected in each animal was in the range C ̂ s = 21450 − 62354 , i.e., roughly half of injected HSC barcodes were detected in the peripheral blood samples. Among granulocytes, the total sampled richness (across all time points) ranged from 2660 − 32363.

Variable Animal	Library size C _L	Injected cells H	Injected GFP+ H*	Total C _s	C _s (grans)
ZH33	63469	3.2 × 10⁷	1.11 × 10⁷	25325	9221
ZG66	53613	4.8 × 10⁷	1.67 × 10⁷	21450	2660
ZH19	53319	4.8 × 10⁷	1.1 × 10⁷	31929	10964
ZJ31	109085	2.3 × 10⁷	8.0 × 10⁶	62354	32363

TABLE 2

Overview of variables and parameters. Parameters and variables and their estimated values if known. Some values need to be calculated from our model and are denoted “calc.,” while others need to be self-consistently estimated. For example, from GFP tagging, the fraction of tagged HSCs is approximately ∑ i = 1 C h h i ( 0 ) / ∑ i = 0 C h h i ( 0 ) ≈ 15 − 35 % but can slowly vary in time. Values relating to sampled cell populations are derived from animal ZH33 in the experiment of (Koelle et al., 2017). HSC proliferation and death rates have been estimated in (Shepherd et al., 2007) and (Catlin et al., 2011). Numbers specific to granulocytes are indicated as such.

Variables/Parameters	Definition	Value
t _j, j = 1, …, J	Sampling time points	∼month
s ̂ i ( t j ) , i = 1 , … , C s	No. of cells with tag i in sample drawn at t _j (data)	∼ 0 − 1 0 4
s ̂ i = 1 J ∑ i = 1 C s s ̂ i ( t j )	Mean no. of cells with tag i in sample (data)	∼ 0 − 1 0 4
S ̂ ( t j ) = ∑ i = 1 s ̂ i ( t j )	Total no. of tagged granulocytes in each sample at t _j (data)	∼ 2 × 1 0 6
C ̂ s ( t j > 2 months )	Total no. of clones (richness) in sample j	∼ 1000 (grans)
C ̂ s > 2	Total richness across all t > t ₂	2,335 (grans)
h ₀	Untagged HSCs in bone marrow (model)	unknown, ∼ 1 0 4
h _i, i = 1, …, C _h	HSCs with barcode i in BM (model)	1–1,000
n i ( ℓ ) ( t )	No. of ℓ ^th-generation progenitor cells with tag/barcode i (model)	calc
m _i(t)	No. mature cells with tag i (model)	calc
s _i(t)	No. of cells with tag i in sample (model)	calc
K	HSC niche carrying capacity (model)	inferred, 10⁴–10⁵
C _h(t)	Total no. of engrafted clones (model)	unknown, ∼ 1 0 4
C _s(t)	Total no. of clones sampled at t (model)	simulated
r _h(0)	Intrinsic HSC self-renewal rate	≫ 0.01/day
μ _h	HSC death rate	< 0.01 /day
α	HSC differentiation rate	10^–3 − 0.02/day, inferred
r _n	Progenitor cell division rate	1–5/day
μ _n	Progenitor cell death rate	unknown, ∼ 0 /day
ω	Progenitor cell terminal differentiation rate	unknown, < r n
L	Proliferative potential of progenitor cells	inferred, L* = 22 (grans)
μ _m	Mature cell death rate	0.185/day (grans)
η	Average sample fraction	∼ 1 0 − 5 − 1 0 − 4
η(t _j)	Fraction of sample j	∼ 1 0 − 5 − 1 0 − 4

Measured quantities

First, we consider the observed data associated with each animal (Koelle et al., 2017), as shown in Figure 1. Granulocytes in blood samples drawn from each animal at times points t _j, j = 1, 2, …, J are sequenced and clonal (barcode) abundances tabulated. The total abundance (number of mature cells of a given cell type), S ̂ ( t j ) , and the richness C ̂ s ( t j ) (the total number of different barcodes detected in each sample) are also recorded and plotted in Figures 1A, B. In this study, S ̂ ( t j ) denotes the total measured number of granulocytes (barcoded and unbarcoded) in a sample taken at time t _j. The fluctuations of S ̂ ( t j ) and C ̂ s ( t j ) across t _j may arise from varying sampled sizes across time points and/or fluctuations in the state of the animal.

FIGURE 1

After transplantation, peripheral blood samples were taken across J time points t _j, j = 1, …, (J). Typically, measurements were taken over 20–49 months and J = 10 − 15. (A) The total population S ̂ ( t j ) of granulocytes sampled from animal ZH33 (Koelle et al., 2017) at times t _j = (1, 2, 3, 4.5, 6.5, 9.5, 12, 14, 21, 28, 30, 38, 43, 46, 49) months. (B) The total richness in each sample, C ̂ s ( t j ) . The richness at the first two time samples are large (as we shall see, due to transplantation of barcoded progenitor cells). After the first two time points, where the richness will arise from the richness of the transplanted HSCs, the typical richness at each time point C ̂ s ( t j ≳ 2 ) ≈ 1000 , while the richness across all J − 2 time points (for t _j > 2 months) is C ̂ s > 2 = 2335 . Across all J time points, 9,221 unique granulocyte clones were detected (out of a total of 25325 across all cell types). The individual clone abundances in the sampled granulocyte population are shown in (C) where the abundances of clone i in a sample taken at time t _j are denoted by s ̂ i ( t j ) . The mean and standard deviation σ ̂ i of the abundances of all clones across all sampling times are calculated using Eq. 1 and scatter-plotted in (D). Each point represents one of the 2,335 detected granulocyte clones.

An example of the abundances of each clone within the granulocyte population from Koelle et al. (Koelle et al., 2017) is shown in Figure 1C. In these experiments, tagged stem cells are transplanted back into a rhesus macaque at t = 0 so that initially each clone consists of a single cell. A series of 1 ≤ j ≤ J samples are taken at time t _j after implementation, yielding a set of mature cells. We denote the abundance of clone i (among granulocyte cells) in the sample taken at time t _j after transplantation as s ̂ i ( t j ) . The J measurements allow each clone of a particular mature cell type i to be characterized by a mean s ̂ i and variance σ ̂ i 2 defined by s ̂ i = 1 J ∑ j = 1 J s ̂ i t j σ ̂ i 2 = 1 J ∑ j = 1 J s ̂ i t j − s ̂ i 2 . (1)

Note that the total measured population of any cell type S ̂ ( t j ) = ∑ i = 1 C s s ̂ i ( t j ) . A scatter plot of σ ̂ i versus s ̂ i for all clones detected in a sample of granulocytes is shown in Figure 1D.

For each clone at ( s ̂ i , σ ̂ i ) we can evaluate the local density ρ ̂ , the number of clones within some size window. This density can be viewed as the concentration of data points shown in Figure 1D as a function of mean clone size s ̂ , and will be constructed using kernel density estimation (Rosenblatt, 1956; Parzen, 1962) of the data points in ( s ̂ , σ ̂ ) space. The unknown density function ρ ̂ is obtained by concatenating isotropic Gaussian kernel functions about each point and using an optimal, common bandwidth parameter, typically chosen as the value that minimizes the mean integrated squared error, or Kernel Density Estimation (KDE) (Silverman, 1986). The reconstructed density function can be thought of as a probability that a random clone arises in the volume (s, s + ds) × (σ, σ + dσ). For each clone at ( s ̂ i , σ ̂ i ) we can evaluate the local density ρ ̂ .

In the remainder of this paper we will develop a mathematical model that we can simulate to generate total populations, clonal populations, and their associated attributes (s _i, σ _i, ρ _i). Note that while there are many existing mathematical models of hematopoiesis (Colijn and Mackey, 2005; Peixoto et al., 2011; De Souza and Humphries, 2019), they describe only time variations in total populations, rather than that of lower-population, individual clones. We will tune parameters of the model so that its predictions provide a reasonable match to the aforementioned measurements, paying particular attention to clone abundances and clone size variability.

Mathematical modeling

Our mathematical model incorporates known and accepted features of hematopoiesis. Three main cell compartments are considered: hematopoietic stem cells (HSCs), transit amplifying progenitor cells, and peripheral mature cells. Although the stem cell population in bone marrow is large and can be described using a deterministic model, the initial populations within each clone are small and require a discrete stochastic description. We will then assume that a small sample of mature cells is drawn from the animals at times t _j, j = 1, 2, …, J and sequenced.

We first describe the initial conditions including the number of HSCs and HSPCs injected into each animal. As listed in Table 1, H is the total number of HSCs injected into each animal, among which H ₀ are untagged and H _i ≪ H ₀ contain barcode 1 ≤ i ≤ C _H (and are GFP+). The total tagged HSC population is H * ≡ ∑ i = 1 C H H i so that H = H ₀ + H*. The richness C _H ≲ C _L is the number of barcodes transferred into the animal, which is comparable to the richness of the barcode library C _L used in each experiment. Since H ₀ ∼ 10⁷ ≫ H _i, we will consider the probability distribution of only the tagged populations, which is described by the multinomial P H = H * ! ∏ i = 1 C H 1 C H H i 1 H i ! , (2) where H ≡ ( H 1 , H 2 , … , H C H ) and H* is the total number of GFP+ (barcoded) cells. Specifically, for animal ZH33 studied in (Koelle et al., 2017)) H ≈ 3 × 10⁷, H*/H ≈ 0.35, ∑ i = 1 C H H i ≈ 1.1 × 1 0 7 , C _H ≲ C _L ≈ 6 × 10⁵. Thus, the typical H _i ≈ H*/C _H ∼ 180.

A certain fraction η ₀ of the H HSCs home into the bone marrow, successfully engraft, and subsequently actively self-renewal and/or differentiate. Engrafted HSC populations are defined by h ( 0 ) = ( h 1 ( 0 ) , h 2 ( 0 ) , … , h C h ( 0 ) ( 0 ) ) , where the richness of engrafted HSCs in the bone marrow is C _h ≲ C _H. Transplantation efficiencies are typically single-digit percentages (Abbuehl et al., 2017; Radtke et al., 2020) and transplanted CD34⁺ cells contain significant numbers of progenitor cells. Thus, the fraction η ₀ ≪ 1; if η ₀ is sufficiently small (approximately ≲ 1/180), then we can safely assume that the initial clone populations in bone marrow are represented by very few cells. For simplicity, we approximate h _i(0) ≈ 1. Even if η ₀≰1/180, most barcodes will be represented by very few cells. We have verified that an initial condition in our model that allows for, say, some h _i(0) = 2, 3 does not qualitatively affect the mature cell populations.

The random selection of cells into the bone marrow can be thought of as a sampling (without replacement) process. Including the untagged population, the probability distribution of engrafted cells resulting from the injected tagged cell population H = ( H 1 , H 2 , … H C H ) is given by P h 0 | H = 1 H * h * 0 ∏ i = 1 C H H i h i 0 , (3) where H* and h * ( 0 ) = ∑ i = 1 C H h i ( 0 ) are the total initial numbers of barcoded injected cells and engrafted barcoded HSCs, summed over all clones. Note that the number of untagged transplanted cells h ₀(0) ≫ 1 is large so that we can approximate it by its deterministic value h ₀(0) ≈ η ₀ H ₀.

To extract the overall probability of initial condition h(0), we average Eq. 3 over the prior P ( H 1 , … , H C H ) and find P h 0 = ∑ H P h 0 | H P H = h * 0 ! ∏ i = 1 C H 1 C H h i 0 1 h i 0 ! , h * 0 ≡ ∑ i = 1 C H h i 0 . (4)

Besides the initial condition h _i(t = 0) = 1, the initial number of untagged HSCs h ₀(t = 0) is related to the transplantation efficiency and is generally unknown. Barcodes are associated with a GFP tag and the initial fraction of sampled cells that are GFP+ is ∼ 35 % . Since we assume a neutral model, it is reasonable to assume that the fraction of injected tagged cells H*/H is equivalent to the fraction of tagged cells in the engrafted population ∑ i = 1 C h ( 0 ) h i ( 0 ) / ∑ i = 0 C h ( 0 ) h i ( 0 ) ≈ 0.35 (although this ratio slowly decreases via extinction). The precise richness of HSC population in stem cell niche, C _h(t) < C _H is also unknown, but except for fluctuations, has a lower bound of C ̂ s , the total number of unique clones detected across all samples across all cell types. Thus, we take h ( 0 ) = ∑ i = 0 C h ( 0 ) h i ( 0 ) = h 0 ( 0 ) + C h ( 0 ) ≈ C h ( 0 ) / 0.35 .

Self-renewal, death, and differentiation into progenitor cells all contribute to the stochastic dynamics of h _i. Although the total HSC population in the niche h ( t ) ≈ ∑ i = 0 C h ( t ) h i ( t ) is large and can be approximated deterministically, the HSC population of each clone h _i(t) may be small and must be treated stochastically. Under our neutral assumption, the intrinsic self-renewal rate r _h of HSCs does not depend on the barcode identity i. Since HSCs reside in niches in the bone marrow that place limits on growth, we assume the HSC proliferation rate follows a linearly decreasing form defined by a carrying capacity and the engrafted HSC population h(t) r h h t = r h 0 1 − h t K , h t ≡ ∑ i = 0 C h h i t , (5) where r _h(0) is the intrinsic proliferation rate of a single, isolated HSC. Note that the untagged HSCs are included through h ₀. Finally, we assume that HSCs die at rate μ _h and differentiate at rate α and that these rates, like the growth rate in Eq. 5, do not depend on barcode identity.

As shown in the Supplementary Material, the richness C _h(t) may progressively decrease from random HSC death and extinction and can be estimated by solving for the stochastic birth-death process (neglecting outflux from differentiation) and using generating functions to find E C h t ≈ C h 0 ψ t + ϕ t , (6) where ψ t ≡ e − ∫ 0 t r h t ′ − μ h d t ′ , ϕ t ≡ ∫ 0 t r h t ′ ψ t ′ d t ′ . (7)

In this expression, r _h(t) is approximated by r h ( h ̄ 0 ( t ) + h ̄ * ( t ) ) , where h ̄ 0 + h ̄ * ( t ) is given by the explicit solution to the deterministic birth-death process with carrying capacity K. Thus, given C _h(0), r _h(0), K, and μ _h determine how the expected richness decreases. Henceforth we treat C _h(t) in our model as the expected value E [ C h ( t ) ] derived from our stochastic birth-death model, i.e., we use C _h(0)/(ψ(t) + ϕ(t)) as the model for C _h(t).

We will also simulate the stochastic birth-death process for HSCs (see Supplementary Material for details), with the differentiation rate α that allows HSCs to differentiate into the progenitor/transit amplifying cell compartment (see Figure 2). Progenitor cells are further distinguished by their generation ℓ. Thus, ℓ not only measures generation number but also an effective differentiation state. Each HSC differentiation event leads to an ℓ = 0 progenitor cell. Since the number of HSCs of any one clone is small, the initial differentiation events from clone i follow a Poisson process with rate αh _i. The populations of the subsequent generations of progenitor cells in clone i are denoted by n i ( ℓ ) . Once the ℓ = 0 generation cells are generated, the number of progenitor cells quickly expand, so their dynamics will be described by a deterministic model as developed in (Xu et al., 2018) d n i ℓ t d t = Poisson α h i t − r n 0 + μ n 0 n i 0 t ℓ = 0 , 2 r n ℓ − 1 n i ℓ − 1 t − r n ℓ + μ n ℓ n i ℓ t 1 ≤ ℓ ≤ L − 1 , 2 r n L − 1 n i L − 1 t − ω + μ n L n i L t ℓ = L , (8) where Poisson(αh _i(t)) is the time-inhomogeneous point Poisson process describing HSC differentiation events. In other words, after a differentiation event at time t ₁, the probability density of the time Δt to the next differentiation event is given by α h i ( t 1 + Δ t ) exp − α ∫ 0 Δ t h i ( t 1 + s ) d s . We will use the values of h _i(t) from our stochastic simulations and sample from this inter-event time density to simulate realizations of differentiation events.

FIGURE 2

Schematic of the hybrid stochastic-deterministic model. Tagged (barcoded) stem cells are transplanted into an animal initially with one cell (h _i(t = 0) ≈ 1) per clone. These cells, together with the untagged ones (h ₀(t = 0) ≫ 1) then undergo self-renewal and death, at rates r h h ( t ) = ∑ i = 0 h i ( t ) and μ _h, respectively, in the bone marrow. HSCs in all clones are also assumed to undergo asymmetric differentiation with rate α, forming a zeroth-generation progenitor cells. The population of ℓ ^th-generation (or stage) progenitor cells, denoted n i ( ℓ ) , further symmetrically differentiate with each division, up to a maximum of ℓ = L generations. The final-generation cell in clone i with population n i ( L ) can then undergo terminal differentiation at rate ω to form mature, circulating peripheral blood cells. Mature cells at population m _i are then randomly sampled (with sampling fraction η and generating a sample population s _i) and sequenced. We wish to infer some of the parameters of the model by comparing the predicted means, standard deviations, and clone number densities with those from data (Figure 1). Lineage differentiation is schematically shown as a splitting of the grey clone between generations ℓ = 1 and ℓ = 2, where a new cell type (squares) branches off. The division and death rates of progenitor cells in this new lineage, r n ′ and μ n ′ , may be different, as may the maximum number of generations L′. The mature cells turn over with rate a μ _m that may depend on lineage (but not clone identity within each lineage). In this paper, we assume that the lineages diverge at the zeroth-generation progenitor cell and analyze the model after the first differentiation step (rate α) independently for different cell types (in this paper, granulocytes).

In Eq. 8, r n ( ℓ ) and μ n ( ℓ ) represent the proliferation and death rates of generation-ℓ progenitor cells, respectively, and ω is the terminal differentiation rate into mature blood. Each division can also be thought of symmetric differentiation producing successively more differentiated progenitor cells. To model the finite proliferative potential of progenitor cells, we set the maximum number of generations to ℓ = L, after which the L ^th generation cell can only terminally differentiate to mature blood.

Consider a single isolated differentiation event of an HSC at t = 0 belonging to a particular clone. The resulting progenitor cell population after this event is described by Eq. 8 without the Poisson(αh _i(t)) term but with an initial condition corresponding to a single ℓ = 0 cell: n i ℓ 0 = 1 ℓ = 0 , 0 otherwise. (9)

The subsequent populations at time t form a temporal “burst” of cells that are described by n i ( ℓ ) ( t ) which is the solution of Eq. 8 without the Poisson(αh _i(t)) term but using the initial condition in Eq. 9. If we assume that all progenitor generations carry the same division and death rates, r n ( ℓ ) = r n for 0 ≤ ℓ ≤ L − 1 and μ n ( ℓ ) = μ n for 0 ≤ ℓ ≤ L, we can find an analytical solution associated with a single isolated burst as n i L t = e − ω + μ n t L − 1 ! 2 r n r n − ω L ∫ 0 r n − ω t z L − 1 e − z d z . (10) We can evaluate all populations n i ( ℓ ) ( t ) for ℓ < L by solving Eq. 8 and using Eq. 10, as detailed in the Supplementary Material.

If we assume that mature cells do not appreciably proliferate ¹ , the mature cell population in clone i obeys d m i t d t = ω n i L t − μ m m i t , (11) where μ _m is the lineage-dependent turnover rate of mature cells. Using the solution to n i ( L ) ( t ) , we solve Eq. 11 to find (see Supplementary Material) m i t = ω ∫ 0 t n i L t ′ e − μ m t − t ′ d t ′ . (12)

The mature cell population burst (of a specific clone) arising from a single, isolated differentiation event is plotted in Figure 3A. Note that the expression for a mature cell burst given in Eq. 12 is derived from the specific initial condition Eq. 9; however, some low-ℓ progenitors are also initially transplanted (see below). Thus, m _i(t) will in general depend on the initial numbers of n ^(ℓ>0)(0).

FIGURE 3

(A) The population m _i(t) of mature cells resulting from a single HSC differentiation event as obtained from Eq. 12. (B) Multiple concatenated bursts from a low-population HSC clone showing well-separated intermittent pulses obtained via Eq. 13. (C) When the HSC population of a clone is large, the resulting mature cell population bursts merge together and exhibit lower relative variability.

Since Eq. 8 are linear, populations arising from a sequence of Poisson-distributed differentiation events can be constructed by adding those derived from single events occurring at times T _k. In this case, the resulting mature cell population at time t is given by m i t = ∑ k = 1 k max m i t − T k , T k max + 1 > t > T k max , (13) where m _i(t − T _k) is the solution given in Eq. 12 (for the specific initial condition in Eq. 9). Two different sequences of bursts are shown in Figures 3B, C. In Figure 3B, we consider a clone with few HSCs such that αh _i(t) ≪ μ _m. This limit gives rise to differentiation events that occur rarely over the lifespan of mature cells, as depicted Figure 3B. In Figure 3C, we plot a sequence of more frequent mature cell population bursts that arise for more frequent differentiation events αh _i(t) ≫ μ _m from HSCs that are in higher population clones.

Recall that the calculations involving Eqs 8–12 are performed for each clone i, resulting in a series of time-dependent expressions for m _i(t) as per Eq. 12. These single event responses are then summed according to Eq. 13 to arrive at the total, time-dependent population of mature cells of a specific type and carrying the same barcode. These predicted whole-organism populations depend on the model parameters C h ( 0 ) , α , K , r n , μ h , μ n , μ m , L , ω . Since growth of transit amplifying progenitor cells is fast, we will henceforth assume r _n ≫ μ _n ≈ 0. All other rates are given in units of per day.

Since the clone abundances are derived from sequencing cells in a small fraction η (for rhesus macaque, η ∼ 10^–5 − 10^–4) of the animal’s blood taken at times t _j, we also need expressions for mature cell populations within a sample. Given the small sample sizes η, low population clones in the mature cell pool can easily be missed. For a given population m _i in the whole animal, the probability that s _i cells are captured in the sample is given by (Chao and Lin, 2012; Xu et al., 2020) P s i | m i , S , M = 1 M S ∏ i = 0 C h m i s i ≈ ∏ i = 0 m i s i η s i 1 − η m i − s i , (14) where, for a given cell type (for example granulocytes), S(t) = ∑ _i=0 s _i(t) is the total number of sampled cells (including untagged ones), M(t) = ∑ _i=0 m _i(t) is the total number of circulating mature cells (including untagged ones), and the sampling fraction is η = S/M (which we first assume is the same at each t _j).

After computing m _i at time t _j using Eq. 12, we take the nearest integer value and use it for m _i in Eq. 14. We then draw a single value s _i(t _j) from the binomial distribution, assuming η is given. Finally, we simulate our model to generate trajectories of M(t) and then determine the tagged sampled fraction S t ≈ H * H η M t . (15)

Equations 5–15 represent a hybrid stochastic-deterministic model since the self-renewal process of a small number of HSCs in each clone and the final sampling step (Eq. 14) are modeled as discrete stochastic processes, while proliferation and differentiation of higher population progenitor and mature cells are treated deterministically via Eqs 8–12. The values s _i(t _j) obtained from the model in Eqs 8–14 are used to generate the predicted clone population mean and standard deviations, s _i and σ _i, according to Eq. 1. Both s _i, σ _i are then used to determine the density of points ρ _i. These three values for each clone are then compared to their corresponding values constructed from data s ̂ i ( t j ) , as we detail in the next section. These predictions, along with the predicted richness C _s(t) and total sampled granulocyte population S(t) provide the basis for comparing with measured data and parameter inference.

Data analysis and results Data presentation

The published experimental data in (Koelle et al., 2017) provide data for four rhesus macaques ZH33, ZG66, ZH19, and ZJ31 over a period of 49, 42, 36 and 20 months, respectively. Samples were taken after autologous transplantation with myeloablative conditioning at times t _j, 1 ≤ j ≤ J. Possible sampled cells are of five types: T cells, B cells, monocytes, granulocytes and NK cells. Barcoded myeloid (granulocytes and monocytes) and B-cells reach a noisy equilibrium after approximately 1 month, whereas for T-cells the time frame is longer, between 5 to 17 months. Furthermore, since granulocytes comprise a majority of white blood cells, we apply our mathematical and statistical model to the granulocyte lineage. At each sampling time t _j, the experimental data from (Koelle et al., 2017) reveals how many cells are sampled from each clone and what type each cell is. Across sampling time points, these sampled populations contain information on the overall abundance, how these abundances fluctuate in time, and the density of the number of clones detected as a function of abundance and abundance variability. Since ZH33 has the longest follow-up period, we use data from this macaque to compare experimental data with our mathematical predictions.

Matching model to data

Validation of our model will rely on matching predictions with available data in the form of C ̂ s ( t j ) , S ̂ ( t j ) , and ( s ̂ i , σ ̂ i , ρ ̂ ( s , σ ) ) . Since the model contains many parameters and the data is noisy and “sparse,” the model will likely overfit. Therefore, we carry out the parameter estimation by hand in stages, imposing limit on parameter values that are physiologically feasible.

First, we compare C _h(t) (Eq. 6) with the richness C ̂ s ( t j ) shown in Figure 1A to provide a constraint among μ _h, K, C _h(0), and r _h(0). We assume that the C _h(0) associated with granulocytes is slightly greater than the total richness across all samples after the first two, C h ( 0 ) ≳ C ̂ s > 2 . This is equivalent to assuming that granulocyte richness after about 2 months arises solely from barcoded HSCs. The cumulative post-two-month richnesses C ̂ s > 2 for animals ZH33, ZG66, ZH19, and ZJ31 are 2,335, 2007, 4,007, and 30732, respectively. Although the sample specific C ̂ s ( t j ) quickly decreases for t > t ₁, our model prediction for C _h(t) follows Eq. 6 and decays more slowly. By estimating C _h(0) and using Eq. 6, we generate the predicted C _s(t) and S(t) by simulating our full model and comparing them to C ̂ s ( t j ) and S ̂ ( t j ) . This allows us to further constrain the parameters C _h(0), r _h(0), K, and μ _h. Note that no analytic formula exists for C _s(t).

We first set K ≈ 100C _h(0) as the niche carrying capacity since smaller or larger values of K cannot provide the correct average clone sizes or approximately matching values of C _s. This comparison allowed us to obtain rough constraints and approximations to some parameters, particularly μ _h, r _h(0), and C _h(0). Discrete sets of values that are consistent with C ̂ s ( t j ) were selected and further pruned by using the remaining data.

The sampled richness C ̂ s ( t j ) in ZH33 exhibits a sharp decrease after the first sample time, without a corresponding collapse in the sampled abundances of granulocytes S ̂ ( t j ) . Our model explains this phenomenon by the initial condition; namely, the initially transplanted population of CD34⁺ cells contains some partially differentiated HSPCs. Progenitor cells of barcode i (described in our model by the populations n i ( ℓ ) ) are initially transplanted so that some n i ( ℓ ) ( t = 0 ) > 0 particularly for small ℓ (cells with a low degree of differentiation).

As shown in Figure 4, a fraction of the initial clones are HSPCs. Once these HSPC clones generate a burst of mature cells, they disappear from the animal without being renewed since there are no corresponding HSCs carrying the same barcode. Thus, the HSPC contribution to the overall sampled richness C ̂ s ( t j ) largely disappears after about 2 months. However, the total mature granulocyte population S ̂ ( t j ) does not suffer a decline since HSPCs lost due to terminal differentiation are replaced by HSC differentiation. The subsequently sampled mature cell richness then reflects the richness C _h(0) of the initially transplanted HSCs. We propose this partial HSPC transplantation as a mechanism for the observed rapid decrease in C ̂ s observed in some animals. The shape of ρ ̂ ( s ) (see Figure 7D) can inform our estimate of the initial progenitor population n ^(ℓ)(t = 0). Maxima in ρ ̂ ( s ) can be accounted for by offspring of initially transplanted progenitor cells of different stages ℓ, with n ^(ℓ)(0) generating smaller clones for larger ℓ (fewer remaining generations to expand).

FIGURE 4

Schematic of a mixed HSC/HSPC initial condition. The transplanted CD34⁺ cells contain HSCs and some progenitor cells (HSPCs), which are exhausted after about 2 months. The remaining richness arises mainly from that of the long term HSCs C _h(0) which then slowly decreases as certain HSC clones become extinct.

Next, we consider the small clones, predominantly arising at short times, and find their average value at the first sample taken at t ₁. These small clones also yield the highest density values ρ ̂ ( s ) , but mostly disappear at longer times. Therefore, we assume they predominantly arise from initial progenitor cells. We can then generate the prediction m(t = t ₁) from our model assuming an initial condition n ^ℓ(0) (and assuming no HSC contribution by setting α = 0). This approximation provides a constraint on the deterministic progenitor cell parameters r _n, L, ω, η. In these experiments, the typical η ∼ 10^–5, so we find that L = 22, and collect a set of feasible values for r _n, ω, and η that provide a good starting point for estimating the other parameters in the model. Note that r _n, L, ω, η can largely compensate each other at this level of comparison. In other words, sets of different ranges of values of one parameter will fit equally well provided some other parameters are also correspondingly adjusted.

Next, we least-squares minimize S ̂ ( t j ) − S ( t j ) , where the model prediction S(t) is given by Eq. 15. This further helps fix α. Once a set of parameters that allow for a reasonable match of C _s(t _j) and S(t _j) have been defined, as shown in Figures 5A, B, we then refine the fitting to the mean−standard deviation ( s ̂ i , σ ̂ i ) scatter plot shown in Figure 1D.

FIGURE 5

For animal ZH33, coarse fitting to the total population of tagged granulocytes S(t _j) (A), and the total number of clones at each time point C _s(t _j) (B). This initial rough matching was achieved using parameter values L = 22, r _n ≈ 2, ω ≈ 0.2, and sampling size η = 10^–5.

To compare our modeling results to the remaining experimental data ( s ̂ i , σ ̂ i , and ρ ̂ i ) and to fine-tune the estimates of the other parameters, the most natural intuition would be to compute the Euclidean distance (or any other relevant distance metric) between model predictions and experimental datasets and tune parameters so as to minimize this distance. However, since a number of clones go extinct in our model, the data size (for s ̂ i , σ ̂ i , and ρ ̂ i ) varies in time.

Thus, before comparing predicted clone size distributions to measured results, we first cluster the data according to the values of s ̂ i and σ ̂ i . Recall that ρ ̂ ( s , σ ) ≈ ρ ̂ ( s ) (since σ is highly correlated with s) is still determined from KDE using the raw, unclustered data ( s ̂ i , σ ̂ i ) . Clustering is performed using k-means to partition the data into multiple regions (in s ̂ - and σ ̂ -space) such that the Euclidean distance between a point and the center of its cluster is smaller than its distance to all other cluster centers. The goal is not to cluster the ( s ̂ i , σ ̂ i ) points according to any real feature, but to simply reduce the dimensionality of the problem and to control the number of effective data points before applying least squares comparisons. Although there are no obvious features in the ( s ̂ i , σ ̂ i ) data, k-means clustering of the distribution of points does yield an optimal number of clusters k* via the “elbow” method where the curvature of the sum of square errors (distortion score) is maximal (Yuan and Yang, 2019). After implementing k-means clustering using the Python yellowbrick package, we find that the optimal number of clusters is typically k* ≈ 50 ± 3 depending on the initial randomization and partitioning process. Subsequent results, however, are insensitive to the precise numbers of clusters used as long as k* ≈ 50.

Figure 6A compares ( s ̂ i , σ ̂ i ) from experiments and from our hybrid multicompartment model. Figure 6B shows the distortion score (blue) and the convergence time (green) as a function of the number of clusters. The optimal number of clusters k* = 48 arises at the elbow of the distortion score curve. Figure 6C shows the clustered data (for animal ZH33) against the clustered model predictions. The radius of each circle, w _k, k = 1, …, 53 denotes the fraction of data points (fraction of the total number of observed clones) assigned to cluster k and is thus a coarse-grained representation of the local data density.

FIGURE 6

A total of 48 clusters are identified via k-means clustering of the data from animal ZH33. (A) The unclustered ( s ̂ i , σ ̂ i ) data and model predictions for an arbitrary set of parameters. (B) The squared errors in k-means clustering or distortion score (blue) and convergence time (green) as a function of the number of clusters. The optimal value from the elbow signalling diminishing returns was found in this case to be k* = 48. (C) The location of the cluster centers plotted on the ( s ̂ , σ ̂ ) plane. The radius of each circle indicates the fraction of all clones w _k that are associated with cluster k and is set to 2000w _k.

Thus, we have clustered measured data according to P = { p 1 , … , p k * } ≡ { ( s ̂ 1 , σ ̂ 1 , w ̂ 1 ) , … , ( s ̂ k * , σ ̂ k * , w ̂ k * ) } , where ( s ̂ k , σ ̂ k ) denotes the center values of cluster k and w ̂ k is the area of the k ^th cluster. For a fixed set of all model parameters, we generate predictions Q = { q 1 , … , q g * } ≡ { ( s 1 , σ 1 , w 1 ) , … , ( s ℓ * , σ ℓ * , w ℓ * ) } (in this expression and in the following, ℓ, ℓ* denote matrix indices and not progenitor cell generations). The optimal number of clusters derived from our hybrid stochastic-deterministic model, ℓ* will in general be different from the number of clusters k* derived from data, but is typically also about ℓ* ≈ 50.

In order to compare the clustered data P to the model prediction Q, which are matrices with different numbers of rows, we use the Wasserstein metric, or Earth mover’s distance (EMD) (Villani, 2009; Kolouri et al., 2017) to define a distance between them. Let d _k,ℓ denote the distance between cluster p _k and cluster q _ℓ so that the matrix a d = {d _k,ℓ} catalogues all possible cluster-cluster distances. We aim to compute a flow map f = {f _k,ℓ} that yields the minimal distance between clusters p = {p _k} and clusters q = {q _ℓ} by finding min f ∑ k = 1 k * ∑ ℓ = 1 ℓ * f k , ℓ d k , ℓ (16)

Subject to the constraints f k , ℓ ≥ 0 , ∑ ℓ = 1 k * f k , ℓ ≤ w k , ∑ k = 1 ℓ * f k , ℓ ≤ w ℓ , (17)

For all 1 ≤ k ≤ k* and 1 ≤ ℓ ≤ ℓ* and ∑ k = 1 k * ∑ ℓ = 1 ℓ * f k , ℓ = ∑ k = 1 k * w k = ∑ ℓ = 1 ℓ * w ℓ . (18)

After finding the optimal flow f, we evaluate the EMD as E M D P , Q = ∑ k = 1 k * ∑ ℓ = 1 ℓ f k , ℓ d k , ℓ ∑ k ′ = 1 k * ∑ ℓ ′ = 1 ℓ * f k ′ , ℓ ′ . (19)

Model parameters are varied until our model-derived predictions best match the clustered data by minimizing the EMD. In this paper, we consider only the granulocyte lineage since it is the most abundant and reliably measured with minimal complex dynamics and regulation.

Finally, note the fluctuations in S(t _j) which are too large to be captured by the intrinsic stochasticity in our model, as indicated in Figure 5B. These “unknown” fluctuations can arise from a number of processes, including variable sampling fractions η(t _j) at each time point t _j and fluctuating animal state due to infections, stress, inflammation, etc. These may influence total mature cell populations month to month. Some of these effects can be effectively accounted for by adjusting the mean sample fraction η = 10^–5 by an amount Δη(t _j) at each time point. Using these values of Δη(t _j) to match the data S(t _j), and then readjusting the parameters, we find good comparison between the experimental measurements and our model, as shown in Figures 7A–C.

FIGURE 7

Fitting of tagged granulocyte populations for animal ZH33 after time-dependent adjustment of sampling size η(t _j). For this animal t _j = [1, 2, 3, 4.5, 6.5, 9.5, 12, 14, 21, 28, 30, 38, 43, 46, 49] months. (A) By initially best-matching S(t _j), we find η(t _j) = [1, 2.03, 1.07, 0.94, 0.68, 1.25, 0.61, 0.83, 1.29, 1.29, 1.01, 1.01, 0.99, 1.14, 0.83]×10^–5. (B–D) By searching parameter space to find a good match to C ̂ s ( t j ) , s ̂ i , σ ̂ i , and ρ ̂ (which is shown with its maximum nnormalized to one), which is plotted with its maximum normalized to unity. We find parameters [in units of/day] that fit the data are μ _h = 0.02, r _h(0) = 0.08, C _h(0) = 2500, C _n(ℓ = 0) = 800, C _n(ℓ = 1) = 1600, C _n(ℓ = 2) = 3200, K = 2.5×10⁵, α = 0.016, r _n = 2, L = 22, μ _n = 0, μ _m = 0.185 and ω = 0.2.

To further show consistency, we then plotted the predicted density and compare it with the data-derived (using KDE) density ρ ̂ in Figure 7D. A few large, highly variable clones remain not well reproduced by our model and are discussed in the next section. The parameter estimation procedure was applied to the three other animals in (Koelle et al., 2017), showing reasonable, consistent matching (see Supplementary Material).

Discussion and conclusion

In this paper, we analyzed data from stem cell transplantation experiments in rhesus macaque (Koelle et al., 2017) in which barcoded stem and progenitor cells (HSPCs) were autologously transplanted after myeloablative conditioning. Typically, of the ∼ 30 million cells transplanted only ∼ 15 – 35 % are tagged and then only a fraction homes to functional bone marrow niches. Nonetheless, typically hundreds or thousands of barcodes are detected in peripheral blood samples. The counts of circulating mature cells derived from each clone fluctuate from sample to sample; these fluctuations are significantly larger than those expected from random small samples (Xu et al., 2018) and thus arise from intrinsic stochasticity (Abkowitz et al., 1996) during hematopoiesis and/or physiological changes in the animals over the months that samples were being drawn.

In order to explain clone size variability, we extend a mathematical model first presented in (Xu et al., 2018). Our hybrid stochastic-deterministic model delineates all the clone populations and assumes regulation of the stem cell proliferation rate through a carrying capacity K, a finite differentiation potential of progenitor cells, terminal differentiation after a fixed number of divisions, and a final sampling step. Since the numbers of HSCs within each clone are typically small, we treated the self-renewal of HSCs within its niche stochastically by a coupled (through the carrying capacity) discrete birth-death process for each clone. In (Xu et al., 2018), the coupling in the stochastic HSC birth-death process was treated using a mean field approximation, which leads to a smaller clone size variability, everything else equal. Random times of asymmetric differentiation of each HSC into the first-stage progenitor cell is described by a rate α. After L differentiation steps, the L ^th-stage progenitor cell terminally differentiates into a mature, circulating blood cell. The progenitor and mature cell pools are treated deterministically. We performed stochastic simulations using the Gillespie algorithm (Bortz et al., 1975; Gillespie, 2007) of the entire HSC pool and solve for the progenitor and mature cell populations numerically. Additional feedback mechanisms between the HSC and progenitor pools can be implemented (Klose et al., 2019), but in our case would require a more complex model incorporating two-way coupling between stochastic and deterministic dynamics.

Our model suggests that the sampled clone abundance variation arises primarily from random differentiation events by HSC clones that occur at rate α. Each differentiation event leads to a temporal burst of mature cells of the same clone. We hypothesize that these bursts of mature cell production lead to the variability in sampled clone populations. Other physiological factors related to animal state may still also play a role, but are not considered in our model. Another feature captured by our current model is the transient richness immediately after transplantation. This behavior is explained by our initial condition that contains short-term HSPCs that are within the initial CD34⁺ pool. These HSPCs quickly differentiate but are not replenished at long times. This distributed initial condition may depend on the experimental protocol and may be indicative of the efficiency, and especially, the composition of the transplant.

Our model also allows us to explore how clone abundance predictions change with parameters. For example, we find that the range of larger clone sizes increase with increases in L, K, and α, in this order. All parameters affect the density of predicted data and cluster sizes. Mature cell death rates, which vary across different cell types, also affect the predicted abundance variations, especially for larger values of μ _m.

We also note that while percentage of GFP+ cells within each mature cell type (lineage) fluctuated from sample to sample, they generally increased with animal age (after transplant) for all animals. T cells had the largest increase in their barcoded populations during the handful of months after transplantation. Since this timescale is much longer than the progenitor cell transients, this slow increase in tagged cell populations suggests that, assuming a neutral model (barcodes and barcode integration sites do not affect cell proliferation and death rates), (i) that a certain fraction of HSCs remained in situ after radiative ablation, and (ii) the CD34⁺ barcoded and transplanted HSC population with an enriched GFP+ fraction were slowly activated. Scenario (i) means that some HSCs remained in their niche and continuously generated blood. The transplanted cells with barcoded HSCs (GFP+) can increase in importance if they slowly become more proliferative as they settle into the animal. Thus, the slow increase in GFP+ fraction in all five measured lineages suggest that transplanted cells may recover slowly from the transplantation procedure and increase their contribution to mature blood cell formation.

Finally, we note that there appears to be additional fluctuation mechanisms that are not accounted for in our model. In all animals, there appears to be a few very large clones with very high variability σ ̂ i . Within a stochastic HSC population, adjusting birth-death parameters that allow for larger clones and larger variances would suppress the richness to below what is observed. We have extensively explored feasible regimes of all parameters and conclude that allowing large clones that vary in abundance precludes agreement with other basic observables such as S ̂ ( t j ) and C ̂ s ( t j ) . Nonetheless, such unexplained features can be mechanistically informative and we discuss a number of reasonable factors that may account for them. First, the fluctuations in the measured total abundances of the different mature cell lineages did not correlate, implying that the specific set of sampling sizes η(t _j) used to explain granulocyte population variations (as shown in Figure 7) cannot be used to analyze those of other mature cell lineages. The seemingly uncorrelated cross-lineage populations imply that time variations in animal state arise further downstream, affecting the development of individual lineages. If fluctuations occurred in stem or multipotent progenitor cells, they would affect multiple cell lineages in similar ways and lead to inter-lineage population correlations.

Animal ZJ31 (see Supplementary Material) appears to be uniquely different from the others in that it exhibited a much larger richness C ̂ s ( t j ) as well as a much smaller maximum clone size (which non-etheless had high variance σ ̂ i ). For example, C _s at t _j = 5 months dips to a very low value, while the abundance S ̂ ( t j ) seems to be at a local maximum. Thus, a small number of granulocyte clones expanded dramatically, potentially squeezing out the many smaller clones below sampling. At month seven, S ̂ is extremely low but C ̂ s has recovered to its long term value, indicating that the previously large granulocyte clones were quickly cleared out. The results for ZJ31 may indicate a lower level of competitive exclusion, but also some other mechanism contributing to high variability. Therefore, the overall observation of high variability and the magnitudes of C ̂ s indicates that other model features should be considered.

One assumption of our model that is likely an oversimplification is the neutrality of barcodes. Although different barcodes themselves may not influence cells, different VISs may. For example, aberrant self-renewal arises when using lentiviral vectors (Espinoza et al., 2019) and different VISs of HIV have been shown to affect cellular proliferation rates (e.g., if the VIS is near an oncogene) Yeh et al. (2021). Besides the non-neutrality, we have also neglected stochastic or variable proliferative potential L and the time course of the HSC homing and engraftment into the bone marrow. A random but distributed L would allow a few randomly selected clones to expand further. We also expect that HSC migration and successful settlement into the bone marrow niche is a time-continuous process that provide a proliferation head start for a few early arriving clones. This would ultimately result in fewer clones C _s with some of them at higher populations s _i(t). An instantaneous (more abrupt) HSC engraftment and a larger spread in L would be more consistent with animal ZJ31 than with the others.

Additional information can be extracted from the clone abundance data to further identify and interrogate such “opposing” behaviors. For example, we have only considered the average autocorrelation of the clone abundances (the variance) and have not constructed cross-correlations between cell types/lineages or correlations across time. Except for the initial period after transplantation, we have assumed a time-inhomogeneous process and have not considered explicit time-dependence such as aging (Muller-Sieburg et al., 2012; de Haan and Lazare, 2018). Physiological aging can be straightforwardly incorporated by e.g., allowing for slow degradation of HSCs, changes in progenitor proliferative potential (Marciniak-Czochra et al., 2009), or changes in HSC niche carrying capacity K(t). Mutations that arise with age may also increase HSC self-renewal (Challen and Goodell, 2020) which could be modeled by a r _h(0) that increases with time. Thymic interruptions or involution with age (Lewkiewicz et al., 2019a,b) could also be modeled by assuming a decreasing maturation rate ω(t) when considering the T cell lineage.

While our current model contains a large number of parameters, it seems that a number of them are compensatory and control specific properties of the model predictions. For example, we found that α, L, ω, and η can compensate for each other and form an unknown effective parameter function f(α, L, ω, η). This feature effectively reduces overfitting and might be better analyzed using machine learning methods. Incorporating the more realistic mechanisms discussed above would yield additional effective parameters allowing the model to more accurately reproduce the measured quantities; nonetheless, intermittent differentiation of HSCs remains the key proposed mechanism for understanding intersample clone abundance variations.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://doi.org/10.1182/blood-2016-07-728691.

Author contributions

TC, YP, and MD’O developed and analyzed the model and wrote the manuscript. YP organized the data, performed numerical analyses, and data fitting. MT contributed to the analysis of the model and helped shape the stochastic simulation approach. TS provided insight on stem cell biology.

Funding

This work was supported by grants from the NIH through grant R01HL146552 (TC) and the Army Research Office through grant W911NF-18-1-0345 (MD’O) and DMS-1814090 (MD’O).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fsysb.2023.893366/full#supplementary-material

Certain lineages such as T cells can intermittently proliferate, e.g., upon antigen activation, but we neglect this and use small effective death rates μ _m for such cell types.

References Abbuehl

J. P.

Tatarova

Held

Huelsken

(2017). Long-term engraftment of primary bone marrow stromal cells repairs niche damage and improves hematopoietic stem cell transplantation. Cell. Stem Cell. 21, 241–255. 10.1016/j.stem.2017.07.004 Abkowitz

J. L.

Catlin

S. N.

Guttorp

(1996). Evidence that hematopoiesis may be a stochastic process in vivo . Nat. Med. 2, 190–197. 10.1038/nm0296-190 Attar

(2014). Changes in the cell surface markers during normal hematopoiesis: A guide to cell isolation. Glob. J. Hematol. Blood Transfus. 1, 20–28. 10.15379/2408-9877.2014.01.01.4 Biasco

Pellin

Scala

Dionisio

Basso-Ricci

Leonardelli

(2016). In vivo tracking of human hematopoiesis reveals patterns of clonal dynamics during early and steady-state reconstitution phases. Cell. Stem Cell. 19, 107–119. 10.1016/j.stem.2016.04.016 Bortz

A. B.

Kalos

M. H.

Lebowitz

J. L.

(1975). A new algorithm for Monte Carlo simulation of ising spin systems. J. Comput. Phys. 17, 10–18. 10.1016/0021-9991(75)90060-1 Busch

Klapproth

Barile

Flossdorf

Holland-Letz

Schlenner

S. M.

(2015). Fundamental properties of unperturbed haematopoiesis from stem cells in vivo . Nature 518, 542–546. 10.1038/nature14242 Bystrykh

L. V.

Verovskaya

Zwart

Broekhuis

de Haan

(2012). Counting stem cells: Methodological constraints. Nat. Methods 9, 567–574. 10.1038/nmeth.2043 Catlin

S. N.

Busque

Gale

R. E.

Guttorp

Abkowitz

J. L.

(2011). The replication rate of human hematopoietic stem cells in vivo . Blood 117, 4460–4466. 10.1182/blood-2010-08-303537 Challen

G. A.

Goodell

M. A.

(2020). Clonal hematopoiesis: Mechanisms driving dominance of stem cell clones. Blood 136, 1590–1598. 10.1182/blood.2020006510 Chao

Lin

C. W.

(2012). Nonparametric lower bounds for species richness and shared species richness under sampling without replacement. Biometrics 68, 912–921. 10.1111/j.1541-0420.2011.01739.x Colijn

Mackey

M. C.

(2005). A mathematical model of hematopoiesis: II. Cyclical neutropenia. J. Theor. Biol. 237, 133–146. 10.1016/j.jtbi.2005.03.034 Copley

M. R.

Beer

P. A.

Eaves

C. J.

(2012). Hematopoietic stem cell heterogeneity takes center stage. Cell. Stem Cell. 10, 690–697. 10.1016/j.stem.2012.05.006 Cordes

Dunbar

C. E.

(2021). Clonal tracking of haematopoietic cells: Insights and clinical implications. J. Haematol. 192, 819–831. 10.1111/bjh.17175 Corso

Varettoni

Mangiacavalli

Zappasodi

Klersy

Rusconi

(2005). Bone marrow CD34+ cell count is predictive for adequate peripheral progenitor cell collection. Leukemia Res. 29, 159–163. 10.1016/j.leukres.2004.05.018 de Haan

Lazare

S. S.

(2018). Aging of hematopoietic stem cells. Blood 131, 479–487. 10.1182/blood-2017-06-746412 De Souza

D. C.

Humphries

A. R.

(2019). Dynamics of a mathematical hematopoietic stem-cell population model. SIAM J. Appl. Dyn. Syst. 18, 808–852. 10.1137/18m1165086 Doulatov

Notta

Laurenti

Dick

J. E.

(2012). Hematopoiesis: A human perspective. Cell. Stem Cell. 10, 120–136. 10.1016/j.stem.2012.01.006 Espinoza

D. A.

Fan

Yang

Cordes

S. F.

Truitt

L. L.

Calvo

K. R.

(2019). Aberrant clonal hematopoiesis following lentiviral vector transduction of HSPCs in a rhesus macaque. Mol. Ther. 27, 1074–1086. 10.1016/j.ymthe.2019.04.003 Fliedner

M. C.

(2002). Research within the field of blood and marrow transplantation nursing: How can it contribute to higher quality of care? Int. J. Hematol. 76, 289–291. 10.1007/BF03165135 Gillespie

D. T.

(1977). Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361. 10.1021/j100540a008 Gillespie

D. T.

(2007). Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 58, 35–55. 10.1146/annurev.physchem.58.032806.104637 Goyal

Kim

Chen

I. S.

Chou

(2015). Mechanisms of blood homeostasis: Lineage tracking and a neutral model of cell populations in rhesus macaques. BMC Biol. 13, 85. 10.1186/s12915-015-0191-8 Grosselin

Sii-Felice

Payen

Chretien

Roux

D. T. L.

Leboulch

(2013). Arrayed lentiviral barcoding for quantification analysis of hematopoietic dynamics. Stem Cells 31, 2162–2171. 10.1002/stem.1383 Höfer

Rodewald

(2016). Output without input: The lifelong productivity of hematopoietic stem cells. Curr. Opin. Cell. Biol. 43, 69–77. 10.1016/j.ceb.2016.08.003 Kim

Kim

Presson

Metzger

Bonifacino

Sehl

(2014). Dynamics of HSPC repopulation in nonhuman primates revealed by a decade-long clonal-tracking study. Cell. Stem Cell. 14, 473–485. 10.1016/j.stem.2013.12.012 Kim

Kim

Presson

A. P.

D. S.

Mao

S. H.

Bonifacino

A. C.

(2010). High-throughput, sensitive quantification of repopulating hematopoietic stem cell clones. J. Virology 84, 11771–11780. 10.1128/JVI.01355-10 Klose

Florian

M. C.

Gerbaulet

Geiger

Glauche

(2019). Hematopoietic stem cell dynamics are regulated by progenitor demand: Lessons from a quantitative modeling approach. Stem Cells 37, 948–957. 10.1002/stem.3005 Koelle

S. J.

Espinoza

D. A.

(2017). Quantitative stability of hematopoietic stem and progenitor cell clonal output in rhesus macaques receiving transplants. Blood 129, 1448–1457. 10.1182/blood-2016-07-728691 Kolouri

Thorpe

Slepcev

Rohde

G. K.

(2017). Optimal mass transport: Signal processing and machine-learning applications. IEEE Signal Process. Mag. 34, 43–59. 10.1109/MSP.2017.2695801 Lee-Six

Friesgaard-Obro

Shepherd

Grossmann

Dawson

Belmonte

(2018). Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478. 10.1038/s41586-018-0497-0 Lewkiewicz

Chuang

Y. L.

Chou

(2019a). A mathematical model of the effects of aging on naive T cell populations and diversity. Bull. Math. Biol. 81, 2783–2817. 10.1007/s11538-019-00630-z Lewkiewicz

Chuang

Y. L.

Chou

(2019b). Dynamics of T cell receptor distributions following acute thymic atrophy and resumption. Math. Biosci. Eng. 17, 28–55. 10.3934/mbe.2020002 Lyne

A. M.

Kent

D. G.

Laurenti

Cornils

Glauche

Perie

(2018). A track of the clones: New developments in cellular barcoding. Exp. Hematol. 68, 15–20. 10.1016/j.exphem.2018.11.005 Marciniak-Czochra

Stiehl

Wagner

(2009). Modeling of replicative senescence in hematopoietic development. Aging (Albany NY) 1, 723–732. 10.18632/aging.100072 Mayle

Luo

Jeong

Goodell

M. A.

(2015). Flow cytometry analysis of murine hematopoietic stem cells. Nature 518, 542–546. Mendelson

Frenette

P. S.

(2014). Hematopoietic stem cell niche maintenance during homeostasis and regeneration. Nat. Med. 20, 833–846. 10.1038/nm.3647 Muller-Sieburg

C. E.

Sieburg

H. B.

Bernitz

J. M.

Cattarossi

(2012). Stem cell heterogeneity: Implications for aging and regenerative medicine. Blood 119, 3900–3907. 10.1182/blood-2011-12-376749 Parmentier

Kramer

Weller

Schuler

Ordemann

Rall

(2020). Reevaluation of reference values for bone marrow differential counts in 236 healthy bone marrow donors. Ann. Hematol. 99, 2723–2729. 10.1007/s00277-020-04255-4 Parzen

(1962). On estimation of a probability density function and mode. Ann. Math. Statistics 33, 1065–1076. 10.1214/aoms/1177704472 Peixoto

Dingli

Pacheco

J. M.

(2011). Modelling hematopoiesis in health and disease. Math. Comput. Model. 53, 1546–1557. 10.1016/j.mcm.2010.04.013 Radtke

Colonna

Perez

A. M.

Hoffman

Kean

L. S.

Kiem

H. P.

(2020). Isolation of a highly purified HSC-enriched CD34⁺CD90⁺CD45RA⁻ cell subset for allogeneic transplantation in the nonhuman primate large-animal model. Transplant. Direct 6, e579. 10.1097/TXD.0000000000001029 Rosenblatt

(1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statistics 27, 832–837. 10.1214/aoms/1177728190 Seita

Weissman

I. L.

(2010). Hematopoietic stem cell: Self-renewal versus differentiation. Wiley Interdiscip. Rev. Syst. Biol. Med. 2, 640–653. 10.1002/wsbm.86 Shepherd

B. E.

Kiem

H. P.

Lansdorp

P. M.

Dunbar

C. E.

Aubert

LaRochelle

(2007). Hematopoietic stem-cell behavior in nonhuman primates. Blood 110, 1806–1813. 10.1182/blood-2007-02-075382 Silverman

B. W.

(1986). Density estimation for statistics and data analysis. London: Chapman & Hall. Stiehl

Marciniak-Czochra

(2011). Characterization of stem cells using mathematical models of multistage cell lineages. Math. Comput. Model. 53, 1505–1517. 10.1016/j.mcm.2010.03.057 Sun

Ramos

Chapman

Johnnidis

J. B.

Y. J.

(2014). Clonal dynamics of native haematopoiesis. Nature 514, 322–327. 10.1038/nature13824 Sun

Komarova

(2012). Stochastic modeling of stem-cell dynamics with control. Math. Biosci. 240, 231–240. 10.1016/j.mbs.2012.08.004 Székely

Burrage

Mangel

Bonsall

(2014). Stochastic dynamics of interacting haematopoietic stem cell niche lineages. PLoS Comput. Biol. 10, e1003794. 10.1371/journal.pcbi.1003794 Verovskaya

Broekhuis

M. J.

Zwart

Ritsema

van Os

de Haan

(2013). Heterogeneity of young and aged murine hematopoietic stem cells revealed by quantitative clonal analysis using cellular barcoding. Blood 122, 523–532. 10.1182/blood-2013-01-481135 Villani

(2009). Optimal transport, old and new. Berlin Heidelberg: Springer-Verlag. Wu

Koelle

S. J.

Yang

Jares

(2014). Clonal tracking of rhesus macaque hematopoiesis highlights a distinct lineage origin for natural killer cells. Cell. Stem Cell. 14, 486–499. 10.1016/j.stem.2014.01.020 Xu

Wang

Guttorp

Abkowitz

J. L.

(2018a). Visualizing hematopoiesis as a stochastic process. Blood Adv. 1, 2637–2645. 10.1182/bloodadvances.2018023705 Xu

Böttcher

Chou

(2020). Diversity in biology: Definitions, quantification and models. Phys. Biol. 17, 031001. 10.1088/1478-3975/ab6754 Xu

Kim

Chen

I. S. Y.

Chou

(2018b). Modeling large fluctuations of thousands of clones during hematopoiesis: The role of stem cell self-renewal and bursty progenitor dynamics in rhesus macaque. PLoS Comput. Biol. 14, e1006489. 10.1371/journal.pcbi.1006489 Yeh

Y. H. J.

Yang

Razmi

Y. C.

(2021). The clonal expansion dynamics of the HIV-1 reservoir: Mechanisms of integration site-dependent proliferation and HIV-1 persistence. Viruses 13, 1858. 10.3390/v13091858 Yuan

Yang

(2019). Research on K-value selection method of K-means clustering algorithm. J 2, 226–235. 10.3390/j2020016