Volume 6, Issue 1 , Pages 17-25, January 2012
A stochastic model of the processes in PCR based amplification of STR DNA in forensic applications
Article Outline
- Abstract
- 1. Introduction
- 2. The stochastic model
- 3. Mathematical treatment
- 3.1. Symbols and definitions
- 3.2. The situation at the end of cycle C and the progression to cycle C+1
- 3.3. Probability distributions of the newly formed amplicons
- 3.4. The situation after the first cycle (C=1)
- 3.5. Recursive relations for the amplicons without stutter
- 3.6. Recursive relations for the amplicons with 1 stutter
- 3.7. Recursive relations for the amplicons with 2 stutters
- 3.8. Solving the recursive relations assuming D0 is fixed and known
- 3.9. Converting the results to the situation that D0 is actually a Poisson variable
- 3.10. Some implications
- 4. Numerical example
- 5. Discussion
- Appendix A. Conversion of Eq. (5) to (6)
- Appendix B. Conversion of Eq. (7) to (8)
- Appendix C. Derivation of Eq. (9)
- Appendix D. Derivation of Eq. (10)
- Appendix E. Derivation of Eq. (11)
- Appendix F. Sum series for the chains with 2 stutters
- Appendix G. Exact solutions of the recursive equations
- Appendix H. Generalization to the situation that D0 is a Poisson random variable
- Appendix I. Supplementary data
- References
- Copyright
Abstract
In forensic DNA profiling use is made of the well-known technique of PCR. When the amount of DNA is high, generally unambiguous profiles can be obtained, but for low copy number DNA stochastic effects can play a major role. In order to shed light on these stochastic effects, we present a simple model for the amplification process. According to the model, three possible things can happen to an individual single DNA strand in each complete cycle: successful amplification, no amplification, or amplification with the introduction of stutter. The model is developed in mathematical terms using a recursive approach: given the numbers of chains at a given cycle, the numbers in the next can be described using a multinomial probability distribution. A full set of recursive relations is derived for the expectations and (co)variances of the number of amplicon chains with no, 1 or 2 stutters. The exact mathematical solutions of this set are given, revealing the development of the expectations and (co)variances as function of the cycle number. The equations reveal that the expected number of amplicon chains without stutter grows exponentially with the cycle number, but for the chains with stutter the relation is more complex. The relative standard deviation on the numbers of chains (coefficient of variation) is inversely proportional to the square root of the expected number of DNA strands entering the amplification. As such, for high copy number DNA the stochastic effects can be ignored, but they play an important role at low concentrations. For the allelic peak, the coefficient of variation rapidly stabilizes after a few cycles, but for the chains with stutter the decrease is more slowly. Further, the ratio of the expected intensity of the stutter peak over that of the allelic peak increases linearly with the number of cycles. Stochastic models, like the one developed in the current paper, can be important in further developing interpretation rules in a Bayesian context.
Keywords: DNA, PCR, STR, Stochastic model, Mathematics, Recursive relation
1. Introduction
Forensic DNA profiling is a widely used technique with generally unambiguous profiling results in case of sufficient amounts of available DNA. The results become less clear-cut when mixtures are analyzed, and when the amount of DNA is fairly low (referred to as LCN, low copy number). In the latter case, random processes play an important role in the amplification, as is acknowledged in many literature sources (e.g., [1], [2], [3], [4], [5], [6], [7]). An individual DNA chain has only a limited probability to be amplified successfully in the sense that a detectable signal can be generated from it. In case there are many DNA chains, the probability that none of them is amplified successfully can safely be ignored, but this is not the case when there are only a few DNA strands present. Due to this stutter peaks may reach intensities in the order of that of the allelic peak, some alleles may not be amplified to detectable levels (allele drop-out) and replicate results of the same DNA sample may lead to different results. For interpretation of profiles in a Bayesian context, knowledge on the stochastic processes is needed. The current paper provides an approach to shed light on these processes using a stochastic model describing relevant events in the amplification process.
Although, to the best of our knowledge, little work has been published on the subject so far, Gill et al. [2] describe a stochastic model with a recursive basis. Given the numbers of amplicons in a cycle, they use the binomial distribution to model the numbers of amplicons in the next cycle. Cowell [5] uses the same basis for his simulation study. In the current paper, we take this several steps further, by including the occurrence of stutter in a multinomial distribution model, and by including potential differences in stochastic properties between amplification of genomic DNA versus amplicon strands. Given the number of amplicons in a given cycle, we derive equations describing the next cycle. This approach leads to a set of recursive equations that can be solved, leading to equations for the expectation and variance of the number of amplicons corresponding to the allelic peak and to the stutter peak. Just as Gill et al. [2] and Cowell [5], we assume that the probabilities of the different events do not depend on the cycle number.
Extension of the theory to include amplicons with more stutters is readily possible, but since the incidence of higher order stutters is generally low [8], higher order stutters are not included in this study.
2. The stochastic model
The basic model can be depicted as illustrated in Fig. 1. During one complete cycle, one of three things can happen to each individual single DNA strand: nothing (no amplification), normal amplification (leading to double-stranded amplicon), or amplification with introduction of a stutter sequence due to some kind of folding (“slipped strand”) of the original strand (see, for example, [9], [10], [11]).

Fig. 1.
Schematic presentation of the stochastic model. During a complete cycle, there are three different possibilities with an individual strand, each with a certain probability.
In this building block of the model some assumptions are made. For example, only backward stutter plays a role, whereas forward stutter is ignored as several literature sources indicate that forward stutter is quantitatively far less important than backward stutter (see, for example, [8], [9], [10], [11], [12], [13]). Also it is assumed that the stutter indicated concerns the loss of a single repeat; the loop is not so big that in one step 2 or more repeats are lost. In future studies the model can be refined to include these types of processes. Further, it is assumed that the probabilities p1, p2 and p3 are constant and independent of the cycle number. Obviously, p1
+
p2
+
p3
=
1.
In the next amplification step, the newly formed double stranded chains are made single-stranded again, and subjected to the same cycle with the same probabilities. The newly formed amplicons take the place of the target strand in Fig. 1. If the amplicon was formed with stutter (probability p3 in the figure), the cycle leads to new amplicons with the same number of stutter with probability p2 or with an additional stutter with probability p3. As such, after several cycles amplicons with 1 or more stutters can appear. Throughout the paper, the number of stutters is always relative to the original genomic DNA strand that is subjected to the amplification. It is assumed that the number of stutter does not affect the values of the probabilities p1, p2 and p3.
The model needs further refinement as the whole process starts with long-chained genomic DNA. The same three steps can occur, but it is very well possible that the probabilities are different. For example, due to the long strand, annealing of the primer may be hampered, leading to a relatively large probability that nothing happens at all. Also, due to potentially more complex three-dimensional folding of the long strand, the probability of introducing a stutter may be larger. Therefore, the model takes 6 probabilities into account: p1, p2 and p3 as indicated for the amplification of amplicon, and q1, q2 and q3 for the same types of probabilities in case of genomic DNA.
In the current paper, we develop a mathematical model describing the expectation and variance of amplicon chain numbers with up to 2 stutters. The models will be built using recursive analyses: describe the situation at cycle number C
+
1 given the situation at cycle C. For the major part, the models will be built assuming that the number of genomic DNA copies is known and fixed. Given the nature of the problem, it is more natural to assume that the actual number of copies subjected to the amplification follows a Poisson distribution. This is addressed in the last part of the derivations.
3. Mathematical treatment
3.1. Symbols and definitions
+
p2.
=
0, 1 or 2).
Throughout the document, the number of stutters is relative to the original genomic DNA strand that is subjected to the amplification.
3.2. The situation at the end of cycle C and the progression to cycle C
+
1
At the start of the process, there are D0 single strands of genomic DNA. They are present throughout the amplification process. In addition, there are amplicons. At the end of cycle C, there are amplicon chains with 0, 1 or 2 stutters (and more, but this is not relevant for the current analysis). The genomic DNA and the amplicon strands are all sources for new amplicons formed in the next cycle. In detail, there are:
At the end of cycle C
+
1 there are in total
+
x0
+
y0,0 amplicon chains without stutter,
+
x1
+
y0,1
+
y1,1 amplicon chains with 1 stutter,
+
y1,2
+
y2,2 amplicon chains with 2 stutters.
The mathematical analyses focus on chains with maximally 2 stutters.
3.3. Probability distributions of the newly formed amplicons
Given the value of D0, the variables x0 and x1 follow a multinomial probability distribution according to
(1)The marginal probability distributions of x0 and x1 are binomials. Written out for x0 it follows
(2)Similarly, x1
∼
Binom[D0;q3].
Given the numbers of chains D0, S0, S1 and S2, the joint and the marginal probability distributions of the random variables y0,0, y0,1, y1,1, y1,2, and y2,2 are multinomial or binomial according to:
(3)The probability distributions of amplicon chains with more than 2 stutters are not relevant for the current paper.
The joint probability distribution of S0, S1 and S2 is not known, and will simply be referred to as fS.
3.4. The situation after the first cycle (C
=
1)
During the first cycle, the only source of amplicons is the genomic DNA. It is impossible to form amplicons with 2 stutters in the current model. The stochastic properties of the numbers of amplicons with 0 or 1 stutters follow immediately from the multinomial distribution of Eq. (1). As such:
(4)3.5. Recursive relations for the amplicons without stutter
The expected number of amplicons without stutter in cycle C
+
1 follows from calculating the expectation of the sum of the random variables S0
+
x0
+
y0,0. The (marginal) probability distribution functions of x0 and y0,0 conditional on S0 are known. The expectation can be written as the sum series
(5)In this expression, fS(S0) is the marginal probability distribution of S0, which is unknown. The range of possible values is also unknown, but it turns out that there is no need to further specify these. The number of copies formed from genomic DNA ranges from 0 to D0, and follows the marginal probability distribution of Eq. (2). The number of new amplicons without stutter (y0,0) that can be formed in the new cycle ranges from 0 to S0, with a marginal probability distribution as indicated in Eq. (3).
The main steps in evaluating the sum series of Eq. (5) are presented in Appendix A. The result is a recursive relation
(6)This relation is intuitively entirely clear: the expected number of copies formed from genomic DNA in any cycle is q2D0, and the expectation of newly formed amplicons from already existing amplicons equals p2μ(0)C. This leads to μ(0)C+1
=
μ(0)C
+
q2D0
+
p2μ(0)C, hence Eq. (6).
The relation for the variance of the chains without stutter follows from a similar summation as used in Eq. (5). Now the expectation of (S0
+
x0
+
y0,0
−
μ(0)C+1)2 needs to be assessed:
(7)Some details of the derivations can be found in Appendix B. In order to express μ(0)C+1 in terms of μ(0)C, use is made of the recursive relation of Eq. (6). The result is
(8)3.6. Recursive relations for the amplicons with 1 stutter
For the amplicons with one stutter, a similar way of reasoning is used as for the chains without stutter. Now μ(1)C+1 is the expectation of the sum S1
+
x1
+
y0,1
+
y1,1, with x1, y0,1 and y1,1 being mutually independent given the values of S0 and S1. As outlined in Appendix C, the following recursive relation is obtained:
(9)Also this relation is intuitively clear: there are μ(1)C copies already present at cycle C, q3D0 copies are formed from genomic DNA, p3μ(0)C copies are formed from amplicons without stutter, and p2μ(1)C copies are formed from amplicons with stutter. Adding them leads to Eq. (9).
The variance of the chains with one stutter is the expectation of (S1
+
x1
+
y0,1
+
y1,1
−
μ(1)C+1)2. Some details of the derivations can be found in Appendix D. The result is
(10)Note the appearance of the covariance of S0 and S1 (Θ(0,1)C) in this equation.
In order to be able to evaluate Eq. (10), a recursive relation for the covariance Θ(0,1)C is needed. The covariance is defined as the expectation of (S0
+
x0
+
y0,0
−
μ(0)C+1)(S1
+
x1
+
y0,1
+
y1,1
−
μ(1)C+1). Note that x0 and x1 are not mutually independent, and the same holds for y0,0 and y0,1 and for S0 and S1. The complete sum series can be found in Appendix E. Evaluation of the sum leads to the following recursive relation:
(11)3.7. Recursive relations for the amplicons with 2 stutters
With respect to the chains with 2 stutters, similar strategies can be used. The expression for μ(2)C+1 is obtained as the expectation of the sum S2
+
y1,2
+
y2,2. The expression for the variance is given by the expectation of (S2
+
y1,2
+
y2,2
−
μ(2)C+1)2. Now there are two covariances to consider; between chains with 0 and 2 stutters, and with 1 and 2 stutters. All the sum series can be found in Appendix F. The following recursive relations are obtained:
(12)3.8. Solving the recursive relations assuming D0 is fixed and known
Developing a spreadsheet to numerically evaluate the recursive relations is straightforward, but to study some properties exact mathematical solutions are needed. Solving the recursive relations given by Eqs. (6), (8), (9), (10), (11), (12) while taking the boundary conditions given by Eq. (4) into account, can be done using software packages like Mathematica (Wolfram Research, version 6.0, using the RSolve-function). The exact results are collected in Appendix G. For large cycle numbers the equations can be approximated to yield the following results:
(13)Use is made of the following definitions:
(14)Note that all entities in Eq. (13) are proportional to D0, the initially available number of copies of genomic DNA. The numbers of chains without stutter grow exponentially with the cycle number, but the chains with stutters show a different pattern.
No attempts were made to further derive and solve the recursive relations for the (co)variances of the chains with two stutters. They would become increasingly complex, and experimentally these chains are not very relevant at all. The theory can be used, however, to provide these equations if the need would arise.
3.9. Converting the results to the situation that D0 is actually a Poisson variable
Taking a (small) sample implies that random processes affect the number of genomic DNA strands that are actually subjected to the amplification. It is quite obvious to assume a Poisson distribution for the actual number of copies (see, for example, [14]). After extraction a fraction is retained. A binomial distribution can be taken for this step, but the final distribution for the number of copies entering the amplification is still a Poisson distribution. Let the expectation of the number of copies entering the amplification be N0, with the actual realization being D0. This latter value is fixed throughout the amplification.
Appendix H outlines the derivations. In order to distinguish the statistics taking the Poisson properties into account from the previous results, the tilde notation will be used. Applying the results of Appendix H to Eq. (13) yields
(15)The parameters are defined according to the definitions of Eq. (16).
(16)Note that Eqs. (13), (15) are essentially identical; only some of the parameters in Eq. (16) are different from those in Eq. (14).
3.10. Some implications
From Eq. (15) it follows that the relative error (defined as σ/μ) on the peak areas is proportional to 1/√N0. The relative error on the allelic peak area is independent of the cycle number, whereas this is not the case for the chains with 1 stutter. For infinite cycle numbers the ratios do approach the same level. Qualitatively, the same conclusion follows from Eq. (13).
On combining Eqs. (15), (16), the ratio of the expectations can now be written in approximation as
(17)This ratio therefore increases with the cycle number but does not depend on N0. Note that there is no upper limit for the ratio; for (very) large cycle numbers the expectation of the number of chains with stutter will always exceed that of the chains without stutter. The same result can be derived from Eq. (13).
4. Numerical example
In order to illustrate the implications of the current model, reasonable values for the individual probabilities are needed. Some indications can be found in literature. Shinde et al. [13] estimate the PCR-efficiency, hence p2, at 0.85 for cycles 1 through 50. Gill et al. [2] report a value of 0.82 and this value is also used by Cowell [5]. Roeder et al. [15] present allelic peak heights as ratios after 30 versus 28 cycles, after 34 versus 28 cycles and 34 versus 30 cycles, indicating that the peak heights approximately double every cycle. This indicates that the value of r2 is close to 2, hence p2 close to 1. Mulero et al. [11] present average peak heights as function of the cycle number, and observe an increase that is numerically close to a doubling in each cycle. A number of papers give numbers for the height of the stutter peak compared to the allelic peak, for 28-cycle PCR. The values differ somewhat and are dependent on the locus, but are typically below 20% mostly below 15% (see, for example, [1], [3], [4], [6], [8], [10], [12]). Without reporting numbers on the peak heights, both Gill et al. [12] and Gibb et al. [8] report that double stutters occur less frequently than single stutters. Our own experience indicates that amplification products with 2 stutters are rarely detectable, indicating that the peak area of double stutters is generally below 1% of the allelic peak.
In some studies the stutter areas are presented for both 28-cycle and 34-cycle PCR. Although clear numbers are missing, the figure in the paper of Petricevic et al. [7] indicates that the height is of the order of 7.5% (roughly 2.5–12%) for 28 and 15% (roughly 2.5–25%) for 34 cycles under the same conditions. The thesis of Graham (Section 3.5.2) [16] presents figures of several stutter peak areas for 28 and 34 cycles. In all cases, the stutter peaks are largest after 34 cycles. Combining these data with the linear relation of Eq. (17) then reveals that the slope p3/r2 attains values in the order of 0.001–0.01. In some other papers an estimate for p3 is given in the order of 0.002 [2] to 0.01 [13]. These values are in line with each other with respect to the apparently relevant order of magnitude.
Taking all this information, for the numerical example used in the current paper we chose p2
=
0.90, and p3
=
0.005. For the amplification of genomic DNA, it is assumed that the probability of no product is relatively high, whereas the probability of formation of stutter is once again small. The choices are q2
=
0.45 and q3
=
0.005. For the expected number of chains entering the amplification (N0) the value of 30 was chosen. Computer simulations (105 amplifications) were done, making use of the multinomial distribution (using the software package Mathematica version 6.0, using the RandomInteger[BinomialDistribution] function). The correlations between the numbers of chains are computed from the (co)variances using the standard definition of the correlation.
Fig. 2, Fig. 3, Fig. 4, Fig. 5 present the results of the computer simulations (symbols without lines) as well as of the exact calculations (solid lines) based on the recursive relations (or the exact equations as presented in the Appendix) and the approximated equations for as far as applicable (dashed lines; Eq. (15)), given the choices described above and taking the Poisson component into account.

Fig. 2.
Mean number of chains as function of the cycle number. The numbers near the lines reflect the number of stutters. The solid lines reflect the calculated exact results, the dashed lines the approximated results, and the symbols the simulated results. On the current scale, the approximation is not visible for chains with 2 stutters.

Fig. 3.
The ratios of the expected numbers of chains with no, 1 or 2 stutters. The numbers near the lines indicate the ratio plotted.

Fig. 4.
Coefficients of variation (expressed as percentage) for the chains with no, 1 or 2 stutters (symbols 0, 1 and 2, respectively). No approximation is available for 2 stutters.

Fig. 5.
The correlation between the numbers of chains with no and 1 stutter (symbol 0,1, including approximation), with no and 2 stutters (symbol 0,2), and with 1 and 2 stutters (symbol 1,2).
Fig. 2 presents the development of the expectations of the numbers of chains over the cycles, while Fig. 3 shows the ratios of them. At the chosen combination of probabilities, the stutter peak is about 8% of the allelic peak after 28 cycles, whereas the peak with 2 stutters is about 0.3%. The latter would therefore not be detectable. After 34 cycles, the ratios are about 9.5% and 0.4%, respectively. Fig. 4 presents the coefficients of variation. For the chains with stutter, the values decrease with increasing cycle number, whereas for the chains without stutter a plateau is reached after less than 5 cycles. Fig. 5 presents the correlation between the numbers of chains.
Large-scale computer simulations can also be used to get an impression on the type of bivariate probability distribution the numbers of chains with no and 1 stutter follow. This is illustrated in Fig. 6 (using a simulation size of 104). The results are presented on a logarithmic scale. The diagonal lines illustrate the gradual shift of the cloud of points to relatively more stuttered amplicon chains. The figure also shows that the distributions are markedly skewed. Studies are in progress to characterize the distributions mathematically.

Fig. 6.
The bivariate empirical probability distributions as obtained in the computer simulations after 28, 31 and 34 cycles. The lines correspond to the situations that the stutter peak intensity is 6%, 7.5% and 15% of the allelic peak intensity.
5. Discussion
The stochastic model described in the current document is based on a few simple assumptions. Essentially, there are three different outcomes for each individual chain after a single cycle: nothing happens, the chain is amplified, or amplified with addition of a stutter. It is assumed that there is no fourth alternative (such a backward stutter or double stutter, or degradation of a chain), and that the probabilities are constant over time. As first approximation this will be reasonable, although further refinements are clearly possible.
In spite of the relative simplicity of the model, the mathematics become already quite complex. For numerical calculations on expected numbers of chains, using the recursive relations is probably easier than implementing the exact solutions. The approximations to the exact solutions do provide some interesting insights.
The number of chains without stutter increases exponentially with the cycle number. For the chains with stutter, the increase is not exponential but contains an additional cycle dependent term. The expectations and (co)variances are all proportional to the expected number of DNA copies entering the amplification. As a result, the relative errors decrease as function of the copy numbers with a factor √N0. As such, the impact of stochastic processes is important in LCN samples where N0 is low. Many literature sources indicate that random processes are important in analyzing and interpreting LCN results (e.g., [1], [2], [3], [4], [5], [6], [7]). Note that the error on the peak intensities themselves also includes the random error of the measurement method. For high copy number samples this measurement error will exceed the stochastic PCR process error, but for low copy number DNA samples the reversed will be true.
The model implies that the relative intensity of the stutter peaks increases with increasing cycle number. This is reported by several authors (e.g., [7], [16], [17]). Although some studies present relative intensities after 28 and 34 cycles, no information could be found concerning intermediate cycle numbers hence on the development of the intensity as the PCR process proceeds. The current model implies that the relation is linear (Eq. (17)).
Increasing the number of cycles for LCN samples leads to higher signals, but also to higher stutter peaks. Both phenomena are well-supported by the models. Therefore, we advise other methods for LCN analysis, like a longer sample injection time or an increased voltage during electrophoresis.
One of the issues in analyzing LCN samples is the choice between either using more material per amplification or use replicates. The current model implies that using the double amount of material reduces the relative variation of the numbers of chains by a factor √2. Assessing the average of two separate experiments does the same. This implies that it actually does not matter very much which of the two procedures is used from an amplification point of view. Arguments in favor of performing multiple experiments are therefore related to issues like the incidence of contamination [17]. It needs also be acknowledged that the random error on the peak area detection is affected by the number of replicates in which the area is measured.
The current model provides (recursive) equations on the expectation and variance of peak heights, but not on the joint probability distribution of the numbers of chains. The latter would be needed in order to perform calculations on probabilities of relevant events. For example, the probability that allele-dropout occurs can be expressed in terms of the probability that the number of amplicons is below a certain threshold. The probability that a stutter peak is more than 15% of the allelic peak can also be expressed in terms of probabilities of numbers of amplicons. This is illustrated by the individual simulation results in Fig. 6 that are located above the highest diagonal line. Knowledge on the joint probability distribution function is needed to compute these types of probabilities and hence assess upper quantiles to be used as guidelines in interpreting LCN profiles. As first approach a multivariate Normal approximation might be used, using the covariance matrix and the expectations as given by the model. However, as the relative error can be extremely high, the validity of such an approach would be questionable. In addition, the example results in Fig. 6 indicate that the bivariate distribution function is not symmetric, as would be the case for the bivariate normal distribution. It might be tempting to speculate that alternatives might be found in multivariate distributions like the Multinomial-Dirichlet distribution or in the family based on the negative-multinomial distribution. For higher cycle numbers the discrete nature of the numbers of chains will be ignorable and the probability distribution could then be approximated using functions for continuous random variables, like a Gamma-distribution [5], [6] or some multivariate analogue of it. Indeed, the marginal probability distributions of the simulation data as presented in Fig. 6 can be studied separately and it was found that Gamma distributions can be used to approximate the marginal distributions to a satisfactory extent (not shown). In addition, using our results (the approximated Eqs. (13) or (15), or the exact solutions of Appendix G) it immediately follows that the scale parameter of the corresponding Gamma distribution is independent of the number of DNA copies entering the amplification, whereas the shape parameter does depend on this number, in line with the results of Cowell [5]. From here, however, it is still a big step to characterize the joint probability distribution function in mathematical terms. Further research is required to study this based on the current recursive models.
Appendix A. Conversion of Eq. (5) to (6)
Summation of Eq. (5) over all y0 is done by splitting the term S0
+
x0
+
y0,0

Using the properties of the binomial distribution for y0,0 it follows

Summation over all x0 proceeds similarly, leading to (use r2
=
p2
+
1)

As fS(S0) is the marginal probability distribution function of S0, the summations of S0.fS(S0) and fS(S0) equal, by definition, μ(0)C and 1, respectively. This leads to Eq. (6).
Appendix B. Conversion of Eq. (7) to (8)
The first step in the treatment of the sum series of Eq. (7) is to write
. Making use of the general properties of the binomial distribution it follows after summation over all y0,0:

Use the recursive Eq. (6) to replace μ(0)C+1 by q2D0
+
r2μ(0)C and reorganize to

Summation over all x0 leads to

As μ(0)C is the expectation of S0, the summation can be written in terms of the expectation and variance of S0, as depicted in Eq. (8).
Appendix C. Derivation of Eq. (9)
The summation for μ(1)C+1 is now given in full by

As S0 and S1 are stochastically dependent, summation is to be done over all possible combinations, making use of the joint probability distribution function.
Evaluation of this sum series proceeds via the same types of steps as described in Appendix A. The summation over y1,1, y0,1 and x1 lead to the appearance of the conditional expectations p2S1, p3S0 and q3D0, respectively. This leads to

Once again, as fS(S0,S1) reflects the joint probability distribution function of S0 and S1, it follows by definition that the sums over S0.fS(S0,S1) and over S1.fS(S0,S1) lead to the expectations μ(0)C and μ(1)C. This leads to Eq. (9).
Appendix D. Derivation of Eq. (10)
The variance of the number of chains with 1 stutter is expressed in full as

Evaluation of this sum series proceeds via similar steps as used in Appendix B. First expand the square, then evaluate the sums over the mutually independent random variables x1, y0,1 and y1,1. Use the recursive relation of Eq. (9) to replace μ(1)C+1 by μ(1)C, and it follows

Summation over all S0 and S1 leads to Eq. (10).
Appendix E. Derivation of Eq. (11)
The sum series of the covariance between the numbers of chains without stutter and with 1 stutter contains random variables that are not stochastically independent. The sum series to be evaluated is

Evaluation of the sum series does not involve any new calculus. Via similar procedures as used for the other sum series, the recursive relation given by Eq. (11) is obtained.
Appendix F. Sum series for the chains with 2 stutters
The recursive relation for the expectation and variance of the number of chains with 2 stutters can be expressed as


For the covariances, the expressions are


Evaluation of the sum series requires straightforward but rather tedious calculus.
Appendix G. Exact solutions of the recursive equations
Applying the software package Mathematica to the set of recursive equations yields the solutions. After simplification they can be written as:






For large cycle numbers only the terms with the highest powers of r2 are retained. For μ(0)C, μ(1)C and μ(2)C the highest order term is
, for the (co)variances the highest is
. After some slight rearrangements the results as presented in Eqs. (13), (14) are obtained.
Appendix H. Generalization to the situation that D0 is a Poisson random variable
Assume D0
∼
Poisson(N0). The equations in Appendix G show that μ(k)C,
and Θ(j,k)C are all proportional to D0. As such, it is convenient to define the D0-independent entities
,
and
. By definition, E(D0)
=
N0,
.
For the expectation of the number of chains with k stutters after cycle C it follows

Summation is to proceed over all possible values of D0 given N0. In the last step, the expectation of the Poisson distribution is used.
The variance follows after first deriving the expectation of
:

From this, it follows

A similar strategy is used for the covariance, calculated as E(S(j)CS(k)C)
−
E(S(j)C)E(S(k)C).

Appendix I. Supplementary data
References
- . A comparison of the characteristics of profiles produced with AmpFlSTR® SGM Plus™ multiplex system for both standard and low copy number (LCN) STR DNA analysis. Forensic Sci. Int. 2001;123:215–223
- . A graphical simulation model of the entire DNA process associated with the analysis of short tandem repeat loci. Nucleic Acids Res. 2005;33:632–643
- . DNA commission of the international society of forensic genetics: recommendations on the interpretation of mixtures. Forensic Sci. Int. 2006;160:90–101
- . Interpretation of low-copy-number DNA profile after post-PCR purification. Forensic Sci. Int. Genet. 2009;(Suppl. 2):542–543
- . Validation of an STR peak area model. Forensic Sci. Int. Genet. 2009;3:193–199
- R.G. Cowell, S.L. Lauritzen, J. Mortera, Probabilistic expert systems for handling artifacts in complex DNA mixtures. Forensic Sci. Int. Genet. (2010), doi:10.1016/j.fsigen.2010.03.008.
- . Validation and development of interpretation guidelines for low copy number (LCN) DNA profiling in New Zealand using the AmpFlSTR® SGM Plus™ multiplex. Forensic Sci. Int. Genet. 2010;4:305–310
- . Characterisation of forward stutter in the AmpFlSTR® SGM Plus® PCR. Sci. Just. 2009;49:24–31
- . Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA. Nucleic Acids Res. 1996;24:2807–2812
- . Systematic analysis of stutter percentages and allele peak height and peak area ratios at heterozygous STR loci for forensic casework and database samples. J. Forensic Sci. 2004;49:1–13
- . Development and validation of the AmpFlSTR® MiniFiler™ PCR amplification kit: a miniSTR multiplex for the analysis of degraded and/or PCR inhibited DNA. J. Forensic Sci. 2008;53:838–852
- . Report of the European network of forensic science institutes (ENFSI): formulation and testing of principles to evaluate STR multiplexes. Forensic Sci. Int. 2000;108:1–29
- . Taq DNA polymerase slippage mutation rates measured by PCR and quasi-likelihood analysis: (CA/GT)n and (A/T)n microsatellites. Nucleic Acids Res. 2003;31:974–980
- . Stochastic processes defining sensitivity and variability of internally calibrated quantitative NASBA-based viral load assays. Nucleic Acids Res. 2002;30:e137
- . Maximizing DNA profiling success from sub-optimal quantities of DNA: a staged approach. Forensic Sci. Int. Genet. 2009;3:128–137
- E.A.M. Graham, An Evaluation of Forensic DNA Profiling Techniques Currently Used in the United Kingdom. PhD Thesis October 2007 (date file retrieved from internet: May 31, 2010) (https://lra.le.ac.uk/bitstream/2381/4529/1/PhD%20Thesis%20EAMG%20FINAL%20VERSION.pdf).
- . An investigation of the rigor of interpretation rules for STRs derived from less than 100
pg of DNA. Forensic Sci. Int. 2000;112:17–40
PII: S1872-4973(11)00016-0
doi:10.1016/j.fsigen.2011.01.003
© 2011 Elsevier Ireland Ltd. All rights reserved.
Volume 6, Issue 1 , Pages 17-25, January 2012

