## Highlights

- •Convergence criteria strongly influence precision of MCMC DNA mixture algorithms.
- •Our Hamiltonian Monte Carlo can fulfil strong convergence criteria.
- •Our tool significantly reduces standard deviation of the log-likelihood ratios.
- •It is the first probabilistic genotyping algorithm to benefit from GPU acceleration.

## Abstract

**Motivation:**Analysing mixed DNA profiles is a common task in forensic genetics. Due to the complexity of the data, such analysis is often performed using Markov Chain Monte Carlo (MCMC)-based genotyping algorithms. These trade off precision against execution time. When default settings (including default chain lengths) are used, as large as a 10-fold changes in inferred log-likelihood ratios (LR) are observed when the software is run twice on the same case. So far, this uncertainty has been attributed to the stochasticity of MCMC algorithms. Since LRs translate directly to strength of the evidence in a criminal trial, forensic laboratories desire LR with small run-to-run variability.

### Results:

## Keywords

## 1. Introduction

*mixed DNA samples*) are often collected. The resulting short tandem repeat data are occasionally affected by stochastic events such as severe peak height imbalance, drop-outs, and drop-ins [

where $V$ is the observed EPG, ${S}_{j}$ represents a genotype set—a list of tuples denoting the allele designations of contributors. The summations are over all possible genotype sets $j$. ${H}_{p}$ and ${H}_{d}$ are the hypotheses of the prosecutor and the defendant respectively. A hypothesis assumes inclusion of certain contributors (suspect, victim, etc.) in the mixture, as well as the background allele frequencies of the populations the contributors allegedly belong to. Usually, the difference between the hypothesis of the prosecutor and the hypothesis of the defendant is the inclusion of the suspect in the former. In the setting we consider, the number of considered contributors is fixed beforehand. $P\left({S}_{j}\right|{H}_{n})$ can be estimated based on the background frequencies of the alleles in the populations of interest for any hypothesis ${H}_{n}$, $n=\{p,d\}$ [

*Probabilistic genotyping*(PG) refers to the set of statistical methods used to compute LR for given EPG data.

- Cowell R.
- Graversen T.
- Lauritzen S.L.
- Mortera J.

*J. R. Stat. Soc. Ser. C. Appl. Stat.*2014;

- Cowell R.G.
- Graversen T.
- Lauritzen S.L.
- Mortera J.

*J. R. Stat. Soc. Ser. C. Appl. Stat.*2015; 64 (This is the accepted version of the following article) (which has been published in final form at http://dx.doi.org/10.1111/rssc.12071): 1-48

with the prior probability density function (PDF) of the parameters $f\left(\cdot \right)$. In an ideal scenario, LR is independent of any choices made by the laboratory technician and of any random confounding factors. In practice, however, LR depends on the variation in the samples from the crime scene, stochastic events occurring during PCR amplification, allele frequency sampling, parameter settings in the data-processing software, hyper-parametrisation of the PG software, etc. [

*run-to-run variability*remains [

### 1.1 Precision of DNA mixture deconvolution

*precision*of DNA mixture deconvolution as the inverse variance between results of runs with identical hyper-parametrisation on the same EPG data for the same hypotheses. Precision has to be considered in addition to the accuracy of a PG system [

“The argument is that the existence of variabilityCourts are often unaware of run-to-run variability, as expert witnesses usually report a single LR number [[across PG runs — our note]raises doubts about whether any of the results should be accepted.” [[16]]

*Gelman–Rubin*(GR) diagnostic [,

“The condition of GR near 1 depends on the problem at hand; for most examples, values below 1.1 are acceptable, but for a final analysis in a critical problem, a higher level of precision may be required.” []Providing evidence in court should be considered a critical problem, as the consequences of wrong or doubtful answers are significant [

“We argue that a cutoff of GR $\le $1.1 is much too high to yield reasonable estimates of target quantities.” [This seems even more important since the statistical models used in both of these software tools combine continuous (e.g. peak intensities) and discrete (e.g. genotype sets) dimensions. However, GR cannot monitor convergence in discrete dimensions. It is therefore possible that convergence is deduced purely from the continuous parameters, while the genotype set distributions may not have converged at all, offering a possible explanation for the large run-to-run variability observed despite low GR thresholds. In our model, we avoid the issue of assessing convergence of genotype sets by marginalising them out (i.e., the genotype set is not part of the MCMC sampling process).[26]]

### 1.2 Trade-off with execution time

- •Laboratories might have to run software multiple times with different hyper-parametrisations in order to check the robustness of the results or to test hypotheses with different numbers of contributors, different analytical thresholds, different priors, etc.
- •Laboratories might want to quantify the precision of the results over several identical replicates.
- •Forensic laboratories are often working under time pressure, e.g., if cases attract great media attention or laws limit detention time without charges.

## 2. Materials and methods

$P\left({S}_{j}\right|M)$ is assumed to be a flat prior. This assumption leads to genotype weights that are independent from the person(s) of the interest. In Bayesian inference, evidence is usually neglected as it is too expensive to compute and constant w.r.t. model parameters. We thus obtain the unnormalised posterior

which we use for estimating $P\left(V\right|{S}_{j})$ (see Eq. (2)). We assume peak heights to be conditionally independent given ${S}_{j}$ and $M$, and alleles of a contributor in different loci to be independent from each other. To provide a mathematical formulation of the model, we denote:

- •${O}_{ar}^{l}$: random variable of observed peak height at locus $l$, allele $a$, replicate $r$;
- •${x}_{ar}^{l}$: expected peak height at locus $l$, allele $a$, replicate $r$;
- •${f}_{X}$: the PDF of a random variable $X$;
- •$Q$: the “catch-all” dropout allele [[9]];
- •${h}_{r}^{l}$: the analytical threshold of the EPG for locus $l$ and replicate $r$;
- •$\mathcal{N}(0,{s}^{2})$: a normal distribution with mean $0$ and standard deviation $s$;
- •${c}_{\psi}$: the peak height variance parameter for peak type $\psi $ (e.g. stutter).

where ${V}_{l,r}$ denotes the available data at locus $l$ for replicate $r$, and ${S}_{j,l}$ is the genotype set $j$ at locus $l$. The conditional probability is modelled as:

The inner multiplications are performed over the set $B$ of observed peaks and the set $D$ of hypothetical dropout peaks. This model formulates separately the relative likelihood of observed peaks and the probabilities of dropout events. In the following, we abbreviate the notation for ${x}_{ar}^{l}(m,{S}_{j,l})$ to ${x}_{ar}^{l}$.

where ${M}^{\prime}$ is the sample from the posterior.

### 2.1 Observed peaks

For the drop-in events, we use the model introduced by Euroformix [

Two hyper-parameters based on the level of noise in negative controls are required: the drop-in rate ${d}_{\mathrm{rate}}$ and the $\lambda $ of the exponential distribution. For the expected peaks, we assume a log-normal distribution following previous works [

The standard deviation of this distribution depends on the components of the expected peaks and their heights:

We define $\psi \in \Psi \left(a\right)=\{a+2,a+1,a,a-1\}$. This means that for a single allele $a$, we consider contributions from the allelic peak $a$, the backward stutter from the $a+1$ peak, the forward stutter from the $a-1$ peak, and the double backward stutter from the $a+2$ peak. Our method currently does not consider half-backward stutter.

The product contribution at a selected allele $a$ is defined as:

where ${m}_{a}^{l}$ is the molecular weight of the allele, and $\overline{m}$ is the average of the largest and smallest observed molecular weights on the EPG within the called peaks. The integer ${\zeta}_{an}^{l}$ is 1 if the genotype of contributor $n$ includes allele $a$ in locus $l$, and 0 otherwise. It is equal to 2 in case of a homozygote. The scalars ${\alpha}_{r}^{l}$ are the locus-specific amplification efficiencies (LSAE), ${w}_{n}$ are the weights of the contributors that sum up to 1, ${\delta}_{rn}$ are the degradation parameters, and ${t}_{r}^{\prime}$ is the total allelic product expressed in relative fluorescent units (RFU).

The sum in Eq. (12) is over the assumed contributors. Finally, ${\chi}_{\psi r}^{l}$ models the fact that peak variance is inversely proportional to peak height [

J.S. Buckleton, The probability of dropout and drop-in, URL https://johnbuckleton.files.wordpress.com/2019/08/the-probability-of-dropout-and-drop-in.pdf.

### 2.2 Dropout events

See Supplementary Material section 1 for details.

### 2.3 Parameters of the model and priors

Parameter | Meaning | Domain | Quantity |
---|---|---|---|

${c}_{s},{c}_{p}$ | Stutter and peak height standard deviation | ${\mathbb{R}}_{>0}$ | 2 |

${t}_{r}^{\prime}$ | Total allelic product | ${\mathbb{R}}_{>0}$ | $R$ |

${w}_{n}$ | Weights of contributors | (0,1) | $N-1$ |

${\delta}_{rn}$ | Degradation parameters | ${\mathbb{R}}_{>0}$ | $RN$ |

${\alpha}_{r}^{l}$ | Locus-specific amplification efficiencies (LSAE) | ${\mathbb{R}}_{>0}$ | ${L}_{\mathrm{all}}$ |

${i}^{2}$ | LSAE variance | ${\mathbb{R}}_{>0}$ | 1 |

Here, $P\left(\alpha \right|i)$ is the prior ln$\left({\alpha}_{r}^{l}\right)\sim \mathcal{N}(0,{i}^{2})$ that prevents the amplification efficiencies from drifting away from 1 too far. The prior variance ${i}^{2}\sim \mathrm{Exp}\left({\sigma}_{\alpha}\right)$, where ${\sigma}_{\alpha}$ is a hyper-parameter to be optimised by the laboratory [

### 2.4 Other considerations

- •the corresponding allele score is a part of any of the genotypes from the set;
- •or the height is lower than the drop-in cap and there is no other peak that can be explained as drop-in;
- •or the peak can be explained as a stutter of another peak with an observed stutter ratio lower than the maximum stutter ratio.

### 2.5 Sampling algorithm

**Three likely genotype sets**${\mathit{S}}_{\mathit{j}}$

**and the resulting contributions to the peaks of the locus in**Fig. 2

**from two contributors.**${C}_{n,a}={\rho}_{aa1}{t}_{an1}$ is the contribution of contributor $n$ from a single copy (i.e. $r=1$) of allele $a$. ${C}_{n,a}^{\prime}={\rho}_{(a-1)a1}{t}_{an1}$ is the contribution from stutter originating from allele $a$. For simplicity, only single-backward stutter and a single replicate are considered. Duplicate entries are highlighted with the same colour.

## 3. Results and discussions

*FBI extended caucasian population*[

*NIST 1036-Caucasian*background allele frequencies [

**Hyper-parameter values used in the present benchmarks.**SR stands for stutter ratio. The values of the variance rates and the standard deviation of the locus-specific amplification efficiency (LSAE) were adjusted to our formulation of the model (i.e., natural logarithm instead of ${log}_{10}$, 1/mean for LSAE standard deviation).

Hyper-parameter | ProvedIt | MIX05, MIX13 |
---|---|---|

Drop-in rate (${d}_{\mathrm{rate}}$) | 0.0015 | 0 |

Drop-in ($\lambda $) | 0.032 | N/A |

Allele variance (${c}_{p}^{2}$) shape | 5.653 | 3.57 |

Allele variance (${c}_{p}^{2}$) rate | 15.7 | 5.196 |

Stutter variance (${c}_{s}^{2}$) shape | 1.501 | 6.97 |

Stutter variance (${c}_{s}^{2}$) rate | 148.462 | 9.279 |

LSAE (${\alpha}_{r}^{l}$) standard deviation | 32.258 | 33.333 |

Rare allele frequency | $2.5\xf7361$ | $2.5\xf7202$ |

Drop-in cap | 3 ${h}_{r}^{l}$ | |

Max. observed backward SR | 0.3 | |

Max. observed forward SR | 0.15 | |

Max. observed double backward SR | 0.05 | |

Wright’s ${F}_{ST}$ for Balding–Nichols | 0.01 | |

Number of chains | 4 | |

Number of burnin steps | 1200 | |

Leapfrog steps per sample | 10 | |

GR stopping threshold | 1.05 |

**Accuracy of our method.**Average LRs over 10 runs for our method (“HMC”) in comparison with other solutions on the MIX05 and MIX13 benchmarks with known ground truth. In most cases, all tested methods correctly return LRs larger than 1 for ground-truth inclusion or smaller than 1 for ground-truth exclusion. For MIX13, we compare with the reported results of Euroformix (“EFM”) version 1.11.4 and STRmix™

Case | ${H}_{p}$ | Truth | ${log}_{10}\text{LR}$ | ||
---|---|---|---|---|---|

HMC | EFM | STRmix™ | |||

MIX05 Case 1 | Perpetrator+U | Incl. | 19.34 | – | – |

MIX05 Case 2 | Perpetrator+U | Incl. | 25.67 | – | – |

MIX05 Case 3 | Perpetrator+U | Incl. | 22.32 | – | – |

MIX05 Case 4 | Perpetrator+U | Incl. | 10.44 | – | – |

MIX13 Case 1 | V+S01A | Incl. | 20.15 | 20.18 | 20.15 |

MIX13 Case 2 | S02A+U+U | Incl. | 17.03 | 17.28 | 16.98 |

MIX13 Case 2 | S02B+U+U | Incl. | 7.50 | 7.88 | 7.26 |

MIX13 Case 2 | S02C+U+U | Incl. | 5.41 | 6.11 | 5.83 |

MIX13 Case 2 | S02D+U+U | Excl. | −16.18 | −2.36 | −14.03 |

MIX13 Case 3 | V+W+S03A | Incl. | 7.87 | 6.82 | 7.69 |

MIX13 Case 3 | V+W+S03B | Excl. | $-\infty $ | $-\infty $ | $-\infty $ |

MIX13 Case 4 | V+S | Incl. | 20.23 | 19.91 | 20.15 |

MIX13 Case 5 | S05A+U+U | Incl. | 3.38 | 9.26 | 3.45 |

MIX13 Case 5 | S05B+U+U | Incl. | 1.61 | 9.38 | 3.32 |

MIX13 Case 5 | S05C+U+U | Excl. | −8.66 | 6.45 | −9.22 |

MIX13 Case 5 | S05A+U+U+U | Incl. | 6.20 | – | – |

MIX13 Case 5 | S05B+U+U+U | Incl. | 5.96 | – | – |

MIX13 Case 5 | S05C+U+U+U | Excl. | 2.63 | – | – |

### 3.1 Accuracy: MIX05 and MIX13 benchmarks

### 3.2 Precision: ProvedIt benchmark

^{ 3}

### 3.3 Performance: computationally challenging mixtures

- •A05_RD14-0003-30_31_32_33_34-1;1;1;1;1-M3I22-0.315GF-Q1.3_01.15 s(named here Sample 3, known contributor is Contributor 31)
- •E04_RD14-0003-48_49_50_29_30-1;1;2;4;1-M2d-0.279GF-Q2.1_05.15 s(named here Sample 4, known contributor is Contributor 29)

**Inference runtimes.**We report the average (over 10 repetitions) inference times on a single-GPU cloud instance for the listed cases with different numbers of contributors (NoC) in minutes (m) and seconds (s).

Case | NoC | NoC known | Inference time |
---|---|---|---|

MIX13 Case 2 | 3 | 0 | 3 m 59 s |

MIX13 Case 3 | 3 | 0 | 1 m 59 s |

MIX13 Case 5 | 3 | 0 | 4 m 37 s |

MIX13 Case 5 | 4 | 0 | 31 m 15 s |

ProvedIt Sample 1 | 4 | 1 | 4 m 13 s |

ProvedIt Sample 1b | 4 | 0 | 19 m 39 s |

ProvedIt Sample 2b | 3 | 0 | 6 m 55 s |

ProvedIt Sample 3 | 5 | 1 | 59 m 55 s |

ProvedIt Sample 4 | 5 | 1 | 27 m 15 s |

## 4. Conclusions and future work

## Acknowledgements

## Appendix A. Supplementary data

- MMC S1
Further details on experiments, stutter models and memory consumption data.

## References

- Interpreting low template DNA profiles.
*Forensic Sci. Int.: Genetics.*2009; 4: 1-10 - Painting the target around the matching profile: the texas sharpshooter fallacy in forensic DNA interpretation.
*Law, Probab. Risk.*2009; 8: 257-276 - Report to the President - Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods.CreateSpace Independent Publishing Platform, 2016
- Commentary: Likelihood ratio as weight of forensic evidence: A closer look.
*Front. Genetics.*2018; 9 - Forensic DNA Profiling : A Practical Guide to Assigning Likelihood Ratios.CRC Press, Boca Raton2021
- EuroForMix: An open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts.
*Forensic Sci. Int.: Genetics.*2016; 21: 35-44 - Analysis of forensic DNA mixtures with artefacts.
*J. R. Stat. Soc. Ser. C. Appl. Stat.*2014; - Analysis of forensic dna mixtures with artefacts.
*J. R. Stat. Soc. Ser. C. Appl. Stat.*2015; 64 (This is the accepted version of the following article) (which has been published in final form at http://dx.doi.org/10.1111/rssc.12071): 1-48 - Validating TrueAllele ® DNA mixture interpretation.
*J. Forensic Sci.*2011; 56: 1430-1447 - The interpretation of single source and mixed DNA profiles.
*Forensic Sci. Int.: Genetics.*2013; 7: 516-528 - Special issue on measuring and reporting the precision of forensic likelihood ratios: Introduction to the debate.
*Sci. Justice.*2016; 56: 371-373 - An illustration of the effect of various sources of uncertainty on DNA likelihood ratio calculations.
*Forensic Sci. Int.: Genetics.*2014; 11: 56-63 - Validating TrueAllele ® interpretation of DNA mixtures containing up to ten unknown contributors.
*J. Forensic Sci.*2019; 65: 380-398 - Developmental validation of STRmix™, expert software for the interpretation of forensic DNA profiles.
*Forensic Sci. Int.: Genetics.*2016; 23: 226-239 - STRmix™ collaborative exercise on DNA mixture interpretation.
*Forensic Sci. Int.: Genetics.*2019; 40: 1-8 - The variability in likelihood ratios due to different mechanisms.
*Forensic Sci. Int.: Genetics.*2015; 14: 187-190 - A review of probabilistic genotyping systems: EuroForMix, DNAStatistX and STRmix™.
*Genes.*2021; 12: 1559 - Error rates, likelihood ratios, and jury evaluation of forensic evidence.
*J. Forensic Sci.*2020; 65: 1199-1209 - Standards for the formulation of evaluative forensic science expert opinion.
*Sci. Justice.*2009; 49: 161-164 - ENFSI guideline for evaluative reporting in forensic science.2015
- Revisiting the STRmix™ likelihood ratio probability interval coverage considering multiple factors.2021 (Biorxiv Preprint Server)
- Examining performance and likelihood ratios for two likelihood ratio systems using the PROVEDIt dataset.
*PLOS ONE.*2021; 16e0256714 - Bayesian Data Analysis.second ed. Chapman & Hall/CRC, Boca Raton, Fla2004
- General methods for monitoring convergence of iterative simulations.
*J. Comput. Graph. Statist.*1998; 7: 434-455 - Genoproof Mixture 3—new software and process to resolve complex DNA mixtures.
*Forensic Sci. Int.: Genetics.*2017; 6: e549-e551 - The algorithm says you did it: The use of black box algorithms to analyze complex DNA evidence.
*Harvard J. Law Technol.*2017; 31: 275 - Revisiting the Gelman–Rubin diagnostic.
*Statist. Sci.*2021; 36 - Using probabilistic theory to develop interpretation guidelines for Y-STR profiles.
*Forensic Sci. Int.: Genetics.*2016; 21: 22-34 - Testing whether stutter and low-level DNA peaks are additive.
*Forensic Sci. Int.: Genetics.*2019; 43102166 - Examining the additivity of peak heights in forensic DNA profiles.
*Australian J. Forensic Sci.*2020; : 1-15 J.S. Buckleton, The probability of dropout and drop-in, URL https://johnbuckleton.files.wordpress.com/2019/08/the-probability-of-dropout-and-drop-in.pdf.

- Statement on minor miscoding in BN formulae STRmix V1.08 with additional comments.2016
- DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands.
*Forensic Sci. Int.*1994; 64: 125-140 - The ‘factor of two’ issue in mixed DNA profiles.
*J. Theoret. Biol.*2014; 363: 300-306 - Hybrid Monte Carlo.
*Phys. Lett. B.*1987; 195: 216-222 - A conceptual introduction to Hamiltonian Monte Carlo.2018
- TensorFlow distributions.2017
- A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt.
*Forensic Sci. Int.: Genetics.*2018; 32: 62-70 - NIST interlaboratory studies involving DNA mixtures (MIX05 and MIX13): Variation observed and lessons learned.
*Forensic Sci. Int.: Genetics.*2018; 37: 81-94 - NIST interlaboratory studies involving DNA mixtures (MIX13): A modern analysis.
*Forensic Sci. Int.: Genetics.*2018; 37: 172-179 - Population data on the expanded CODIS core STR loci for eleven populations of significance for forensic DNA analyses in the united states.
*Forensic Sci. Int.: Genetics.*2016; 25: 175-181 - Corrigendum to ‘US population data for 29 autosomal STR loci’[Forensic Sci. Int. Genet. 7 (2013) e82–e83].
*Forensic Sci. Int.: Genetics.*2017; 31: e36-e40 - A sensitivity analysis to determine the robustness of STRmix™ with respect to laboratory calibration.
*Forensic Sci. Int.: Genetics.*2018; 35: 113-122 - A comparison of likelihood ratios obtained from EuroForMix and STRmix™.
*J. Forensic Sci.*2021;

## Article info

### Publication history

### Identification

### Copyright

### User license

Creative Commons Attribution (CC BY 4.0) |## Permitted

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes

Elsevier's open access license policy