Highlights
- •Stutter artefacts from STRs in MPS forensic data are characterized.
- •Stutter proportions and candidate variables are investigated via beta regression.
- •Parental uninterrupted stretch (PTUS) as explanatory variable.
- •Estimates are integrated into probabilistic genotyping model MPSproto.
Abstract
Keywords
1. Introduction
Gill P., Bleka Ø, Hansson O., Benschop C., Haned H. Chapter 2 - Empirical characterization of DNA profiles. In: Forensic Practitioner’s Guide to the Interpretation of Complex DNA Profiles [Internet]. Academic Press; 2020. p. 55–88. Available from: 〈https://doi.org/10.1016/B978–0-12–820562-4.00010–9〉.
where is the read count for allele , and is the read count for stutter originating from (parental) allele .
Gill P., Bleka Ø, Hansson O., Benschop C., Haned H. Chapter 2 - Empirical characterization of DNA profiles. In: Forensic Practitioner’s Guide to the Interpretation of Complex DNA Profiles [Internet]. Academic Press; 2020. p. 55–88. Available from: 〈https://doi.org/10.1016/B978–0-12–820562-4.00010–9〉.
2. Methods
2.1 Ethical declaration
2.2 DNA samples
2.3 MPS analysis: Library preparation and sequencing
2.4 Bioinformatics analyses
Andrews,Simon. Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data [Internet]. [cited 2021 Aug 2]. Available from: 〈https://www.bioinformatics.babraham.ac.uk/projects/fastqc/〉.

Mitchell, R., Standage, D. lusSTR [Internet]. bioforensics; 2021 [cited 2021 Aug 2]. Available from: 〈https://github.com/bioforensics/lusSTR〉.
2.5 Data processing and identification of stutters
R Core Team (2021). R: A language and environment for statistical computing. [Internet]. Vienna, Austria: R Foundation for Statistical Computing; Available from: URL 〈https://www.R-project.org/〉.
Fox EJ, Reid-Bayliss KS. Accuracy of Next Generation Sequencing Platforms. J Gener Seq Appl [Internet]. 2014 [cited 2021 Jun 4];01(01). Available from: 〈https://www.omicsonline.org/open-access/accuracy-of-next-generation-sequencing-platforms-jngsa.1000106.php?aid=28132〉.
where and are the highest and second highest read counts respectively (hence per sample and locus combination. The sequence with the highest read counts () is always classified as a true allele, whereas the one with the second highest () is classified as a second true allele only if (motivation for this criterion is described in Section 3.2).
- 1.Backward stutters: n-1 and n-2 types
- 2.Forward stutters: n+1 and n+2 types
- 3.n0 stutter: n-1 and n+1 products in the same sequence
2.6 Statistical analysis of stutters
2.6.1 Beta regression model
Thomopoulos NT. Statistical Distributions [Internet]. Cham: Springer International Publishing; 2017 [cited 2021 Aug 19]. Available from: 〈http://link.springer.com/10.1007/978–3-319–65112-5〉.
Seefeld K., Ed M., Linder E. Statistics Using R with Biological Examples [Internet]. Durham, NH: University of New Hampshire, Department of Mathematics & Statistics; 2007. Available from: 〈https://cran.r-project.org/doc/contrib/Seefeld_StatsRBio.pdf〉.
2.6.2 Characterization of LUS stutters
2.6.2.1 Investigated variables
- 1.PTUS (abbreviation of parental uninterrupted stretch) (continuous): Number of repeats of the parental uninterrupted stretch (parental motif).
- 2.LUSRep (continuous): Number of repeats of the longest uninterrupted stretch of the parental allele (the motif which the stutter originates from).
- 3.MotifDiff (factor): Sequence motif that stutters and difference in number of repeats compared to the parental allele.
- 4.stutterType (factor): Type of stutter generated from the parental allele. It can be a longer sequence (e.g., n+1) or shorter (e.g., n-1). This variable was used to analyse data across all categories of stutter types.
- 5.Locus (factor): 27 STR markers were analysed. This variable was included in the model when analysis across all loci was performed. See Supplementary Table S1 with the full list of markers.
- 6.Complexity (factor): STRs are divided into different categories depending on the repeat pattern [[48]]. These are the following:
- 6.1Simple repeats (repeat units/motifs are identical in length and sequence).
- E.g., [ATCT]11.
- 6.2Compound repeats (two or more simple adjacent repeats/motifs).
- E.g., TCTA TCTG [TCTA]12.
- 6.3Complex repeats (combination of repeat units/motifs with variable length and/or sequence).
- E.g., [TCTA]6 [TCTG]5 [TCTA]3 TA [TCTA]3 TCA [TCTA]2 TCCATA [TCTA]10.
- 6.1
2.6.2.2 Model comparison
- 1.All loci and stutter types together (ALL model)
- 2.Per marker, but all stutter types together (per-marker model)
- 3.Per stutter type, but all markers together (per-stuttertype model)
- 4.Per marker and per stutter type (per-marker-stuttertype model)
Shmueli G. To Explain or to Predict? Stat Sci [Internet]. 2010 Aug 1 [cited 2022 Jan 20];25(3). Available from: 〈https://projecteuclid.org/journals/statistical-science/volume-25/issue-3/To-Explain-or-to-Predict/10.1214/10-STS330.full〉.
Shmueli G. To Explain or to Predict? Stat Sci [Internet]. 2010 Aug 1 [cited 2022 Jan 20];25(3). Available from: 〈https://projecteuclid.org/journals/statistical-science/volume-25/issue-3/To-Explain-or-to-Predict/10.1214/10-STS330.full〉.
2.6.3 Characterization of LUS and non-LUS stutters
Locus | Stutter type | Group | Sequence type | Sequence (bracket format) | |
---|---|---|---|---|---|
MotifDiff | MotifType | ||||
D12S391 | n-1 | AGAT:−1 | LUS | Parental | [AGAT]14 [AGAC]9 |
Stutter | [AGAT]13 [AGAC]9 | ||||
AGAC:−1 | non-LUS | Parental | [AGAT]14 [AGAC]9 | ||
Stutter | [AGAT]14 [AGAC]8 | ||||
D6S1043 | n-1 | ATCT:−1 | LUS | Parental | [ATCT]6 ATGT [ATCT]13 |
Stutter | [ATCT]6 ATGT [ATCT]12 | ||||
ATCT:−1 | non-LUS | Parental | [ATCT]6 ATGT [ATCT]13 | ||
Stutter | [ATCT]5 ATGT [ATCT]13 |
2.6.4 Characterization of n0 stutters
3. Results
3.1 Quality check with FastQC
3.2 Genotype identification

3.3 Stutter analysis
3.3.1 Characterization of LUS stutters
3.3.1.1 Model comparison
Linear predictor (Model) | Model Type | logLik | Param (n) | nObs | BIC |
---|---|---|---|---|---|
per marker / stutter type | 15497.85889 | 135 | 4813 | -29851 | |
per marker / stutter type | 15530.10966 | 146 | 4813 | -29822 |
3.3.1.2 Extension of the precision model
Beta regression model type | logLik | nParam | nObs | BIC |
---|---|---|---|---|
Constant precision (small model) | 15497.86 | 135 | 4813 | -29851 |
Variable precision (large model) | 15566.32 | 180 | 4813 | -29606.4 |
3.3.1.3 Stutter proportion sizes

3.3.2 Characterization of LUS and non-LUS stutters

3.3.3 Characterization of n0 stutters

4. Discussion
4.1 Setting the analytical thresholds
4.2 Stutter ratio vs stutter proportion
4.3 Model comparison for LUS stutters
4.4 Characterization of LUS stutters
4.5 Characterization of LUS and non-LUS stutters
4.6 Characterization of n0 stutters
4.7 Future research. Improvement of present model
4.8 MPSProto
Gill P., Bleka Ø, Benschop C., Haned H. Forensic Practitioner’s Guide to the Interpretation of Complex DNA Profiles [Internet]. First. Elsevier; 2020 [cited 2021 Nov 17]. Available from: 〈https://linkinghub.elsevier.com/retrieve/pii/C20190012332〉.
4.8.1 Extension of the EuroForMix model (MPSproto)
- 1.Stutter model,
- 2.Noise model and
- 3.Marker Efficiency.
- Gill P.
- Bleka Ø
- Hansson O.
- Benschop C.
- Haned H.
4.8.2 An illustration of mixture interpretation using MPSproto and EuroForMix
- •Hp: “Ref1 + Ref2 are contributors”
- •Hd: “Ref1 + 1 (unrelated) unknown are contributors”
4.8.3 Comparison of the two models
- •D12S391: An allele dropout for Ref2 using T = 30. There are several stutter artefacts that cannot be accounted for by EuroForMix, but they are for MPSproto.
- •D1S1656: An allele dropout for Ref2 using T = 30.
- •FGA: An additional artefact that is not accounted for using EuroForMix, whereas it is for MPSproto.
- •Penta E: Both alleles of Ref2 drop out using T = 30, but only one for T = 10.
- •vWA: A homozygous allele dropout for Ref2 using T = 30.
- •D21S11: A n+1 stutter artefact (12 reads) was not captured by the MPSproto stutter model (too few observations in the calibration dataset to be part of the model). The LR increased by log10LR= 0.3 when this stutter was accounted for in MPSproto.

5. Conclusion
CRediT authorship contribution statement
Acknowledgements
Appendix A. Supplementary material
Supplementary material.
Supplementary material.
References
- Massively parallel sequencing techniques for forensics: a review.Electrophoresis. 2018; 39: 2642-2654
- The first GHEP-ISFG collaborative exercise on forensic applications of massively parallel sequencing.Forensic Sci. Int. Genet. 2020; 49102391
- Current state-of-art of STR sequencing in forensic genetics.Electrophoresis. 2018; 39: 2655-2668
- European survey on forensic applications of massively parallel sequencing.Forensic Sci. Int. Genet. 2017; 29: e23-e25
- FDSTools: a software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise.Forensic Sci. Int. Genet. 2017; 27: 27-40
- Sequencing of 231 forensic genetic markers using the MiSeq FGxTM forensic genomics system – an evaluation of the assay and software.Forensic Sci. Res. 2018; 3: 111-123
- Stutter analysis of complex STR MPS data.Forensic Sci. Int. Genet. 2018; 35: 107-112
- From next generation sequencing to now generation sequencing in forensics.Forensic Sci. Int. Genet. 2019; 38: 175-180
- A review of bioinformatic methods for forensic DNA analyses.Forensic Sci. Int. Genet. 2018; 33: 117-128
- Sequencing technologies and tools for short tandem repeat variation detection.Brief. Bioinform. 2015; 16: 193-204
- Understanding the behavior of stutter through the sequencing of STR alleles.Forensic Sci. Int Genet. Suppl. Ser. 2019; 7: 115-116
- A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis.in: Kalendar R. PLOS One. 12. 2017
- Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus VWA.Nucleic Acids Res. 1996; 24: 2807-2812
- A study of the origin of ‘shadow bands’ seen when typing dinucleotide repeat polymorphisms by the PCR.Hum. Mol. Genet. 1993; 2: 411-415
- Detection and quantitative characterization of artificial extra peaks following polymerase chain reaction amplification of 14 short tandem repeat systems used in forensic investigations.Electrophoresis. 1997; 18: 1928-1935
- Slipped-strand mispairing: a major mechanism for DNA sequence evolution.Mol. Biol. Evol. 1987; 4: 203-221
- Polymerase slippage in relation to the uniformity of tetrameric repeat stretches.Forensic Sci. Int. 2003; 135: 163-166
- Slippage synthesis of simple sequence DNA.Nucleic Acids Res. 1992; 20: 211-215
Gill P., Bleka Ø, Hansson O., Benschop C., Haned H. Chapter 2 - Empirical characterization of DNA profiles. In: Forensic Practitioner’s Guide to the Interpretation of Complex DNA Profiles [Internet]. Academic Press; 2020. p. 55–88. Available from: 〈https://doi.org/10.1016/B978–0-12–820562-4.00010–9〉.
Butler JM. Forensic DNA typing: biology, technology, and genetics of STR markers. 2nd ed. Amsterdam; Boston: Elsevier Academic Press; 2005. 660 p.
- DNA commission of the international society of forensic genetics: recommendations on the interpretation of mixtures.Forensic Sci. Int. 2006; 160: 90-101
- Characterisation of forward stutter in the AmpFlSTR® SGM Plus® PCR.Sci. Justice. 2009; 49: 24-31
- Characterising stutter in forensic STR multiplexes.Forensic Sci. Int. Genet. 2012; 6: 58-63
- Assessment of the stochastic threshold, back- and forward stutter filters and low template techniques for NGM.Forensic Sci. Int. Genet. 2012; 6: 708-715
- Developing allelic and stutter peak height models for a continuous method of DNA interpretation.Forensic Sci. Int. Genet. 2013; 7: 296-304
- Investigation into stutter ratio variability between different laboratories.Forensic Sci. Int. Genet. 2014; 13: 79-81
- Identifying and modelling the drivers of stutter in forensic DNA profiles.Aust. J. Forensic Sci. 2014; 46: 194-203
- Compound stutter in D2S1338 and D12S391.Forensic Sci. Int. Genet. 2019; 39: 50-56
- Characterizing stutter variants in forensic STRs with massively parallel sequencing.Forensic Sci. Int Genet. 2020; 45102225
- Geographical heterogeneity of Y-chromosomal lineages in Norway.Forensic Sci. Int. 2006; 164: 10-19
Verogen. ForenSeq DNA Signature Prep Reference Guide. 2018;42.
Thermo Fisher Scientific I. Qubit 4 Fluorometer User Guide. 2018;72.
Illumina. MiSeq FGx Instrument Reference Guide. 2015;84.
Andrews,Simon. Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data [Internet]. [cited 2021 Aug 2]. Available from: 〈https://www.bioinformatics.babraham.ac.uk/projects/fastqc/〉.
- MultiQC: summarize analysis results for multiple tools and samples in a single report.Bioinformatics. 2016; 32: 3047-3048
- Fast STR allele identification with STRait Razor 3.0.Forensic Sci. Int Genet. 2017; Sep; 30: 18-23
Mitchell, R., Standage, D. lusSTR [Internet]. bioforensics; 2021 [cited 2021 Aug 2]. Available from: 〈https://github.com/bioforensics/lusSTR〉.
R Core Team (2021). R: A language and environment for statistical computing. [Internet]. Vienna, Austria: R Foundation for Statistical Computing; Available from: URL 〈https://www.R-project.org/〉.
- Qualitative and quantitative assessment of Illumina’s forensic STR and SNP kits on MiSeq FGxTM..in: Kalendar R. PLOS One. 12. 2017
Fox EJ, Reid-Bayliss KS. Accuracy of Next Generation Sequencing Platforms. J Gener Seq Appl [Internet]. 2014 [cited 2021 Jun 4];01(01). Available from: 〈https://www.omicsonline.org/open-access/accuracy-of-next-generation-sequencing-platforms-jngsa.1000106.php?aid=28132〉.
- Analysis and interpretation of mixed forensic stains using DNA STR profiling.Forensic Sci. Int. 1998; 91: 55-70
- Beta regression for modelling rates and proportions.J. Appl. Stat. 2004; 31: 799-815
- Regression analysis of variates observed on (0, 1): percentages, proportions and fractions.Stat. Model. 2003; 3: 193-213
- Model selection criteria in beta regression with varying dispersion.Commun. Stat. Simul. Comput. 2017; 46 (729:46)
Thomopoulos NT. Statistical Distributions [Internet]. Cham: Springer International Publishing; 2017 [cited 2021 Aug 19]. Available from: 〈http://link.springer.com/10.1007/978–3-319–65112-5〉.
Seefeld K., Ed M., Linder E. Statistics Using R with Biological Examples [Internet]. Durham, NH: University of New Hampshire, Department of Mathematics & Statistics; 2007. Available from: 〈https://cran.r-project.org/doc/contrib/Seefeld_StatsRBio.pdf〉.
- Beta regression in R.J. Stat. Softw. 2021; 34 ([Internet]. 2010 [cited 2021 Apr 24];34(2))
- Biology and genetics of new autosomal STR loci useful for forensic DNA analysis.Forensic Sci. Rev. 2012; 24: 15-26
- Bayesian Data Analysis. third ed. CRC Press, Boca Raton2014: 639
Shmueli G. To Explain or to Predict? Stat Sci [Internet]. 2010 Aug 1 [cited 2022 Jan 20];25(3). Available from: 〈https://projecteuclid.org/journals/statistical-science/volume-25/issue-3/To-Explain-or-to-Predict/10.1214/10-STS330.full〉.
- Investigation of the STR loci noise distributions of PowerSeqTM auto system.Croat. Med J. 2017; 58: 214-221
- Understanding the characteristics of sequence-based single-source DNA profiles.Forensic Sci. Int. Genet. 2020; 44102192
- Analysis of global variability in 15 established and 5 new European Standard Set (ESS) STRs using the CEPH human genome diversity panel.Forensic Sci. Int. Genet. 2011; 5: 155-169
- Use of the LUS in sequence allele designations to facilitate probabilistic genotyping of NGS-based STR typing results.Forensic Sci. Int. Genet. 2018; 34: 197-205
- Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers.Forensic Sci. Int. Genet. 2021; 51102459
- Flanking variation influences rates of stutter in simple repeats.Genes. 2017; 8: 329
- An examination of STR nomenclatures, filters and models for MPS mixture interpretation.Forensic Sci. Int. Genet. 2020; 48102319
Gill P., Bleka Ø, Benschop C., Haned H. Forensic Practitioner’s Guide to the Interpretation of Complex DNA Profiles [Internet]. First. Elsevier; 2020 [cited 2021 Nov 17]. Available from: 〈https://linkinghub.elsevier.com/retrieve/pii/C20190012332〉.
- EuroForMix: An open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts.Forensic Sci. Int. Genet. 2016; 21: 35-44
- Chapter 7 - The quantitative (continuous) model theory.Forensic Practitioner’s Guide to the Interpretation of Complex DNA Profiles. London: Academic Press, 2020: 181-238https://doi.org/10.1016/B978–0-12–820562-4.00010–9
- Modeling allelic analyte signals for aSTRs in NGS DNA profiles.J. Forensic Sci. 2021; 66: 1234-1245
Article info
Publication history
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy