A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt

Published:October 25, 2017DOI:


      • We describe a large-scale database consisting of over 25,000 STR profiles.
      • The samples were generated over a four-year period using 144 laboratory conditions.
      • The database consists of samples containing one- to five-persons.


      DNA-based human identity testing is conducted by comparison of PCR-amplified polymorphic Short Tandem Repeat (STR) motifs from a known source with the STR profiles obtained from uncertain sources. Samples such as those found at crime scenes often result in signal that is a composite of incomplete STR profiles from an unknown number of unknown contributors, making interpretation an arduous task. To facilitate advancement in STR interpretation challenges we provide over 25,000 multiplex STR profiles produced from one to five known individuals at target levels ranging from one to 160 copies of DNA. The data, generated under 144 laboratory conditions, are classified by total copy number and contributor proportions. For the 70% of samples that were synthetically compromised, we report the level of DNA damage using quantitative and end-point PCR. In addition, we characterize the complexity of the signal by exploring the number of detected alleles in each profile.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Forensic Science International: Genetics
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Gill P.
        • Haned H.
        • Bleka O.
        • Hansson O.
        • Dorum G.
        • Egeland T.
        Genotyping and interpretation of STR-DNA: Low-template, mixtures and database matches-Twenty years of research and development.
        Forensic Sci. Int. Genet. 2015; 18: 100-117
        • Butler J.M.
        Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers.
        2nd ed. Elsevier Academic Press, Burlington2005
        • Mousavi S.A.
        • Javadimoghadam M.
        • Ghavamzadeh A.
        • Alimoghaddam K.
        • Sayarifard A.
        • Ghaffari S.H.
        • et al.
        The relationship between STR-PCR chimerism analysis and chronic GvHD following hematopoietic stem cell transplantation.
        Int. J. Hematol.-Oncol. Stem Cell Res. 2017; 11: 24-29
        • Corral-Vazquez C.
        • Aguilar-Quesada R.
        • Catalina P.
        • Lucena-Aguilar G.
        • Ligero G.
        • Miranda B.
        • et al.
        Cell lines authentication and mycoplasma detection as minimun quality control of cell lines in biobanking.
        Cell Tissue Bank. 2017;
        • Dørum G.
        • Kaur N.
        • Gysi M.
        Pedigree-based relationship inference from complex DNA mixtures.
        Int. J. Legal Med. 2017; 131: 629-641
        • Egeland T.
        • Slooten K.
        The likelihood ratio as a random variable for linked markers in kinship analysis.
        Int. J. Legal Med. 2016; 130: 1445-1456
        • Silva N.M.
        • Pereira L.
        • Poloni E.S.
        • Currat M.
        Human neutral genetic variation and forensic STR data.
        PLoS One. 2012; 7: e49666
        • Alaeddini R.
        • Walsh S.J.
        • Abbas A.
        Forensic implications of genetic analyses from degraded DNA–a review.
        Forensic Sci. Int. Genet. 2010; 4: 148-157
        • Alaeddini R.
        Forensic implications of PCR inhibition–A review.
        Forensic Sci. Int. Genet. 2012; 6: 297-305
        • Funes-Huacca M.E.
        • Opel K.
        • Thompson R.
        • McCord B.R.
        A comparison of the effects of PCR inhibition in quantitative PCR and forensic STR analysis.
        Electrophoresis. 2011; 32: 1084-1089
        • Vernarecci S.
        • Ottaviani E.
        • Agostino A.
        • Mei E.
        • Calandro L.
        • Montagna P.
        Quantifiler® Trio Kit and forensic samples management: a matter of degradation.
        Forensic Sci. Int. Genet. 2015; 16: 77-85
        • Mönich U.J.
        • Duffy K.
        • Médard M.
        • Cadambe V.
        • Alfonse L.E.
        • Grgicak C.
        Probabilistic characterisation of baseline noise in STR profiles.
        Forensic Sci. Int. Genet. 2015; 19: 107-122
        • Gilder J.R.
        • Doom T.E.
        • Inman K.
        • Krane D.E.
        Run-specific limits of detection and quantitation for STR-based DNA testing.
        J. Forensic Sci. 2007; : 2007
        • Walsh P.S.
        • Fildes N.J.
        • Reynolds R.
        Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA.
        Nucleic Acids Res. 1996; 24: 2807-2812
        • Bright J.-A.
        • Taylor D.
        • Curran J.M.
        • Buckleton J.S.
        Developing allelic and stutter peak height models for a continuous method of DNA interpretation.
        Forensic Sci. Int. Genet. 2013; 7: 296-304
        • Bieber F.R.
        • Buckleton J.S.
        • Budowle B.
        • Butler J.M.
        • Coble M.D.
        Evaluation of forensic DNA mixture evidence: protocol for evaluation, interpretation, and statistical calculations using the combined probability of inclusion.
        BMC Genet. 2016; 17: 125
        • Duffy K.R.
        • Gurram N.
        • Peters K.C.
        • Wellner G.
        • Grgicak C.M.
        Exploring STR signal in the single- and multicopy number regimes: deductions from an in silico model of the entire DNA laboratory process.
        Electrophoresis. 2017; 38: 855-868
        • Benschop C.C.G.
        • Haned H.
        • Jeurissen L.
        • Gill P.D.
        • Sijen T.
        The effect of varying the number of contributors on likelihood ratios for complex DNA mixtures.
        Forensic Sci. Int. Genet. 2015; 19: 92-99
        • Dror I.E.
        • Hampikian G.
        Subjectivity and bias in forensic DNA mixture interpretation.
        Sci. Justice. 2011; 51
        • Holdren J.P.
        • Lander E.S.
        • Press W.
        • Savitz M.
        • Austin W.M.
        • Chyba C.
        • et al.
        Report To the President Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods.
        • Coble M.D.
        • Buckleton J.
        • Butler J.M.
        • Egeland T.
        • Fimmers R.
        • Gill P.
        • et al.
        DNA Commission of the International Society for Forensic Genetics: recommendations on the validation of software programs performing biostatistical calculations for forensic genetics applications.
        Forensic Sci. Int. Genet. 2016; 25: 191-197
        • Chen X.
        • Love J.C.
        • Navin N.E.
        • Pachter L.
        • Stubbington M.J.T.
        • Svensson V.
        • et al.
        Single-cell analysis at the threshold.
        Nat. Biotechnol. 2016; 34: 1111-1118
        • Just R.S.
        • Moreno L.I.
        • Smerick J.B.
        • Irwin J.A.
        Performance and concordance of the ForenSeq™ system for autosomal and Y chromosome short tandem repeat sequencing of reference-type specimens.
        Forensic Sci. Int. Genet. 2017; 28: 1-9
        • Shin G.
        • Grimes S.M.
        • Lee H.
        • Lau B.T.
        • Xia L.C.
        • Ji H.P.
        CRISPR–Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis.
        Nat. Commun. 2017; 8: 14291
        • Hansson O.
        • Egeland T.
        • Gill P.
        Characterization of degradation and heterozygote balance by simulation of the forensic DNA analysis process.
        Int. J. Legal Med. 2017; 131: 303-317
        • Gill P.
        • Haned H.
        A new methodological framework to interpret complex DNA profiles using likelihood ratios.
        Forensic Sci. Int. Genet. 2013; 7
        • Steele C.D.
        • Greenhalgh M.
        • Balding D.J.
        Verifying likelihoods for low template DNA profiles using multiple replicates.
        Forensic Sci. Int. Genet. 2014; 13
        • Buckleton J.
        • Curran J.
        A discussion of the merits of random man not excluded and likelihood ratios.
        Forensic Sci. Int. Genet. 2008; 2
        • Haned H.
        • Gill P.
        • Lohmueller K.
        • Inman K.
        • Rudin N.
        Validation of probabilistic genotyping software for use in forensic DNA casework: definitions and illustrations.
        Sci. Justice. 2016; 56: 104-108
        • Bright J.A.
        • Taylor D.
        • McGovern C.
        • Cooper S.
        • Russell L.
        • Abarno D.
        • et al.
        Developmental validation of STRmix (TM), expert software for the interpretation of forensic DNA profiles.
        Forensic Sci. Int. Genet. 2016; 23: 226-239
        • Perlin M.W.
        • Legler M.M.
        • Spencer C.E.
        • Smith J.L.
        • Allan W.P.
        • Belrose J.L.
        Validating TrueAllele® DNA mixture interpretation.
        J. Forensic Sci. 2011; 56
        • Puch-Solis R.
        • Rodgers L.
        • Mazumbder A.
        • Pope S.
        • Evett I.
        • Curran J.
        • et al.
        Evaluating forensic DNA profiles using peak heights, allowing for multiple donors, allelic dropout and stutters.
        Forensic Sci. Int. Genet. 2013; 7: 555-563
        • Swaminathan H.
        • Garg A.
        • Grgicak C.M.
        • Medard M.
        • Lun D.S.
        CEESIt A computational tool for the interpretation of STR mixtures.
        Forensic Sci. Int. Genet. 2016; 22: 149-160
        • Cowell R.G.
        • Graversen T.
        • Lauritzen S.L.
        • Mortera J.
        Analysis of forensic DNA mixtures with artefacts.
        J. R. Stat. Soc. Ser. C Appl. Stat. 2015; 64
        • Balding D.J.
        Evaluation of mixed-source, low-template DNA profiles in forensic science.
        Proc. Natl. Acad. Sci. U. S. A. 2013; 110
        • Balding D.J.
        • Buckleton J.
        Interpreting low template DNA profiles.
        Forensic Sci. Int. Genet. 2009; : 2009
        • Benschop C.C.G.
        • van der Beek C.P.
        • Meiland H.C.
        • van Gorp A.G.M.
        • Westen A.A.
        • Sijen T.
        Low template STR typing: effect of replicate number and consensus method on genotyping reliability and DNA database search results.
        Forensic Sci. Int. Genet. 2011; 5: 316-328
        • Taylor D.
        • Buckleton J.
        Do low template DNA profiles have useful quantitative data?.
        Forensic Sci. Int. Genet. 2015; 16: 13-16
        • Perlin M.W.
        • Hornyak J.M.
        • Sugimoto G.
        • Miller K.W.P.
        TrueAllele® genotype identification on DNA mixtures containing up to five unknown contributors.
        J. Forensic Sci. 2015; 60: 857-868
        • Taylor D.
        • Bright J.-A.
        • Buckleton J.
        The interpretation of single source and mixed DNA profiles.
        Forensic Sci. Int. Genet. 2013; 7
        • Evett I.W.
        • Pope S.
        Is it to the advantage of a defendant to infer a greater number of contributors to a questioned sample than is necessary to explain the observed DNA profile?.
        Sci. Justice. 2014; 54: 373-374
        • Brenner C.H.
        Fairness in evaluating DNA mixtures.
        Forensic Sci. Int. Genet. 2017; 27: 186
        • Presciuttini S.
        • Egeland T.
        About the number of contributors to a forensic sample.
        Forensic Sci. Int. Genet. 2016; 25: e18-e19
        • Swaminathan H.
        • Grgicak C.M.
        • Medard M.
        • Lun D.S.
        NOCIt. A computational method to infer the number of contributors to DNA samples analyzed by STR genotyping.
        Forensic Sci. Int. Genet. 2015; 16: 172-180
        • Marciano M.A.
        • Adelman J.D.
        PACE: Probabilistic Assessment for Contributor Estimation — a machine learning-based assessment of the number of contributors in DNA mixtures.
        Forensic Sci. Int. Genet. 2017; 27: 82-91
        • Taylor D.
        • Bright J.-A.
        • Buckleton J.
        Interpreting forensic DNA profiling evidence without specifying the number of contributors.
        Forensic Sci. Int. Genet. 2014; 13: 269-280
        • Slooten K.
        Accurate assessment of the weight of evidence for DNA mixtures by integrating the likelihood ratio.
        Forensic Sci. Int. Genet. 2017; 27: 1-16
        • Bille T.W.
        • Weitz S.M.
        • Coble M.D.
        • Buckleton J.
        • Bright J.-A.
        Comparison of the performance of different models for the interpretation of low level mixed DNA profiles.
        Electrophoresis. 2014; 35: 3125-3133
        • Bleka O.
        • Benschop C.C.G.
        • Storvik G.
        • Gill P.
        A comparative study of qualitative and quantitative models used to interpret complex STR DNA profiles.
        Forensic Sci. Int. Genet. 2016; 25: 85-96
        • Woldegebriel M.
        • Vivó-Truyols G.
        A new bayesian approach for estimating the presence of a suspected compound in routine screening analysis.
        Anal. Chem. 2016; 88: 9843-9849
        • Woldegebriel M.
        • Vivó-Truyols G.
        Probabilistic model for untargeted peak detection in LC–MS using bayesian statistics.
        Anal. Chem. 2015; 87: 7345-7355
        • Taylor D.
        • Powers D.
        Teaching artificial intelligence to read electropherograms.
        Forensic Sci. Int. Genet. 2016; 25: 10-18
        • Perlin M.W.
        • Sinelnikov A.
        An information gap in DNA evidence interpretation.
        PLoS One. 2009; : 2009
        • Perlin M.W.
        • Szabady B.
        Linear mixture analysis: a mathematical approach to resolving mixed DNA samples.
        J. Forensic Sci. 2001; 46: 1372-1378
        • Bright J.A.
        • Neville S.
        • Curran J.M.
        • Buckleton J.S.
        Variability of mixed DNA profiles separated on a 3130 and 3500 capillary electrophoresis instrument.
        Aust. J. Forensic Sci. 2014; 46: 304-312
        • LifeTechnologiesCorporation
        Quantifiler™ HP and Trio DNA Quantification Kits User Guide.
        • Grgicak C.M.
        • Urban Z.M.
        • Cotton R.W.
        Investigation of reproducibility and error associated with qPCR methods using Quantifiler® Duo DNA quantification kit.
        J. Forensic Sci. 2010; 55: 1331-1339
        • Kitayama T.
        • Fujii K.
        • Nakahara H.
        • Mizuno N.
        • Kasai K.
        • Yonezawa N.
        • et al.
        Estimation of the detection rate in STR analysis by determining the DNA degradation ratio using quantitative PCR.
        Legal Med. (Tokyo, Japan). 2013; 15: 1-6
        • Hudlow W.R.
        • Chong M.D.
        • Swango K.L.
        • Timken M.D.
        • Buoncristiani M.R.
        A quadruplex real-time qPCR assay for the simultaneous assessment of total human DNA, human male DNA, DNA degradation and the presence of PCR inhibitors in forensic samples: a diagnostic tool for STR typing.
        Forensic Sci. Int. Genet. 2008; 2: 108-125
        • Monich U.J.
        • Duffy K.
        • Medard M.
        • Cadambe V.
        • Alfonse L.E.
        • Grgicak C.
        Probabilistic characterisation of baseline noise in STR profiles.
        Forensic Sci. Int. Genet. 2015; 19: 107-122
        • Brookes C.
        • Bright J.-A.
        • Harbison S.
        • Buckleton J.
        Characterising stutter in forensic STR multiplexes.
        Forensic Sci. Int. Genet. 2012; 6: 58-63
        • FBI
        Quality Assurance Standards for Forensic DNA Testing Laboratories.
        • Perez J.
        • mitchell A.A.
        • Ducasse N.
        • Tamariz J.
        • Caragine T.
        Estimating the number of contributors to two-, three-, and four-person mixtures containing DNA.
        Croat. Med. J. 2011; 52: 314-326
        • Tvedebrink T.
        On the exact distribution of the numbers of alleles in DNA mixtures.
        Int. J. Legal Med. 2014; 128: 427-437
        • Rakay C.A.
        • Bregu J.
        • Grgicak C.M.
        Maximizing allele detection: effects of analytical threshold and DNA levels on rates of allele and locus drop-out.
        Forensic Sci. Int. Genet. 2012; 6: 723-728
        • Biesecker L.G.
        • Bailey-Wilson J.E.
        • Ballantyne J.
        • Baum H.
        • Bieber F.R.
        • Brenner C.
        • et al.
        DNA identifications after the 9/11 World Trade Center attack.
        Science. 2005; 310: 1122-1123

      CHORUS Manuscript

      View Open Manuscript