Advertisement
Research paper| Volume 57, 102634, March 2022

Streamlining the decision-making process for international DNA kinship matching using Worldwide allele frequencies and tailored cutoff log10LR thresholds

Open AccessPublished:November 26, 2021DOI:https://doi.org/10.1016/j.fsigen.2021.102634

      Highlights

      • International DNA kinship matching procedure for missing person cases and DVI.
      • Worldwide allele frequencies to evaluate kinship when ancestry is unknown.
      • Tailored cutoff log10LR thresholds determined for the 10 most common scenarios.
      • Interpretation tables to evaluate the match (report, reject or require more data).

      Abstract

      The identification of human remains belonging to missing persons is one of the main challenges for forensic genetics. Although other means of identification can be applied to missing person investigations, DNA is often extremely valuable to further support or refute potential associations. When reference DNA samples cannot be collected from personal items belonging to a missing person, a direct DNA identification cannot be carried out. However, identifications can be made indirectly using DNA from the missing person’s relatives. The ranking of likelihood ratio (LR) values, which measure the fit of a missing person for any given pedigree, is often the first step in selecting candidates in a DNA database. Although implementing DNA kinship matching in a national environment is feasible, many challenges need to be resolved before applying this method to an international configuration. In this study, we present an innovative and intuitive method to perform international DNA kinship matching and facilitate the comparison of DNA profiles when the ancestry is unknown or unsure and/or when different marker sets are used. This straightforward method, which is based on calculations performed with the DNA matching software BONAPARTE, Worldwide allele frequencies and tailored cutoff log10LR thresholds, allows for the classification of potential candidates according to the strength of the DNA evidence and the predicted proportion of adventitious matches. This is a powerful method for streamlining the decision-making process in missing person investigations and DVI processes, especially when there are low numbers of overlapping typed STRs. Intuitive interpretation tables and a decision tree will help strengthen international data comparison for the identification of reported missing individuals discovered outside their national borders.

      Keywords

      1. Introduction

      In 2018, it was estimated that 100,000 people around the world were reported to be missing [
      Declaration from Agnès Coutou, ICRC’s Protection Adviser, at the 2018 U.N.
      ]. While a great majority of reported missing persons (MP) are found safe, many families of missing persons live in constant grief and sorrow, often waiting years for news of their loved ones. A number of countries have established national missing person DNA programmes which are very effective when the disappearance of individuals and the discovery of unidentified human remains (UHR) occur nationally. However, many countries have unsolved missing person investigations and UHR that cannot be identified using their national systems alone. Moreover, the coordination for international casework needs to be adequately considered in conjunction with the necessary standards to enable routine and effective identification. International cooperation in missing person investigations is therefore highly advisable in light of the ease of international travel (for business and leisure), increased global migration, the consequences of growing transnational crime and human trafficking, the vulnerability of migrants and refugees, and the high risk they run of falling victim of a crime. In the event that human remains are discovered and are thought to belong to a reported missing person, a comparison between an ante-mortem DNA profile (conventionally obtained from personal items belonging to the missing person, such as a toothbrush or hairbrush, or from a previous medical sample) and a post-mortem DNA profile (obtained from human remains) is the most reliable method for identification [

      Recommendations on the Use of DNA for the Identification of Missing Persons and Unidentified Human Remains by the INTERPOL DNA Monitoring Expert Group, (2017). 〈http://www.interpol.int〉.

      ]. Autosomal STR-based profiles are the most conventional form of DNA data available in police investigations but additional DNA data information, including Y-STRs, mitochondrial DNA and/or SNP (Single Nucleotide Polymorphisms) may also be obtained from biological samples [
      • Budowle B.
      • Allard M.W.
      • Wilson M.R.
      • Chakraborty R.
      Forensics and mitochondrial DNA: applications, debates, and foundations.
      ,
      • Butler J.M.
      The future of forensic DNA analysis.
      ,
      • Kayser M.
      Forensic use of Y-chromosome DNA: a general overview.
      ,
      • Phillips C.
      • Manzo L.
      • de la Puente M.
      • Fondevila M.
      • Lareu M.V.
      The MASTiFF panel-a versatile multiple-allele SNP test for forensics.
      ,
      • Laurent F.-X.
      • Vibrac G.
      • Rubio A.
      • Thévenot M.-T.
      • Pène L.
      Les nouvelles technologies d′analyses ADN au service des enquêtes judiciaires.
      ]. However, in many cases, direct DNA matches are not possible as ante-mortem DNA profiles are either unavailable or insufficient to confirm the missing person's identity. This could be due to the lack of available personal items or medical records, often encountered in missing person investigations, or due to the displacement of populations and the destruction or lack of property. Consequently, in the majority of cases, ante-mortem DNA data can only be obtained by relatives donating biological samples to the requesting authorities.
      While most laboratories have the capacity and experience to perform relatively simple kinship testing, such as paternity tests, the evaluation of complex kinship scenarios is more challenging [
      • Coble M.D.
      • Buckleton J.
      • Butler J.M.
      • Egeland T.
      • Fimmers R.
      • Gill P.
      • Gusmão L.
      • Guttman B.
      • Krawczak M.
      • Morling N.
      • Parson W.
      • Pinto N.
      • Schneider P.M.
      • Sherry S.T.
      • Willuweit S.
      • Prinz M.
      DNA Commission of the International Society for Forensic Genetics: Recommendations on the validation of software programs performing biostatistical calculations for forensic genetics applications.
      ]. Specialized computer programs are often required to undertake comparisons of ante-mortem and post-mortem data to perform complex kinship calculations with large datasets of DNA profiles [
      • Dongen C.J.B.
      • Slooten K.
      • Burgers W.
      • Wiegerinck W.
      Bayesian networks for victim identification on the basis of DNA profiles.
      ,
      • Slooten K.
      Validation of DNA-based identification software by computation of pedigree likelihood ratios.
      ,
      • Kling D.
      • Tillmar A.O.
      • Egeland T.
      Familias 3 – extensions and new functionality.
      ,
      • Morimoto C.
      • Tsujii H.
      • Manabe S.
      • Fujimoto S.
      • Hirai E.
      • Hamano Y.
      • Tamaki K.
      Development of a software for kinship analysis considering linkage and mutation based on a Bayesian network.
      ,
      • Starinsky-Elbaz S.
      • Ram T.
      • Voskoboinik L.
      • Pasternak Z.
      Weight-of-evidence for DNA identification of missing persons and human remains using CODIS.
      ]. These programs computes likelihood ratios (LR), often presented as the logarithm of the likelihood ratio using base 10 (log10LR), which is the optimal basis for statistical decisions, regardless whether there is a hypothesis about prior probabilities [
      • Collins A.
      • Morton N.E.
      Likelihood ratios for DNA identification.
      ]. Using specific allele frequencies from the reference population to which the missing person belongs, the fit of a missing person for any given pedigree (e.g. parent, child, or sibling of the missing person) is measured by comparing two hypotheses: H1 and H2. H1 supports that the individual is related to a defined pedigree, whereas H2 states that they are unrelated [
      • Coble M.D.
      • Buckleton J.
      • Butler J.M.
      • Egeland T.
      • Fimmers R.
      • Gill P.
      • Gusmão L.
      • Guttman B.
      • Krawczak M.
      • Morling N.
      • Parson W.
      • Pinto N.
      • Schneider P.M.
      • Sherry S.T.
      • Willuweit S.
      • Prinz M.
      DNA Commission of the International Society for Forensic Genetics: Recommendations on the validation of software programs performing biostatistical calculations for forensic genetics applications.
      ]. Although this method is relatively straightforward to implement in a national environment where the MP and UHR are reported within the same country, many challenges need to be resolved before applying this method to an international configuration.
      The first challenge is the choice of method to perform the DNA kinship matching computations. Most of the methods use the observed numbers of alleles shared identity by state (IBS) to estimate the identical by descent (IBD) sharing probabilities or kinship coefficients [
      • Jin Y.
      • Schäeffer A.A.
      • Sherry S.T.
      • Feolo M.
      Quickly identifying identical and closely related subjects in large databases using genotype data.
      ]. The segment approach (used in Direct-To-Consumer genetic testing and genealogy applications) or the use of long runs of IBD to perform kinship matching appear to be the best method for deciphering biological relationships outside the immediate family [
      • Henn B.M.
      • Hon L.
      • Macpherson J.M.
      • Eriksson N.
      • Saxonov S.
      • Pe’er I.
      • Mountain J.L.
      Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples.
      ]. However, most methods require immense marker density with several hundred per centiMorgan [
      • Seidman D.N.
      • Shenoy S.A.
      • Kim M.
      • Babu R.
      • Woods I.G.
      • Dyer T.D.
      • Lehman D.M.
      • Curran J.E.
      • Duggirala R.
      • Blangero J.
      • Williams A.L.
      Rapid, phase-free detection of long identity-by-descent segments enables effective relationship classification.
      ]. While they are easily applicable to SNPs, they are unfit for application to STRs. Indeed, only a handful of STR loci are commonly used in forensics which are well separated from each other and considered to be difficult to sequence accurately using long reads [
      • Tytgat O.
      • Gansemans Y.
      • Weymaere J.
      • Rubben K.
      • Deforce D.
      • Van Nieuwerburgh F.
      Nanopore sequencing of a forensic STR multiplex reveals loci suitable for single-contributor STR profiling.
      ]. STR-based profiles are the most conventional form of DNA data available in criminal and missing person investigations. Additionally, most national DNA databases only accept autosomal STR data, making the comparison of other sources of data almost impossible for the time being.
      The second challenge is access to reliable intelligence regarding the ancestry of the UHR sample, and to a lesser extent, that of the missing person him or herself. DNA kinship matching, based on autosomal STR profiles obtained through sharing international data, requires the use of allele frequencies from reference populations [
      • Goudet J.
      • Kay T.
      • Weir B.S.
      How to estimate kinship.
      ]. National or continental datasets can be used depending on data availability, and the accessibility information regarding the ancestry of the individuals whose biological relationships will be tested. However, when applying this DNA kinship analysis to missing person investigations, the intelligence on the ancestry of DNA profiles is very often lacking. Indeed, when performing international DNA kinship matching, the human remains may belong to a person with a completely different ancestry to the country where the remains were found, and the ancestry of the family members included in the tested pedigree could be unknown or inaccurately communicated due to false assumptions regarding nationality or country of origin. In both cases, the use of allele frequencies from the wrong reference population is thought to lead to erroneous conclusions, considering that rare alleles in one population may be more common in another one, leading to high false positive rates [
      • Goudet J.
      • Kay T.
      • Weir B.S.
      How to estimate kinship.
      ,
      • Fortier A.L.
      • Kim J.
      • Rosenberg N.A.
      Human-genetic ancestry inference and false positives in forensic familial searching.
      ]. One potential solution could involve the ancestry prediction of the unidentified human remains from the available DNA data (i.e. the 13 CODIS STR set). Algee-Hewitt et al. previously demonstrated that despite having been selected for individual identification and not for ancestry inference, the CODIS markers generated a non-trivial model-based clustering pattern [
      • Algee-Hewitt B.F.B.
      • Edge M.D.
      • Kim J.
      • Li J.Z.
      • Rosenberg N.A.
      Individual identifiability predicts population identifiability in forensic microsatellite markers.
      ]. Although the 13 CODIS markers have relatively low theta (ϴ) values, their high heterozygosities produce continental ancestry inference potential. Although this solution could theoretically be applied to some cases, little is known about the performance of these methods on admixed individuals (with parents or grandparents from diverse ancestry) or individuals from isolated populations. Admixed individuals who do not fully belong to a predefined continental group may require the use of a reference population whose allele frequencies are unavailable [
      • Thornton T.
      • Tang H.
      • Hoffmann T.J.
      • Ochs-Balcom H.M.
      • Caan B.J.
      • Risch N.
      Estimating kinship in admixed populations.
      ,
      • Dou J.
      • Sun B.
      • Sim X.
      • Hughes J.D.
      • Reilly D.F.
      • Tai E.S.
      • Liu J.
      • Wang C.
      Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data.
      ]. Some forensic DNA laboratories perform the DNA kinship matching calculation using several sets of allele frequencies from various reference populations, but the interpretation of numerous log10LR values that can have significant differences due to the rarity of some alleles in some populations is very complex overall [
      • Goudet J.
      • Kay T.
      • Weir B.S.
      How to estimate kinship.
      ]. A final possibility could be to use several sets of allele frequencies for the two opposite hypotheses in a Bayesian calculation. The calculation would be more accurate if the correct allele frequencies were used at least for H1 if the ancestry of the MP were known. However, the majority of forensic software available, including BONAPARTE, does not support this type of calculation [
      • Kling D.
      • Tillmar A.O.
      • Egeland T.
      Familias 3 – extensions and new functionality.
      ,
      • van Dongen C.J.
      • Slooten K.
      • Slagter M.
      • Burgers W.
      • Wiegerinck W.
      Bonaparte: application of new software for missing persons program.
      ].
      The third challenge concerns the number of overlapping STR loci available for comparison which can be compounded at the international level. Although some countries use commercial STR typing technologies with a large range of STR loci, other countries employ less up-to-date technologies with smaller, or alternative, STR panels. This results in partial DNA matches between UHR and DNA profiles from relatives of a missing person and consequently leads to weaker log10LR values, making it more difficult to reach a conclusion [
      • Hines D.Z.C.
      • Vennemeyer M.
      • Amory S.
      • Huel R.L.M.
      • Hanson I.
      • Katzmarzyk C.
      • Parsons T.J.
      Prioritized sampling of bone and teeth for DNA analysis in commingled cases.
      ,
      • Amorim A.
      • Pereira L.
      Pros and cons in the use of SNPs in forensic kinship investigation: a comparative analysis with STRs.
      ]. Likewise, UHR samples from old human remains or damaged tissue are subject to DNA degradation and prevent full DNA profiles from being obtained.
      The fourth and final challenge concerns the log10LR interpretation and the report of a potential match. When DNA database searches are performed, a log10LR will be computed for each pedigree against all UHR samples available in the database. Generally, an arbitrary log10LR threshold is used to determine if a match should be reported or not [

      Familial DNA Database Search System-Hardware/Software Integration Project, Natl. Inst. Justice. (n.d.). https://nij.ojp.gov/library/publications/familial-dna-database-search-system-hardwaresoftware-integration-project (accessed January 20, 2021).

      ,
      • Ge J.
      • Budowle B.
      Kinship index variations among populations and thresholds for familial searching.
      ]. Such arbitrary thresholds can be adjusted using decision theory to balance the usefulness of true positive results against the uselessness of false positive results. Ultimately, however, the same threshold is applied to every kinship DNA case, regardless of the available pedigree. When performing DNA kinship matching, the weight of the DNA evidence is obviously increased by adding as many close relatives as possible to the pedigree or by performing pair-wise searches [
      • Ge J.
      • Budowle B.
      • Chakraborty R.
      Choosing relatives for DNA identification of missing persons.
      ,
      • Vigeland M.D.
      • Marsico F.L.
      • Herrera Pi ̃nero M.
      • Egeland T.
      Prioritising family members for genotyping in missing person cases: a general approach combining the statistical power of exclusion and inclusion.
      ,
      • Brustad H.K.
      • Colucci M.
      • Jobling M.A.
      • Sheehan N.A.
      • Egeland T.
      Strategies for pairwise searches in forensic kinship analysis.
      ]. The reality is that the best scenario to support the identification may not be feasible as it depends directly on which relatives were selected and were available to have their DNA collected at the time of the investigation, which can date back several decades. As a result, the great majority of pedigrees are only composed of one close relative (usually a parent or a child) or more distant family members (siblings, grandparents, grandchildren, etc.). As the relatives become more genetically distant from the missing person, the value of ante-mortem DNA data diminishes (e.g. parents and children yield more informative results than siblings or cousins) [
      • Ge J.
      • Budowle B.
      • Chakraborty R.
      Choosing relatives for DNA identification of missing persons.
      ]. This means that using a single threshold may be too stringent for several types of pedigrees and/or for partial DNA matches. Indeed, if the threshold is too low, it will lead to an increase in the number of false positive matches needing to be reviewed. On the other hand, a high threshold will decrease the number of real positive matches reported, leading to fewer identifications. This is particularly true for pedigrees providing a low level of genetic information where the majority of LR values are below the conventional threshold used for the majority of DNA kinship cases. Although the threshold is informative on the maximum proportion of expected false positive matches, (i.e. a log10LR threshold of 3 means that no more than one in 1,000 unrelated individuals achieves this score per missing person), it does not precisely estimate the true number of false positive matches that may be encountered. The number of false positive matches can be particularly high when pedigrees are compared with large databases of UHR profiles, resulting in an increased workload as ante-mortem and post-mortem data will have to be crosschecked manually to confirm or reject each potential match. A balance needs to be found to reduce false positive matches while enabling the detection of the highest number of real-positive matches, even when the DNA evidence is not particularly informative.
      To enhance the potential of international DNA kinship matching, the aforementioned challenges were confronted to develop an intuitive and efficient method for selecting optimal candidate matches from DNA database searches in an international context, where ancestry is unknown or uncertain. This method relies on log10LR computation based on Worldwide allele frequencies and the determination of optimal tailored cutoff log10LR thresholds for 10 different scenarios and for a number of overlapping typed STRs ranging from 6 to 24 loci. Specifically devised interpretation tables presented in this study, will facilitate the interpretation of all potential matches based on log10LR values. The method aims to streamline the decision-making process in missing person investigations and DVI processes by determining whether a potential match should be considered for review, rejection, or whether additional DNA data should be requested to avoid the additional workload caused by reviewing unnecessary false positive matches.

      2. Material and methods

      2.1 Optimal determination of ϴ for LR computation when ancestry is unknown

      Reference STR allele frequency data from global populations previously analyzed by the forensic community [

      Familial DNA Database Search System-Hardware/Software Integration Project, Natl. Inst. Justice. (n.d.). https://nij.ojp.gov/library/publications/familial-dna-database-search-system-hardwaresoftware-integration-project (accessed January 20, 2021).

      ,
      • Oldt R.F.
      • Kanthaswamy S.
      Expanded CODIS STR allele frequencies - evidence for the irrelevance of race-based DNA databases.
      ,
      • Amigo J.
      • Phillips C.
      • Salas T.
      • Formoso L.F.
      • Carracedo Á.
      • Lareu M.
      pop.STR—an online population frequency browser for established and new forensic STRs.
      ] was incorporated into the OmniPop 200.1 program available on STRbase (http://www.cstl.nist.gov/strbase/populationdata.htm). Ten anonymized STR profiles derived from previous studies [
      • Oldt R.F.
      • Kanthaswamy S.
      Expanded CODIS STR allele frequencies - evidence for the irrelevance of race-based DNA databases.
      ,
      • Amigo J.
      • Phillips C.
      • Salas T.
      • Formoso L.F.
      • Carracedo Á.
      • Lareu M.
      pop.STR—an online population frequency browser for established and new forensic STRs.
      ] with ethnic representation across the reference populations were used to calculate random match probabilities (RMPs). RMPs were generated for high population structure scenarios following the NRC II recommended Equation 4.10. LRs were deduced from RMP as LR= 1/RMP [

      National Research Council (US) Committee on DNA Forensic Science: An Update, The Evaluation of Forensic DNA Evidence, National Academies Press (US), Washington (DC), 1996. http://www.ncbi.nlm.nih.gov/books/NBK232610/ (accessed January 20, 2021).

      ]. Profile LRs calculated were modelled under four different ϴ values: ϴ= 0, which assumes no population genetic structure; ϴ= 0.01, a minor ϴ adjustment recommended for genetically stratified populations such as US racial groups [

      National Research Council (US) Committee on DNA Forensic Science: An Update, The Evaluation of Forensic DNA Evidence, National Academies Press (US), Washington (DC), 1996. http://www.ncbi.nlm.nih.gov/books/NBK232610/ (accessed January 20, 2021).

      ]; ϴ = 0.0251, the ϴ value calculated directly from the Worldwide reference frequency data [
      • Amigo J.
      • Phillips C.
      • Salas T.
      • Formoso L.F.
      • Carracedo Á.
      • Lareu M.
      pop.STR—an online population frequency browser for established and new forensic STRs.
      ] using Arlequin 3.5.2.2 population genetics software [
      • Excoffier L.
      • Lischer H.E.
      Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
      ]; ϴ = 0.03, a conservative adjustment factor recommended for Worldwide populations [
      • Steele C.D.
      • Balding D.J.
      Choice of population database for forensic DNA profile analysis.
      ]; and ϴ = 0.04, a ϴ value suggested for isolated populations such as US Native Americans [
      • McCulloh K.L.
      • Ng J.
      • Oldt R.F.
      • Weise J.A.
      • Viray J.
      • Budowle B.
      • Smith D.G.
      • Kanthaswamy S.
      The genetic structure of Native Americans in North America based on the Globalfiler® STRs, Legal Medicine.
      ].

      2.2 STR data simulations for kinship construction

      Firstly, simulations were performed, generating data for a total of 1,000,000 full pedigrees containing DNA profiles of relatives with 24 STR markers (CSF1PO, D10S1248, D12S391, D13S317, D16S539, D18S51, D19S433, D1S1656, D21S11, D22S1045, D2S1338, D2S441, D3S1358, D5S818, D6S1043, D7S820, D8S1179, FGA, Penta D, Penta E, SE33, TH01, TPOX and vWA). Simulated STR profiles were generated by gene dropping, i.e. founder alleles are drawn from the population statistics and transmitted to the next generation by Mendelian inheritance and per locus (one allele from the mother, one from the father). This was followed with a uniform mutation mechanism with a mutation parameter of 1E-03, using an internal function in BONAPARTE 4.1 (Smart Research BV) [
      • Slooten K.
      Validation of DNA-based identification software by computation of pedigree likelihood ratios.
      ,
      • van Dongen C.J.
      • Slooten K.
      • Slagter M.
      • Burgers W.
      • Wiegerinck W.
      Bonaparte: application of new software for missing persons program.
      ]. The population statistics used to generate the STR profiles were compiled from a Worldwide population composed of several hundreds of previously published datasets [
      • Buckleton J.
      • Curran J.
      • Goudet J.
      • Taylor D.
      • Thiery A.
      • Weir B.S.
      Population-specific F values for forensic STR markers: A worldwide survey.
      ]. A detailed list of the adaptation made to the published reference populations used to compute these Worldwide allele frequencies is presented in Section 3.1.
      The full pedigrees were then adapted to recreate the 10 most frequently encountered types of scenarios in missing person cases with 100,000 pedigrees for each scenario (Fig. 1).
      Fig. 1
      Fig. 1The ten most commonly found scenarios in missing person or DVI investigations. A circle represents a biological female and a square represents a biological male.
      The pedigrees were composed of relatives with full DNA profiles (24 STRs), reported as family member (FM) and relatives for whom DNA data was missing, reported as untyped (UN), according to each scenario. Secondly, one FM profile from each pedigree was converted into a UHR profile to recreate the scenarios from Fig. 1. Thirdly, for each UHR profile, 19 sub-profiles were generated by randomly selecting between 6 and 24 markers with equal probability using Microsoft Excel macros. This was carried out to mimic the effects of DNA degradation often encountered on UHR samples and the use of different loci sets across the world that do not contain the same STR markers. Allelic dropouts were not considered because the random inclusion of dropouts is not an available option when generating DNA profiles using BONAPARTE. A summary of the process used for data simulation is explained in Fig. 2. Fourthly, the measure of the fit of every UHR sub-profile to each pedigree was calculated in the manner of a blind DNA database search.
      Fig. 2
      Fig. 2Summary of the process used for data simulation in this study to obtain the related and unrelated log10LR distribution curves. This figure summarizes the process for scenario 1 and was carried out for the other nine scenarios.

      2.3 Likelihood ratio calculation on simulated pedigrees

      The log10LR was calculated using BONAPARTE 4.1 [
      • Slooten K.
      Validation of DNA-based identification software by computation of pedigree likelihood ratios.
      ,
      • van Dongen C.J.
      • Slooten K.
      • Slagter M.
      • Burgers W.
      • Wiegerinck W.
      Bonaparte: application of new software for missing persons program.
      ] based on the probability of observed genotypes under each hypothesis H1 and H2. log10LR values were adjusted by BONAPARTE 4.1 for population substructure according to the recommendations of the second National Research Council (NRC II) Report [

      National Research Council (US) Committee on DNA Forensic Science: An Update, The Evaluation of Forensic DNA Evidence, National Academies Press (US), Washington (DC), 1996. http://www.ncbi.nlm.nih.gov/books/NBK232610/ (accessed January 20, 2021).

      ] and from other research groups [
      • Chernomoretz A.
      • Balparda M.
      • Grutta L.L.
      • Calabrese A.
      • Martinez G.
      • Escobar M.S.
      • Sibilla G.
      GENis, an open-source multi-tier forensic DNA information system.
      ,
      • Ge J.
      • Budowle B.
      • Chakraborty R.
      DNA identification by pedigree likelihood ratio accommodating population substructure and mutations.
      ]. This adjustment fits the purpose of this study as the NRC II recommendations are designed to estimate random match probabilities when sub-group data is unavailable [

      National Research Council (US) Committee on DNA Forensic Science: An Update, The Evaluation of Forensic DNA Evidence, National Academies Press (US), Washington (DC), 1996. http://www.ncbi.nlm.nih.gov/books/NBK232610/ (accessed January 20, 2021).

      ]. Two categories of match parameters for these computations were defined: a ϴ correction factor (used to correct for the degree of relatedness of alleles that have a common ancestry) and lambda (Λ) correction factors which are specifically used by BONAPARTE [

      BONAPARTE User Manual 4.1.10.

      ]. In the matching process, LRs are computed under the assumption that the alleles of founders in a pedigree are in Hardy-Weinberg equilibrium. The required allele relative frequencies are taken from the counts from selected population statistics. These counts are corrected with a Λ-dependent factor depending if the alleles are considered as common, rare, or new alleles [

      BONAPARTE User Manual 4.1.10.

      ]. Λ-dependent factors for common, rare, and new alleles were set to 3, 0.5, and 0.5, respectively. Likelihood ratio calculations were performed between each pedigree and each UHR profile. The total number of comparisons (i.e. one UHR profile compared to one pedigree – regardless of the number of overlapping STR loci and number of relatives in the pedigree) carried out was based on 100,000 pedigrees and totaled 190 billion for each tested scenario.

      2.4 log10LR distributions of related and unrelated pedigrees and determination of cutoff

      Once all the computations had been carried out in the manner of a blind search, the extraction process (FM profile extracted to create a UHR profile and a pedigree with a missing person) was reversed to separate the “related matches” (matches between a pedigree and the UHR profile which was extracted from its original pedigree) from the “unrelated matches” (matches between a pedigree and UHR profile extracted from another pedigree). Related and unrelated log10LR values were then used to draw both related and unrelated log10LR distribution curves for each condition (scenario and overlapping number of typed STRs) and rounded down to the nearest whole number. Violin plots were generated using BoxPlotR to summarize the related log10LR distributions for each scenario [
      • Spitzer M.
      • Wildenhain J.
      • Rappsilber J.
      • Tyers M.
      BoxPlotR: a web tool for generation of box plots.
      ]. Distributions were then converted into interpretation tables within Microsoft Excel to facilitate interpretation (see Section 3.4).

      2.5 Validation of interpretation tables with seven reference population datasets

      Four sets of 10,000 pedigrees, containing DNA profiles for 24 STR markers, were also generated using BONAPARTE for the same ten scenarios. Each set was generated using the allele frequencies from four different populations: African American, Asian, Caucasian and Hispanic described in the NIST 1,036 Revised U.S. Population Dataset (July 2017) [
      • Hill C.R.
      • Duewer D.L.
      • Kline M.C.
      • Coble M.D.
      • Butler J.M.
      U.S. population data for 29 autosomal STR loci.
      ,
      • Steffen C.R.
      • Coble M.D.
      • Gettings K.B.
      • Vallone P.M.
      Corrigendum to “U.S. Population Data for 29 Autosomal STR Loci” [Forensic Sci. Int. Genet. 7 (2013) e82-e83].
      ]. This dataset is composed of 1,036 DNA profiles of 29 autosomal STRs in total, separated into 342 African Americans, 97 Asians, 361 Caucasians, and 236 Hispanics. The log10LR values were calculated, based on Worldwide allele frequencies, for related and unrelated pairs, as previously described, and interpretation tables were used to assess the classification performance of potential matches. The same process was applied to three isolated reference populations previously genotyped with Globalfiler [
      • Ng J.
      • Oldt R.F.
      • McCulloh K.L.
      • Weise J.A.
      • Viray J.
      • Budowle B.
      • Smith D.G.
      • Kanthaswamy S.
      Native American population data based on the Globalfiler(®) autosomal STR loci.
      ,
      • Martínez-Cortés G.
      • Zuñiga-Chiquette F.
      • Celorio-Sánchez A.S.
      • Ruiz García E.
      • Antelo-Figueroa A.B.
      • Dalpozzo-Valenzuela V.
      • Valenzuela-Coronado A.
      • Rangel-Villalobos H.
      Population data for 21 autosomal STR loci (GlobalFiler kit) in two Mexican-Mestizo population from the northwest, Mexico.
      ,
      • Al-Eitan L.N.
      • Darwish N.N.
      • Hakooz N.M.
      • Dajani R.B.
      Investigation of the forensic GlobalFiler loci in the genetically isolated Circassian subpopulation in Jordan.
      ].

      3. Results & discussion

      3.1 Generation of Worldwide allele frequencies

      DNA population data from Buckleton et al. was used in this study, which provided ϴ values for 446 referenced populations, representing the most complete reference population dataset to date [
      • Buckleton J.
      • Curran J.
      • Goudet J.
      • Taylor D.
      • Thiery A.
      • Weir B.S.
      Population-specific F values for forensic STR markers: A worldwide survey.
      ]. When creating Worldwide allele frequencies, one has to decide how to weigh the different populations by selecting one of the following options, all with advantages and disadvantages. The first option attempted to mimic the “real world” and selected continental weight based on contemporary demographic studies (e.g., 60% individuals from Asia, 17% from Africa, 10% from Europe, 9% from America, etc.) [

      2020 World Population by Country, https://worldpopulationreview.com/.

      ]. The second option was to weigh continental populations equally, regardless of their real demographic population. The third option was to determine which areas of the world would be more prone to reporting missing persons and/or to discover unidentified human remains and increase their weight in the final dataset. Although all of these options are valid, the availability of population data across the globe is very diverse. There are a lot more reference population datasets available for European populations than for remote populations, despite the work of several laboratories attempting to fill this gap [
      • Bodner M.
      • Bastisch I.
      • Butler J.M.
      • Fimmers R.
      • Gill P.
      • Gusmão L.
      • Morling N.
      • Phillips C.
      • Prinz M.
      • Schneider P.M.
      • Parson W.
      Recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) on quality control of autosomal Short Tandem Repeat allele frequency databasing (STRidER).
      ].
      Additionally, when datasets are available online it does not necessarily mean that it is the case for the full range of STR markers typically used in forensic DNA profiling [
      • Buckleton J.
      • Curran J.
      • Goudet J.
      • Taylor D.
      • Thiery A.
      • Weir B.S.
      Population-specific F values for forensic STR markers: A worldwide survey.
      ]. Indeed, most of the data available were obtained from over a decade ago, using various PCR amplification kits that analyzed a smaller set of loci. Therefore, it was impossible to adjust the weight of the different populations for some markers without losing some information for other loci and/or affecting the weight of the populations for untouched markers. The decision was made to conserve most of the available populations from Buckleton et al. Reference populations that did not belong to one of the continental clusters defined in the publication were omitted. Duplicated populations, which were identified after careful curation of the datasets, were also omitted. As a result, the final Worldwide dataset was composed of 182,999 individuals from 369 populations, as shown in Table 1. This represents a total of 2,407,482 typed alleles.
      Table 1Composition of the Worldwide population of this study.
      ClustersNumber of reference populationsNumber of individualsWeight in Worldwide population
      Africa367,3444.01%
      Asian7216,5229.03%
      AusAb1715,6378.54%
      Caucn15182,57945.13%
      Hispc3930,46316.65%
      IndPk263,2761.79%
      Inuit22090.11%
      NatAm222,7851.52%
      Polyn424,18413.22%
      Total369182,999100.00%
      Detailed information on each cluster and reference populations for each STR marker is available in Tables S1 and S2. As expected, a high heterogeneity was observed between the number of total typed alleles for each marker, with a minimum of 1,064 typed alleles for D6S1043 and a maximum of 202,978 typed alleles for FGA. This was again unsurprising, as different amplification kits were used to obtain the data and some markers have traditionally been part of commercial multiplexes while others are typed as additional markers in casework investigations. In line with these findings, the Caucasian cluster represented almost half of the total number of individuals included in this study. However, some discrepancies were encountered when focusing on the cluster weight for each of the STR markers separately. For example, the Caucasian and the Hispanic clusters represented 32% and 33% respectively for CSF1PO, whereas they represented 75% and 4% respectively for D10S1248. While it would have been preferable to combine full datasets to avoid discrepancies in population representation for the 24 STR markers, computing a Worldwide population with data from more than 182,000 individuals led to an impressive number of 813 different alleles identified across the entire dataset. The main advantage of this dataset was that even very rare alleles had a defined allele frequency value instead of a default value, which can influence the accuracy of the log10LR calculation. A more accurate and representative Worldwide population may be available in the future using datasets obtained using recent technologies (including Massively Parallel Sequencing) to improve the weight of underrepresented continental populations [
      • Novroski N.M.M.
      • King J.L.
      • Churchill J.D.
      • Seah L.H.
      • Budowle B.
      Characterization of genetic sequence variation of 58 STR loci in four major population groups.
      ,
      • Casals F.
      • Anglada R.
      • Bonet N.
      • Rasal R.
      • van der Gaag K.J.
      • Hoogenboom J.
      • Solé-Morata N.
      • Comas D.
      • Calafell F.
      Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.
      ,
      • Churchill J.D.
      • Schmedes S.E.
      • King J.L.
      • Budowle B.
      Evaluation of the Illumina(®) Beta Version ForenSeqTM DNA Signature Prep Kit for use in genetic profiling.
      ,
      • Devesse L.
      • Ballard D.
      • Davenport L.
      • Riethorst I.
      • Mason-Buck G.
      • Syndercombe Court D.
      Concordance of the ForenSeqTM system and characterisation of sequence-specific autosomal STR alleles across two major population groups.
      ,
      • Hussing C.
      • Bytyci R.
      • Huber C.
      • Morling N.
      • Børsting C.
      The Danish STR sequence database: duplicate typing of 363 Danes with the ForenSeqTM DNA Signature Prep Kit.
      ,
      • Delest A.
      • Godfrin D.
      • Chantrel Y.
      • Ulus A.
      • Vannier J.
      • Faivre M.
      • Hollard C.
      • Laurent F.-X.
      Sequenced-based French population data from 169 unrelated individuals with Verogen’s ForenSeq DNA signature prep kit.
      ,
      • Laurent F.X.
      • Ausset L.
      • Clot M.
      • Jullien S.
      • Chantrel Y.
      • Hollard C.
      • Pene L.
      Automation of library preparation using Illumina ForenSeq kit for routine sequencing of casework samples.
      ,
      • Hollard C.
      • Ausset L.
      • Chantrel Y.
      • Jullien S.
      • Clot M.
      • Faivre M.
      • Suzanne É.
      • Pène L.
      • Laurent F.-X.
      Automation and developmental validation of the ForenSeqTM DNA Signature Preparation kit for high-throughput analysis in forensic laboratories.
      ].
      The Worldwide population data were then translated into Worldwide allele frequencies (Table S3) and further exploited for all the kinship simulations described in this publication.

      3.2 Determination of an appropriate ϴ value for LR calculation when ancestry is unknown

      As the purpose of this method is to perform international DNA kinship matching on all missing person investigations, regardless of the location, it was important to consider the possibility that individuals included in the pedigree could originate from population groups showing minimal or high differentiation [
      • Buckleton J.
      • Curran J.
      • Goudet J.
      • Taylor D.
      • Thiery A.
      • Weir B.S.
      Population-specific F values for forensic STR markers: A worldwide survey.
      ,
      • Weir B.S.
      • Goudet J.
      A unified characterization of population structure and relatedness.
      ]. Therefore, a ϴ correction factor had to be applied to account for the degree of relatedness of alleles with a closely shared ancestry. In order to determine the most appropriate ϴ value that could be used for future LR computation when ancestry is unknown, LRs were computed using reference STR data from 11 reference populations into the OmniPop 200.1 program, recognized and used by the National Institute of Standards and Technology (NIST) for the purpose of clustering autosomal markers. Each LR was modelled using five values of ϴ and the results shown in Table S4 indicated a large range of variance in LRs across ϴ values. As the ϴ value increases, the LR can diminish by up to 6–8 orders of magnitude (as seen in the US Black and simulated Mbuti profiles, which are two communities made up of genetically differentiated ethnic groups). While increasing the ϴ value tends to reduce the difference of magnitude, the empirically estimated value of 0.03 was used in the forthcoming simulations. This value has been previously demonstrated to be sufficiently large to be conservative as it was demonstrated that ϴ = 0.03 was greater than the majority of the median ϴ estimates from global comparisons of subpopulations with continental populations [
      • Steele C.D.
      • Balding D.J.
      Choice of population database for forensic DNA profile analysis.
      ,
      • Buckleton J.
      • Curran J.
      • Goudet J.
      • Taylor D.
      • Thiery A.
      • Weir B.S.
      Population-specific F values for forensic STR markers: A worldwide survey.
      ]. Additional methods recently published could be applied in the future to estimate the ϴ for arbitrary population structures, such as the Worldwide population, to reduce potential biases found in existing approaches [
      • Ochoa A.
      • Storey J.D.
      Estimating FST and kinship for arbitrary population structures.
      ].

      3.3 Comparison of log10LR distributions of the 10 scenarios

      Based on the simulated STR profiles generated with BONAPARTE, log10LR distributions for the related individuals (the person of interest is the MP) were plotted. Fig. 3 shows the mean and variance of the log10LR related distributions for the 10 scenarios presented as a violin plot.
      Fig. 3
      Fig. 3Violin plot representing the log10LR distributions for related matches obtained from 6 to 24 overlapping STR loci. The centered white dots show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles.
      The information provided by each scenario was ranked in terms of detail. In accordance with previous publications [
      • Ge J.
      • Budowle B.
      Kinship index variations among populations and thresholds for familial searching.
      ,
      • Ge J.
      • Budowle B.
      • Chakraborty R.
      Choosing relatives for DNA identification of missing persons.
      ], scenario A consisting of a pedigree with both parents of a missing person was unsurprisingly the most informative scenario. Scenarios involving at least two parent-child relationships (scenarios E, F and I) also showed high log10LR values. Full-sibling scenarios (scenarios G and H) showed a high variance but were able to reach high log10LR means, especially when three siblings were added to the pedigree. As expected, pedigrees with one parent-child relationship (scenarios C and D) gave the lowest mean values, with a non-negligible proportion of log10LR being negative (although it is not represented in the figure because matches with a negative log10LR value cannot be exported in BONAPARTE). However, the addition of a spouse (not genetically related to the MP) helped to increase the log10LR value by several orders of magnitude, as presented in scenario B.

      3.4 From log10LR distribution curves to interpretation tables

      DNA database comparisons using kinship matching are executed to determine the strength of the evidence and how it supports the hypothesis for an identification based on the genetic data. The higher the log10LR value is, the more supported the proposed relationship. Conversely, the lower the log10LR (typically a negative value), the more support there is for the alternative hypothesis. The goal of a successful DNA database search is to then communicate the evidential value, when it exceeds a certain strength; i.e. the said strength is low enough to enable identifications but high enough to prevent reporting evidence that is coincidental [
      • Slooten K.
      Likelihood ratio distributions and the (ir)relevance of error rates.
      ]. Indeed, too many potential candidate matches would severely increase the workload of the DNA experts and police officers as many of the potential candidate matches are actually adventitious [
      • Ge J.
      • Budowle B.
      How many familial relationship testing results could be wrong?.
      ]. Therefore, a log10LR threshold can be used to determine the likelihood of obtaining evidence that could be considered useful enough and the number of unrelated pairs that would achieve this likelihood ratio by chance. log10LR values of over five are generally considered to provide strong evidence for relatedness, based on previously established forensic guidelines [

      Familial DNA Database Search System-Hardware/Software Integration Project, Natl. Inst. Justice. (n.d.). https://nij.ojp.gov/library/publications/familial-dna-database-search-system-hardwaresoftware-integration-project (accessed January 20, 2021).

      ,
      • Ge J.
      • Budowle B.
      Kinship index variations among populations and thresholds for familial searching.
      ]. When the pedigree is a trio-case (with DNA data available for both parents of a missing person), kinship calculations produce large log10LR values in favour of H1, as observed in Fig. 3, that in practice will not be found by chance. Therefore, it is appropriate to use such a threshold because the evidence is strong enough even for a small prior probability on H1. However, applying the same threshold to less-informative scenarios reduces the chance of obtaining such strong evidence and ultimately implies that some missing persons may not be identified as potential candidates. Several research groups have attempted to determine appropriate cutoff thresholds, but these are often restricted to only one scenario or reference population and do not take into account the matches between partial DNA profiles [
      • Tamura T.
      • Osawa M.
      • Ochiai E.
      • Suzuki T.
      • Nakamura T.
      Evaluation of advanced multiplex short tandem repeat systems in pairwise kinship analysis.
      ,
      • Turrina S.
      • Ferrian M.
      • Caratti S.
      • Cosentino E.
      • De Leo D.
      Kinship analysis: assessment of related vs unrelated based on defined pedigrees.
      ,
      • Cho S.
      • Shin E.S.
      • Yu H.J.
      • Lee J.H.
      • Seo H.J.
      • Kim M.Y.
      • Lee S.D.
      Set up of cutoff thresholds for kinship determination using SNP loci.
      ,
      • Li R.
      • Li H.
      • Peng D.
      • Hao B.
      • Wang Z.
      • Huang E.
      • Wu R.
      • Sun H.
      Improved pairwise kinship analysis using massively parallel sequencing.
      ].
      To simplify the evaluation of a potential candidate, the optimal cutoff log10LR thresholds were determined based on related and unrelated distributions for the 10 different scenarios and for a number of overlapping typed STRs ranging from 6 to 24 loci. The full process is summarized in Fig. 4. Four coloured zones were defined on the basis of related and unrelated distributions.
      • A “red zone” defined by the lowest 1/5000 values of the related distribution curve and all log10LR values below.
      • An “orange zone” defined by an average number of false positive matches of 50 per MP or above, in a database search of 100,000 UHR profiles.
      • A “green zone” defined by an average number of false positive matches below 50 per MP, in a database search of 100,000 UHR profiles.
      • A “grey zone” defined by log10LR values where related and unrelated distribution curves do not overlap, in a database search of 100,000 UHR profiles.
      Fig. 4
      Fig. 4Overview of the calculation of optimal log10LR thresholds and formatting of the interpretation tables. Two examples corresponding to two pedigrees, one for scenario A and the other one for scenario C are shown. Computations are performed for each scenario and for a specific number of overlapping STR loci between simulated DNA profiles from relatives of a missing person and simulated DNA profiles from UHR samples. Related and unrelated distributions are plotted together and four colour zones were defined based on the balance of false positives and false negatives observed. Each colour determines the suggested instructions to follow concerning the reporting or rejection of the potential candidate.
      Based on these identified zones in each of the log10LR distribution curves, interpretation tables were computed for each scenario. The main advantage of these tables is to streamline the decision-making process by observing the “colour” associated to the log10LR value for a specific scenario and a defined number of overlapping typed STRs between the UHR DNA profile and the DNA profile(s) from relative(s). These interpretation tables for all ten scenarios are available in Fig. S1.

      3.5 Suggested workflow to deal with potential candidates in forensic DNA databases when performing international DNA kinship matching

      To accompany these interpretation tables, we suggest using the decision tree presented in Fig. 5, which summarizes all of the potential outcomes of log10LR values and the most appropriate method of proceeding with them.
      Fig. 5
      Fig. 5Decision tree to determine the optimal course of actions when performing DNA kinship matching in the case of unknown or uncertain ancestry.
      The first goal of the decision tree is to efficiently identify candidates for which the strength of the DNA evidence is considered to be either very strong (matches that should be reviewed before reporting) or very weak (matches that should be rejected automatically). Based on the determination of the zones from the simulated data, potential matches falling into the green zone should exhibit less than 50 adventitious matches per MP in a database search with 100,000 UHR profiles. Therefore, most of these adventitious matches should be eliminated after review of each potential match by crosschecking ante-mortem and post-mortem metadata that can be obtained from the authorities in charge of the investigation. These can include the biological sex, dates (disappearance of the missing person occurs before the discovery of the human remains), age, height, personal belongings or clothes found on the body that could fit the last known outfit of the missing person, and other secondary identifiers such as scars, marks or tattoos to help with the identification process [

      Recommendations on the Use of DNA for the Identification of Missing Persons and Unidentified Human Remains by the INTERPOL DNA Monitoring Expert Group, (2017). 〈http://www.interpol.int〉.

      ]. This of course includes other means of primary identifiers such as fingerprints or dental charts, if available.
      The second goal is to raise awareness about the insufficient strength of the DNA evidence to either report or reject the potential matches. In most of these cases, the list of potential matches will be too long (more than 50 adventitious matches in average per MP) and could lead to a long and complex process to eliminate false positive matches. If the log10LR value falls in the “orange” zone, with a high number of potential matches to review, additional intelligence should be obtained by collecting DNA profiles from other relatives to select a more efficient scenario or by increasing the number of STR markers. This is often the case with UHR DNA profiles, which may originate from degraded biological samples. A reanalysis using an extended STR marker set or new DNA sequencing technologies can help to type the missing loci [
      • Parsons T.J.
      • Huel R.M.L.
      • Bajunović Z.
      • Rizvić A.
      Large scale DNA identification: the ICMP experience.
      ,
      • Latham K.E.
      • Miller J.J.
      DNA recovery and analysis from skeletal material in modern forensic contexts.
      ], leading to more informative pedigrees and ultimately reducing the number of false positive matches to review. Even though the addition of new DNA data may not help find a potential match, it will still be useful to avoid adventitious matches in future searches.
      The third goal is to alert the user about unusual log10LR values. A potential match with a log10LR value within the “grey” zone would mean that the value was not detected either for the related and unrelated pedigrees during the validation phase. This could be explained by the presence of germinal mutations or silent alleles between parent(s)/offspring duos or trios, which would decrease the log10LR value [
      • Machado P.
      • Gusmão L.
      • Conde-Sousa E.
      • Pinto N.
      The influence of the different mutation models in kinship evaluation.
      ]. It could also come from an allelic dropout affecting one profile from the pedigree which could be interpreted as a mutation since allelic dropouts could not be considered when generating DNA profiles using BONAPARTE. For all these cases, the calculations could be performed again with the exclusion of the marker exhibiting the mutation in order to verify that the new log10LR value is found in another colour zone. Additional kinship calculations, involving other commonly used mutation models, such as “equal”, “proportional to frequency”, “stepwise” and “extended stepwise” that are notably available in other probabilistic software may be used to measure the effect of modelling [
      • Kling D.
      • Tillmar A.O.
      • Egeland T.
      Familias 3 – extensions and new functionality.
      ,
      • Morimoto C.
      • Tsujii H.
      • Manabe S.
      • Fujimoto S.
      • Hirai E.
      • Hamano Y.
      • Tamaki K.
      Development of a software for kinship analysis considering linkage and mutation based on a Bayesian network.
      ]. Another potential explanation for a value in these zones could be that the expected pedigree is incorrect and one or several biological relationships has not been accurately reported. This could be the case if a log10LR computation was carried out based on pedigree B, but the supposed brother of the missing person is in fact a half-brother, which would correspond to pedigree I. Computation using different scenarios may be required to compare the values and decide whether an alternative scenario should be considered. One could argue that the size of the grey zone depends too much on the number of simulations and that it would not exist for the full distribution when using an unlimited number of pedigrees. However, it was decided to keep the grey zone in the interpretation table due to the sheer number of pedigrees used in this study and also because it was the only way of avoiding the automatic rejection of a match which could be of potential interest.

      3.6 Validating the performance of the interpretation tables using individuals from several continental populations

      Assessing the performance of the classification system, based on the interpretation tables, requires the use of pedigrees that are representative of the main population groups. This evaluation could potentially help to determine if the log10LR values obtained for specific continental population groups once classified based on the cutoff thresholds fit with the interpretation tables. Erroneous classifications would lead to an increase of false positive and false negative rates and suggest that a Worldwide model should not be used for these populations.
      NIST 1036 Revised U.S. Population Dataset (July 2017), including allele frequencies for four different US populations (African American, Asian, Caucasian, and Hispanic), was used to generate 10,000 pedigrees for each ancestry [
      • Steffen C.R.
      • Coble M.D.
      • Gettings K.B.
      • Vallone P.M.
      Corrigendum to “U.S. Population Data for 29 Autosomal STR Loci” [Forensic Sci. Int. Genet. 7 (2013) e82-e83].
      ]. log10LR values were calculated using Worldwide allele frequencies. The decision tree (featured in Fig. 5) was then used to classify each match into its colour zone, determining the most suitable outcome (rejection, request for additional information, or consideration as a potential candidate) therefore mimicking how international DNA kinship matching would be performed in real investigations when the ancestry is unknown.
      Fig. S2 shows the log10LR distributions of the related matches for the 10 different scenarios and for a number of overlapping typed STRs ranging from 6 and 24 loci. For all of the scenarios, log10LR distributions for the four reference populations did not show significant differences, in terms of the mean of distribution and variance. They may not be significally different because the Worldwide population presented in this work is composed of more than 2 million alleles. Consequently, the great majority of alleles, which can be found Worldwide, are listed in the compiled population with a defined value determined for each allele, including almost all rare alleles known to date. Being able to add new DNA profiles for LR calculations containing very few or no additional undefined alleles would avoid an overestimation of the log10LR value for some populations and therefore eliminate the potential bias that can exist using small reference population sets. The apparent homogeneity of the distribution curves emphasizes the fact that this method may balance the differences between populations and could potentially be applied to other reference populations, under the assumption that they exhibit the same pattern. This was ultimately tested on three additional reference populations that are known to be genetically isolated and which were not included in the reference populations used to establish the Worldwide allele frequencies [
      • Ng J.
      • Oldt R.F.
      • McCulloh K.L.
      • Weise J.A.
      • Viray J.
      • Budowle B.
      • Smith D.G.
      • Kanthaswamy S.
      Native American population data based on the Globalfiler(®) autosomal STR loci.
      ,
      • Martínez-Cortés G.
      • Zuñiga-Chiquette F.
      • Celorio-Sánchez A.S.
      • Ruiz García E.
      • Antelo-Figueroa A.B.
      • Dalpozzo-Valenzuela V.
      • Valenzuela-Coronado A.
      • Rangel-Villalobos H.
      Population data for 21 autosomal STR loci (GlobalFiler kit) in two Mexican-Mestizo population from the northwest, Mexico.
      ,
      • Al-Eitan L.N.
      • Darwish N.N.
      • Hakooz N.M.
      • Dajani R.B.
      Investigation of the forensic GlobalFiler loci in the genetically isolated Circassian subpopulation in Jordan.
      ].
      Table S5 shows the average of log10LR values obtained from 10,000 simulated pedigrees generated from allele frequencies in the three isolated populations (Native American population, Circassian subpopulation in Jordan and Mexican-Mestizo population from northwest Mexico). log10LR was calculated for these pedigrees based on the allele frequencies from the three isolated populations and compared with the Worldwide allele frequencies. Interestingly, the same homogeneity was observed with mild changes in the values, regardless of the reference population used to generate the data or the allele frequencies used for the calculation. The value obtained from the Worldwide allele frequencies was always the closest to the value observed when using the same reference population for generating data and calculating the log10LR, compared to the two other tested populations. This result emphasizes the fact that the Worldwide allele frequencies could be applied to isolated populations, even though they were not part of the original sets of reference populations. Additional confirmation should be obtained in future by testing other isolated populations, although datasets for a large set of STR markers are rarely accessible online.
      After classification of related and unrelated matches based on either the interpretation tables or the use of a specific log10LR threshold, false positive matches (unrelated matches found in the green zone and considered for review) and false negative matches (related matches found in the red zone and immediately rejected) were estimated. The efficiency of both classification methods are shown in Table 2.
      Table 2Classification efficiency of related and unrelated matches using the a specific threshold vs. tailored thresholds determined in this manuscript.
      AVERAGE NUMBER OF FALSE POSITIVE MATCHES PER MISSING PERSONRATE OF FALSE NEGATIVE (BASED ON 10,000 RELATED MATCHES)
      ScenariosScenarios
      ABCDEFGHIJABCDEFGHIJ
      Threshold: log10LR ≥ 1Threshold: log10LR ≥ 1
      African-American0519917813106177245112African-American0.000%0.000%0.000%0.008%0.000%0.042%0.300%0.008%0.000%0.100%
      Asian0721519216112192316141Asian0.000%0.000%0.000%0.000%0.000%0.042%0.225%0.008%0.000%0.050%
      Caucasian01024520921121224368159Caucasian0.000%0.000%0.042%0.017%0.000%0.058%0.483%0.083%0.000%0.117%
      Hispanic0620818514110185295135Hispanic0.000%0.000%0.033%0.000%0.000%0.092%0.400%0.025%0.000%0.125%
      Threshold: log10LR ≥ 2Threshold: log10LR ≥ 2
      African-American027664311133316African-American0.000%0.000%0.592%0.617%0.000%0.333%1.258%0.067%0.000%0.750%
      Asian038571515185420Asian0.000%0.000%0.667%0.525%0.008%0.442%1.058%0.150%0.008%0.450%
      Caucasian049978621257628Caucasian0.000%0.000%1.117%0.975%0.017%0.683%1.575%0.225%0.008%0.917%
      Hispanic028068413164418Hispanic0.000%0.000%0.983%0.975%0.058%0.558%1.675%0.183%0.000%0.842%
      Threshold: log10LR ≥ 3Threshold: log10LR ≥ 3
      African-American011211011001African-American0.000%0.033%6.100%6.892%0.200%1.592%3.783%0.342%0.017%2.775%
      Asian011613122123Asian0.000%0.058%8.392%7.617%0.275%1.942%3.742%0.625%0.200%2.600%
      Caucasian022018223234Caucasian0.011%0.125%10.875%11.267%0.433%2.908%5.233%0.933%0.075%4.000%
      Hispanic011412111011Hispanic0.006%0.058%9.817%9.908%0.625%2.450%5.325%0.683%0.058%4.250%
      Threshold: log10LR ≥ 4Threshold: log10LR ≥ 4
      African-American0000000000African-American0.114%1.050%23.350%24.142%1.358%5.117%9.025%1.342%0.475%8.717%
      Asian0010000000Asian0.170%1.017%28.883%27.975%2.175%6.467%9.667%2.400%1.000%9.383%
      Caucasian0020000000Caucasian0.336%2.183%34.433%34.817%2.475%9.317%12.825%2.817%1.042%13.075%
      Hispanic0000000000Hispanic0.245%1.642%32.475%31.317%2.567%7.633%12.258%2.383%1.008%13.417%
      Threshold: log10LR ≥ 5Threshold: log10LR ≥ 5
      African-American0000000000African-American1.171%6.359%47.608%48.167%5.317%12.125%18.467%3.825%2.833%20.125%
      Asian0000000000Asian1.670%8.631%54.150%52.633%6.875%14.817%20.175%6.325%4.408%22.058%
      Caucasian0000000000Caucasian2.376%11.489%61.475%61.933%8.625%18.808%25.283%8.292%5.600%28.367%
      Hispanic0000000000Hispanic2.127%9.926%59.233%58.392%8.500%17.400%23.883%6.467%5.167%28.567%
      Tailored thresholds determined in this manuscriptTailored thresholds determined in this manuscript
      African-American02191809255310African-American0.000%0.000%0.000%0.000%0.000%0.000%0.058%0.000%0.000%0.033%
      Asian042823216398515Asian0.000%0.000%0.000%0.000%0.000%0.000%0.050%0.000%0.000%0.000%
      Caucasian0530283194410617Caucasian0.000%0.000%0.017%0.008%0.000%0.000%0.100%0.000%0.000%0.050%
      Hispanic032221113296412Hispanic0.000%0.000%0.017%0.000%0.000%0.000%0.142%0.000%0.000%0.000%
      The first interesting conclusion is the effect of specific thresholds versus tailored thresholds determined in this manuscript to the average number of reported false positive matches per missing person. This metric is relevant because it indicates the average number of matches (related or unrelated) that would be subjected to a manual review, as the log10LR value would be above the set thresholds when performing database searches. Results in Table 2 show large numbers of false positive matches that needed review for the least informative scenarios (C, D, F, G and J), when the lowest specific log10LR threshold of 1 was applied. Increasing the log10LR threshold greatly helps to reduce the average number of reported false positive matches with a log10LR threshold of 5 leading to no false positive matches in all scenarios. The average number of false positive matches obtained with tailored thresholds determined in this study, are low and below 50 for every scenario and in each reference population. The values are close to the ones obtained for a specific threshold between 2 and 3 and show that the tailored threshold can be safely implemented to prevent false positive matches as much as possible.
      The most interesting observation remains the comparison between the false negative rates observed between the two types of log10LR thresholds. The use of a specific threshold at 3 showed high heterogeneity between the scenarios due to the different levels of information deduced from them. As expected, scenario A demonstrated the lowest false positive rate, which can be explained by the fact that most of the log10LR values for the related matches were above 3, even with only six overlapping STRs between the UHR and the FM DNA profiles. The highest false negative rates were observed for the less informative scenarios (C and D, with only one FM available) with the specific threshold set at 5, reaching almost 60%. It is interesting to note that these values represent the rate of false negatives for every calculation with 6 to 24 overlapping markers. Hence, nearly all related matches found would be false negative matches when using a low number of overlapping markers are low (whereas no false negative matches would be obtained when calculating log10LR based on 24 markers). This therefore reduces the chance of identifying missing persons with a low-informative pedigree and a low number of overlapping STRs. When applying tailored thresholds, the false negative rates dropped to almost 0% for all tested scenarios. Nearly all of these matches fell into the orange or grey zones, highlighting the probable requirement for additional DNA information and tests to either confirm or reject the match, thereby reducing the overall workload. This method clearly helps to accurately classify all potential matches by minimizing the report of false positive matches and eliminating false negative matches that could adversely affect the search for missing persons with low-informative pedigrees. By applying cross-validation checks between ante-mortem and post-mortem data just to those potential matches considered to be as as informative enough while using the tailored threshold proposed, will help to identify and reject adventitious matches.
      In the long term, the most efficient option for reducing inconclusive matches would be first to raise awareness that international DNA kinship matching is more effective if at least two family members are available. Secondly, recommending the analysis of DNA samples using the most complete STR sets to increase the number of overlapping markers as much as possible would be beneficial to reduce matches found in the orange zone. Thirdly, ante-mortem and/or post-mortem metadata can be attached to the DNA profile to facilitate the interpretation process. It is also important to note that using the same reference population (i.e. Worldwide population) and same log10LR thresholds across several countries would facilitate the comparison of the results obtained through international DNA kinship matching.
      In the future, forensic identifications could also be enhanced using the BONAPARTE software feature enabling the computation of log10LR for other DNA markers such as Y-STR, mitochondrial DNA, and identity SNPs. In the longer term, Massively Parallel Sequencing data will help to reduce adventitious matches by increasing the number of loci and consequently the catalogue of allele sequences, which could positively affect the log10LR value [
      • Casals F.
      • Anglada R.
      • Bonet N.
      • Rasal R.
      • van der Gaag K.J.
      • Hoogenboom J.
      • Solé-Morata N.
      • Comas D.
      • Calafell F.
      Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.
      ,
      • Delest A.
      • Godfrin D.
      • Chantrel Y.
      • Ulus A.
      • Vannier J.
      • Faivre M.
      • Hollard C.
      • Laurent F.-X.
      Sequenced-based French population data from 169 unrelated individuals with Verogen’s ForenSeq DNA signature prep kit.
      ,
      • Parsons T.J.
      • Huel R.M.L.
      • Bajunović Z.
      • Rizvić A.
      Large scale DNA identification: the ICMP experience.
      ,
      • Barrio P.A.
      • Martín P.
      • Alonso A.
      • Müller P.
      • Bodner M.
      • Berger B.
      • Parson W.
      • Budowle B.
      DNASEQEX Consortium, Massively parallel sequence data of 31 autosomal STR loci from 496 Spanish individuals revealed concordance with CE-STR technology and enhanced discrimination power.
      ]. However, this type of data is often unavailable and/or not shared with international DNA kinship requests. International recommendations would certainly be helpful to set standards to allow further participation in solving ongoing unsolved missing person investigations.

      4. Conclusion

      In this publication, we propose an innovative method to perform international DNA kinship matching for missing person investigations to search for potential candidates in DNA databases. Using Worldwide allele frequencies, compiled from 369 previously published reference populations, this method does not require ancestry information about the missing person or the human remains, which is often information that cannot be provided or accurately determined. The interpretation of log10LR values, computed with BONAPARTE, is simplified using optimal cutoff thresholds adapted for low and high numbers of overlapping typed STRs between the DNA profiles of relatives included in the pedigree and the DNA profile of the unidentified human remains. Specifically devised interpretation tables, available for the 10 most common scenarios encountered in missing person investigations, help to assess the strength of the DNA evidence and accurately determine the best outcome. This method is highly effective, with a negligible false positive rate and very high true-positive rate with the correct classification of matches, including those for which the DNA strength is insufficient and could benefit from additional information. It is wise to keep in mind that the only purpose of using cutoff log10LR thresholds is to retrieve promising items through a database search. The obtained log10LR values cannot be extrapolated and used to the probability that H1 (the MP and UHR sample are the same individual) is true [
      • Slooten K.
      Likelihood ratio distributions and the (ir)relevance of error rates.
      ]. This log10LR result has to be combined with other facts in the case to reach a conclusion.
      Our method is an excellent first step when the objective is to quickly and efficiently triage potential candidates in the DNA database. It is also extremely useful in cases with scenarios with low genetic information and/or low number of overlapping STRs, which have a high probability to be rejected when using conventional methods (i.e. specific threshold) [
      • Marsico F.L.
      • Vigeland M.D.
      • Egeland T.
      • Piñero M.H.
      Making decisions in missing person identification cases with low statistical power.
      ]. In the case where additional means of identification are used, such as dental or personal belongings, and when these indicate that the remains could belong to a potentially identified missing person, we recommend recalculating the log10LR using available national or continental allele frequencies to determine the true strength of the DNA evidence. In the future, we hope to further increase the weight of reference populations not yet included in the Worldwide allele frequencies, by including recently published population studies including those on isolated populations. This would require access to the genotypes (and not just the allele frequencies) and, ideally, a quality check such as the one performed by STRidER [
      • Bodner M.
      • Bastisch I.
      • Butler J.M.
      • Fimmers R.
      • Gill P.
      • Gusmão L.
      • Morling N.
      • Phillips C.
      • Prinz M.
      • Schneider P.M.
      • Parson W.
      Recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) on quality control of autosomal Short Tandem Repeat allele frequency databasing (STRidER).
      ]. The method presented in this manuscript is now applied to I-Familia, a new service proposed by the International Criminal Police Organization - INTERPOL to its 195 member countries to identify missing persons globally through international DNA kinship matching. The international comparison of family DNA and unidentified human remains in I-Familia, using the streamlined procedure presented in this study, has already helped to solve missing person investigations, ultimately bringing closure to many families.

      Conflict of interest

      The authors have no conflicts of interest to declare.

      Acknowledgments

      The authors would like to thank Anna Nikolaeva, Klaas Slooten, Bruce Weir and the members of the INTERPOL DNA Monitoring Expert Group for their help and suggestions and recommendations during the developmental validation. The authors would like to acknowledge the three anonymous reviewers for their thorough evaluation of our manuscript and much appreciated the remarks that helped to strengthen this work before publication. The authors would like to thank Wim Wiegerinck and Willem Burgers (Smart Research BV) for the continuous support with the implementation and use of BONAPARTE. Lastly, the authors would like to acknowledge the INTERPOL English Language Department and Katriona Laurent for proofreading the manuscript.

      Appendix A. Supplementary material

      References

      1. Declaration from Agnès Coutou, ICRC’s Protection Adviser, at the 2018 U.N.
        General Assembly’s Human Rights Committee. 2018;
      2. Recommendations on the Use of DNA for the Identification of Missing Persons and Unidentified Human Remains by the INTERPOL DNA Monitoring Expert Group, (2017). 〈http://www.interpol.int〉.

        • Budowle B.
        • Allard M.W.
        • Wilson M.R.
        • Chakraborty R.
        Forensics and mitochondrial DNA: applications, debates, and foundations.
        Annu. Rev. Genom. Hum. Genet. 2003; 4: 119-141https://doi.org/10.1146/annurev.genom.4.070802.110352
        • Butler J.M.
        The future of forensic DNA analysis.
        Philos. Trans. R. Soc. Lond. B. Biol. Sci. 2015; 370https://doi.org/10.1098/rstb.2014.0252
        • Kayser M.
        Forensic use of Y-chromosome DNA: a general overview.
        Hum. Genet. 2017; 136: 621-635https://doi.org/10.1007/s00439-017-1776-9
        • Phillips C.
        • Manzo L.
        • de la Puente M.
        • Fondevila M.
        • Lareu M.V.
        The MASTiFF panel-a versatile multiple-allele SNP test for forensics.
        Int. J. Leg. Med. 2020; 134: 441-450https://doi.org/10.1007/s00414-019-02233-8
        • Laurent F.-X.
        • Vibrac G.
        • Rubio A.
        • Thévenot M.-T.
        • Pène L.
        Les nouvelles technologies d′analyses ADN au service des enquêtes judiciaires.
        Médecine/Sci. 2017; 33: 971-978https://doi.org/10.1051/medsci/20173311014
        • Coble M.D.
        • Buckleton J.
        • Butler J.M.
        • Egeland T.
        • Fimmers R.
        • Gill P.
        • Gusmão L.
        • Guttman B.
        • Krawczak M.
        • Morling N.
        • Parson W.
        • Pinto N.
        • Schneider P.M.
        • Sherry S.T.
        • Willuweit S.
        • Prinz M.
        DNA Commission of the International Society for Forensic Genetics: Recommendations on the validation of software programs performing biostatistical calculations for forensic genetics applications.
        Forensic Sci. Int. Genet. 2016; 25: 191-197https://doi.org/10.1016/j.fsigen.2016.09.002
        • Dongen C.J.B.
        • Slooten K.
        • Burgers W.
        • Wiegerinck W.
        Bayesian networks for victim identification on the basis of DNA profiles.
        Forensic Sci. Int. Genet. Suppl. Ser. 2009; 2: 466-468https://doi.org/10.1016/j.fsigss.2009.08.024
        • Slooten K.
        Validation of DNA-based identification software by computation of pedigree likelihood ratios.
        Forensic Sci. Int. Genet. 2011; 5: 308-315https://doi.org/10.1016/j.fsigen.2010.06.005
        • Kling D.
        • Tillmar A.O.
        • Egeland T.
        Familias 3 – extensions and new functionality.
        Forensic Sci. Int. Genet. 2014; 13: 121-127https://doi.org/10.1016/j.fsigen.2014.07.004
        • Morimoto C.
        • Tsujii H.
        • Manabe S.
        • Fujimoto S.
        • Hirai E.
        • Hamano Y.
        • Tamaki K.
        Development of a software for kinship analysis considering linkage and mutation based on a Bayesian network.
        Forensic Sci. Int. Genet. 2020; 47102279https://doi.org/10.1016/j.fsigen.2020.102279
        • Starinsky-Elbaz S.
        • Ram T.
        • Voskoboinik L.
        • Pasternak Z.
        Weight-of-evidence for DNA identification of missing persons and human remains using CODIS.
        Forensic Sci. Med. Pathol. 2020; 16: 389-394https://doi.org/10.1007/s12024-020-00248-x
        • Collins A.
        • Morton N.E.
        Likelihood ratios for DNA identification.
        Proc. Natl. Acad. Sci. 1994; 91: 6007-6011https://doi.org/10.1073/pnas.91.13.6007
        • Jin Y.
        • Schäeffer A.A.
        • Sherry S.T.
        • Feolo M.
        Quickly identifying identical and closely related subjects in large databases using genotype data.
        PLoS One. 2017; 12: 0179106https://doi.org/10.1371/journal.pone.0179106
        • Henn B.M.
        • Hon L.
        • Macpherson J.M.
        • Eriksson N.
        • Saxonov S.
        • Pe’er I.
        • Mountain J.L.
        Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples.
        PLOS ONE. 2012; 7e34267https://doi.org/10.1371/journal.pone.0034267
        • Seidman D.N.
        • Shenoy S.A.
        • Kim M.
        • Babu R.
        • Woods I.G.
        • Dyer T.D.
        • Lehman D.M.
        • Curran J.E.
        • Duggirala R.
        • Blangero J.
        • Williams A.L.
        Rapid, phase-free detection of long identity-by-descent segments enables effective relationship classification.
        Am. J. Hum. Genet. 2020; 106: 453-466https://doi.org/10.1016/j.ajhg.2020.02.012
        • Tytgat O.
        • Gansemans Y.
        • Weymaere J.
        • Rubben K.
        • Deforce D.
        • Van Nieuwerburgh F.
        Nanopore sequencing of a forensic STR multiplex reveals loci suitable for single-contributor STR profiling.
        Genes. 2020; 11https://doi.org/10.3390/genes11040381
        • Goudet J.
        • Kay T.
        • Weir B.S.
        How to estimate kinship.
        Mol. Ecol. 2018; 27: 4121-4135https://doi.org/10.1111/mec.14833
        • Fortier A.L.
        • Kim J.
        • Rosenberg N.A.
        Human-genetic ancestry inference and false positives in forensic familial searching.
        G3 Genes. 2020; 10: 2893-2902https://doi.org/10.1534/g3.120.401473
        • Algee-Hewitt B.F.B.
        • Edge M.D.
        • Kim J.
        • Li J.Z.
        • Rosenberg N.A.
        Individual identifiability predicts population identifiability in forensic microsatellite markers.
        Curr. Biol. 2016; 26: 935-942https://doi.org/10.1016/j.cub.2016.01.065
        • Thornton T.
        • Tang H.
        • Hoffmann T.J.
        • Ochs-Balcom H.M.
        • Caan B.J.
        • Risch N.
        Estimating kinship in admixed populations.
        Am. J. Hum. Genet. 2012; 91: 122-138https://doi.org/10.1016/j.ajhg.2012.05.024
        • Dou J.
        • Sun B.
        • Sim X.
        • Hughes J.D.
        • Reilly D.F.
        • Tai E.S.
        • Liu J.
        • Wang C.
        Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data.
        PLOS Genet. 2017; 13e1007021https://doi.org/10.1371/journal.pgen.1007021
        • van Dongen C.J.
        • Slooten K.
        • Slagter M.
        • Burgers W.
        • Wiegerinck W.
        Bonaparte: application of new software for missing persons program.
        Forensic Sci. Int. Genet. Suppl. Ser. 2011; 3: e119-e120https://doi.org/10.1016/j.fsigss.2011.08.059
        • Hines D.Z.C.
        • Vennemeyer M.
        • Amory S.
        • Huel R.L.M.
        • Hanson I.
        • Katzmarzyk C.
        • Parsons T.J.
        Prioritized sampling of bone and teeth for DNA analysis in commingled cases.
        Commingled Hum. Remains Methods Recovery Anal. Identif. 2014; : 275-305https://doi.org/10.1016/B978-0-12-405889-7.00013-7
        • Amorim A.
        • Pereira L.
        Pros and cons in the use of SNPs in forensic kinship investigation: a comparative analysis with STRs.
        Forensic Sci. Int. 2005; 150: 17-21https://doi.org/10.1016/j.forsciint.2004.06.018
      3. Familial DNA Database Search System-Hardware/Software Integration Project, Natl. Inst. Justice. (n.d.). https://nij.ojp.gov/library/publications/familial-dna-database-search-system-hardwaresoftware-integration-project (accessed January 20, 2021).

        • Ge J.
        • Budowle B.
        Kinship index variations among populations and thresholds for familial searching.
        PLOS ONE. 2012; 7e37474https://doi.org/10.1371/journal.pone.0037474
        • Ge J.
        • Budowle B.
        • Chakraborty R.
        Choosing relatives for DNA identification of missing persons.
        J. Forensic Sci. 56. 2011; Suppl 1: S23-S28https://doi.org/10.1111/j.1556-4029.2010.01631.x
        • Vigeland M.D.
        • Marsico F.L.
        • Herrera Pi ̃nero M.
        • Egeland T.
        Prioritising family members for genotyping in missing person cases: a general approach combining the statistical power of exclusion and inclusion.
        Forensic Sci. Int. Genet. 2020; 49102376https://doi.org/10.1016/j.fsigen.2020.102376
        • Brustad H.K.
        • Colucci M.
        • Jobling M.A.
        • Sheehan N.A.
        • Egeland T.
        Strategies for pairwise searches in forensic kinship analysis.
        Forensic Sci. Int. Genet. 2021; 54102562https://doi.org/10.1016/j.fsigen.2021.102562
        • Oldt R.F.
        • Kanthaswamy S.
        Expanded CODIS STR allele frequencies - evidence for the irrelevance of race-based DNA databases.
        Leg. Med. Tokyo Jpn. 2020; 42101642https://doi.org/10.1016/j.legalmed.2019.101642
        • Amigo J.
        • Phillips C.
        • Salas T.
        • Formoso L.F.
        • Carracedo Á.
        • Lareu M.
        pop.STR—an online population frequency browser for established and new forensic STRs.
        Forensic Sci. Int. Genet. Suppl. Ser. 2009; 2: 361-362https://doi.org/10.1016/j.fsigss.2009.08.178
      4. National Research Council (US) Committee on DNA Forensic Science: An Update, The Evaluation of Forensic DNA Evidence, National Academies Press (US), Washington (DC), 1996. http://www.ncbi.nlm.nih.gov/books/NBK232610/ (accessed January 20, 2021).

        • Excoffier L.
        • Lischer H.E.
        Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
        Mol. Ecol. Resour. 2010; 10: 564-567https://doi.org/10.1111/j.1755-0998.2010.02847.x
        • Steele C.D.
        • Balding D.J.
        Choice of population database for forensic DNA profile analysis.
        Sci. Justice J. Forensic Sci. Soc. 2014; 54: 487-493https://doi.org/10.1016/j.scijus.2014.10.004
        • McCulloh K.L.
        • Ng J.
        • Oldt R.F.
        • Weise J.A.
        • Viray J.
        • Budowle B.
        • Smith D.G.
        • Kanthaswamy S.
        The genetic structure of Native Americans in North America based on the Globalfiler® STRs, Legal Medicine.
        Leg. Med. Tokyo Jpn. 2016; 23: 49-54https://doi.org/10.1016/j.legalmed.2016.09.007
        • Buckleton J.
        • Curran J.
        • Goudet J.
        • Taylor D.
        • Thiery A.
        • Weir B.S.
        Population-specific F values for forensic STR markers: A worldwide survey.
        Forensic Sci. Int. Genet. 2016; 23: 91-100https://doi.org/10.1016/j.fsigen.2016.03.004
        • Chernomoretz A.
        • Balparda M.
        • Grutta L.L.
        • Calabrese A.
        • Martinez G.
        • Escobar M.S.
        • Sibilla G.
        GENis, an open-source multi-tier forensic DNA information system.
        Forensic Sci. Int. Rep. 2020; 2100132https://doi.org/10.1016/j.fsir.2020.100132
        • Ge J.
        • Budowle B.
        • Chakraborty R.
        DNA identification by pedigree likelihood ratio accommodating population substructure and mutations.
        Invest. Genet. 2010; 1: 8https://doi.org/10.1186/2041-2223-1-8
      5. BONAPARTE User Manual 4.1.10.

        • Spitzer M.
        • Wildenhain J.
        • Rappsilber J.
        • Tyers M.
        BoxPlotR: a web tool for generation of box plots.
        Nat. Methods. 2014; 11: 121-122https://doi.org/10.1038/nmeth.2811
        • Hill C.R.
        • Duewer D.L.
        • Kline M.C.
        • Coble M.D.
        • Butler J.M.
        U.S. population data for 29 autosomal STR loci.
        Forensic Sci. Int. Genet. 2013; 7: e82-e83https://doi.org/10.1016/j.fsigen.2012.12.004
        • Steffen C.R.
        • Coble M.D.
        • Gettings K.B.
        • Vallone P.M.
        Corrigendum to “U.S. Population Data for 29 Autosomal STR Loci” [Forensic Sci. Int. Genet. 7 (2013) e82-e83].
        Forensic Sci. Int. Genet. 2017; 31: e36-e40https://doi.org/10.1016/j.fsigen.2017.08.011
        • Ng J.
        • Oldt R.F.
        • McCulloh K.L.
        • Weise J.A.
        • Viray J.
        • Budowle B.
        • Smith D.G.
        • Kanthaswamy S.
        Native American population data based on the Globalfiler(®) autosomal STR loci.
        Forensic Sci. Int. Genet. 2016; 24: e12-e13https://doi.org/10.1016/j.fsigen.2016.06.014
        • Martínez-Cortés G.
        • Zuñiga-Chiquette F.
        • Celorio-Sánchez A.S.
        • Ruiz García E.
        • Antelo-Figueroa A.B.
        • Dalpozzo-Valenzuela V.
        • Valenzuela-Coronado A.
        • Rangel-Villalobos H.
        Population data for 21 autosomal STR loci (GlobalFiler kit) in two Mexican-Mestizo population from the northwest, Mexico.
        Int. J. Leg. Med. 2019; 133: 781-783https://doi.org/10.1007/s00414-018-1950-1
        • Al-Eitan L.N.
        • Darwish N.N.
        • Hakooz N.M.
        • Dajani R.B.
        Investigation of the forensic GlobalFiler loci in the genetically isolated Circassian subpopulation in Jordan.
        Gene. 2020; 733144269https://doi.org/10.1016/j.gene.2019.144269
      6. 2020 World Population by Country, https://worldpopulationreview.com/.

        • Bodner M.
        • Bastisch I.
        • Butler J.M.
        • Fimmers R.
        • Gill P.
        • Gusmão L.
        • Morling N.
        • Phillips C.
        • Prinz M.
        • Schneider P.M.
        • Parson W.
        Recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) on quality control of autosomal Short Tandem Repeat allele frequency databasing (STRidER).
        Forensic Sci. Int. Genet. 2016; 24: 97-102https://doi.org/10.1016/j.fsigen.2016.06.008
        • Novroski N.M.M.
        • King J.L.
        • Churchill J.D.
        • Seah L.H.
        • Budowle B.
        Characterization of genetic sequence variation of 58 STR loci in four major population groups.
        Forensic Sci. Int. Genet. 2016; 25: 214-226https://doi.org/10.1016/j.fsigen.2016.09.007
        • Casals F.
        • Anglada R.
        • Bonet N.
        • Rasal R.
        • van der Gaag K.J.
        • Hoogenboom J.
        • Solé-Morata N.
        • Comas D.
        • Calafell F.
        Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.
        Forensic Sci. Int. Genet. 2017; 30: 66-70https://doi.org/10.1016/j.fsigen.2017.06.006
        • Churchill J.D.
        • Schmedes S.E.
        • King J.L.
        • Budowle B.
        Evaluation of the Illumina(®) Beta Version ForenSeqTM DNA Signature Prep Kit for use in genetic profiling.
        Forensic Sci. Int. Genet. 2016; 20: 20-29https://doi.org/10.1016/j.fsigen.2015.09.009
        • Devesse L.
        • Ballard D.
        • Davenport L.
        • Riethorst I.
        • Mason-Buck G.
        • Syndercombe Court D.
        Concordance of the ForenSeqTM system and characterisation of sequence-specific autosomal STR alleles across two major population groups.
        Forensic Sci. Int. Genet. 2018; 34: 57-61https://doi.org/10.1016/j.fsigen.2017.10.012
        • Hussing C.
        • Bytyci R.
        • Huber C.
        • Morling N.
        • Børsting C.
        The Danish STR sequence database: duplicate typing of 363 Danes with the ForenSeqTM DNA Signature Prep Kit.
        Int. J. Leg. Med. 2019; 133: 325-334https://doi.org/10.1007/s00414-018-1854-0
        • Delest A.
        • Godfrin D.
        • Chantrel Y.
        • Ulus A.
        • Vannier J.
        • Faivre M.
        • Hollard C.
        • Laurent F.-X.
        Sequenced-based French population data from 169 unrelated individuals with Verogen’s ForenSeq DNA signature prep kit.
        Forensic Sci. Int. Genet. 2020; 47102304https://doi.org/10.1016/j.fsigen.2020.102304
        • Laurent F.X.
        • Ausset L.
        • Clot M.
        • Jullien S.
        • Chantrel Y.
        • Hollard C.
        • Pene L.
        Automation of library preparation using Illumina ForenSeq kit for routine sequencing of casework samples.
        Forensic Sci. Int. Genet. Suppl. Ser. 2017; 6: e415-e417https://doi.org/10.1016/j.fsigss.2017.09.156
        • Hollard C.
        • Ausset L.
        • Chantrel Y.
        • Jullien S.
        • Clot M.
        • Faivre M.
        • Suzanne É.
        • Pène L.
        • Laurent F.-X.
        Automation and developmental validation of the ForenSeqTM DNA Signature Preparation kit for high-throughput analysis in forensic laboratories.
        Forensic Sci. Int. Genet. 2019; 40: 37-45https://doi.org/10.1016/j.fsigen.2019.01.010
        • Weir B.S.
        • Goudet J.
        A unified characterization of population structure and relatedness.
        Genetics. 2017; 206: 2085-2103https://doi.org/10.1534/genetics.116.198424
        • Ochoa A.
        • Storey J.D.
        Estimating FST and kinship for arbitrary population structures.
        PLOS Genet. 2021; 17e1009241https://doi.org/10.1371/journal.pgen.1009241
        • Slooten K.
        Likelihood ratio distributions and the (ir)relevance of error rates.
        Forensic Sci. Int. Genet. 2020; 44102173https://doi.org/10.1016/j.fsigen.2019.102173
        • Ge J.
        • Budowle B.
        How many familial relationship testing results could be wrong?.
        PLoS Genet. 2020; 16https://doi.org/10.1371/journal.pgen.1008929
        • Tamura T.
        • Osawa M.
        • Ochiai E.
        • Suzuki T.
        • Nakamura T.
        Evaluation of advanced multiplex short tandem repeat systems in pairwise kinship analysis.
        Leg. Med. Tokyo Jpn. 2015; 17: 320-325https://doi.org/10.1016/j.legalmed.2015.03.005
        • Turrina S.
        • Ferrian M.
        • Caratti S.
        • Cosentino E.
        • De Leo D.
        Kinship analysis: assessment of related vs unrelated based on defined pedigrees.
        Int. J. Leg. Med. 2016; 130: 113-119https://doi.org/10.1007/s00414-015-1290-3
        • Cho S.
        • Shin E.S.
        • Yu H.J.
        • Lee J.H.
        • Seo H.J.
        • Kim M.Y.
        • Lee S.D.
        Set up of cutoff thresholds for kinship determination using SNP loci.
        Forensic Sci. Int. Genet. 2017; 29: 1-8https://doi.org/10.1016/j.fsigen.2017.03.009
        • Li R.
        • Li H.
        • Peng D.
        • Hao B.
        • Wang Z.
        • Huang E.
        • Wu R.
        • Sun H.
        Improved pairwise kinship analysis using massively parallel sequencing.
        Forensic Sci. Int. Genet. 2019; 38: 77-85https://doi.org/10.1016/j.fsigen.2018.10.006
        • Parsons T.J.
        • Huel R.M.L.
        • Bajunović Z.
        • Rizvić A.
        Large scale DNA identification: the ICMP experience.
        Forensic Sci. Int. Genet. 2019; 38: 236-244https://doi.org/10.1016/j.fsigen.2018.11.008
        • Latham K.E.
        • Miller J.J.
        DNA recovery and analysis from skeletal material in modern forensic contexts.
        Forensic Sci. Res. 2018; 4: 51-59https://doi.org/10.1080/20961790.2018.1515594
        • Machado P.
        • Gusmão L.
        • Conde-Sousa E.
        • Pinto N.
        The influence of the different mutation models in kinship evaluation.
        Forensic Sci. Int. Genet. Suppl. Ser. 2017; 6: e255-e256https://doi.org/10.1016/j.fsigss.2017.09.093
        • Barrio P.A.
        • Martín P.
        • Alonso A.
        • Müller P.
        • Bodner M.
        • Berger B.
        • Parson W.
        • Budowle B.
        DNASEQEX Consortium, Massively parallel sequence data of 31 autosomal STR loci from 496 Spanish individuals revealed concordance with CE-STR technology and enhanced discrimination power.
        Forensic Sci. Int. Genet. 2019; 42: 49-55https://doi.org/10.1016/j.fsigen.2019.06.009
        • Marsico F.L.
        • Vigeland M.D.
        • Egeland T.
        • Piñero M.H.
        Making decisions in missing person identification cases with low statistical power.
        Forensic Sci. Int. Genet. 2021; 54102519https://doi.org/10.1016/j.fsigen.2021.102519