Advertisement
Research paper| Volume 59, 102705, July 2022

Microhaplotype and Y-SNP/STR (MY): A novel MPS-based system for genotype pattern recognition in two-person DNA mixtures

  • Haoliang Fan
    Correspondence
    Corresponding author at: Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China.
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China

    Institute of Archaeological Science, Fudan University, Shanghai 200433, China

    School of Basic Medicine and Life Science, Hainan Medical University, Haikou 571199, Hainan, China
    Search for articles by this author
  • Qiqian Xie
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
    Search for articles by this author
  • Lingxiang Wang
    Affiliations
    Institute of Archaeological Science, Fudan University, Shanghai 200433, China
    Search for articles by this author
  • Kai Ru
    Affiliations
    Institute of Archaeological Science, Fudan University, Shanghai 200433, China
    Search for articles by this author
  • Xiaohui Tan
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
    Search for articles by this author
  • Jiuyang Ding
    Affiliations
    School of Forensic Medicine, Guizhou Medical University, Guiyang 550004, Guizhou, China
    Search for articles by this author
  • Xiao Wang
    Affiliations
    Department of Psychiatry, The First Clinical Medical College, Shanxi Medical University, Taiyuan 030001, Shanxi, China
    Search for articles by this author
  • Jian Huang
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
    Search for articles by this author
  • Zhuo Wang
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China

    Department of Infertility and Sexual Medicine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, Guangdong, China
    Search for articles by this author
  • Yanning Li
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China

    School of Basic Medicine, Gannan Medical University, Ganzhou 341000, Jiangxi, China
    Search for articles by this author
  • Xiaohan Wang
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
    Search for articles by this author
  • Yitong He
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
    Search for articles by this author
  • Cihang Gu
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
    Search for articles by this author
  • Min Liu
    Affiliations
    School of Information and Communications Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
    Search for articles by this author
  • Shiwen Ma
    Affiliations
    Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China
    Search for articles by this author
  • Shaoqing Wen
    Correspondence
    Corresponding authors.
    Affiliations
    Institute of Archaeological Science, Fudan University, Shanghai 200433, China
    Search for articles by this author
  • Pingming Qiu
    Correspondence
    Corresponding authors.
    Affiliations
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
    Search for articles by this author
Open AccessPublished:April 13, 2022DOI:https://doi.org/10.1016/j.fsigen.2022.102705

      Highlights

      • Microhaplotype and Y-SNP/STR (MY) system comprising 114 Y-SNPs, 45 Y-STRs, and 22 Microhaplotypes was developed based on multiplex PCR and 150-bp paired-end sequencing.
      • Twenty-six two-person genotype combinations were integrated into nine genotype patterns.
      • MY system-based genotype pattern recognition, a regression-based method to identify the genotype pattern for each MH locus, is proposed for two-person DNA mixture deconvolution. (application range: 1:10–1:2).

      Abstract

      Backgrounds

      Y-chromosomal haplotypes based on Y-short tandem repeats (STRs) and Y-single nucleotide polymorphisms/insertion and deletion polymorphisms (SNPs/InDels) are used to characterize paternal lineages of unknown male trace donors. However, Y-chromosomal genetic markers are not currently sufficient for precise individual identification. Microhaplotype (MH), generally < 200 bp on autosomes and consisting of two or more SNPs, was recently introduced in forensic genetics with the development of massive parallel sequencing technology and may facilitate identification and DNA mixture deconvolution. Therefore, combining the two kinds of genetic markers may be beneficial in many forensic scenarios, especially crime scenes with male suspects, such as sexual assault cases.

      Methods

      In the present study, we developed a novel MPS-based panel, Microhaplotype and Y-SNP/STR (MY), by multiplex PCR and 150-bp paired-end sequencing, including 114 Y-SNPs (twelve dominant Y-DNA haplogroups), 45 Y-STRs (N-1 stutter < 0.09; estimated mutation rate < 5 × 10−3), and 22 MHs (allele coverage ratio > 0.91; pairwise distance > 10 Mb). Additionally, MY system-based genotype pattern recognition (GPR), a regression-based method to identify the genotype pattern for each MH locus, is proposed for two-person DNA mixture deconvolution. We integrated 26 two-person genotype combinations into nine genotype patterns and validated the application range of GPR based on DNA profiles of ten sets of simulated male-male DNA mixtures (1:10–1:2).

      Results

      The effective number of alleles (Ae) ranged from 3.62 to 14.72, with an average of 7.17, in 100 Chinese Guangdong Han individuals. The cumulative discrimination power was 1–5.00 × 10−31, and the cumulative power of exclusion was 1–5.00 × 10−8 and 1–4.85 × 10−12 for duo and trio paternity testing, respectively. Furthermore, the actual mixing ratio-depth of coverage (DoC) ratio (RDoC) regression relationships were established for different genetic markers and genotype patterns. In five overlapping areas, genotype differentiation of the major and minor contributors required likelihood ratio methods. In nonoverlapping areas, the genotype pattern could be recognized by comparing the observed RDoC and RDoC ranges.

      Conclusion

      The GPR can be used to deconvolute two-person DNA mixtures (application range: 1:10–1:2) for individual identification.

      Keywords

      1. Introduction

      In crime scene investigation, Y-chromosomal haplotypes composed of Y-chromosomal short tandem repeats (Y-STRs) and Y-DNA haplogroups can exclude male suspects from involvement in crime, identify the paternal lineage of male perpetrators, highlight multiple male contributors to a trace, and provide investigative leads for finding unknown male perpetrators. In principle, with sufficient numbers of rapidly mutating (RM) Y-STR markers available, closely and especially distantly related men can be separated by means of observed mutations [
      • Kayser M.
      Forensic use of Y-chromosome DNA: a general overview.
      ]. However, the available RM Y-STRs are currently insufficient for precise individual identification [
      • Fan H.
      • Zeng Y.
      • Wu W.
      • et al.
      The Y-STR landscape of coastal southeastern Han: Forensic characteristics, haplotype analyses, mutation rates, and population genetics.
      ,
      • Ballantyne K.N.
      • Goedbloed M.
      • Fang R.
      • et al.
      Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications.
      ,
      • Ay M.
      • Serin A.
      • Sevay H.
      • Gurkan C.
      • Canan H.
      Genetic characterisation of 13 rapidly mutating Y-STR loci in 100 father and son pairs from South and East Turkey.
      ,
      • Claerhout S.
      • Vandenbosch M.
      • Nivelle K.
      • et al.
      Determining Y-STR mutation rates in deep-routing genealogies: Identification of haplogroup differences.
      ,
      • Ge J.
      • Budowle B.
      • Aranda X.G.
      • Planz J.V.
      • Eisenberg A.J.
      • Chakraborty R.
      Mutation rates at Y chromosome short tandem repeats in Texas populations.
      ].
      Biological samples, which are collected from bodily fluids (e.g., blood, saliva, vaginal secretions, and semen) in criminal disputes (e.g., sexual and physical assault, sodomy, and murder), can include mixed-DNA profiles of two or more same- or opposite-sex donors [
      • Oldoni F.
      • Podini D.
      Forensic molecular biomarkers for mixture analysis.
      ]. Although STRs are useful for addressing forensic DNA-oriented questions, the stutters generated by PCR replication slippage represent the most serious limitation for DNA mixture deconvolution, resulting in DNA background masking of the alleles of the minor donor at higher mixture ratios [
      • Fregeau C.J.
      • Bowen K.L.
      • Leclair B.
      • Trudel I.
      • Bishop L.
      • Fourney R.M.
      AmpFlSTR profiler Plus short tandem repeat DNA analysis of casework samples, mixture samples, and nonhuman DNA samples amplified under reduced PCR volume conditions (25 microL).
      ,
      • Green R.L.
      • Lagace R.E.
      • Oldroyd N.J.
      • Hennessy L.K.
      • Mulero J.J.
      Developmental validation of the AmpFlSTR(R) NGM SElect PCR amplification kit: a next-generation STR multiplex with the SE33 locus. Forensic science international.
      ]. The capillary electrophoresis (CE)-based mixed-DNA profiles conventionally used for mixture interpretation can be influenced by various factors, including but not limited to drop-in/out alleles, pull-up of signals from dye colour, shared alleles, and split peaks caused by incomplete adenylation [
      • Oldoni F.
      • Podini D.
      Forensic molecular biomarkers for mixture analysis.
      ,
      • Walsh P.S.
      • Fildes N.J.
      • Reynolds R.
      Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA.
      ,
      • Butler J.M.
      • Kline M.C.
      • Coble M.D.
      NIST interlaboratory studies involving DNA mixtures (MIX05 and MIX13): Variation observed and lessons learned. Forensic science international.
      ,
      • Barrio P.A.
      • Crespillo M.
      • Luque J.A.
      • et al.
      GHEP-ISFG collaborative exercise on mixture profiles (GHEP-MIX06). Reporting conclusions: results and evaluation.
      ]. In addition, the numbers of STR markers are generally limited to 40 when using the CE method. Hence, the traditional STR-CE method used in forensic genetics may not be the optimal solution for DNA mixture deconvolution.
      With the development of massive parallel sequencing (MPS) technology [
      • Van Neste C.
      • Van Nieuwerburgh F.
      • Van Hoofstat D.
      • Deforce D.
      Forensic STR analysis using massive parallel sequencing.
      ,
      • Ambers A.D.
      • Churchill J.D.
      • King J.L.
      • et al.
      More comprehensive forensic genetic marker analyses for accurate human remains identification using massively parallel DNA sequencing.
      ,
      • Borsting C.
      • Morling N.
      Next generation sequencing and its applications in forensic genetics.
      ,
      • Budowle B.
      • Schmedes S.E.
      • Wendt F.R.
      Increasing the reach of forensic genetics with massively parallel sequencing.
      ,
      • Gettings K.B.
      • Aponte R.A.
      • Vallone P.M.
      • Butler J.M.
      STR allele sequence variation: current knowledge and future issues.
      ,
      • Fan H.
      • Du Z.
      • Wang F.
      • et al.
      The forensic landscape and the population genetic analyses of Hainan Li based on massively parallel sequencing DNA profiling.
      ,
      • Almalki N.
      • Chow H.Y.
      • Sharma V.
      • Hart K.
      • Siegel D.
      • Wurmbach E.
      Systematic assessment of the performance of illumina's MiSeq FGx forensic genomics system.
      ,
      • Butler J.M.
      The future of forensic DNA analysis.
      ,
      • Barrio P.A.
      • Martin P.
      • Alonso A.
      • et al.
      Massively parallel sequence data of 31 autosomal STR loci from 496 Spanish individuals revealed concordance with CE-STR technology and enhanced discrimination power.
      ,
      • Yu Z.
      • Xie Q.
      • Zhao Y.
      • Duan L.
      • Qiu P.
      • Fan H.
      NGS plus bacterial culture: a more accurate method for diagnosing forensic-related nosocomial infections.
      ,
      • Gorden E.M.
      • Sturk-Andreaggi K.
      • Marshall C.
      Capture enrichment and massively parallel sequencing for human identification.
      ,
      • Fan H.
      • Wang L.
      • Liu C.
      • et al.
      Development and validation of a novel 133-plex forensic STR panel (52 STRs and 81 Y-STRs) using single-end 400 bp massive parallel sequencing.
      ], emerging microhaplotypes (i.e., microhaps or MHs, generally <200 bp and consisting of two or more closely linked single nucleotide polymorphisms (SNPs) with three or more allelic combinations [
      • Oldoni F.
      • Podini D.
      Forensic molecular biomarkers for mixture analysis.
      ,
      • Oldoni F.
      • Kidd K.K.
      • Podini D.
      Microhaplotypes in forensic genetics.
      ,
      • Kidd K.K.
      • Pakstis A.J.
      • Speed W.C.
      • et al.
      Current sequencing technology makes microhaplotypes a powerful new type of genetic marker for forensics.
      ]) exhibit promising characteristics for further enhancing the deconvolution of DNA mixtures. MHs present some advantages over standard STRs: (1) a multiallelic nature and the absence of STR structures, which prevents Taq polymerase slippage and stutter peak generation; (2) balanced PCR amplification due to the similar lengths of MH amplicons (preferential PCR amplification for shorter alleles commonly occurs with STRs); and (3) lower mutation rates than STRs. Since Kidd proposed this powerful new type of genetic marker in 2013 [
      • Kidd K.K.
      • Pakstis A.J.
      • Speed W.C.
      • et al.
      Current sequencing technology makes microhaplotypes a powerful new type of genetic marker for forensics.
      ], an increasing number of studies have primarily focused on the following aspects: (1) finding and evaluating more MHs with higher effective number of alleles (Ae) to yield low random match probabilities (RMPs) and high probabilities of enhancing deconvolution performance [
      • Wu R.
      • Li H.
      • Li R.
      • et al.
      Identification and sequencing of 59 highly polymorphic microhaplotypes for analysis of DNA mixtures.
      ,
      • Gandotra N.
      • Speed W.C.
      • Qin W.
      • et al.
      Validation of novel forensic DNA markers using multiplex microhaplotype sequencing.
      ]; (2) expanding the forensic application scenarios of MHs (e.g., biogeographic ancestry inference [
      • Gandotra N.
      • Speed W.C.
      • Qin W.
      • et al.
      Validation of novel forensic DNA markers using multiplex microhaplotype sequencing.
      ,
      • Bulbul O.
      • Speed W.C.
      • Gurkan C.
      • et al.
      Improving ancestry distinctions among Southwest Asian populations.
      ,
      • Chen P.
      • Zhu W.
      • Tong F.
      • et al.
      Identifying novel microhaplotypes for ancestry inference.
      ,
      • Cheung E.Y.Y.
      • Phillips C.
      • Eduardoff M.
      • Lareu M.V.
      • McNevin D.
      Performance of ancestry-informative SNP and microhaplotype markers.
      ,
      • de la Puente M.
      • Ruiz-Ramirez J.
      • Ambroa-Conde A.
      • et al.
      Broadening the applicability of a custom multi-platform panel of microhaplotypes: bio-geographical ancestry inference and expanded reference data.
      ,
      • Jin X.
      • Zhang X.
      • Shen C.
      • et al.
      A highly polymorphic panel consisting of microhaplotypes and compound markers with the NGS and its forensic efficiency evaluations in Chinese two groups.
      ,
      • Kidd K.K.
      • Bulbul O.
      • Gurkan C.
      • et al.
      Genetic relationships of Southwest Asian and Mediterranean populations.
      ,
      • Oldoni F.
      • Yoon L.
      • Wootton S.C.
      • Lagace R.
      • Kidd K.K.
      • Podini D.
      Population genetic data of 74 microhaplotypes in four major U.S. population groups.
      ,
      • Phillips C.
      • McNevin D.
      • Kidd K.K.
      • et al.
      MAPlex - A massively parallel sequencing ancestry analysis multiplex for Asia-Pacific populations.
      ,
      • Xavier C.
      • de la Puente M.
      • Phillips C.
      • et al.
      Forensic evaluation of the Asia Pacific ancestry-informative MAPlex assay.
      ,
      • Zhu J.
      • Lv M.
      • Zhou N.
      • et al.
      Genotyping polymorphic microhaplotype markers through the Illumina((R)) MiSeq platform for forensics.
      ,
      • Bulbul O.
      • Pakstis A.J.
      • Soundararajan U.
      • et al.
      Ancestry inference of 96 population samples using microhaplotypes.
      ,
      • Kidd K.K.
      • Speed W.C.
      • Pakstis A.J.
      • et al.
      Evaluating 130 microhaplotypes across a global set of 83 populations.
      ], analysis of degraded samples [
      • Bose N.
      • Carlberg K.
      • Sensabaugh G.
      • Erlich H.
      • Calloway C.
      Target capture enrichment of nuclear SNP markers for massively parallel sequencing of degraded and mixed samples.
      ,
      • de la Puente M.
      • Phillips C.
      • Xavier C.
      • et al.
      Building a custom large-scale panel of novel microhaplotypes for forensic identification using MiSeq and Ion S5 massively parallel sequencing systems.
      ,
      • Fregeau C.J.
      Validation of the Verogen ForenSeq DNA Signature Prep kit/Primer Mix B for phenotypic and biogeographical ancestry predictions using the Micro MiSeq(R) Flow Cells. Forensic science international.
      ,
      • Liu J.
      • Li W.
      • Wang J.
      • et al.
      A new set of DIP-SNP markers for detection of unbalanced and degraded DNA mixtures.
      ], human and missing-person identification [
      • Wu R.
      • Li H.
      • Li R.
      • et al.
      Identification and sequencing of 59 highly polymorphic microhaplotypes for analysis of DNA mixtures.
      ,
      • Zhu J.
      • Lv M.
      • Zhou N.
      • et al.
      Genotyping polymorphic microhaplotype markers through the Illumina((R)) MiSeq platform for forensics.
      ,
      • Jin X.Y.
      • Cui W.
      • Chen C.
      • et al.
      Developing and population analysis of a new multiplex panel of 18 microhaplotypes and compound markers using next generation sequencing and its application in the Shaanxi Han population.
      ,
      • Kureshi A.
      • Li J.
      • Wen D.
      • Sun S.
      • Yang Z.
      • Zha L.
      Construction and forensic application of 20 highly polymorphic microhaplotypes.
      ,
      • Pang J.B.
      • Rao M.
      • Chen Q.F.
      • et al.
      A 124-plex microhaplotype panel based on next-generation sequencing developed for forensic applications.
      ,
      • Phillips C.
      • Amigo J.
      • Tillmar A.O.
      • et al.
      A compilation of tri-allelic SNPs from 1000 Genomes and use of the most polymorphic loci for a large-scale human identification panel.
      ,
      • Turchi C.
      • Melchionda F.
      • Pesaresi M.
      • Tagliabracci A.
      Evaluation of a microhaplotypes panel for forensic genetics using massive parallel sequencing technology.
      ,
      • van der Gaag K.J.
      • de Leeuw R.H.
      • Laros J.F.J.
      • den Dunnen J.T.
      • de Knijff P.
      Short hypervariable microhaplotypes: a novel set of very short high discriminating power loci without stutter artefacts.
      ], and relationship testing [
      • Staadig A.
      • Tillmar A.
      Evaluation of microhaplotypes in forensic kinship analysis from a Swedish population perspective.
      ,
      • Sun S.
      • Liu Y.
      • Li J.
      • et al.
      Development and application of a nonbinary SNP-based microhaplotype panel for paternity testing involving close relatives.
      ,
      • Wu R.
      • Chen H.
      • Li R.
      • et al.
      Pairwise kinship testing with microhaplotypes: can advancements be made in kinship inference with these markers?.
      ,
      • Zhu J.
      • Chen P.
      • Qu S.
      • et al.
      Evaluation of the microhaplotype markers in kinship analysis.
      ,
      • Bai Z.
      • Zhao H.
      • Lin S.
      • et al.
      Evaluation of a microhaplotype-based noninvasive prenatal test in twin gestations: determination of paternity, zygosity, and fetal fraction.
      ,
      • Ou X.
      • Qu N.
      Noninvasive prenatal paternity testing by target sequencing microhaps.
      ,
      • Qu N.
      • Xie Y.
      • Li H.
      • et al.
      Noninvasive prenatal paternity testing using targeted massively parallel sequencing.
      ,
      • Wang J.Y.T.
      • Whittle M.R.
      • Puga R.D.
      • Yambartsev A.
      • Fujita A.
      • Nakaya H.I.
      Noninvasive prenatal paternity determination using microhaplotypes: a pilot study.
      ]); and (3) deconvoluting two-person and more-than-two-person DNA mixtures by different probabilistic models [
      • de la Puente M.
      • Phillips C.
      • Xavier C.
      • et al.
      Building a custom large-scale panel of novel microhaplotypes for forensic identification using MiSeq and Ion S5 massively parallel sequencing systems.
      ,
      • Bennett L.
      • Oldoni F.
      • Long K.
      • et al.
      Mixture deconvolution by massively parallel sequencing of microhaplotypes.
      ,
      • Chen P.
      • Deng C.
      • Li Z.
      • et al.
      A microhaplotypes panel for massively parallel sequencing analysis of DNA mixtures.
      ,
      • Chen P.
      • Yin C.
      • Li Z.
      • et al.
      Evaluation of the microhaplotypes panel for DNA mixture analyses.
      ,
      • Crysup B.
      • Woerner A.E.
      • King J.L.
      • Budowle B.
      Graph algorithms for mixture interpretation.
      ,
      • Oldoni F.
      • Bader D.
      • Fantinato C.
      • et al.
      A sequence-based 74plex microhaplotype assay for analysis of forensic DNA mixtures.
      ,
      • Coble M.D.
      • Bright J.A.
      Probabilistic genotyping software: an overview.
      ,
      • Alladio E.
      • Omedei M.
      • Cisana S.
      • et al.
      DNA mixtures interpretation - a proof-of-concept multi-software comparison highlighting different probabilistic methods’ performances on challenging samples.
      ,
      • Benschop C.C.
      • Sijen T.
      LoCIM-tool: an expert’s assistant for inferring the major contributor’s alleles in mixed consensus DNA profiles.
      ,
      • Bleka O.
      • Benschop C.C.G.
      • Storvik G.
      • Gill P.
      A comparative study of qualitative and quantitative models used to interpret complex STR DNA profiles.
      ,
      • Bleka O.
      • Eduardoff M.
      • Santos C.
      • Phillips C.
      • Parson W.
      • Gill P.
      Open source software EuroForMix can be used to analyse complex SNP mixtures.
      ,
      • Bleka O.
      • Storvik G.
      • Gill P.
      EuroForMix: an open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts.
      ,
      • Bright J.A.
      • Taylor D.
      • McGovern C.
      • et al.
      Developmental validation of STRmix, expert software for the interpretation of forensic DNA profiles. Forensic science international.
      ,
      • Buckleton J.S.
      • Bright J.A.
      • Gittelson S.
      • et al.
      The Probabilistic Genotyping Software STRmix: Utility and Evidence for its Validity.
      ,
      • Taylor D.
      • Bright J.A.
      • Buckleton J.
      The interpretation of single source and mixed DNA profiles.
      ]. To date, however, these markers have remained as complements to conventional STR analysis in mixture profiling. This is due to the disadvantages of MHs: (1) the presence of fewer alleles than for most STRs, which leads to a requirement for more MHs to reach efficiencies comparable to those of STRs; (2) the lack of necessary population data for forensic applications; and (3) the absence of appropriate workflows and pipelines for sequencing, data assembly and analysis within the global forensic DNA community. Even so, compared to the STR-CE approach, the MHs generated by MPS possess enormous potential for DNA mixture deconvolution [
      • Oldoni F.
      • Podini D.
      Forensic molecular biomarkers for mixture analysis.
      ,
      • Oldoni F.
      • Kidd K.K.
      • Podini D.
      Microhaplotypes in forensic genetics.
      ,
      • Yang J.
      • Lin D.
      • Deng C.
      • et al.
      The advances in DNA mixture interpretation.
      ].
      Thus, based on both the advantages and disadvantages of Y-SNPs/STRs and MHs, the genetic markers can complement each other when used together, which would be helpful for crime scene investigations, especially for crime scenes with male suspects. In the present study, we developed a novel MY system comprising 114 Y-SNPs, 45 Y-STRs, and 22 MHs, which was based on multiplex PCR and 150-bp paired-end sequencing technologies [
      • Wu R.
      • Li H.
      • Li R.
      • et al.
      Identification and sequencing of 59 highly polymorphic microhaplotypes for analysis of DNA mixtures.
      ,
      • Ou X.
      • Qu N.
      Noninvasive prenatal paternity testing by target sequencing microhaps.
      ,
      • Bootsma M.L.
      • Gruenthal K.M.
      • McKinney G.J.
      • et al.
      A GT-seq panel for walleye (Sander vitreus) provides important insights for efficient development and implementation of amplicon panels in non-model organisms.
      ,
      • Qu N.
      • Lin S.
      • Gao Y.
      • Liang H.
      • Zhao H.
      • Ou X.
      A microhap panel for kinship analysis through massively parallel sequencing technology.
      ]. The MY system contains two main components: Y-chromosomal markers, which could provide some additional information in investigations (e.g., Y haplotype-based familial searching and paternal/kinship determination), and MH loci, which can be applied to deconvolute DNA mixtures for human individual identification. In addition, genotype pattern recognition (GPR) of two-person DNA mixtures is proposed based on the MY system. A total of 26 different genotype combinations were integrated into nine distinct genotype patterns in two-person DNA mixtures with a deduced application range of 1:10.11–1:2.10. We simulated ten sets of male-male DNA mixtures with actual mixing ratios (AMRs) from 1:10–1:2 to validate the application range of GPR. The regression relationships between the AMR and depth of coverage (DoC) ratio (RDoC) were established for different genetic markers and genotype patterns. The major and minor genotypes could be inferred by GPR and likelihood ratio (LR)-based methods. Hence, GPR could be used to deconvolute two-person DNA mixtures (application range: 1:10–1:2) for individual identification.

      2. Material and methods

      2.1 MY system

      2.1.1 Y-SNPs

      According to the International Society of Genetic Genealogy (ISOGG), the 114 selected Y-SNPs of the MY system mainly cover twelve dominant and representative Y haplogroups (C, D, E, F, G, I, J, K, Q, R, N, and O). The C, D, N and O haplogroups account for more than 93% of the Y-chromosomal genetic make-up in East Asian (EAS) populations [
      • Jin L.
      • Su B.
      Natives or immigrants: modern human origin in east Asia.
      ,
      • Shi H.
      • Zhong H.
      • Peng Y.
      • et al.
      Y chromosome evidence of earliest modern human settlement in East Asia and multiple origins of Tibetan and Japanese populations.
      ,
      • Su B.
      • Xiao J.
      • Underhill P.
      • et al.
      Y-Chromosome evidence for a northward migration of modern humans into Eastern Asia during the last Ice Age.
      ]; therefore, we selected 99 common and representative Y-SNPs to refine the four major haplogroups, especially the O haplogroup (containing 71 Y-SNPs to refine the downstream subhaplogroups). Detailed Y-SNP information is provided in Supplementary Table S1 and Fig. 1.
      Fig. 1
      Fig. 1Detailed information of 181 genetic markers (114 Y-SNPs, 45 Y-STRs, and 22 MHs) in the MY system. (*, global average Ae value of 2504 individuals from 26 global populations; Ae, effective number of alleles; MH, microhaplotype. Detailed information on the MY genetic markers is provided in .).

      2.1.2 Y-STRs

      Here, the Y-STR amplicon length, mutation rate (μ) [
      • Fan H.
      • Zeng Y.
      • Wu W.
      • et al.
      The Y-STR landscape of coastal southeastern Han: Forensic characteristics, haplotype analyses, mutation rates, and population genetics.
      ,
      • Ballantyne K.N.
      • Goedbloed M.
      • Fang R.
      • et al.
      Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications.
      ], and stutter ratio (N-1 stutter < 0.09, see 2.6.2 for details) were taken into consideration for design and optimization. Eventually, a total of 45 single-copy Y-STRs, consisting of 23 slowly mutating (SM) Y-STRs (μ < 1 × 10−3), 19 moderately mutating (MM) Y-STRs (μ, 1 × 10−3- 5 × 10−3), 2 fast-mutating (FM) Y-STRs (μ, 5 × 10−3- 1 × 10−2), and one RM Y-STR (μ > 10−2), were included in the MY system: DYS388, DYS391, DYS392, DYS393, DYS434, DYS435, DYS439, DYS450, DYS453, DYS454, DYS455, DYS460, DYS462, DYS472, DYS476, DYS485, DYS492, DYS502, DYS508, DYS511, DYS512, DYS513, DYS530, DYS531, DYS533, DYS538, DYS541, DYS549, DYS556, DYS565, DYS568, DYS570, DYS571, DYS572, DYS573, DYS576, DYS578, DYS585, DYS588, DYS590, DYS613, DYS616, DYS638, DYS640, and DYS641. The detailed locations, estimated mutation rates and repeat motifs of the Y-STRs are provided in Supplementary Table S2 and Fig. 1.

      2.1.3 MHs

      Initially, 456 targeted fragments (Ae > 5 in EASs) were preliminarily chosen from the 10966-fragment pool (nucleotide diversity (π) > 0.01) on a genome-wide scale (30 ×, GRCh38 [

      Byrska-Bishop M., Evani US, Zhao X. et al. (2021) High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. bioRxiv: 2021.02.06.430068. doi: 10.1101/2021.02.06.430068.

      ]). Furthermore, we collected our MH set (135 MHs, Supplementary Table S3) from these 456 targeted fragments, which had pairwise distances (d) > 10 Mb and were suitable for 150-bp paired-end sequencing. Additionally, we set relatively rigorous criteria for MH selection, i.e., an allele coverage ratio (ACR) > 0.91 and informativeness for ancestry inference (In) > 0.185. Finally, after multiple rounds of optimization (at least more than 10 times for the adjustments of panel design, amplification primers, the multiplex PCR amplification system, and quality control), a total of 22 MHs with a global average Ae of 8.32 (Fig. 1) showed balanced amplification and were selected for the MY system. The detailed composite SNPs making up each MH are presented in Supplementary Table S4.
      The thresholds (N-1 stutter < 0.09 and ACR > 0.91) were determined by considering the following (see 2.6 for some additional details): a Y-STR analysis threshold for DNA mixtures higher than the N-1 stutter ratio to suppress noise interference (i.e., an N-1 stutter lower than the mixing ratio), a balance between the ACR of MHs and the application range of GPR (e.g., if the ACR is more than 0.91, the application range is 1:10.11–1:2.10; if the ACR is more than 0.90, the application range is 1:9.00–1:2.11.), enough MHs to reach sufficient system effectiveness for individual identification, application ranges of both Y-chromosomal genetic markers and MHs, etc. The deduced application range for GPR varies from 1:10.11–1:2.10 in two-person DNA mixtures when the ACR is more than 0.91 (Supplementary Fig. S1–3). To take full advantage of the Y-chromosomal information in two-person DNA mixtures with male contributors, the N-1 stutter of each Y-STR should be less than 0.09 (Supplementary Fig. S4).

      2.2 Sample collection and mixture simulation

      2.2.1 Human DNA samples

      One hundred unrelated healthy Han Chinese volunteers (80 males and 20 females) were recruited from Guangdong Province of China (GDH), and their peripheral blood samples were collected in EDTA anticoagulant tubes (2 ml). Genomic DNA was extracted using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol. The quantity of the DNA template was determined using a Qubit® 4.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) with a Qubit® dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer’s instructions.
      All peripheral blood samples were collected with written consent from the donors who gave their permission for the analysis of their DNA and the dissemination of their results via a scientific publication. This study was approved by the Biomedical Ethical Committee of Southern Medical University (No. 2021–007) and in accordance with the standards of the Declaration of Helsinki.

      2.2.2 Artificial DNA mixtures

      • (1)
        DNA quantification
        For each selected male DNA sample, two skilled laboratory assistants quantified the DNA concentration 15 times. Then, after removing the maximum and minimum values, we took the average value as the final quantity.
      • (2)
        Two-person DNA mixtures
        To validate the deduced application range for GPR (1:10.11–1:2.10, details in 2.6.1), we randomly selected ten different male DNA samples for artificial DNA mixture simulations and prepared 10 sets of male-male DNA mixtures at the same nine ratios: 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, and 1:10 (detailed simulations in Supplementary Table S5).

      2.3 Library preparation and sequencing

      We used 1 ng of DNA or mixed DNA sample as the template, with the following thermal cycling conditions to amplify targeted loci: 98 °C for 3 min; followed by 18 cycles of 98 °C for 30 s and 60 °C for 4 min; a final extension at 72 °C for 2 min; and an infinite hold at 10 °C. PCR products were purified using Agencourt® AMPure® XP Beads (Beckman Coulter, Brea, CA, USA). Each targeted amplicon was barcoded and amplified using the following PCR cycling conditions: an initial incubation at 98 °C for 1 min; followed by 6 cycles of 98 °C for 20 s, 60 °C for 20 s, and 72 °C for 30 s; a final extension at 72 °C for 2 min; and an infinite hold at 10 °C. Libraries were purified using AMPure® XP Beads and quantified by a Qubit® dsDNA HS Assay Kit according to the manufacturer’s instructions for final normalization to produce equal volumes before sequencing. Sequencing was performed using the Illumina® NovaSeq™ 6000 System (Illumina, San Diego, CA, USA) with a 2 × 150-bp strategy according to the manufacturer’s recommendation.

      2.4 MPS data processing

      Illumina bcl2fastq v.2.17 software (Illumina, San Diego, CA, USA) was used to obtain demultiplexed FASTQ data, trim adaptors, and calculate sample coverages and “%Q30 Bases” (the percentage of bases with a Q-score ≥ 30) to evaluate sample quality. The FASTQ data were further filtered by Trimmomatic v.0.4 [
      • Bolger A.M.
      • Lohse M.
      • Usadel B.
      Trimmomatic: a flexible trimmer for Illumina sequence data.
      ] to remove low-quality (averaged Q-score < 20) and short (< 100 bp) reads. In addition, PANDAseq [
      • Masella A.P.
      • Bartram A.K.
      • Truszkowski J.M.
      • Brown D.G.
      • Neufeld J.D.
      PANDAseq: paired-end assembler for illumina sequences.
      ] was used to assemble sequences and scale them to billions of paired-end reads.
      We detected the SNPs and insertions/deletions (InDels) of the Y chromosome by mapping the reads to the human reference genome GRCh38 with Burrows–Wheeler Aligner (BWA) [
      • Li H.
      • Durbin R.
      Fast and accurate short read alignment with Burrows-Wheeler transform.
      ] and calling variants with the Genome Analysis Toolkit (GATK) best practice pipelines [
      • McKenna A.
      • Hanna M.
      • Banks E.
      • et al.
      The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.
      ,
      • Van der Auwera G.A.
      • Carneiro M.O.
      • Hartl C.
      • et al.
      From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.
      ]. In addition, the Y-haplogroup for each sample was determined by an in-house Python script following the ISOGG.
      STR calling was conducted using STRait Razor 3.0 [
      • Woerner A.E.
      • King J.L.
      • Budowle B.
      Fast STR allele identification with STRait Razor 3.0.
      ] with adjusted configuration files. Quality control and interpretation of genotyping results were performed according to the addendum to “SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories”. The floating and fixed analytical threshold (AT) and interpretation threshold (IT) were used in combination, with 1.5% AT and 4.5% IT (if depth > 650 reads) and 10 × AT and 30 × IT (if depth < 650 reads).
      For the compilation of phased MHs, we constructed a partial reference genome as the reference comprising each MH amplicon. The raw reads in FASTQ format were aligned to the reference genome using BWA, and alignments were further processed with SAMtools [
      • Li H.
      • Handsaker B.
      • Wysoker A.
      • et al.
      The sequence alignment/map format and SAMtools.
      ]. Then, we applied an R package (microhaplot, https://github.com/ngthomas/microhaplot) to obtain the allele strings and depth for each MH. In addition, the MPS data of 22 MH loci were also validated in Integrative Genomics Viewer (IGV) v2.9.2 [
      • Thorvaldsdottir H.
      • Robinson J.T.
      • Mesirov J.P.
      Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.
      ] to check for MH calling concordance. We randomly selected two samples and prepared both of them in triplicate to assess the “within-run” and “between-run” concordance (two in the same run, one in another run) [
      • Gandotra N.
      • Speed W.C.
      • Qin W.
      • et al.
      Validation of novel forensic DNA markers using multiplex microhaplotype sequencing.
      ].

      2.5 Forensic parameters

      The mean values and standard deviations (SDs) of the DoC, N-1 stutter ratio for Y-STRs, and ACR values for MHs were calculated by SAS® 9.4 software (SAS Institute Inc., Cary, NC, USA). The other relevant forensic parameters were determined as described in our previous studies [
      • Fan H.
      • Wang X.
      • Chen H.
      • et al.
      The evaluation of forensic characteristics and the phylogenetic analysis of the Ong Be language-speaking population based on Y-STR.
      ,
      • Ding J.
      • Fan H.
      • Zhou Y.
      • et al.
      Genetic polymorphisms and phylogenetic analyses of the Ü-Tsang Tibetan from Lhasa based on 30 slowly and moderately mutated Y-STR loci.
      ,
      • Fan H.
      • Wang X.
      • Chen H.
      • Li W.
      • Wang W.
      • Deng J.
      The Ong Be language-speaking population in Hainan Island: genetic diversity, phylogenetic characteristics and reflections on ethnicity.
      ,
      • Li W.
      • Wang X.
      • Wang X.
      • et al.
      Forensic characteristics and phylogenetic analyses of one branch of Tai-Kadai language-speaking Hainan Hlai (Ha Hlai) via 23 autosomal STRs included in the Huaxia(.) Platinum System.
      ,
      • Fan H.
      • Wang X.
      • Chen H.
      • et al.
      Population analysis of 27 Y-chromosomal STRs in the Li ethnic minority from Hainan province, southernmost China.
      ,
      • Fan H.
      • Wang X.
      • Ren Z.
      • et al.
      Population data of 19 autosomal STR loci in the Li population from Hainan Province in southernmost China.
      ,
      • Fan H.
      • Xie Q.
      • Zhang Z.
      • Wang J.
      • Chen X.
      • Qiu P.
      Chronological age prediction: developmental evaluation of DNA methylation-based machine learning models.
      ,
      • Wang F.
      • Du Z.
      • Han B.
      • et al.
      Genetic diversity, forensic characteristics and phylogenetic analysis of the Qiongzhong aborigines residing in the tropical rainforests of Hainan Island via 19 autosomal STRs.
      ,
      • Fan H.
      • Xie Q.
      • Li Y.
      • Wang L.
      • Wen S.Q.
      • Qiu P.
      Insights into forensic features and genetic structures of Guangdong maoming han based on 27 Y-STRs.
      ,
      • Fan H.
      • Zhang X.
      • Wang X.
      • et al.
      Genetic analysis of 27 Y-STR loci in Han population from Hainan province, southernmost China.
      ,
      • Luo C.
      • Duan L.
      • Li Y.
      • et al.
      Insights from Y-STRs: forensic characteristics, genetic affinities, and linguistic classifications of guangdong hakka and she groups.
      ,
      • Fan H.
      • He Y.
      • Li S.
      • et al.
      Systematic evaluation of a novel 6-dye direct and multiplex PCR-CE-based indel typing system for forensic purposes.
      ]. Ae was calculated using the formula proposed by Kidd et al. [
      • Kidd K.K.
      • Speed W.C.
      Criteria for selecting microhaplotypes: mixture detection and deconvolution.
      ], Ae=1Pi2, where Pi equals the frequency of allele i and summation is performed over all alleles at the locus. The Spearman correlation between the composite SNP number and Ae value was conducted in the R statistical environment. The In value was calculated using the program INFOCALC [
      • Rosenberg N.A.
      • Li L.M.
      • Ward R.
      • Pritchard J.K.
      Informativeness of genetic markers for inference of ancestry.
      ] for the informativeness of each MH across the 1000 Genomes Project (1KG) populations [
      • Genomes Project C.
      • Auton A.
      • Brooks L.D.
      • et al.
      A global reference for human genetic variation.
      ,
      • Sudmant P.H.
      • Rausch T.
      • Gardner E.J.
      • et al.
      An integrated map of structural variation in 2,504 human genomes.
      ] and GDH population.

      2.6 Genotype pattern recognition in two-person DNA mixtures

      Pattern recognition involves the development of systems that learn to solve a given problem (including clustering, classification and dimensionality reduction) using a set of example instances, each represented by a number of features [
      • de Ridder D.
      • de Ridder J.
      • Reinders M.J.
      Pattern recognition in bioinformatics.
      ]. Here, drawing on the concept of pattern recognition, the genotype combinations of two-person DNA mixtures were semi-automatically classified into different genotype patterns by theoretical derivation according to the distribution features of sequencing depth at certain mixing ratios. Based on the fixed genotype patterns, the regression-based GPR of two-person DNA mixture was proposed to determine the corresponding genotype pattern for each MH locus.

      2.6.1 MHs: From genotype combination to genotype pattern

      There are three different types of two-person DNA mixtures (female-female, female-male, and male-male DNA mixtures). In theory, for any type of two-person DNA mixture, each MH locus could have four alleles (MHallele=4), three alleles (MHallele=3), two alleles (MHallele=2), or one allele (MHallele=1). We used 9 different genotype patterns to represent the 26 different genotype combinations of any two-person DNA mixture, and the genotype combination of each genotype pattern was unique within a certain application range.
      • (1)
        MHallele=4: When four alleles are observed at a MH locus, six genotype combinations are possible for the two contributors, namely, (AB,CD), (AC,BD), (AD,BC), (BC,AD), (BD,AC), and (CD,AB), where A, B, C and D are the four observed alleles (Fig. 2). In addition, according to descending order of the DoC values, we integrated these genotype combinations into the pattern MHallele=4 (major contributor’s genotype + minor contributor’s genotype, DoCA > DoCB > DoCC > DoCD). Supplementary Fig. S1 presents the detailed derivation process of the AMR range when the ACR is more than 0.91. The deduced application range is (0, 1:1.10) for the pattern MHallele=4.
        Fig. 2
        Fig. 2Integrating different genotype combinations of the microhaplotype loci into the same genotype pattern and the deduced application range for each genotype pattern. (The first genotype combination as a representative pattern and detailed processes of deducing the application ranges for seven different genotype patterns are shown in .).
      • (2)
        MHallele=3: When three alleles are observed at a MH locus, twelve genotype combinations are possible for the two contributors, namely, (AB,AC), (BC,AC), (AB,BC), (AC,AB), (AC,BC), (BC,AB), (AB,CC), (AC,BB), (BC,AA), (AA,BC), (BB,AC), and (CC,AB), where A, B and C are the three observed alleles (Fig. 2). In addition, according to descending order of the DoC values, we integrated these genotype combinations into three different genotype patterns (α, β, and γ, DoCA > DoCB > DoCC). The deduced application ranges of patterns α, β, and γ are (1:10.11, 1:1.10), (0, 1:2.10), and (0, 1:11], respectively (details in Supplementary Fig. S2).
      • (3)
        MHallele=2: When two alleles are observed at a MH locus, seven genotype combinations are possible for the two contributors, namely, (AB,AA), (AB,BB), (AA,AB), (BB,AB), (AA,BB), (BB,AA), and (AB,AB), where A and B are the two observed alleles (Fig. 2). In addition, according to descending order of the DoC values, we integrated these genotype combinations into four different genotype patterns (δ, ε, ζ, and η, DoCA > DoCB). The deduced application ranges of patterns δ, ε, ζ, and η are (1:21.23, 1:1], (0, 1:1], (0, 1:1), and (0, 1:1], respectively (details in Supplementary Fig. S3).
      • (4)
        MHallele=1: When one allele is observed at a MH locus, only one genotype combination is possible for the two contributors, (AA,AA), where A is the only observed allele (Fig. 2). Therefore, the genotype pattern is MHallele=1. The deduced application range of pattern MHallele=1 is (0, 1:1].
      For all different types of two-person DNA mixtures, the genotype patterns of MHallele=1 and MHallele=4 do not need to be recognized. There are three and four different genotype patterns for MHallele=3 and MHallele=2, respectively, which need to be recognized. Once the genotype pattern is recognized, the genotypes of the two contributors are definitive. When the ACR for each MH is more than 0.91, the deduced application range for GPR is (1:10.11, 1:2.10).

      2.6.2 Statistical analysis

      We also defined the following terminologies:
      • (1)
        AMR: Actual mixing ratio, which is calculated according to two contributors’ DNA concentrations and volumes (Minor, minor contributor; Major, major contributor; C, concentration; V, volume):
        AMR=CMinorVMinorCMajorVMajor


      • (2)
        RDoC: DoC ratio, which is determined according to the following self-defined formulas:
        RDoC=DoCBDoCAYSNP/STRandMHallele=2DoCB+DoCCDoCAMHallele=3DoCC+DoCDDoCA+DoCB(MHallele=4)


        In the male-male DNA mixture, we obtained the Y-DNA profiles of both the minor and major contributors; in addition, the mixing ratio could be calculated from the DoC ratios of the minor and major contributors. If two different Y-DNA haplogroups (A and B) or two Y-STR alleles (also named A and B for formula standardization) are detected at the same single-copy Y-STR locus in a two-person DNA mixture, it can be confirmed as a male-male DNA mixture. We defined A and B as the major and minor Y-haplogroups or Y-STR alleles, and DoCA and DoCB represented the DoC for the major and minor Y-DNA haplogroups or Y-STR alleles (DoCA > DoCB), respectively.
      • (3)
        E(AMR): Expected AMR, which is denoted by the 95% confidence interval (CI) of AMR.
      The regression analyses (linear, quadratic, and cubic models) and Spearman correlations between AMR and RDoC were conducted in IBM® SPSS® Statistics 26 (IBM Corporation, Armonk, NY, USA). In addition, normality tests and one-way analysis of variance (ANOVA) (with post hoc multiple comparisons) were also conducted in IBM® SPSS® Statistics 26.

      3. Results and discussion

      3.1 MY system

      3.1.1 Characterization and MPS performance

      The MY system (Fig. 1) is composed of three different types of genetic markers, including 114 Y-SNPs (Supplementary Table S1), 45 Y-STRs (Supplementary Table S2), and 22 MHs (Supplementary Tables S3–4). Initially, we selected a set of 135 highly polymorphic MHs (Ae > 5 and d > 10 Mb) in EASs, which were suitable for 150-bp paired-end sequencing (Supplementary Table S3). The numbers of composite SNPs making up each MH varied from 7 (MH21FHL-002) to 85 (MH09FHL-008), and the Ae values ranged from 5.00 (MH14FHL-004) to 16.61 (MH21FHL-001), with an average Ae of 6.94 in EASs. Eventually, 22 MHs were incorporated into the MY system according to our selection criteria (details in 2.1.3). They were distributed on 15 different autosomes (Fig. 1), and the global average Ae was 8.32 (3.93–13.44). Among these 22 MH markers, we identified 6 novel SNPs (minor allele frequency, MAF ≥ 0.05) in 100 GDH individuals (Supplementary Table S4), which could provide more polymorphic and specific information for forensic applications, especially for regional human individual identification and some cases with these special variations.
      There was a significant correlation between the composite SNP number and Ae value (r = 0.3275, p = 0.0001) in our 135-MH set. In addition, we collected a total of 777 MHs from global populations [
      • Wu R.
      • Li H.
      • Li R.
      • et al.
      Identification and sequencing of 59 highly polymorphic microhaplotypes for analysis of DNA mixtures.
      ,
      • Gandotra N.
      • Speed W.C.
      • Qin W.
      • et al.
      Validation of novel forensic DNA markers using multiplex microhaplotype sequencing.
      ,
      • Jin X.
      • Zhang X.
      • Shen C.
      • et al.
      A highly polymorphic panel consisting of microhaplotypes and compound markers with the NGS and its forensic efficiency evaluations in Chinese two groups.
      ,
      • Bulbul O.
      • Pakstis A.J.
      • Soundararajan U.
      • et al.
      Ancestry inference of 96 population samples using microhaplotypes.
      ,
      • Kidd K.K.
      • Speed W.C.
      • Pakstis A.J.
      • et al.
      Evaluating 130 microhaplotypes across a global set of 83 populations.
      ,
      • de la Puente M.
      • Phillips C.
      • Xavier C.
      • et al.
      Building a custom large-scale panel of novel microhaplotypes for forensic identification using MiSeq and Ion S5 massively parallel sequencing systems.
      ,
      • Jin X.Y.
      • Cui W.
      • Chen C.
      • et al.
      Developing and population analysis of a new multiplex panel of 18 microhaplotypes and compound markers using next generation sequencing and its application in the Shaanxi Han population.
      ,
      • Kureshi A.
      • Li J.
      • Wen D.
      • Sun S.
      • Yang Z.
      • Zha L.
      Construction and forensic application of 20 highly polymorphic microhaplotypes.
      ,
      • Staadig A.
      • Tillmar A.
      Evaluation of microhaplotypes in forensic kinship analysis from a Swedish population perspective.
      ,
      • Sun S.
      • Liu Y.
      • Li J.
      • et al.
      Development and application of a nonbinary SNP-based microhaplotype panel for paternity testing involving close relatives.
      ,
      • Ou X.
      • Qu N.
      Noninvasive prenatal paternity testing by target sequencing microhaps.
      ,
      • Wang J.Y.T.
      • Whittle M.R.
      • Puga R.D.
      • Yambartsev A.
      • Fujita A.
      • Nakaya H.I.
      Noninvasive prenatal paternity determination using microhaplotypes: a pilot study.
      ,
      • Chen P.
      • Deng C.
      • Li Z.
      • et al.
      A microhaplotypes panel for massively parallel sequencing analysis of DNA mixtures.
      ,
      • Chen P.
      • Yin C.
      • Li Z.
      • et al.
      Evaluation of the microhaplotypes panel for DNA mixture analyses.
      ,
      • Kidd K.K.
      • Pakstis A.J.
      • Speed W.C.
      • Lagace R.
      • Wootton S.
      • Chang J.
      Selecting microhaplotypes optimized for different purposes.
      ,
      • Wen D.
      • Sun S.
      • Liu Y.
      • et al.
      Considering the flanking region variants of nonbinary SNP and phenotype-informative SNP to constitute 30 microhaplotype loci for increasing the discriminative ability of forensic applications.
      ] and found that the correlation was significant (r = 0.8368, p < 2.2 ×10−16) in this global MH dataset (Supplementary Fig. S5).
      The average amplicon sizes of Y-SNPs, Y-STRs, and MHs were 229 ± 34 nt, 136 ± 5 nt, and 263 ± 15 nt, respectively. The average DoC ± SD values were 2381 ± 1786 × (Y-SNPs, Supplementary Table S6), 1033 ± 1018 × (Y-STRs, Supplementary Table S7), and 1653 ± 1076 × (MHs, Supplementary Table S8). The highest DoC of the Y-SNPs was observed at F400 (8438 ± 4031 ×), while the lowest DoC was discovered at F138 (706 ± 342 ×). The maximum and minimum DoC values of the Y-STRs were 2790 ± 1922 × (DYS460) and 394 ± 170 × (DYS450), respectively. Furthermore, the N-1 stutter ratios of 45 Y-STRs were less than 0.0818 ± 0.0197 (DYS392) (Supplementary Table S9). The MH locus with the highest DoC was MH07FHL-004 (3568 ± 1471 ×), and that with the lowest DoC was MH17FHL-005 (540 ± 252 ×). In addition, the ACR of 22 MHs ranged from 0.9193 ± 0.0588 (MH17FHL-005) to 0.9647 ± 0.0266 (MH10FHL-007) (Supplementary Table S10).

      3.1.2 Forensic characteristics of GDH individuals

      • (1)
        Y-SNPs/STRs
        Supplementary Table S11 presents a total of 29 different Y subhaplogroups and the frequency distribution of Y-DNA haplogroups in 80 GDH individuals. We found four major Y-DNA haplogroups, namely, O (92.50%), N (3.75%), C (2.50%), and Q (1.25%). Haplogroups O2a, O1a, and O1b accounted for 60.00%, 18.75%, and 13.75% of the samples, respectively, consistent with the results of [
        • Li R.
        • Zhang C.
        • Li H.
        • et al.
        SNP typing using the HID-Ion AmpliSeq Identity Panel in a southern Chinese population.
        ]. In addition, the three most frequent O subhaplogroups were O1b1a1a1a1-F2924 (11.25%), O1a1a1-F446 (8.75%), and O2a2a-M188 (8.75%).
        The Y-chromosomal haplotype profiles of 80 GDH individuals are shown in Supplementary Table S12. A total of 77 different Y-chromosomal haplotypes were found, of which 74 (96.10%) were unique and 3 occurred twice (H001-H003). We found 4 null alleles at DYS531 (2 nulls), DYS508, and DYS588, which were further confirmed by a CE-based AGCU Y-LM Kit (DYS531 and DYS508) [
        • Fan H.
        • Zeng Y.
        • Wu W.
        • et al.
        The Y-STR landscape of coastal southeastern Han: Forensic characteristics, haplotype analyses, mutation rates, and population genetics.
        ] and self-designed primers for DYS588 according to [
        • Hanson E.K.
        • Ballantyne J.
        A highly discriminating 21 locus Y-STR “megaplex” system designed to augment the minimal haplotype loci for forensic casework.
        ]. In the GDH individuals, duplicated or triplicated alleles and intermediate alleles were not identified. The detailed Y-STR repeat region sequences and allele frequencies (length-based and sequence-based) are shown in Supplementary Table S13. Among these 3600 Y-STR alleles generated by the MY system, 168 distinct length variants and 4 repeat sequence subvariants (i.e., isoalleles, which are defined as alleles with the same length but different sequences) were identified across all 45 Y-chromosomal STR loci in the 80 GDH individuals. Sequence variations (8.89% of Y-STRs in the MY system) were detected at DYS531 (allele 12), DYS485 (allele 15), DYS578 (allele 8), and DYS392 (allele 14), which contributed to the higher sequence-based genetic diversity (GD) than length-based GD (0.20%−3.65%, Supplementary Table S14). The highest GD was found at RM DYS576 (0.7937). Three Y-STRs, namely, DYS613, DYS472, and DYS502, presented no polymorphism in the 80 GDH individuals. The overall haplotype diversity (HD) was 0.9991, with a discrimination capacity (DC) of 0.9625 (Supplementary Table S15).
      • (2)
        MHs
      The forensic-associated parameters of 22 MHs in 100 GDH individuals are presented in Table 1. The minimum and maximum Ae were 3.62 at MH13FHL-002 and 14.72 at MH02FHL-006, respectively, while the average Ae was 7.17 ( ± 3.22). As shown in Supplementary Fig. S5, except for our 135-MH set, the majority (578/642 ≈90%) of MHs in the global MH dataset had Ae values less than 5. All In values of the 22 MHs exceeded 0.1868, indicating the ability of the MY system to differentiate intercontinental populations. The expected heterozygosity (He) for 22 MHs ranged from 0.7277 (MH13FHL-002) to 0.9367 (MH02FHL-006), with an average He of 0.8373 ( ± 0.0535). The polymorphism information content (PIC) varied from 0.6854 (MH13FHL-002) to 0.9284 (MH02FHL-006), and the average PIC value was 0.8129 ( ± 0.0626). The average discrimination power (DP) was 0.9419 ( ± 0.0266), with a DP range of 0.8792 (MH13FHL-002) to 0.9826 (MH21FHL-002). The power of exclusion (PE) ranged from 0.3834 (MH13FHL-002) to 0.8159 (MH16FHL-004), with an average of 0.6413 ( ± 0.1251). The MH frequencies are listed in Supplementary Table S16. A total of 343 distinct alleles were observed across 22 MH loci, and the number of different MH alleles varied from 6 (MH06FHL-002) to 43 (MH02FHL-006), with allele frequencies ranging from 0.0005 to 0.4250. For the 22 independent MHs of the MY system, the cumulative discrimination power (CDP) was 1–5.00 × 10−31, and the combined power of exclusion for duo paternity testing (CPEduo) and trio paternity testing (CPEtrio) were 1–5.00 × 10−8 and 1–4.85 × 10−12, respectively. The system efficiency of the 22 MHs was equivalent to that of 26–28 forensic CODIS and non-CODIS STRs in GDH individuals (Supplementary Table S17). This demonstrated that the system effectiveness of MHs in MY is equal to or even exceeds that of STRs used frequently in forensics.
      Table 1Forensic parameters of 22 microhaplotypes in Guangdong Han population (GDH, n = 100).
      MH IDNSNPNalleleAeInHeHoPICMPDPPETPIp-HWE
      MH01FHL-0098104.870.20090.79750.73000.76690.07400.92600.47621.85190.0890
      MH02FHL-00311106.490.19100.85010.85000.82780.04640.95360.69493.33330.8750
      MH02FHL-006444314.720.58550.93670.90000.92840.02300.97700.79545.00000.5610
      MH02FHL-01018136.400.40160.84800.87000.82590.04900.95100.73463.84620.8840
      MH03FHL-0019106.340.19430.84430.85000.81960.05180.94820.69493.33330.4870
      MH03FHL-00310124.960.18680.80240.75000.77670.06760.93240.50982.00000.3160
      MH04FHL-0051384.860.26240.79820.73000.76680.06940.93060.47621.85190.2520
      MH06FHL-001984.340.19290.77340.76000.74230.08600.91400.52702.08330.9120
      MH06FHL-0023364.420.35590.77760.71000.73790.10840.89160.44391.72410.1480
      MH07FHL-0011175.640.19460.82700.80000.79880.06260.93740.59902.50000.2570
      MH07FHL-0021495.370.28490.81780.84000.78740.06880.93120.67533.12500.4460
      MH07FHL-004212711.090.44060.83970.83000.81590.05440.94560.65592.94120.2510
      MH08FHL-00611135.150.25040.81010.86000.78290.06860.93140.71473.57140.7870
      MH09FHL-002884.840.25810.79740.79000.76180.08020.91980.58062.38100.6220
      MH10FHL-00133247.870.56570.85200.83000.83200.04440.95560.65592.94120.7090
      MH10FHL-0079116.330.20040.84630.85000.82190.04980.95020.69493.33330.6670
      MH11FHL-00724196.280.46840.84500.84000.82290.05360.94640.67533.12500.3790
      MH13FHL-002893.620.23630.72770.67000.68540.12080.87920.38341.51520.0340
      MH16FHL-004142312.300.44340.92330.91000.91300.02240.97760.81595.55560.0900
      MH17FHL-00519237.150.28450.85940.86000.84490.03620.96380.71473.57140.7550
      MH18FHL-004263011.110.38940.91460.90000.90360.02240.97760.79545.00000.4760
      MH21FHL-00272013.610.27740.93120.90000.92180.01740.98260.79545.00000.6930
      Mean16167.170.31210.83730.81950.81290.05810.94190.64133.1629
      SD1093.220.12490.05350.06810.06260.02660.02660.12511.1676
      MH, microhaplotype; n, sample number of GDH; NSNP, composite SNP number; Nallele, allele number; Ae, effective number of alleles; In, informativeness for ancestry inference (1KG and GDH); He, expected heterozygosity; Ho, observed heterozygosity; PIC, polymorphism information content; MP, match probability; DP, discrimination power; PE, power of exclusion; TPI, typical paternity index; p-HWE, probability value of Hardy-Weinberg equilibrium.

      3.2 GPR

      The deduced application range of GPR in two-person DNA mixtures is (1:10.11, 1:2.10) based on the MY system. The DNA profiles of ten sets of simulated male-male DNA mixtures ranging from 1:10–1:2 were generated by the MY system. As presented in Supplementary Table S18, two different Y-DNA haplogroups were detected in most of the male-male DNA mixtures (detailed Y-DNA haplogroups in Supplementary Table S19). In these male-male DNA mixtures, the number of Y-STR loci with two different alleles ranged from 6 to 18. For the 220 detected MH loci in the simulated DNA mixtures, the proportions of MHallele=1, MHallele=2, MHallele=3, and MHallele=4 were 0.91%, 14.55%, 47.27%, and 37.27%, respectively.

      3.2.1 Regression analyses between the AMR and RDoC

      • (1)
        Y-SNPs/STRs and MHallele=4
        The mean RDoC values and SD values of Y-SNPs, Y-STRs, and MHallele=4 for each AMR group of 1:10–1:2 are listed in Supplementary Table S20. Each AMR group was composed of 58–146 RDoC values, which was in accordance with a normal distribution (p > 0.05). As shown in Fig. 3, the correlation coefficient (r) values between AMR and RDoC values were 0.9806, 0.9446, and 0.9710 for Y-SNPs, Y-STRs, and MHallele=4, respectively. We established linear regression equations between the AMR and RDoC values for Y-SNPs, Y-STRs, and MHallele=4. In addition, the 95% CI of the AMR (i.e., E(AMR)) could be obtained.
        Fig. 3
        Fig. 3Regression analyses between the actual mixing ratio (AMR) and DoC ratio (RDoC) for Y-SNPs (A), Y-STRs (B), and MHallele=4 (C) based on ten sets of simulated male-male DNA mixtures (1:10–1:2). (N, total number of generated RDoC values; r, correlation coefficient; R2, coefficient of determination.).
      • (2)
        MHallele=2 and MHallele=3
      There are a total of seven different genotype patterns for MHallele=3 (patterns α, β, and γ) and MHallele=2 (patterns δ, ε, ζ, and η) in two-person DNA mixtures. Each AMR group of seven genotype patterns followed a normal distribution (p > 0.05, Supplementary Table S21), consisting of 6–70 RDoC values (in addition to pattern η lacking measured data). Except for patterns α and η (tending to form a straight line), highly positive Spearman correlations (between AMR and RDoC) were observed for the genotype patterns (r range: 0.9658–0.9796, Table 2). In addition, we used different regression analyses (linear, quadratic, and cubic models) to establish the relationships between the RDoC and AMR for seven genotype patterns (Table 2 and Fig. 4A, B).
      Table 2Correlations between actual mixing ratio (AMR) and DoC ratio (RDoC) for seven genotype patterns of MHallele=3 and MHallele=2 based on ten sets of simulated male-male DNA mixtures (1:10–1:2).
      MHGenotype patternNrR2RDoC rangeRegression equation#
      MHallele=3RDOC= 1.1744–0.8763*AMR+ 11.3493*AMR2-13.9876*AMR3
      β1530.96580.93281.0727–1.9405LowerRDOC= 1.0566–0.8093*AMR+ 11.0726*AMR2-13.6835*AMR3
      UpperRDOC= 1.2921–0.9434*AMR+ 11.6261*AMR2-14.2917*AMR3
      MHallele=2RDOC= 1.0072
      α6300.8919–1.1224LowerRDOC= 0.8919
      UpperRDOC= 1.1224
      RDOC= 0.0071 + 0.9761*AMR
      γ1530.97540.95130.0505–0.5502LowerRDOC= −0.0469 + 0.9743*AMR
      UpperRDOC= 0.0612 + 0.9779*AMR
      RDoC= 0.90
      η*0.8100–1.0000LowerRDoC= 0.81
      UpperRDoC= 1.00
      RDOC= 0.9462–0.9520*AMR-1.3617*AMR2+ 2.9266*AMR3
      δ1170.96710.93530.4341–0.9014LowerRDOC= 0.8831–0.9060*AMR-1.5520*AMR2 + 3.1357*AMR3
      UpperRDOC= 1.0092–0.9881*AMR-1.1715*AMR2 + 2.7175*AMR3
      RDOC= 0.0084 + 0.9591*AMR
      ζ540.97960.95950.0544–0.5398LowerRDOC= −0.0411 + 0.9545*AMR
      UpperRDOC= 0.0579 + 0.9637*AMR
      RDOC= −0.0059 + 0.5630*AMR-0.2980*AMR2
      ε1170.97350.94780.0246–0.2246LowerRDOC= −0.0289 + 0.5651*AMR-0.3041*AMR2
      UpperRDOC= 0.0171 + 0.5609*AMR-0.2920*AMR2
      N, total number of RDoC values; r, correlation coefficient; R2, coefficient of determination; * , Numerical relationships of pattern η based on theoretical derivation by lacking of measured values; #, regression equation in bold and 95% prediction interval (PI) of RDoC in normal.
      Fig. 4
      Fig. 4Regression analyses between the actual mixing ratio (AMR) and DoC ratio (RDoC) for different genotype patterns of MHallele=3 (A) and MHallele=2 (B) and RDoC intersections of pairwise genotype patterns of MHallele=3 (C) and MHallele=2 (D) based on ten sets of simulated male-male DNA mixtures (1:10–1:2).m (Patterns α, β, and γ for MHallele=3 and patterns δ, ε, ζ, and η for MHallele=2. Even though the deduced application range of pattern β is 0–1:2.10, the allele distribution of ten sets of simulations also follows pattern β at 1:2.).

      3.2.2 RDoC ranges and RDoC intersections

      Based on the above regression analyses, the relationships between AMR and RDoC values have been established for Y-SNPs/STRs and MHs at 1:10–1:2 mixing ratios of two-person DNA mixtures. As shown in Table 2, the RDoC ranges of seven genotype patterns are calculated according to different regression equations and the AMR range (1:10–1:2). The RDoC intersections are obtained by the overlaps of pairwise RDoC ranges. For Y-SNPs/STRs and pattern MHallele=4, there is no RDoC intersection (Fig. 3). For MHallele=3 and MHallele=2, Fig. 4C (MHallele=3) and D (MHallele=2) present four different RDoC intersections of seven genotype patterns. The detailed RDoC intersections are α ∩ β = [1.0727, 1.1224], η ∩ δ = [0.8100, 0.9014], δ ∩ ζ = [0.4341, 0.5398], and ζ ∩ ε = [0.0544, 0.2246], which require further recognition.

      3.2.3 Pattern recognition of each genotype

      • (1)
        MHallele=2 and MHallele=3
      As shown in Fig. 4C, D, the genotype patterns need to be further examined in the RDoC intersections. In contrast, the genotype patterns are clear outside the RDoC intersections.
      The RDoC intersections have two main components: the nonoverlapping and overlapping areas (Fig. 5). Fig. 5A presents the RDoC intersection of patterns α and β (α ∩ β = [1.0727, 1.1224]). When the AMR range is 1:7.16–1:2 (the nonoverlapping area), the genotype pattern is recognized as pattern α. When the AMR range is 1:10–1:7.16 (the overlapping area), the genotype pattern is uncertain (pattern α or β), which requires LR-based probabilistic genotyping (PG) systems (EuroForMix, DNAStatistX and STRmix™) to infer the major and minor contributors’ genotypes [
      • Gill P.
      • Benschop C.
      • Buckleton J.
      • Bleka O.
      • Taylor D.
      A Review of Probabilistic Genotyping Systems: EuroForMix, DNAStatistX and STRmix.
      ]. In total, there were five different overlapping areas within the RDoC intersections for which these LR-based methods were needed for inference (Fig. 5A for MHallele=3 and Fig. 5B-D for MHallele=2). In addition, when the AMR values of two-person DNA mixtures are between 1:4.98 and 1:2.39, there are no overlapping areas for seven genotype patterns, which means that the genotype patterns of the major and minor contributors could be directly recognized according to the comparison of observed RDoC and RDoC ranges without the LR-based PG systems.
      Fig. 5
      Fig. 5Detailed processes of genotype pattern recognition (GPR) in different RDoC intersections of MHallele=3 and MHallele=2. A. GPR in the RDoC intersection of patterns α and β (α ∩ β); B. GPR in the RDoC intersection of patterns η and δ (η ∩ δ); C. GPR in the RDoC intersection of patterns δ and ζ (δ ∩ ζ); D. GPR in the RDoC intersection of patterns ζ and ε (ζ ∩ ε).
      (2) MHallele=4 and MHallele=1
      In two-person DNA mixtures with 1:10–1:2 mixing ratios, the patterns MHallele=4 and MHallele=1 are unique genotype patterns (Fig. 2). Therefore, GPR is not needed.
      In summary, based on the MY system, the two-person DNA mixtures (application range: 1:10–1:2) can be deconvoluted by GPR (Fig. 6). When the DNA profiles of a two-person mixture with unknown AMRs are generated by the MY system, the mixture type can be determined with a high probability by Y-chromosomal genetic markers. The Y-chromosomal haplotypes could provide additional clues for investigation, especially for sexual assault cases (e.g., rape and sodomy). The MH markers of the MY system are used for human individual identification. According to the allele number (1–4 alleles), the MH type of each MH locus can be determined. Due to the 22 selected MHs having relatively high average Ae values (8.32 in 1KG and 7.17 in GDH individuals) and the determinate pattern MHallele=4 of two-person mixtures, we can use the mean RDoC value (R¯DoC) of the MHallele=4 loci to obtain the E(AMR) of the two-person DNA mixture (Fig. 3). In addition, the E(AMR) can also be obtained using the R¯DoC of Y-chromosomal genetic markers (Y-SNPs/STRs) in male-male mixtures. If the E(AMR) is located in the application range of GPR (1:10–1:2), the two-person DNA mixture can be deconvoluted. For each observed RDoC value of MHallele=2 and MHallele=3 loci, RDoC intersections are needed for further recognition. If the observed RDoC is located within the RDoC intersections, there are two different scenarios: (1) if the E(AMR) is located in nonoverlapping areas, the genotype pattern of the MH locus can be recognized according to the comparison of different RDoC ranges; (2) if the E(AMR) is located in overlapping areas, LR-based PG systems are needed for inference of major and minor contributors’ genotypes. If the observed RDoC is located outside the RDoC intersections (as in the case of nonoverlapping areas), the genotype pattern of the MH locus can be recognized directly by comparing different RDoC ranges. Thus, the major and minor contributors’ genotypes in the two-person mixture can be obtained for further individual identification.
      Fig. 6
      Fig. 6Detailed processes of two-person DNA mixture (1:10–1:2) deconvolution based on the MY system.
      Overall, the two-person DNA mixtures (application range: 1:10–1:2) can be deconvoluted using the GPR strategy. Mixture deconvolution has been a persistent challenge in forensic DNA analysis, and in this study, we focused on the relatively simple case of a two-person DNA mixture to find an effective way to deconvolute DNA mixtures. Even though GPR is unable to deconvolute balanced (1:1) and extremely unbalanced (> 1:10) two-person DNA mixtures, it represents a small step forward in utilizing DoC information for DNA mixture deconvolution (from pattern recognition to genotype inference). Low-quality, degraded, and casework-like mixed samples should be further validated by GPR. For the MY system, more region-specific Y-DNA haplogroups and RM Y-STRs could be considered for population-specific human identification and regional forensic genealogy. The combined utilization of Y-chromosomal genetic markers and MHs can provide more useful information for crime scene investigations, especially for crime scenes with male suspects.

      4. Conclusion

      Combining Y-SNP/STR and MH genetic markers for mixed traces with male contributors is beneficial for familial searching, paternal/kinship determination, and mixture deconvolution. In the present study, we developed a novel MPS-based MY system consisting of 114 Y-SNPs (with twelve dominant Y-DNA haplogroups), 45 Y-STRs (μ < 5 × 10−3 and N-1 stutter < 0.09), and 22 MHs (Ae > 5, In > 0.185, ACR > 0.91, and d > 10 Mb). For the 22 independent MHs in the MY system, the Ae ranged from 3.62 to 14.72, with an average of 7.17 in GDH individuals. The CDP was 1–5.00 × 10−31, and the CPEduo and CPEtrio were 1–5.00 × 10−8 and 1–4.85 × 10−12, respectively. In addition, we proposed a GPR method for two-person DNA mixtures based on the MY system. We integrated 26 different genotype combinations into nine genotype patterns and validated the application range (1:10–1:2) of GPR using ten sets of simulated male-male DNA mixtures. The regression relationships between AMR and RDoC were established for different genetic markers and genotype patterns. For five overlapping areas within the RDoC intersections, LR-based methods are needed to infer the genotypes of the major and minor contributors. In the nonoverlapping areas (the very dominant areas outside the RDoC intersections and nonoverlapping areas of the RDoC intersections), the genotype patterns can be recognized by comparing observed RDoC and RDoC ranges with the assistance of E(AMR). In conclusion, based on the MY system, two-person DNA mixtures (1:10–1:2) could be deconvoluted using the GPR strategy for individual identification.

      Availability of data and material

      The raw data for this article and the in-house scripts are available upon reasonable request to the corresponding authors.

      Conflicts of interest

      The authors declare that they have no conflicts of interest.

      Acknowledgements

      This study benefited from the valuable comments of Prof. Bofeng Zhu (Southern Medical University), Prof. Feng Chen (Nanjing Medical University), Prof. Weibo Liang (Sichuan University), Prof. Jianye Ge (University of North Texas Health Science Center), Prof. Peng Chen (Nanjing Medical University), and Fang Zhao (Shanxi University).
      The authors sincerely thank all the volunteers who contributed samples for this study and Homgen BioTech. for technical assistance. This study was supported by the Program of Hainan Association for Science and Technology Plans to Youth R&D Innovation (QCXM201705); National Undergraduate Innovation and Entrepreneurship Training Program (No. 201911810008 and No. 201911810023); Shanghai Key Laboratory of Forensic Medicine (Academy of Forensic Science) Open Project Foundation (No. KF1812); Science Foundation of the School of Forensic Medicine, Southern Medical University (No. 2021KY02); and National Natural Science Foundation of China (NSFC, No. 81671865, No. 81971786, and No. 32070576).

      Appendix A. Supplementary material

      References

        • Kayser M.
        Forensic use of Y-chromosome DNA: a general overview.
        Hum. Genet. 2017; 136: 621-635https://doi.org/10.1007/s00439-017-1776-9
        • Fan H.
        • Zeng Y.
        • Wu W.
        • et al.
        The Y-STR landscape of coastal southeastern Han: Forensic characteristics, haplotype analyses, mutation rates, and population genetics.
        Electrophoresis. 2021; 42: 1578-1593https://doi.org/10.1002/elps.202100037
        • Ballantyne K.N.
        • Goedbloed M.
        • Fang R.
        • et al.
        Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications.
        Am. J. Hum. Genet. 2010; 87: 341-353https://doi.org/10.1016/j.ajhg.2010.08.006
        • Ay M.
        • Serin A.
        • Sevay H.
        • Gurkan C.
        • Canan H.
        Genetic characterisation of 13 rapidly mutating Y-STR loci in 100 father and son pairs from South and East Turkey.
        Ann. Hum. Biol. 2018; 45: 506-515https://doi.org/10.1080/03014460.2018.1559353
        • Claerhout S.
        • Vandenbosch M.
        • Nivelle K.
        • et al.
        Determining Y-STR mutation rates in deep-routing genealogies: Identification of haplogroup differences.
        Forensic Sci. Int. Genet. 2018; 34: 1-10https://doi.org/10.1016/j.fsigen.2018.01.005
        • Ge J.
        • Budowle B.
        • Aranda X.G.
        • Planz J.V.
        • Eisenberg A.J.
        • Chakraborty R.
        Mutation rates at Y chromosome short tandem repeats in Texas populations.
        Forensic Sci. Int. Genet. 2009; 3: 179-184https://doi.org/10.1016/j.fsigen.2009.01.007
        • Oldoni F.
        • Podini D.
        Forensic molecular biomarkers for mixture analysis.
        Forensic Sci. Int. Genet. 2019; 41: 107-119https://doi.org/10.1016/j.fsigen.2019.04.003
        • Fregeau C.J.
        • Bowen K.L.
        • Leclair B.
        • Trudel I.
        • Bishop L.
        • Fourney R.M.
        AmpFlSTR profiler Plus short tandem repeat DNA analysis of casework samples, mixture samples, and nonhuman DNA samples amplified under reduced PCR volume conditions (25 microL).
        J. Forensic Sci. 2003; 48: 1014-1034
        • Green R.L.
        • Lagace R.E.
        • Oldroyd N.J.
        • Hennessy L.K.
        • Mulero J.J.
        Developmental validation of the AmpFlSTR(R) NGM SElect PCR amplification kit: a next-generation STR multiplex with the SE33 locus. Forensic science international.
        Genetics. 2013; 7: 41-51https://doi.org/10.1016/j.fsigen.2012.05.012
        • Walsh P.S.
        • Fildes N.J.
        • Reynolds R.
        Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA.
        Nucleic Acids Res. 1996; 24: 2807-2812https://doi.org/10.1093/nar/24.14.2807
        • Butler J.M.
        • Kline M.C.
        • Coble M.D.
        NIST interlaboratory studies involving DNA mixtures (MIX05 and MIX13): Variation observed and lessons learned. Forensic science international.
        Genetics. 2018; 37: 81-94https://doi.org/10.1016/j.fsigen.2018.07.024
        • Barrio P.A.
        • Crespillo M.
        • Luque J.A.
        • et al.
        GHEP-ISFG collaborative exercise on mixture profiles (GHEP-MIX06). Reporting conclusions: results and evaluation.
        Forensic Sci. Int. Genet. 2018; 35: 156-163https://doi.org/10.1016/j.fsigen.2018.05.005
        • Van Neste C.
        • Van Nieuwerburgh F.
        • Van Hoofstat D.
        • Deforce D.
        Forensic STR analysis using massive parallel sequencing.
        Forensic Sci. Int. Genet. 2012; 6: 810-818https://doi.org/10.1016/j.fsigen.2012.03.004
        • Ambers A.D.
        • Churchill J.D.
        • King J.L.
        • et al.
        More comprehensive forensic genetic marker analyses for accurate human remains identification using massively parallel DNA sequencing.
        BMC Genom. 2016; 17: 750https://doi.org/10.1186/s12864-016-3087-2
        • Borsting C.
        • Morling N.
        Next generation sequencing and its applications in forensic genetics.
        Forensic Sci. Int. Genet. 2015; 18: 78-89https://doi.org/10.1016/j.fsigen.2015.02.002
        • Budowle B.
        • Schmedes S.E.
        • Wendt F.R.
        Increasing the reach of forensic genetics with massively parallel sequencing.
        Forensic Sci., Med., Pathol. 2017; 13: 342-349https://doi.org/10.1007/s12024-017-9882-5
        • Gettings K.B.
        • Aponte R.A.
        • Vallone P.M.
        • Butler J.M.
        STR allele sequence variation: current knowledge and future issues.
        Forensic Sci. Int. Genet. 2015; 18: 118-130https://doi.org/10.1016/j.fsigen.2015.06.005
        • Fan H.
        • Du Z.
        • Wang F.
        • et al.
        The forensic landscape and the population genetic analyses of Hainan Li based on massively parallel sequencing DNA profiling.
        Int. J. Leg. Med. 2021; 135: 1295-1317https://doi.org/10.1007/s00414-021-02590-3
        • Almalki N.
        • Chow H.Y.
        • Sharma V.
        • Hart K.
        • Siegel D.
        • Wurmbach E.
        Systematic assessment of the performance of illumina's MiSeq FGx forensic genomics system.
        Electrophoresis. 2017; 38: 846-854https://doi.org/10.1002/elps.201600511
        • Butler J.M.
        The future of forensic DNA analysis.
        Philos. Trans. R. Soc. Lond. Ser. B, Biol. Sci. 2015; 370https://doi.org/10.1098/rstb.2014.0252
        • Barrio P.A.
        • Martin P.
        • Alonso A.
        • et al.
        Massively parallel sequence data of 31 autosomal STR loci from 496 Spanish individuals revealed concordance with CE-STR technology and enhanced discrimination power.
        Forensic Sci. Int. Genet. 2019; 42: 49-55https://doi.org/10.1016/j.fsigen.2019.06.009
        • Yu Z.
        • Xie Q.
        • Zhao Y.
        • Duan L.
        • Qiu P.
        • Fan H.
        NGS plus bacterial culture: a more accurate method for diagnosing forensic-related nosocomial infections.
        Leg. Med (Tokyo). 2021; 52101910https://doi.org/10.1016/j.legalmed.2021.101910
        • Gorden E.M.
        • Sturk-Andreaggi K.
        • Marshall C.
        Capture enrichment and massively parallel sequencing for human identification.
        Forensic Sci. Int. Genet. 2021; 53102496https://doi.org/10.1016/j.fsigen.2021.102496
        • Fan H.
        • Wang L.
        • Liu C.
        • et al.
        Development and validation of a novel 133-plex forensic STR panel (52 STRs and 81 Y-STRs) using single-end 400 bp massive parallel sequencing.
        Int. J. Leg. Med. 2022; 136: 447-464https://doi.org/10.1007/s00414-021-02738-1
        • Oldoni F.
        • Kidd K.K.
        • Podini D.
        Microhaplotypes in forensic genetics.
        Forensic Sci. Int. Genet. 2019; 38: 54-69https://doi.org/10.1016/j.fsigen.2018.09.009
        • Kidd K.K.
        • Pakstis A.J.
        • Speed W.C.
        • et al.
        Current sequencing technology makes microhaplotypes a powerful new type of genetic marker for forensics.
        Forensic Sci. Int. Genet. 2014; 12: 215-224https://doi.org/10.1016/j.fsigen.2014.06.014
        • Wu R.
        • Li H.
        • Li R.
        • et al.
        Identification and sequencing of 59 highly polymorphic microhaplotypes for analysis of DNA mixtures.
        Int J. Leg. Med. 2021; 135: 1137-1149https://doi.org/10.1007/s00414-020-02483-x
        • Gandotra N.
        • Speed W.C.
        • Qin W.
        • et al.
        Validation of novel forensic DNA markers using multiplex microhaplotype sequencing.
        Forensic Sci. Int Genet. 2020; 47102275https://doi.org/10.1016/j.fsigen.2020.102275
        • Bulbul O.
        • Speed W.C.
        • Gurkan C.
        • et al.
        Improving ancestry distinctions among Southwest Asian populations.
        Forensic Sci. Int. Genet. 2018; 35: 14-20https://doi.org/10.1016/j.fsigen.2018.03.010
        • Chen P.
        • Zhu W.
        • Tong F.
        • et al.
        Identifying novel microhaplotypes for ancestry inference.
        Int. J. Leg. Med. 2019; 133: 983-988https://doi.org/10.1007/s00414-018-1881-x
        • Cheung E.Y.Y.
        • Phillips C.
        • Eduardoff M.
        • Lareu M.V.
        • McNevin D.
        Performance of ancestry-informative SNP and microhaplotype markers.
        Forensic Sci. Int. Genet. 2019; 43102141https://doi.org/10.1016/j.fsigen.2019.102141
        • de la Puente M.
        • Ruiz-Ramirez J.
        • Ambroa-Conde A.
        • et al.
        Broadening the applicability of a custom multi-platform panel of microhaplotypes: bio-geographical ancestry inference and expanded reference data.
        Front. Genet. 2020; 11581041https://doi.org/10.3389/fgene.2020.581041
        • Jin X.
        • Zhang X.
        • Shen C.
        • et al.
        A highly polymorphic panel consisting of microhaplotypes and compound markers with the NGS and its forensic efficiency evaluations in Chinese two groups.
        Genes. 2020; 11https://doi.org/10.3390/genes11091027
        • Kidd K.K.
        • Bulbul O.
        • Gurkan C.
        • et al.
        Genetic relationships of Southwest Asian and Mediterranean populations.
        Forensic Sci. Int. Genet. 2021; 53102528https://doi.org/10.1016/j.fsigen.2021.102528
        • Oldoni F.
        • Yoon L.
        • Wootton S.C.
        • Lagace R.
        • Kidd K.K.
        • Podini D.
        Population genetic data of 74 microhaplotypes in four major U.S. population groups.
        Forensic Sci. Int. Genet. 2020; 49102398https://doi.org/10.1016/j.fsigen.2020.102398
        • Phillips C.
        • McNevin D.
        • Kidd K.K.
        • et al.
        MAPlex - A massively parallel sequencing ancestry analysis multiplex for Asia-Pacific populations.
        Forensic Sci. Int. Genet. 2019; 42: 213-226https://doi.org/10.1016/j.fsigen.2019.06.022
        • Xavier C.
        • de la Puente M.
        • Phillips C.
        • et al.
        Forensic evaluation of the Asia Pacific ancestry-informative MAPlex assay.
        Forensic Sci. Int. Genet. 2020; 48102344https://doi.org/10.1016/j.fsigen.2020.102344
        • Zhu J.
        • Lv M.
        • Zhou N.
        • et al.
        Genotyping polymorphic microhaplotype markers through the Illumina((R)) MiSeq platform for forensics.
        Forensic Sci. Int. Genet. 2019; 39: 1-7https://doi.org/10.1016/j.fsigen.2018.11.005
        • Bulbul O.
        • Pakstis A.J.
        • Soundararajan U.
        • et al.
        Ancestry inference of 96 population samples using microhaplotypes.
        Int. J. Leg. Med. 2018; 132: 703-711https://doi.org/10.1007/s00414-017-1748-6
        • Kidd K.K.
        • Speed W.C.
        • Pakstis A.J.
        • et al.
        Evaluating 130 microhaplotypes across a global set of 83 populations.
        Forensic Sci. Int. Genet. 2017; 29: 29-37https://doi.org/10.1016/j.fsigen.2017.03.014
        • Bose N.
        • Carlberg K.
        • Sensabaugh G.
        • Erlich H.
        • Calloway C.
        Target capture enrichment of nuclear SNP markers for massively parallel sequencing of degraded and mixed samples.
        Forensic Sci. Int. Genet. 2018; 34: 186-196https://doi.org/10.1016/j.fsigen.2018.01.010
        • de la Puente M.
        • Phillips C.
        • Xavier C.
        • et al.
        Building a custom large-scale panel of novel microhaplotypes for forensic identification using MiSeq and Ion S5 massively parallel sequencing systems.
        Forensic Sci. Int. Genet. 2020; 45102213https://doi.org/10.1016/j.fsigen.2019.102213
        • Fregeau C.J.
        Validation of the Verogen ForenSeq DNA Signature Prep kit/Primer Mix B for phenotypic and biogeographical ancestry predictions using the Micro MiSeq(R) Flow Cells. Forensic science international.
        Genetics. 2021; 53102533https://doi.org/10.1016/j.fsigen.2021.102533
        • Liu J.
        • Li W.
        • Wang J.
        • et al.
        A new set of DIP-SNP markers for detection of unbalanced and degraded DNA mixtures.
        Electrophoresis. 2019; 40: 1795-1804https://doi.org/10.1002/elps.201900017
        • Jin X.Y.
        • Cui W.
        • Chen C.
        • et al.
        Developing and population analysis of a new multiplex panel of 18 microhaplotypes and compound markers using next generation sequencing and its application in the Shaanxi Han population.
        Electrophoresis. 2020; 41: 1230-1237https://doi.org/10.1002/elps.201900451
        • Kureshi A.
        • Li J.
        • Wen D.
        • Sun S.
        • Yang Z.
        • Zha L.
        Construction and forensic application of 20 highly polymorphic microhaplotypes.
        R. Soc. Open Sci. 2020; 7191937https://doi.org/10.1098/rsos.191937
        • Pang J.B.
        • Rao M.
        • Chen Q.F.
        • et al.
        A 124-plex microhaplotype panel based on next-generation sequencing developed for forensic applications.
        Sci. Rep. 2020; 10: 1945https://doi.org/10.1038/s41598-020-58980-x
        • Phillips C.
        • Amigo J.
        • Tillmar A.O.
        • et al.
        A compilation of tri-allelic SNPs from 1000 Genomes and use of the most polymorphic loci for a large-scale human identification panel.
        Forensic Sci. Int. Genet. 2020; 46102232https://doi.org/10.1016/j.fsigen.2020.102232
        • Turchi C.
        • Melchionda F.
        • Pesaresi M.
        • Tagliabracci A.
        Evaluation of a microhaplotypes panel for forensic genetics using massive parallel sequencing technology.
        Forensic Sci. Int. Genet. 2019; 41: 120-127https://doi.org/10.1016/j.fsigen.2019.04.009
        • van der Gaag K.J.
        • de Leeuw R.H.
        • Laros J.F.J.
        • den Dunnen J.T.
        • de Knijff P.
        Short hypervariable microhaplotypes: a novel set of very short high discriminating power loci without stutter artefacts.
        Forensic Sci. Int. Genet. 2018; 35: 169-175https://doi.org/10.1016/j.fsigen.2018.05.008
        • Staadig A.
        • Tillmar A.
        Evaluation of microhaplotypes in forensic kinship analysis from a Swedish population perspective.
        Int. J. Leg. Med. 2021; 135: 1151-1160https://doi.org/10.1007/s00414-021-02509-y
        • Sun S.
        • Liu Y.
        • Li J.
        • et al.
        Development and application of a nonbinary SNP-based microhaplotype panel for paternity testing involving close relatives.
        Forensic Sci. Int. Genet. 2020; 46102255https://doi.org/10.1016/j.fsigen.2020.102255
        • Wu R.
        • Chen H.
        • Li R.
        • et al.
        Pairwise kinship testing with microhaplotypes: can advancements be made in kinship inference with these markers?.
        Forensic Sci. Int. 2021; 325110875https://doi.org/10.1016/j.forsciint.2021.110875
        • Zhu J.
        • Chen P.
        • Qu S.
        • et al.
        Evaluation of the microhaplotype markers in kinship analysis.
        Electrophoresis. 2019; 40: 1091-1095https://doi.org/10.1002/elps.201800351
        • Bai Z.
        • Zhao H.
        • Lin S.
        • et al.
        Evaluation of a microhaplotype-based noninvasive prenatal test in twin gestations: determination of paternity, zygosity, and fetal fraction.
        Genes. 2020; : 12https://doi.org/10.3390/genes12010026
        • Ou X.
        • Qu N.
        Noninvasive prenatal paternity testing by target sequencing microhaps.
        Forensic Sci. Int. Genet. 2020; 48102338https://doi.org/10.1016/j.fsigen.2020.102338
        • Qu N.
        • Xie Y.
        • Li H.
        • et al.
        Noninvasive prenatal paternity testing using targeted massively parallel sequencing.
        Transfusion. 2018; 58: 1792-1799https://doi.org/10.1111/trf.14577
        • Wang J.Y.T.
        • Whittle M.R.
        • Puga R.D.
        • Yambartsev A.
        • Fujita A.
        • Nakaya H.I.
        Noninvasive prenatal paternity determination using microhaplotypes: a pilot study.
        BMC Med. Genom. 2020; 13: 157https://doi.org/10.1186/s12920-020-00806-w
        • Bennett L.
        • Oldoni F.
        • Long K.
        • et al.
        Mixture deconvolution by massively parallel sequencing of microhaplotypes.
        Int. J. Leg. Med. 2019; 133: 719-729https://doi.org/10.1007/s00414-019-02010-7
        • Chen P.
        • Deng C.
        • Li Z.
        • et al.
        A microhaplotypes panel for massively parallel sequencing analysis of DNA mixtures.
        Forensic Sci. Int Genet. 2019; 40: 140-149https://doi.org/10.1016/j.fsigen.2019.02.018
        • Chen P.
        • Yin C.
        • Li Z.
        • et al.
        Evaluation of the microhaplotypes panel for DNA mixture analyses.
        Forensic Sci. Int. Genet. 2018; 35: 149-155https://doi.org/10.1016/j.fsigen.2018.05.003
        • Crysup B.
        • Woerner A.E.
        • King J.L.
        • Budowle B.
        Graph algorithms for mixture interpretation.
        Genes. 2021; 12https://doi.org/10.3390/genes12020185
        • Oldoni F.
        • Bader D.
        • Fantinato C.
        • et al.
        A sequence-based 74plex microhaplotype assay for analysis of forensic DNA mixtures.
        Forensic Sci. Int. Genet. 2020; 49102367https://doi.org/10.1016/j.fsigen.2020.102367
        • Coble M.D.
        • Bright J.A.
        Probabilistic genotyping software: an overview.
        Forensic Sci. Int. Genet. 2019; 38: 219-224https://doi.org/10.1016/j.fsigen.2018.11.009
        • Alladio E.
        • Omedei M.
        • Cisana S.
        • et al.
        DNA mixtures interpretation - a proof-of-concept multi-software comparison highlighting different probabilistic methods’ performances on challenging samples.
        Forensic Sci. Int. Genet. 2018; 37: 143-150https://doi.org/10.1016/j.fsigen.2018.08.002
        • Benschop C.C.
        • Sijen T.
        LoCIM-tool: an expert’s assistant for inferring the major contributor’s alleles in mixed consensus DNA profiles.
        Forensic Sci. Int. Genet. 2014; 11: 154-165https://doi.org/10.1016/j.fsigen.2014.03.012
        • Bleka O.
        • Benschop C.C.G.
        • Storvik G.
        • Gill P.
        A comparative study of qualitative and quantitative models used to interpret complex STR DNA profiles.
        Forensic Sci. Int. Genet. 2016; 25: 85-96https://doi.org/10.1016/j.fsigen.2016.07.016
        • Bleka O.
        • Eduardoff M.
        • Santos C.
        • Phillips C.
        • Parson W.
        • Gill P.
        Open source software EuroForMix can be used to analyse complex SNP mixtures.
        Forensic Sci. Int. Genet. 2017; 31: 105-110https://doi.org/10.1016/j.fsigen.2017.08.001
        • Bleka O.
        • Storvik G.
        • Gill P.
        EuroForMix: an open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts.
        Forensic Sci. Int. Genet. 2016; 21: 35-44https://doi.org/10.1016/j.fsigen.2015.11.008
        • Bright J.A.
        • Taylor D.
        • McGovern C.
        • et al.
        Developmental validation of STRmix, expert software for the interpretation of forensic DNA profiles. Forensic science international.
        Genetics. 2016; 23: 226-239https://doi.org/10.1016/j.fsigen.2016.05.007
        • Buckleton J.S.
        • Bright J.A.
        • Gittelson S.
        • et al.
        The Probabilistic Genotyping Software STRmix: Utility and Evidence for its Validity.
        J. Forensic Sci. 2019; 64: 393-405https://doi.org/10.1111/1556-4029.13898
        • Taylor D.
        • Bright J.A.
        • Buckleton J.
        The interpretation of single source and mixed DNA profiles.
        Forensic Sci. Int. Genet. 2013; 7: 516-528https://doi.org/10.1016/j.fsigen.2013.05.011
        • Yang J.
        • Lin D.
        • Deng C.
        • et al.
        The advances in DNA mixture interpretation.
        Forensic Sci. Int. 2019; 301: 101-106https://doi.org/10.1016/j.forsciint.2019.05.024
        • Bootsma M.L.
        • Gruenthal K.M.
        • McKinney G.J.
        • et al.
        A GT-seq panel for walleye (Sander vitreus) provides important insights for efficient development and implementation of amplicon panels in non-model organisms.
        Mol. Ecol. Resour. 2020; 20: 1706-1722https://doi.org/10.1111/1755-0998.13226
        • Qu N.
        • Lin S.
        • Gao Y.
        • Liang H.
        • Zhao H.
        • Ou X.
        A microhap panel for kinship analysis through massively parallel sequencing technology.
        Electrophoresis. 2020; 41: 246-253https://doi.org/10.1002/elps.201900337
        • Jin L.
        • Su B.
        Natives or immigrants: modern human origin in east Asia.
        Nat. Rev. Genet. 2000; 1: 126-133https://doi.org/10.1038/35038565
        • Shi H.
        • Zhong H.
        • Peng Y.
        • et al.
        Y chromosome evidence of earliest modern human settlement in East Asia and multiple origins of Tibetan and Japanese populations.
        BMC Biol. 2008; 6: 45https://doi.org/10.1186/1741-7007-6-45
        • Su B.
        • Xiao J.
        • Underhill P.
        • et al.
        Y-Chromosome evidence for a northward migration of modern humans into Eastern Asia during the last Ice Age.
        Am. J. Hum. Genet. 1999; 65: 1718-1724https://doi.org/10.1086/302680
      1. Byrska-Bishop M., Evani US, Zhao X. et al. (2021) High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. bioRxiv: 2021.02.06.430068. doi: 10.1101/2021.02.06.430068.

        • Bolger A.M.
        • Lohse M.
        • Usadel B.
        Trimmomatic: a flexible trimmer for Illumina sequence data.
        Bioinformatics. 2014; 30: 2114-2120https://doi.org/10.1093/bioinformatics/btu170
        • Masella A.P.
        • Bartram A.K.
        • Truszkowski J.M.
        • Brown D.G.
        • Neufeld J.D.
        PANDAseq: paired-end assembler for illumina sequences.
        BMC Bioinforma. 2012; 13: 31https://doi.org/10.1186/1471-2105-13-31
        • Li H.
        • Durbin R.
        Fast and accurate short read alignment with Burrows-Wheeler transform.
        Bioinformatics. 2009; 25: 1754-1760https://doi.org/10.1093/bioinformatics/btp324
        • McKenna A.
        • Hanna M.
        • Banks E.
        • et al.
        The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.
        Genome Res. 2010; 20: 1297-1303https://doi.org/10.1101/gr.107524.110
        • Van der Auwera G.A.
        • Carneiro M.O.
        • Hartl C.
        • et al.
        From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.
        Curr. Protoc. Bioinforma. 2013; 43 (11 0 1- 0 33)https://doi.org/10.1002/0471250953.bi1110s43
        • Woerner A.E.
        • King J.L.
        • Budowle B.
        Fast STR allele identification with STRait Razor 3.0.
        Forensic Sci. Int. Genet. 2017; 30: 18-23https://doi.org/10.1016/j.fsigen.2017.05.008
        • Li H.
        • Handsaker B.
        • Wysoker A.
        • et al.
        The sequence alignment/map format and SAMtools.
        Bioinformatics. 2009; 25: 2078-2079https://doi.org/10.1093/bioinformatics/btp352
        • Thorvaldsdottir H.
        • Robinson J.T.
        • Mesirov J.P.
        Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.
        Brief. Bioinforma. 2013; 14: 178-192https://doi.org/10.1093/bib/bbs017
        • Fan H.
        • Wang X.
        • Chen H.
        • et al.
        The evaluation of forensic characteristics and the phylogenetic analysis of the Ong Be language-speaking population based on Y-STR.
        Forensic Sci. Int. Genet. 2018; 37: e6-e11https://doi.org/10.1016/j.fsigen.2018.09.008
        • Ding J.
        • Fan H.
        • Zhou Y.
        • et al.
        Genetic polymorphisms and phylogenetic analyses of the Ü-Tsang Tibetan from Lhasa based on 30 slowly and moderately mutated Y-STR loci.
        Forensic Sci. Res. 2020; : 1-8https://doi.org/10.1080/20961790.2020.1810882
        • Fan H.
        • Wang X.
        • Chen H.
        • Li W.
        • Wang W.
        • Deng J.
        The Ong Be language-speaking population in Hainan Island: genetic diversity, phylogenetic characteristics and reflections on ethnicity.
        Mol. Biol. Rep. 2019; 46: 4095-4103https://doi.org/10.1007/s11033-019-04859-8
        • Li W.
        • Wang X.
        • Wang X.
        • et al.
        Forensic characteristics and phylogenetic analyses of one branch of Tai-Kadai language-speaking Hainan Hlai (Ha Hlai) via 23 autosomal STRs included in the Huaxia(.) Platinum System.
        Mol. Genet. Genom. Med. 2020; 8e1462https://doi.org/10.1002/mgg3.1462
        • Fan H.
        • Wang X.
        • Chen H.
        • et al.
        Population analysis of 27 Y-chromosomal STRs in the Li ethnic minority from Hainan province, southernmost China.
        Forensic Sci. Int. Genet. 2018; 34: e20-e22https://doi.org/10.1016/j.fsigen.2018.01.007
        • Fan H.
        • Wang X.
        • Ren Z.
        • et al.
        Population data of 19 autosomal STR loci in the Li population from Hainan Province in southernmost China.
        Int. J. Leg. Med. 2019; 133: 429-431https://doi.org/10.1007/s00414-018-1828-2
        • Fan H.
        • Xie Q.
        • Zhang Z.
        • Wang J.
        • Chen X.
        • Qiu P.
        Chronological age prediction: developmental evaluation of DNA methylation-based machine learning models.
        Front Bioeng. Biotechnol. 2021; 9819991https://doi.org/10.3389/fbioe.2021.819991
        • Wang F.
        • Du Z.
        • Han B.
        • et al.
        Genetic diversity, forensic characteristics and phylogenetic analysis of the Qiongzhong aborigines residing in the tropical rainforests of Hainan Island via 19 autosomal STRs.
        Ann. Hum. Biol. 2021; 48: 335-342https://doi.org/10.1080/03014460.2021.1951352
        • Fan H.
        • Xie Q.
        • Li Y.
        • Wang L.
        • Wen S.Q.
        • Qiu P.
        Insights into forensic features and genetic structures of Guangdong maoming han based on 27 Y-STRs.
        Front. Genet. 2021; 12690504https://doi.org/10.3389/fgene.2021.690504
        • Fan H.
        • Zhang X.
        • Wang X.
        • et al.
        Genetic analysis of 27 Y-STR loci in Han population from Hainan province, southernmost China.
        Forensic Sci. Int. Genet. 2018; 33: e9-e10https://doi.org/10.1016/j.fsigen.2017.12.009
        • Luo C.
        • Duan L.
        • Li Y.
        • et al.
        Insights from Y-STRs: forensic characteristics, genetic affinities, and linguistic classifications of guangdong hakka and she groups.
        Front. Genet. 2021; 12676917https://doi.org/10.3389/fgene.2021.676917
        • Fan H.
        • He Y.
        • Li S.
        • et al.
        Systematic evaluation of a novel 6-dye direct and multiplex PCR-CE-based indel typing system for forensic purposes.
        Front. Genet. 2021; 12744645https://doi.org/10.3389/fgene.2021.744645
        • Kidd K.K.
        • Speed W.C.
        Criteria for selecting microhaplotypes: mixture detection and deconvolution.
        Invest. Genet. 2015; 6: 1https://doi.org/10.1186/s13323-014-0018-3
        • Rosenberg N.A.
        • Li L.M.
        • Ward R.
        • Pritchard J.K.
        Informativeness of genetic markers for inference of ancestry.
        Am. J. Hum. Genet. 2003; 73: 1402-1422https://doi.org/10.1086/380416
        • Genomes Project C.
        • Auton A.
        • Brooks L.D.
        • et al.
        A global reference for human genetic variation.
        Nature. 2015; 526: 68-74https://doi.org/10.1038/nature15393
        • Sudmant P.H.
        • Rausch T.
        • Gardner E.J.
        • et al.
        An integrated map of structural variation in 2,504 human genomes.
        Nature. 2015; 526: 75-81https://doi.org/10.1038/nature15394
        • de Ridder D.
        • de Ridder J.
        • Reinders M.J.
        Pattern recognition in bioinformatics.
        Brief. Bioinforma. 2013; 14: 633-647https://doi.org/10.1093/bib/bbt020
        • Kidd K.K.
        • Pakstis A.J.
        • Speed W.C.
        • Lagace R.
        • Wootton S.
        • Chang J.
        Selecting microhaplotypes optimized for different purposes.
        Electrophoresis. 2018; 39: 2815-2823https://doi.org/10.1002/elps.201800092
        • Wen D.
        • Sun S.
        • Liu Y.
        • et al.
        Considering the flanking region variants of nonbinary SNP and phenotype-informative SNP to constitute 30 microhaplotype loci for increasing the discriminative ability of forensic applications.
        Electrophoresis. 2021; 42: 1115-1126https://doi.org/10.1002/elps.202000341
        • Li R.
        • Zhang C.
        • Li H.
        • et al.
        SNP typing using the HID-Ion AmpliSeq Identity Panel in a southern Chinese population.
        Int. J. Leg. Med. 2018; 132: 997-1006https://doi.org/10.1007/s00414-017-1706-3
        • Hanson E.K.
        • Ballantyne J.
        A highly discriminating 21 locus Y-STR “megaplex” system designed to augment the minimal haplotype loci for forensic casework.
        J. Forensic Sci. 2004; 49: 40-51
        • Gill P.
        • Benschop C.
        • Buckleton J.
        • Bleka O.
        • Taylor D.
        A Review of Probabilistic Genotyping Systems: EuroForMix, DNAStatistX and STRmix.
        Genes (Basel). 12. 2021https://doi.org/10.3390/genes12101559