If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Corresponding author at: Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China.
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, ChinaInstitute of Archaeological Science, Fudan University, Shanghai 200433, ChinaSchool of Basic Medicine and Life Science, Hainan Medical University, Haikou 571199, Hainan, China
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, ChinaDepartment of Infertility and Sexual Medicine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, Guangdong, China
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, ChinaSchool of Basic Medicine, Gannan Medical University, Ganzhou 341000, Jiangxi, China
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, Guangdong, China
Microhaplotype and Y-SNP/STR (MY) system comprising 114 Y-SNPs, 45 Y-STRs, and 22 Microhaplotypes was developed based on multiplex PCR and 150-bp paired-end sequencing.
•
Twenty-six two-person genotype combinations were integrated into nine genotype patterns.
•
MY system-based genotype pattern recognition, a regression-based method to identify the genotype pattern for each MH locus, is proposed for two-person DNA mixture deconvolution. (application range: 1:10–1:2).
Abstract
Backgrounds
Y-chromosomal haplotypes based on Y-short tandem repeats (STRs) and Y-single nucleotide polymorphisms/insertion and deletion polymorphisms (SNPs/InDels) are used to characterize paternal lineages of unknown male trace donors. However, Y-chromosomal genetic markers are not currently sufficient for precise individual identification. Microhaplotype (MH), generally < 200 bp on autosomes and consisting of two or more SNPs, was recently introduced in forensic genetics with the development of massive parallel sequencing technology and may facilitate identification and DNA mixture deconvolution. Therefore, combining the two kinds of genetic markers may be beneficial in many forensic scenarios, especially crime scenes with male suspects, such as sexual assault cases.
Methods
In the present study, we developed a novel MPS-based panel, Microhaplotype and Y-SNP/STR (MY), by multiplex PCR and 150-bp paired-end sequencing, including 114 Y-SNPs (twelve dominant Y-DNA haplogroups), 45 Y-STRs (N-1 stutter < 0.09; estimated mutation rate < 5 × 10−3), and 22 MHs (allele coverage ratio > 0.91; pairwise distance > 10 Mb). Additionally, MY system-based genotype pattern recognition (GPR), a regression-based method to identify the genotype pattern for each MH locus, is proposed for two-person DNA mixture deconvolution. We integrated 26 two-person genotype combinations into nine genotype patterns and validated the application range of GPR based on DNA profiles of ten sets of simulated male-male DNA mixtures (1:10–1:2).
Results
The effective number of alleles (Ae) ranged from 3.62 to 14.72, with an average of 7.17, in 100 Chinese Guangdong Han individuals. The cumulative discrimination power was 1–5.00 × 10−31, and the cumulative power of exclusion was 1–5.00 × 10−8 and 1–4.85 × 10−12 for duo and trio paternity testing, respectively. Furthermore, the actual mixing ratio-depth of coverage (DoC) ratio (RDoC) regression relationships were established for different genetic markers and genotype patterns. In five overlapping areas, genotype differentiation of the major and minor contributors required likelihood ratio methods. In nonoverlapping areas, the genotype pattern could be recognized by comparing the observed RDoC and RDoC ranges.
Conclusion
The GPR can be used to deconvolute two-person DNA mixtures (application range: 1:10–1:2) for individual identification.
In crime scene investigation, Y-chromosomal haplotypes composed of Y-chromosomal short tandem repeats (Y-STRs) and Y-DNA haplogroups can exclude male suspects from involvement in crime, identify the paternal lineage of male perpetrators, highlight multiple male contributors to a trace, and provide investigative leads for finding unknown male perpetrators. In principle, with sufficient numbers of rapidly mutating (RM) Y-STR markers available, closely and especially distantly related men can be separated by means of observed mutations [
Biological samples, which are collected from bodily fluids (e.g., blood, saliva, vaginal secretions, and semen) in criminal disputes (e.g., sexual and physical assault, sodomy, and murder), can include mixed-DNA profiles of two or more same- or opposite-sex donors [
]. Although STRs are useful for addressing forensic DNA-oriented questions, the stutters generated by PCR replication slippage represent the most serious limitation for DNA mixture deconvolution, resulting in DNA background masking of the alleles of the minor donor at higher mixture ratios [
AmpFlSTR profiler Plus short tandem repeat DNA analysis of casework samples, mixture samples, and nonhuman DNA samples amplified under reduced PCR volume conditions (25 microL).
Developmental validation of the AmpFlSTR(R) NGM SElect PCR amplification kit: a next-generation STR multiplex with the SE33 locus. Forensic science international.
]. The capillary electrophoresis (CE)-based mixed-DNA profiles conventionally used for mixture interpretation can be influenced by various factors, including but not limited to drop-in/out alleles, pull-up of signals from dye colour, shared alleles, and split peaks caused by incomplete adenylation [
]. In addition, the numbers of STR markers are generally limited to 40 when using the CE method. Hence, the traditional STR-CE method used in forensic genetics may not be the optimal solution for DNA mixture deconvolution.
With the development of massive parallel sequencing (MPS) technology [
Massively parallel sequence data of 31 autosomal STR loci from 496 Spanish individuals revealed concordance with CE-STR technology and enhanced discrimination power.
], emerging microhaplotypes (i.e., microhaps or MHs, generally <200 bp and consisting of two or more closely linked single nucleotide polymorphisms (SNPs) with three or more allelic combinations [
]) exhibit promising characteristics for further enhancing the deconvolution of DNA mixtures. MHs present some advantages over standard STRs: (1) a multiallelic nature and the absence of STR structures, which prevents Taq polymerase slippage and stutter peak generation; (2) balanced PCR amplification due to the similar lengths of MH amplicons (preferential PCR amplification for shorter alleles commonly occurs with STRs); and (3) lower mutation rates than STRs. Since Kidd proposed this powerful new type of genetic marker in 2013 [
], an increasing number of studies have primarily focused on the following aspects: (1) finding and evaluating more MHs with higher effective number of alleles (Ae) to yield low random match probabilities (RMPs) and high probabilities of enhancing deconvolution performance [
A highly polymorphic panel consisting of microhaplotypes and compound markers with the NGS and its forensic efficiency evaluations in Chinese two groups.
Validation of the Verogen ForenSeq DNA Signature Prep kit/Primer Mix B for phenotypic and biogeographical ancestry predictions using the Micro MiSeq(R) Flow Cells. Forensic science international.
Developing and population analysis of a new multiplex panel of 18 microhaplotypes and compound markers using next generation sequencing and its application in the Shaanxi Han population.
DNA mixtures interpretation - a proof-of-concept multi-software comparison highlighting different probabilistic methods’ performances on challenging samples.
]. To date, however, these markers have remained as complements to conventional STR analysis in mixture profiling. This is due to the disadvantages of MHs: (1) the presence of fewer alleles than for most STRs, which leads to a requirement for more MHs to reach efficiencies comparable to those of STRs; (2) the lack of necessary population data for forensic applications; and (3) the absence of appropriate workflows and pipelines for sequencing, data assembly and analysis within the global forensic DNA community. Even so, compared to the STR-CE approach, the MHs generated by MPS possess enormous potential for DNA mixture deconvolution [
Thus, based on both the advantages and disadvantages of Y-SNPs/STRs and MHs, the genetic markers can complement each other when used together, which would be helpful for crime scene investigations, especially for crime scenes with male suspects. In the present study, we developed a novel MY system comprising 114 Y-SNPs, 45 Y-STRs, and 22 MHs, which was based on multiplex PCR and 150-bp paired-end sequencing technologies [
A GT-seq panel for walleye (Sander vitreus) provides important insights for efficient development and implementation of amplicon panels in non-model organisms.
]. The MY system contains two main components: Y-chromosomal markers, which could provide some additional information in investigations (e.g., Y haplotype-based familial searching and paternal/kinship determination), and MH loci, which can be applied to deconvolute DNA mixtures for human individual identification. In addition, genotype pattern recognition (GPR) of two-person DNA mixtures is proposed based on the MY system. A total of 26 different genotype combinations were integrated into nine distinct genotype patterns in two-person DNA mixtures with a deduced application range of 1:10.11–1:2.10. We simulated ten sets of male-male DNA mixtures with actual mixing ratios (AMRs) from 1:10–1:2 to validate the application range of GPR. The regression relationships between the AMR and depth of coverage (DoC) ratio (RDoC) were established for different genetic markers and genotype patterns. The major and minor genotypes could be inferred by GPR and likelihood ratio (LR)-based methods. Hence, GPR could be used to deconvolute two-person DNA mixtures (application range: 1:10–1:2) for individual identification.
2. Material and methods
2.1 MY system
2.1.1 Y-SNPs
According to the International Society of Genetic Genealogy (ISOGG), the 114 selected Y-SNPs of the MY system mainly cover twelve dominant and representative Y haplogroups (C, D, E, F, G, I, J, K, Q, R, N, and O). The C, D, N and O haplogroups account for more than 93% of the Y-chromosomal genetic make-up in East Asian (EAS) populations [
]; therefore, we selected 99 common and representative Y-SNPs to refine the four major haplogroups, especially the O haplogroup (containing 71 Y-SNPs to refine the downstream subhaplogroups). Detailed Y-SNP information is provided in Supplementary Table S1 and Fig. 1.
Fig. 1Detailed information of 181 genetic markers (114 Y-SNPs, 45 Y-STRs, and 22 MHs) in the MY system. (*, global average Ae value of 2504 individuals from 26 global populations; Ae, effective number of alleles; MH, microhaplotype. Detailed information on the MY genetic markers is provided in Supplementary Tables S1–4.).
], and stutter ratio (N-1 stutter < 0.09, see 2.6.2 for details) were taken into consideration for design and optimization. Eventually, a total of 45 single-copy Y-STRs, consisting of 23 slowly mutating (SM) Y-STRs (μ < 1 × 10−3), 19 moderately mutating (MM) Y-STRs (μ, 1 × 10−3- 5 × 10−3), 2 fast-mutating (FM) Y-STRs (μ, 5 × 10−3- 1 × 10−2), and one RM Y-STR (μ > 10−2), were included in the MY system: DYS388, DYS391, DYS392, DYS393, DYS434, DYS435, DYS439, DYS450, DYS453, DYS454, DYS455, DYS460, DYS462, DYS472, DYS476, DYS485, DYS492, DYS502, DYS508, DYS511, DYS512, DYS513, DYS530, DYS531, DYS533, DYS538, DYS541, DYS549, DYS556, DYS565, DYS568, DYS570, DYS571, DYS572, DYS573, DYS576, DYS578, DYS585, DYS588, DYS590, DYS613, DYS616, DYS638, DYS640, and DYS641. The detailed locations, estimated mutation rates and repeat motifs of the Y-STRs are provided in Supplementary Table S2 and Fig. 1.
2.1.3 MHs
Initially, 456 targeted fragments (Ae > 5 in EASs) were preliminarily chosen from the 10966-fragment pool (nucleotide diversity (π) > 0.01) on a genome-wide scale (30 ×, GRCh38 [
Byrska-Bishop M., Evani US, Zhao X. et al. (2021) High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. bioRxiv: 2021.02.06.430068. doi: 10.1101/2021.02.06.430068.
]). Furthermore, we collected our MH set (135 MHs, Supplementary Table S3) from these 456 targeted fragments, which had pairwise distances (d) > 10 Mb and were suitable for 150-bp paired-end sequencing. Additionally, we set relatively rigorous criteria for MH selection, i.e., an allele coverage ratio (ACR) > 0.91 and informativeness for ancestry inference (In) > 0.185. Finally, after multiple rounds of optimization (at least more than 10 times for the adjustments of panel design, amplification primers, the multiplex PCR amplification system, and quality control), a total of 22 MHs with a global average Ae of 8.32 (Fig. 1) showed balanced amplification and were selected for the MY system. The detailed composite SNPs making up each MH are presented in Supplementary Table S4.
The thresholds (N-1 stutter < 0.09 and ACR > 0.91) were determined by considering the following (see 2.6 for some additional details): a Y-STR analysis threshold for DNA mixtures higher than the N-1 stutter ratio to suppress noise interference (i.e., an N-1 stutter lower than the mixing ratio), a balance between the ACR of MHs and the application range of GPR (e.g., if the ACR is more than 0.91, the application range is 1:10.11–1:2.10; if the ACR is more than 0.90, the application range is 1:9.00–1:2.11.), enough MHs to reach sufficient system effectiveness for individual identification, application ranges of both Y-chromosomal genetic markers and MHs, etc. The deduced application range for GPR varies from 1:10.11–1:2.10 in two-person DNA mixtures when the ACR is more than 0.91 (Supplementary Fig. S1–3). To take full advantage of the Y-chromosomal information in two-person DNA mixtures with male contributors, the N-1 stutter of each Y-STR should be less than 0.09 (Supplementary Fig. S4).
2.2 Sample collection and mixture simulation
2.2.1 Human DNA samples
One hundred unrelated healthy Han Chinese volunteers (80 males and 20 females) were recruited from Guangdong Province of China (GDH), and their peripheral blood samples were collected in EDTA anticoagulant tubes (2 ml). Genomic DNA was extracted using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol. The quantity of the DNA template was determined using a Qubit® 4.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) with a Qubit® dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer’s instructions.
All peripheral blood samples were collected with written consent from the donors who gave their permission for the analysis of their DNA and the dissemination of their results via a scientific publication. This study was approved by the Biomedical Ethical Committee of Southern Medical University (No. 2021–007) and in accordance with the standards of the Declaration of Helsinki.
2.2.2 Artificial DNA mixtures
(1)
DNA quantification
For each selected male DNA sample, two skilled laboratory assistants quantified the DNA concentration 15 times. Then, after removing the maximum and minimum values, we took the average value as the final quantity.
(2)
Two-person DNA mixtures
To validate the deduced application range for GPR (1:10.11–1:2.10, details in 2.6.1), we randomly selected ten different male DNA samples for artificial DNA mixture simulations and prepared 10 sets of male-male DNA mixtures at the same nine ratios: 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, and 1:10 (detailed simulations in Supplementary Table S5).
2.3 Library preparation and sequencing
We used 1 ng of DNA or mixed DNA sample as the template, with the following thermal cycling conditions to amplify targeted loci: 98 °C for 3 min; followed by 18 cycles of 98 °C for 30 s and 60 °C for 4 min; a final extension at 72 °C for 2 min; and an infinite hold at 10 °C. PCR products were purified using Agencourt® AMPure® XP Beads (Beckman Coulter, Brea, CA, USA). Each targeted amplicon was barcoded and amplified using the following PCR cycling conditions: an initial incubation at 98 °C for 1 min; followed by 6 cycles of 98 °C for 20 s, 60 °C for 20 s, and 72 °C for 30 s; a final extension at 72 °C for 2 min; and an infinite hold at 10 °C. Libraries were purified using AMPure® XP Beads and quantified by a Qubit® dsDNA HS Assay Kit according to the manufacturer’s instructions for final normalization to produce equal volumes before sequencing. Sequencing was performed using the Illumina® NovaSeq™ 6000 System (Illumina, San Diego, CA, USA) with a 2 × 150-bp strategy according to the manufacturer’s recommendation.
2.4 MPS data processing
Illumina bcl2fastq v.2.17 software (Illumina, San Diego, CA, USA) was used to obtain demultiplexed FASTQ data, trim adaptors, and calculate sample coverages and “%Q30 Bases” (the percentage of bases with a Q-score ≥ 30) to evaluate sample quality. The FASTQ data were further filtered by Trimmomatic v.0.4 [
] was used to assemble sequences and scale them to billions of paired-end reads.
We detected the SNPs and insertions/deletions (InDels) of the Y chromosome by mapping the reads to the human reference genome GRCh38 with Burrows–Wheeler Aligner (BWA) [
] with adjusted configuration files. Quality control and interpretation of genotyping results were performed according to the addendum to “SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories”. The floating and fixed analytical threshold (AT) and interpretation threshold (IT) were used in combination, with 1.5% AT and 4.5% IT (if depth > 650 reads) and 10 × AT and 30 × IT (if depth < 650 reads).
For the compilation of phased MHs, we constructed a partial reference genome as the reference comprising each MH amplicon. The raw reads in FASTQ format were aligned to the reference genome using BWA, and alignments were further processed with SAMtools [
]. Then, we applied an R package (microhaplot, https://github.com/ngthomas/microhaplot) to obtain the allele strings and depth for each MH. In addition, the MPS data of 22 MH loci were also validated in Integrative Genomics Viewer (IGV) v2.9.2 [
] to check for MH calling concordance. We randomly selected two samples and prepared both of them in triplicate to assess the “within-run” and “between-run” concordance (two in the same run, one in another run) [
The mean values and standard deviations (SDs) of the DoC, N-1 stutter ratio for Y-STRs, and ACR values for MHs were calculated by SAS® 9.4 software (SAS Institute Inc., Cary, NC, USA). The other relevant forensic parameters were determined as described in our previous studies [
Forensic characteristics and phylogenetic analyses of one branch of Tai-Kadai language-speaking Hainan Hlai (Ha Hlai) via 23 autosomal STRs included in the Huaxia(.) Platinum System.
Genetic diversity, forensic characteristics and phylogenetic analysis of the Qiongzhong aborigines residing in the tropical rainforests of Hainan Island via 19 autosomal STRs.
], , where Pi equals the frequency of allele i and summation is performed over all alleles at the locus. The Spearman correlation between the composite SNP number and Ae value was conducted in the R statistical environment. The In value was calculated using the program INFOCALC [
2.6 Genotype pattern recognition in two-person DNA mixtures
Pattern recognition involves the development of systems that learn to solve a given problem (including clustering, classification and dimensionality reduction) using a set of example instances, each represented by a number of features [
]. Here, drawing on the concept of pattern recognition, the genotype combinations of two-person DNA mixtures were semi-automatically classified into different genotype patterns by theoretical derivation according to the distribution features of sequencing depth at certain mixing ratios. Based on the fixed genotype patterns, the regression-based GPR of two-person DNA mixture was proposed to determine the corresponding genotype pattern for each MH locus.
2.6.1 MHs: From genotype combination to genotype pattern
There are three different types of two-person DNA mixtures (female-female, female-male, and male-male DNA mixtures). In theory, for any type of two-person DNA mixture, each MH locus could have four alleles (MHallele=4), three alleles (MHallele=3), two alleles (MHallele=2), or one allele (MHallele=1). We used 9 different genotype patterns to represent the 26 different genotype combinations of any two-person DNA mixture, and the genotype combination of each genotype pattern was unique within a certain application range.
(1)
MHallele=4: When four alleles are observed at a MH locus, six genotype combinations are possible for the two contributors, namely, (AB,CD), (AC,BD), (AD,BC), (BC,AD), (BD,AC), and (CD,AB), where A, B, C and D are the four observed alleles (Fig. 2). In addition, according to descending order of the DoC values, we integrated these genotype combinations into the pattern MHallele=4 (major contributor’s genotype + minor contributor’s genotype, DoCA > DoCB > DoCC > DoCD). Supplementary Fig. S1 presents the detailed derivation process of the AMR range when the ACR is more than 0.91. The deduced application range is (0, 1:1.10) for the pattern MHallele=4.
Fig. 2Integrating different genotype combinations of the microhaplotype loci into the same genotype pattern and the deduced application range for each genotype pattern. (The first genotype combination as a representative pattern and detailed processes of deducing the application ranges for seven different genotype patterns are shown in Supplementary Figs. S1–3.).
MHallele=3: When three alleles are observed at a MH locus, twelve genotype combinations are possible for the two contributors, namely, (AB,AC), (BC,AC), (AB,BC), (AC,AB), (AC,BC), (BC,AB), (AB,CC), (AC,BB), (BC,AA), (AA,BC), (BB,AC), and (CC,AB), where A, B and C are the three observed alleles (Fig. 2). In addition, according to descending order of the DoC values, we integrated these genotype combinations into three different genotype patterns (α, β, and γ, DoCA > DoCB > DoCC). The deduced application ranges of patterns α, β, and γ are (1:10.11, 1:1.10), (0, 1:2.10), and (0, 1:11], respectively (details in Supplementary Fig. S2).
(3)
MHallele=2: When two alleles are observed at a MH locus, seven genotype combinations are possible for the two contributors, namely, (AB,AA), (AB,BB), (AA,AB), (BB,AB), (AA,BB), (BB,AA), and (AB,AB), where A and B are the two observed alleles (Fig. 2). In addition, according to descending order of the DoC values, we integrated these genotype combinations into four different genotype patterns (δ, ε, ζ, and η, DoCA > DoCB). The deduced application ranges of patterns δ, ε, ζ, and η are (1:21.23, 1:1], (0, 1:1], (0, 1:1), and (0, 1:1], respectively (details in Supplementary Fig. S3).
(4)
MHallele=1: When one allele is observed at a MH locus, only one genotype combination is possible for the two contributors, (AA,AA), where A is the only observed allele (Fig. 2). Therefore, the genotype pattern is MHallele=1. The deduced application range of pattern MHallele=1 is (0, 1:1].
For all different types of two-person DNA mixtures, the genotype patterns of MHallele=1 and MHallele=4 do not need to be recognized. There are three and four different genotype patterns for MHallele=3 and MHallele=2, respectively, which need to be recognized. Once the genotype pattern is recognized, the genotypes of the two contributors are definitive. When the ACR for each MH is more than 0.91, the deduced application range for GPR is (1:10.11, 1:2.10).
2.6.2 Statistical analysis
We also defined the following terminologies:
(1)
AMR: Actual mixing ratio, which is calculated according to two contributors’ DNA concentrations and volumes (Minor, minor contributor; Major, major contributor; C, concentration; V, volume):
(2)
RDoC: DoC ratio, which is determined according to the following self-defined formulas:
In the male-male DNA mixture, we obtained the Y-DNA profiles of both the minor and major contributors; in addition, the mixing ratio could be calculated from the DoC ratios of the minor and major contributors. If two different Y-DNA haplogroups (A and B) or two Y-STR alleles (also named A and B for formula standardization) are detected at the same single-copy Y-STR locus in a two-person DNA mixture, it can be confirmed as a male-male DNA mixture. We defined A and B as the major and minor Y-haplogroups or Y-STR alleles, and DoCA and DoCB represented the DoC for the major and minor Y-DNA haplogroups or Y-STR alleles (DoCA > DoCB), respectively.
(3)
E(AMR): Expected AMR, which is denoted by the 95% confidence interval (CI) of AMR.
The regression analyses (linear, quadratic, and cubic models) and Spearman correlations between AMR and RDoC were conducted in IBM® SPSS® Statistics 26 (IBM Corporation, Armonk, NY, USA). In addition, normality tests and one-way analysis of variance (ANOVA) (with post hoc multiple comparisons) were also conducted in IBM® SPSS® Statistics 26.
3. Results and discussion
3.1 MY system
3.1.1 Characterization and MPS performance
The MY system (Fig. 1) is composed of three different types of genetic markers, including 114 Y-SNPs (Supplementary Table S1), 45 Y-STRs (Supplementary Table S2), and 22 MHs (Supplementary Tables S3–4). Initially, we selected a set of 135 highly polymorphic MHs (Ae > 5 and d > 10 Mb) in EASs, which were suitable for 150-bp paired-end sequencing (Supplementary Table S3). The numbers of composite SNPs making up each MH varied from 7 (MH21FHL-002) to 85 (MH09FHL-008), and the Ae values ranged from 5.00 (MH14FHL-004) to 16.61 (MH21FHL-001), with an average Ae of 6.94 in EASs. Eventually, 22 MHs were incorporated into the MY system according to our selection criteria (details in 2.1.3). They were distributed on 15 different autosomes (Fig. 1), and the global average Ae was 8.32 (3.93–13.44). Among these 22 MH markers, we identified 6 novel SNPs (minor allele frequency, MAF ≥ 0.05) in 100 GDH individuals (Supplementary Table S4), which could provide more polymorphic and specific information for forensic applications, especially for regional human individual identification and some cases with these special variations.
There was a significant correlation between the composite SNP number and Ae value (r = 0.3275, p = 0.0001) in our 135-MH set. In addition, we collected a total of 777 MHs from global populations [
A highly polymorphic panel consisting of microhaplotypes and compound markers with the NGS and its forensic efficiency evaluations in Chinese two groups.
Developing and population analysis of a new multiplex panel of 18 microhaplotypes and compound markers using next generation sequencing and its application in the Shaanxi Han population.
Considering the flanking region variants of nonbinary SNP and phenotype-informative SNP to constitute 30 microhaplotype loci for increasing the discriminative ability of forensic applications.
] and found that the correlation was significant (r = 0.8368, p < 2.2 ×10−16) in this global MH dataset (Supplementary Fig. S5).
The average amplicon sizes of Y-SNPs, Y-STRs, and MHs were 229 ± 34 nt, 136 ± 5 nt, and 263 ± 15 nt, respectively. The average DoC ± SD values were 2381 ± 1786 × (Y-SNPs, Supplementary Table S6), 1033 ± 1018 × (Y-STRs, Supplementary Table S7), and 1653 ± 1076 × (MHs, Supplementary Table S8). The highest DoC of the Y-SNPs was observed at F400 (8438 ± 4031 ×), while the lowest DoC was discovered at F138 (706 ± 342 ×). The maximum and minimum DoC values of the Y-STRs were 2790 ± 1922 × (DYS460) and 394 ± 170 × (DYS450), respectively. Furthermore, the N-1 stutter ratios of 45 Y-STRs were less than 0.0818 ± 0.0197 (DYS392) (Supplementary Table S9). The MH locus with the highest DoC was MH07FHL-004 (3568 ± 1471 ×), and that with the lowest DoC was MH17FHL-005 (540 ± 252 ×). In addition, the ACR of 22 MHs ranged from 0.9193 ± 0.0588 (MH17FHL-005) to 0.9647 ± 0.0266 (MH10FHL-007) (Supplementary Table S10).
3.1.2 Forensic characteristics of GDH individuals
(1)
Y-SNPs/STRs
Supplementary Table S11 presents a total of 29 different Y subhaplogroups and the frequency distribution of Y-DNA haplogroups in 80 GDH individuals. We found four major Y-DNA haplogroups, namely, O (92.50%), N (3.75%), C (2.50%), and Q (1.25%). Haplogroups O2a, O1a, and O1b accounted for 60.00%, 18.75%, and 13.75% of the samples, respectively, consistent with the results of [
]. In addition, the three most frequent O subhaplogroups were O1b1a1a1a1-F2924 (11.25%), O1a1a1-F446 (8.75%), and O2a2a-M188 (8.75%).
The Y-chromosomal haplotype profiles of 80 GDH individuals are shown in Supplementary Table S12. A total of 77 different Y-chromosomal haplotypes were found, of which 74 (96.10%) were unique and 3 occurred twice (H001-H003). We found 4 null alleles at DYS531 (2 nulls), DYS508, and DYS588, which were further confirmed by a CE-based AGCU Y-LM Kit (DYS531 and DYS508) [
]. In the GDH individuals, duplicated or triplicated alleles and intermediate alleles were not identified. The detailed Y-STR repeat region sequences and allele frequencies (length-based and sequence-based) are shown in Supplementary Table S13. Among these 3600 Y-STR alleles generated by the MY system, 168 distinct length variants and 4 repeat sequence subvariants (i.e., isoalleles, which are defined as alleles with the same length but different sequences) were identified across all 45 Y-chromosomal STR loci in the 80 GDH individuals. Sequence variations (8.89% of Y-STRs in the MY system) were detected at DYS531 (allele 12), DYS485 (allele 15), DYS578 (allele 8), and DYS392 (allele 14), which contributed to the higher sequence-based genetic diversity (GD) than length-based GD (0.20%−3.65%, Supplementary Table S14). The highest GD was found at RM DYS576 (0.7937). Three Y-STRs, namely, DYS613, DYS472, and DYS502, presented no polymorphism in the 80 GDH individuals. The overall haplotype diversity (HD) was 0.9991, with a discrimination capacity (DC) of 0.9625 (Supplementary Table S15).
(2)
MHs
The forensic-associated parameters of 22 MHs in 100 GDH individuals are presented in Table 1. The minimum and maximum Ae were 3.62 at MH13FHL-002 and 14.72 at MH02FHL-006, respectively, while the average Ae was 7.17 ( ± 3.22). As shown in Supplementary Fig. S5, except for our 135-MH set, the majority (578/642 ≈90%) of MHs in the global MH dataset had Ae values less than 5. All In values of the 22 MHs exceeded 0.1868, indicating the ability of the MY system to differentiate intercontinental populations. The expected heterozygosity (He) for 22 MHs ranged from 0.7277 (MH13FHL-002) to 0.9367 (MH02FHL-006), with an average He of 0.8373 ( ± 0.0535). The polymorphism information content (PIC) varied from 0.6854 (MH13FHL-002) to 0.9284 (MH02FHL-006), and the average PIC value was 0.8129 ( ± 0.0626). The average discrimination power (DP) was 0.9419 ( ± 0.0266), with a DP range of 0.8792 (MH13FHL-002) to 0.9826 (MH21FHL-002). The power of exclusion (PE) ranged from 0.3834 (MH13FHL-002) to 0.8159 (MH16FHL-004), with an average of 0.6413 ( ± 0.1251). The MH frequencies are listed in Supplementary Table S16. A total of 343 distinct alleles were observed across 22 MH loci, and the number of different MH alleles varied from 6 (MH06FHL-002) to 43 (MH02FHL-006), with allele frequencies ranging from 0.0005 to 0.4250. For the 22 independent MHs of the MY system, the cumulative discrimination power (CDP) was 1–5.00 × 10−31, and the combined power of exclusion for duo paternity testing (CPEduo) and trio paternity testing (CPEtrio) were 1–5.00 × 10−8 and 1–4.85 × 10−12, respectively. The system efficiency of the 22 MHs was equivalent to that of 26–28 forensic CODIS and non-CODIS STRs in GDH individuals (Supplementary Table S17). This demonstrated that the system effectiveness of MHs in MY is equal to or even exceeds that of STRs used frequently in forensics.
Table 1Forensic parameters of 22 microhaplotypes in Guangdong Han population (GDH, n = 100).
MH ID
NSNP
Nallele
Ae
In
He
Ho
PIC
MP
DP
PE
TPI
p-HWE
MH01FHL-009
8
10
4.87
0.2009
0.7975
0.7300
0.7669
0.0740
0.9260
0.4762
1.8519
0.0890
MH02FHL-003
11
10
6.49
0.1910
0.8501
0.8500
0.8278
0.0464
0.9536
0.6949
3.3333
0.8750
MH02FHL-006
44
43
14.72
0.5855
0.9367
0.9000
0.9284
0.0230
0.9770
0.7954
5.0000
0.5610
MH02FHL-010
18
13
6.40
0.4016
0.8480
0.8700
0.8259
0.0490
0.9510
0.7346
3.8462
0.8840
MH03FHL-001
9
10
6.34
0.1943
0.8443
0.8500
0.8196
0.0518
0.9482
0.6949
3.3333
0.4870
MH03FHL-003
10
12
4.96
0.1868
0.8024
0.7500
0.7767
0.0676
0.9324
0.5098
2.0000
0.3160
MH04FHL-005
13
8
4.86
0.2624
0.7982
0.7300
0.7668
0.0694
0.9306
0.4762
1.8519
0.2520
MH06FHL-001
9
8
4.34
0.1929
0.7734
0.7600
0.7423
0.0860
0.9140
0.5270
2.0833
0.9120
MH06FHL-002
33
6
4.42
0.3559
0.7776
0.7100
0.7379
0.1084
0.8916
0.4439
1.7241
0.1480
MH07FHL-001
11
7
5.64
0.1946
0.8270
0.8000
0.7988
0.0626
0.9374
0.5990
2.5000
0.2570
MH07FHL-002
14
9
5.37
0.2849
0.8178
0.8400
0.7874
0.0688
0.9312
0.6753
3.1250
0.4460
MH07FHL-004
21
27
11.09
0.4406
0.8397
0.8300
0.8159
0.0544
0.9456
0.6559
2.9412
0.2510
MH08FHL-006
11
13
5.15
0.2504
0.8101
0.8600
0.7829
0.0686
0.9314
0.7147
3.5714
0.7870
MH09FHL-002
8
8
4.84
0.2581
0.7974
0.7900
0.7618
0.0802
0.9198
0.5806
2.3810
0.6220
MH10FHL-001
33
24
7.87
0.5657
0.8520
0.8300
0.8320
0.0444
0.9556
0.6559
2.9412
0.7090
MH10FHL-007
9
11
6.33
0.2004
0.8463
0.8500
0.8219
0.0498
0.9502
0.6949
3.3333
0.6670
MH11FHL-007
24
19
6.28
0.4684
0.8450
0.8400
0.8229
0.0536
0.9464
0.6753
3.1250
0.3790
MH13FHL-002
8
9
3.62
0.2363
0.7277
0.6700
0.6854
0.1208
0.8792
0.3834
1.5152
0.0340
MH16FHL-004
14
23
12.30
0.4434
0.9233
0.9100
0.9130
0.0224
0.9776
0.8159
5.5556
0.0900
MH17FHL-005
19
23
7.15
0.2845
0.8594
0.8600
0.8449
0.0362
0.9638
0.7147
3.5714
0.7550
MH18FHL-004
26
30
11.11
0.3894
0.9146
0.9000
0.9036
0.0224
0.9776
0.7954
5.0000
0.4760
MH21FHL-002
7
20
13.61
0.2774
0.9312
0.9000
0.9218
0.0174
0.9826
0.7954
5.0000
0.6930
Mean
16
16
7.17
0.3121
0.8373
0.8195
0.8129
0.0581
0.9419
0.6413
3.1629
–
SD
10
9
3.22
0.1249
0.0535
0.0681
0.0626
0.0266
0.0266
0.1251
1.1676
–
MH, microhaplotype; n, sample number of GDH; NSNP, composite SNP number; Nallele, allele number; Ae, effective number of alleles; In, informativeness for ancestry inference (1KG and GDH); He, expected heterozygosity; Ho, observed heterozygosity; PIC, polymorphism information content; MP, match probability; DP, discrimination power; PE, power of exclusion; TPI, typical paternity index; p-HWE, probability value of Hardy-Weinberg equilibrium.
The deduced application range of GPR in two-person DNA mixtures is (1:10.11, 1:2.10) based on the MY system. The DNA profiles of ten sets of simulated male-male DNA mixtures ranging from 1:10–1:2 were generated by the MY system. As presented in Supplementary Table S18, two different Y-DNA haplogroups were detected in most of the male-male DNA mixtures (detailed Y-DNA haplogroups in Supplementary Table S19). In these male-male DNA mixtures, the number of Y-STR loci with two different alleles ranged from 6 to 18. For the 220 detected MH loci in the simulated DNA mixtures, the proportions of MHallele=1, MHallele=2, MHallele=3, and MHallele=4 were 0.91%, 14.55%, 47.27%, and 37.27%, respectively.
3.2.1 Regression analyses between the AMR and RDoC
(1)
Y-SNPs/STRs and MHallele=4
The mean RDoC values and SD values of Y-SNPs, Y-STRs, and MHallele=4 for each AMR group of 1:10–1:2 are listed in Supplementary Table S20. Each AMR group was composed of 58–146 RDoC values, which was in accordance with a normal distribution (p > 0.05). As shown in Fig. 3, the correlation coefficient (r) values between AMR and RDoC values were 0.9806, 0.9446, and 0.9710 for Y-SNPs, Y-STRs, and MHallele=4, respectively. We established linear regression equations between the AMR and RDoC values for Y-SNPs, Y-STRs, and MHallele=4. In addition, the 95% CI of the AMR (i.e., E(AMR)) could be obtained.
Fig. 3Regression analyses between the actual mixing ratio (AMR) and DoC ratio (RDoC) for Y-SNPs (A), Y-STRs (B), and MHallele=4 (C) based on ten sets of simulated male-male DNA mixtures (1:10–1:2). (N, total number of generated RDoC values; r, correlation coefficient; R2, coefficient of determination.).
There are a total of seven different genotype patterns for MHallele=3 (patterns α, β, and γ) and MHallele=2 (patterns δ, ε, ζ, and η) in two-person DNA mixtures. Each AMR group of seven genotype patterns followed a normal distribution (p > 0.05, Supplementary Table S21), consisting of 6–70 RDoC values (in addition to pattern η lacking measured data). Except for patterns α and η (tending to form a straight line), highly positive Spearman correlations (between AMR and RDoC) were observed for the genotype patterns (r range: 0.9658–0.9796, Table 2). In addition, we used different regression analyses (linear, quadratic, and cubic models) to establish the relationships between the RDoC and AMR for seven genotype patterns (Table 2 and Fig. 4A, B).
Table 2Correlations between actual mixing ratio (AMR) and DoC ratio (RDoC) for seven genotype patterns of MHallele=3 and MHallele=2 based on ten sets of simulated male-male DNA mixtures (1:10–1:2).
N, total number of RDoC values; r, correlation coefficient; R2, coefficient of determination; * , Numerical relationships of pattern η based on theoretical derivation by lacking of measured values; #, regression equation in bold and 95% prediction interval (PI) of RDoC in normal.
Fig. 4Regression analyses between the actual mixing ratio (AMR) and DoC ratio (RDoC) for different genotype patterns of MHallele=3 (A) and MHallele=2 (B) and RDoC intersections of pairwise genotype patterns of MHallele=3 (C) and MHallele=2 (D) based on ten sets of simulated male-male DNA mixtures (1:10–1:2).m (Patterns α, β, and γ for MHallele=3 and patterns δ, ε, ζ, and η for MHallele=2. Even though the deduced application range of pattern β is 0–1:2.10, the allele distribution of ten sets of simulations also follows pattern β at 1:2.).
Based on the above regression analyses, the relationships between AMR and RDoC values have been established for Y-SNPs/STRs and MHs at 1:10–1:2 mixing ratios of two-person DNA mixtures. As shown in Table 2, the RDoC ranges of seven genotype patterns are calculated according to different regression equations and the AMR range (1:10–1:2). The RDoC intersections are obtained by the overlaps of pairwise RDoC ranges. For Y-SNPs/STRs and pattern MHallele=4, there is no RDoC intersection (Fig. 3). For MHallele=3 and MHallele=2, Fig. 4C (MHallele=3) and D (MHallele=2) present four different RDoC intersections of seven genotype patterns. The detailed RDoC intersections are α ∩ β = [1.0727, 1.1224], η ∩ δ = [0.8100, 0.9014], δ ∩ ζ = [0.4341, 0.5398], and ζ ∩ ε = [0.0544, 0.2246], which require further recognition.
3.2.3 Pattern recognition of each genotype
(1)
MHallele=2and MHallele=3
As shown in Fig. 4C, D, the genotype patterns need to be further examined in the RDoC intersections. In contrast, the genotype patterns are clear outside the RDoC intersections.
The RDoC intersections have two main components: the nonoverlapping and overlapping areas (Fig. 5). Fig. 5A presents the RDoC intersection of patterns α and β (α ∩ β = [1.0727, 1.1224]). When the AMR range is 1:7.16–1:2 (the nonoverlapping area), the genotype pattern is recognized as pattern α. When the AMR range is 1:10–1:7.16 (the overlapping area), the genotype pattern is uncertain (pattern α or β), which requires LR-based probabilistic genotyping (PG) systems (EuroForMix, DNAStatistX and STRmix™) to infer the major and minor contributors’ genotypes [
]. In total, there were five different overlapping areas within the RDoC intersections for which these LR-based methods were needed for inference (Fig. 5A for MHallele=3 and Fig. 5B-D for MHallele=2). In addition, when the AMR values of two-person DNA mixtures are between 1:4.98 and 1:2.39, there are no overlapping areas for seven genotype patterns, which means that the genotype patterns of the major and minor contributors could be directly recognized according to the comparison of observed RDoC and RDoC ranges without the LR-based PG systems.
Fig. 5Detailed processes of genotype pattern recognition (GPR) in different RDoC intersections of MHallele=3 and MHallele=2. A. GPR in the RDoC intersection of patterns α and β (α ∩ β); B. GPR in the RDoC intersection of patterns η and δ (η ∩ δ); C. GPR in the RDoC intersection of patterns δ and ζ (δ ∩ ζ); D. GPR in the RDoC intersection of patterns ζ and ε (ζ ∩ ε).
In two-person DNA mixtures with 1:10–1:2 mixing ratios, the patterns MHallele=4 and MHallele=1 are unique genotype patterns (Fig. 2). Therefore, GPR is not needed.
In summary, based on the MY system, the two-person DNA mixtures (application range: 1:10–1:2) can be deconvoluted by GPR (Fig. 6). When the DNA profiles of a two-person mixture with unknown AMRs are generated by the MY system, the mixture type can be determined with a high probability by Y-chromosomal genetic markers. The Y-chromosomal haplotypes could provide additional clues for investigation, especially for sexual assault cases (e.g., rape and sodomy). The MH markers of the MY system are used for human individual identification. According to the allele number (1–4 alleles), the MH type of each MH locus can be determined. Due to the 22 selected MHs having relatively high average Ae values (8.32 in 1KG and 7.17 in GDH individuals) and the determinate pattern MHallele=4 of two-person mixtures, we can use the mean RDoC value of the MHallele=4 loci to obtain the E(AMR) of the two-person DNA mixture (Fig. 3). In addition, the E(AMR) can also be obtained using the of Y-chromosomal genetic markers (Y-SNPs/STRs) in male-male mixtures. If the E(AMR) is located in the application range of GPR (1:10–1:2), the two-person DNA mixture can be deconvoluted. For each observed RDoC value of MHallele=2 and MHallele=3 loci, RDoC intersections are needed for further recognition. If the observed RDoC is located within the RDoC intersections, there are two different scenarios: (1) if the E(AMR) is located in nonoverlapping areas, the genotype pattern of the MH locus can be recognized according to the comparison of different RDoC ranges; (2) if the E(AMR) is located in overlapping areas, LR-based PG systems are needed for inference of major and minor contributors’ genotypes. If the observed RDoC is located outside the RDoC intersections (as in the case of nonoverlapping areas), the genotype pattern of the MH locus can be recognized directly by comparing different RDoC ranges. Thus, the major and minor contributors’ genotypes in the two-person mixture can be obtained for further individual identification.
Fig. 6Detailed processes of two-person DNA mixture (1:10–1:2) deconvolution based on the MY system.
Overall, the two-person DNA mixtures (application range: 1:10–1:2) can be deconvoluted using the GPR strategy. Mixture deconvolution has been a persistent challenge in forensic DNA analysis, and in this study, we focused on the relatively simple case of a two-person DNA mixture to find an effective way to deconvolute DNA mixtures. Even though GPR is unable to deconvolute balanced (1:1) and extremely unbalanced (> 1:10) two-person DNA mixtures, it represents a small step forward in utilizing DoC information for DNA mixture deconvolution (from pattern recognition to genotype inference). Low-quality, degraded, and casework-like mixed samples should be further validated by GPR. For the MY system, more region-specific Y-DNA haplogroups and RM Y-STRs could be considered for population-specific human identification and regional forensic genealogy. The combined utilization of Y-chromosomal genetic markers and MHs can provide more useful information for crime scene investigations, especially for crime scenes with male suspects.
4. Conclusion
Combining Y-SNP/STR and MH genetic markers for mixed traces with male contributors is beneficial for familial searching, paternal/kinship determination, and mixture deconvolution. In the present study, we developed a novel MPS-based MY system consisting of 114 Y-SNPs (with twelve dominant Y-DNA haplogroups), 45 Y-STRs (μ < 5 × 10−3 and N-1 stutter < 0.09), and 22 MHs (Ae > 5, In > 0.185, ACR > 0.91, and d > 10 Mb). For the 22 independent MHs in the MY system, the Ae ranged from 3.62 to 14.72, with an average of 7.17 in GDH individuals. The CDP was 1–5.00 × 10−31, and the CPEduo and CPEtrio were 1–5.00 × 10−8 and 1–4.85 × 10−12, respectively. In addition, we proposed a GPR method for two-person DNA mixtures based on the MY system. We integrated 26 different genotype combinations into nine genotype patterns and validated the application range (1:10–1:2) of GPR using ten sets of simulated male-male DNA mixtures. The regression relationships between AMR and RDoC were established for different genetic markers and genotype patterns. For five overlapping areas within the RDoC intersections, LR-based methods are needed to infer the genotypes of the major and minor contributors. In the nonoverlapping areas (the very dominant areas outside the RDoC intersections and nonoverlapping areas of the RDoC intersections), the genotype patterns can be recognized by comparing observed RDoC and RDoC ranges with the assistance of E(AMR). In conclusion, based on the MY system, two-person DNA mixtures (1:10–1:2) could be deconvoluted using the GPR strategy for individual identification.
Availability of data and material
The raw data for this article and the in-house scripts are available upon reasonable request to the corresponding authors.
Conflicts of interest
The authors declare that they have no conflicts of interest.
Acknowledgements
This study benefited from the valuable comments of Prof. Bofeng Zhu (Southern Medical University), Prof. Feng Chen (Nanjing Medical University), Prof. Weibo Liang (Sichuan University), Prof. Jianye Ge (University of North Texas Health Science Center), Prof. Peng Chen (Nanjing Medical University), and Fang Zhao (Shanxi University).
The authors sincerely thank all the volunteers who contributed samples for this study and Homgen BioTech. for technical assistance. This study was supported by the Program of Hainan Association for Science and Technology Plans to Youth R&D Innovation (QCXM201705); National Undergraduate Innovation and Entrepreneurship Training Program (No. 201911810008 and No. 201911810023); Shanghai Key Laboratory of Forensic Medicine (Academy of Forensic Science) Open Project Foundation (No. KF1812); Science Foundation of the School of Forensic Medicine, Southern Medical University (No. 2021KY02); and National Natural Science Foundation of China (NSFC, No. 81671865, No. 81971786, and No. 32070576).
AmpFlSTR profiler Plus short tandem repeat DNA analysis of casework samples, mixture samples, and nonhuman DNA samples amplified under reduced PCR volume conditions (25 microL).
Developmental validation of the AmpFlSTR(R) NGM SElect PCR amplification kit: a next-generation STR multiplex with the SE33 locus. Forensic science international.
Massively parallel sequence data of 31 autosomal STR loci from 496 Spanish individuals revealed concordance with CE-STR technology and enhanced discrimination power.
A highly polymorphic panel consisting of microhaplotypes and compound markers with the NGS and its forensic efficiency evaluations in Chinese two groups.
Validation of the Verogen ForenSeq DNA Signature Prep kit/Primer Mix B for phenotypic and biogeographical ancestry predictions using the Micro MiSeq(R) Flow Cells. Forensic science international.
Developing and population analysis of a new multiplex panel of 18 microhaplotypes and compound markers using next generation sequencing and its application in the Shaanxi Han population.
DNA mixtures interpretation - a proof-of-concept multi-software comparison highlighting different probabilistic methods’ performances on challenging samples.
A GT-seq panel for walleye (Sander vitreus) provides important insights for efficient development and implementation of amplicon panels in non-model organisms.
Byrska-Bishop M., Evani US, Zhao X. et al. (2021) High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. bioRxiv: 2021.02.06.430068. doi: 10.1101/2021.02.06.430068.
Forensic characteristics and phylogenetic analyses of one branch of Tai-Kadai language-speaking Hainan Hlai (Ha Hlai) via 23 autosomal STRs included in the Huaxia(.) Platinum System.
Genetic diversity, forensic characteristics and phylogenetic analysis of the Qiongzhong aborigines residing in the tropical rainforests of Hainan Island via 19 autosomal STRs.
Considering the flanking region variants of nonbinary SNP and phenotype-informative SNP to constitute 30 microhaplotype loci for increasing the discriminative ability of forensic applications.