Highlights
- •Y-SNP Pedigree Tagging System combined with the 41Y panel are utilized to build a regional Y-DNA database.
- •Y-SNP haplogroups show ability in discriminating some near-matching Y-STR haplotype pairs under the 41Y panel.
- •Machine-learning based approaches are employed to classify Y-STR haplotypes from different Y-SNP haplogroups.
- •Haplogroup prediction and Y-SNP-based discrimination of Y-STR haplotype pair could be utilized to improve Y-STR databases.
Abstract
Improving the resolution of the current widely used Y-chromosomal short tandem repeat
(Y-STR) dataset is of great importance for forensic investigators, and the current
approach is limited, except for the addition of more Y-STR loci. In this research,
a regional Y-DNA database was investigated to improve the Y-STR haplotype resolution
utilizing a Y-SNP Pedigree Tagging System that includes 24 Y-chromosomal single nucleotide
polymorphism (Y-SNP) loci. This pilot study was conducted in the Chinese Yunnan Zhaoyang
Han population, and 3473 unrelated male individuals were enrolled. Based on data on
the male haplogroups under different panels, the matched or near-matching (NM) Y-STR
haplotype pairs from different haplogroups indicated the critical roles of haplogroups
in improving the regional Y-STR haplotype resolution. A classic median-joining network
analysis was performed using Y-STR or Y-STR/Y-SNP data to reconstruct population substructures,
which revealed the ability of Y-SNPs to correct misclassifications from Y-STRs. Additionally,
population substructures were reconstructed using multiple unsupervised or supervised
dimensionality reduction methods, which indicated the potential of Y-STR haplotypes
in predicting Y-SNP haplogroups. Haplogroup prediction models were built based on
nine publicly accessible machine-learning (ML) approaches. The results showed that
the best prediction accuracy score could reach 99.71% for major haplogroups and 98.54%
for detailed haplogroups. Potential influences on prediction accuracy were assessed
by adjusting the Y-STR locus numbers, selecting Y-STR loci with various mutabilities,
and performing data processing. ML-based predictors generally presented a better prediction
accuracy than two available predictors (Nevgen and EA-YPredictor). Three tree models
were developed based on the Yfiler Plus panel with unprocessed input data, which showed
their strong generalization ability in classifying various Chinese Han subgroups (validation
dataset). In conclusion, this study revealed the significance and application prospects
of Y-SNP haplogroups in improving regional Y-STR databases. Y-SNP haplogroups can
be used to discriminate NM Y-STR haplotype pairs, and it is important for forensic
Y-STR databases to develop haplogroup prediction tools to improve the accuracy of
biogeographic ancestry inferences.
Keywords
To read this article in full you will need to make a payment
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to Forensic Science International: GeneticsAlready a print subscriber? Claim online access
Already an online subscriber? Sign in
Register: Create an account
Institutional Access: Sign in to ScienceDirect
References
- The Y chromosome as the most popular marker in genetic genealogy benefits interdisciplinary research.Hum. Genet. 2017; 136: 559-573
- A nomenclature system for the tree of human Y-chromosomal binary haplogroups.Genome Res. 2002; 12: 339-348
- Messages through bottlenecks: on the combined use of slow and fast evolving polymorphic markers on the human Y chromosome.Am. J. Hum. Genet. 2000; 67: 1055-1061
- Human Y-chromosome variation in the genome-sequencing era.Nat. Rev. Genet. 2017; 18: 485-497
- Y-chromosome evolution: emerging insights into processes of Y-chromosome degeneration.Nat. Rev. Genet. 2013; 14: 113-124
- Evaluation of 13 short tandem repeat loci for use in personal identification applications.Am. J. Hum. Genet. 1994; 55: 175-189
- Chromosome Y microsatellites: population genetic and evolutionary aspects.Int J. Leg. Med. 1997; 110: 134-140
- Evaluation of Y-chromosomal STRs: a multicenter study.Int J. Leg. Med. 1997; 110: 125-133
- Development and validation of the AmpFlSTR Yfiler PCR amplification kit: a male specific, single amplification 17 Y-STR multiplex system.J. Forensic Sci. 2006; 51: 64-75
- Developmental validation of the PowerPlex® Y23 System: a single multiplex Y-STR analysis system for casework and database samples.Forensic Sci. Int Genet. 2013; 7: 240-250
- An extensive analysis of Y-chromosomal microsatellite haplotypes in globally dispersed human populations.Am. J. Hum. Genet. 2001; 68: 990-1018
- Improving global and regional resolution of male lineage differentiation by simple single-copy Y-chromosomal short tandem repeat polymorphisms.Forensic Sci. Int Genet. 2009; 3: 205-213
- Online reference database of European Y-chromosomal short tandem repeat (STR) haplotypes.Forensic Sci. Int. 2001; 118: 106-113
- Online Y-chromosomal short tandem repeat haplotype reference database (YHRD) for U.S. populations.J. Forensic Sci. 2002; 47: 513-519
- Asian online Y-STR Haplotype Reference Database.Leg. Med (Tokyo). 2003; 5: S160-S163
- Y chromosome haplotype reference database (YHRD): update.Forensic Sci. Int Genet. 2007; 1: 83-87
- The new Y chromosome haplotype reference database.Forensic Sci. Int Genet. 2015; 15: 43-48
- Estimating Y chromosome specific microsatellite mutation frequencies using deep rooting pedigrees.Hum. Mol. Genet. 1997; 6: 799-803
- Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications.Am. J. Hum. Genet. 2010; 87: 341-353
- A new future of forensic Y-chromosome analysis: rapidly mutating Y-STRs for differentiating male relatives and paternal lineages.Forensic Sci. Int Genet. 2012; 6: 208-218
- Improving empirical evidence on differentiating closely related men with RM Y-STRs: A comprehensive pedigree study from Pakistan.Forensic Sci. Int Genet. 2016; 25: 45-51
- Rapidly mutating Y-STRs multiplex genotyping panel to investigate UAE population.Forensic Sci. Int Genet Suppl. Ser. 2013; 4: e200-e201
- Multiplex assay development and mutation rate analysis for 13 RM Y-STRs in Chinese Han population.Int J. Leg. Med. 2017; 131: 345-350
- Filipino DNA Variation at 36 Y-chromosomal Short Tandem Repeat (STR) Marker Units.Philipp. J. Sci. 2019; 148 (43-42)
- Identification and characterization of novel rapidly mutating Y-chromosomal short tandem repeat markers.Hum. Mutat. 2020; 41: 1680-1696
- Developmental validation of the Yfiler Platinum PCR Amplification Kit for forensic genetic caseworks and databases.Electrophoresis. 2021; 42: 126-133
- A recent insertion of an alu element on the Y chromosome is a useful marker for human population studies.Mol. Biol. Evol. 1994; 11: 749-761
- Construction of human Y-chromosomal haplotypes using a new polymorphic A to G transition.Hum. Mol. Genet. 1994; 3: 2159-2161
- Detection of numerous Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography.Genome Res. 1997; 7: 996-1005
- Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree.Curr. Biol. 2009; 19: 1453-1457
- Population-Scale Sequencing Data Enable Precise Estimates of Y-STR Mutation Rates.Am. J. Hum. Genet. 2016; 98: 919-933
- Typing of Y chromosome single nucleotide polymorphisms in a Japanese population by a multiplexed single nucleotide primer extension reaction.Leg. Med (Tokyo). 2002; 4: 202-206
- Multiplex PCR and minisequencing of SNPs--a model with 35 Y chromosome SNPs.Forensic Sci. Int. 2003; 137: 74-84
- Founders, drift, and infidelity: the relationship between Y chromosome diversity and patrilineal surnames.Mol. Biol. Evol. 2009; 26: 1093-1102
- An efficient multiplex genotyping approach for detecting the major worldwide human Y-chromosome haplogroups.Int J. Leg. Med. 2011; 125: 879-885
- A multiplex SNP assay for the dissection of human Y-chromosome haplogroup O representing the major paternal lineage in East and Southeast Asia.J. Hum. Genet. 2012; 57: 65-69
- Multiplex genotyping assays for fine-resolution subtyping of the major human Y-chromosome haplogroups E, G, I, J, and R in anthropological, genealogical, and forensic investigations.Electrophoresis. 2013; 34: 3029-3038
- Identification of new SNPs in native South American populations by resequencing the Y chromosome.Forensic Sci. Int Genet. 2015; 15: 111-114
- Genetic substructure and forensic characteristics of Chinese Hui populations using 157 Y-SNPs and 27 Y-STRs.Forensic Sci. Int Genet. 2019; 41: 11-18
- Forensic characteristics and genetic analysis of both 27 Y-STRs and 143 Y-SNPs in Eastern Han Chinese population.Forensic Sci. Int Genet. 2019; 42: E13-E20
- Next Generation Sequencing Plus (NGS+) with Y-chromosomal Markers for Forensic Pedigree Searches.Sci. Rep. 2017; 7: 11324
- Developmental validation of a custom panel including 165 Y-SNPs for Chinese Y-chromosomal haplogroups dissection using the ion S5 XL system.Forensic Sci. Int Genet. 2019; 38: 70-76
- Forensic Y-SNP analysis beyond SNaPshot: High-resolution Y-chromosomal haplogrouping from low quality and quantity DNA using Ion AmpliSeq and targeted massively parallel sequencing.Forensic Sci. Int Genet. 2019; 41: 93-106
- Title: Developmental validation of Y-SNP pedigree tagging system: A panel via quick ARMS PCR.Forensic Sci. Int Genet. 2020; 46102271
- Y chromosomes shared by descent or state.Archaeogenetics: DNA Popul. prehistory Eur. 2000; : 301-304
- Recent radiation within Y-chromosomal haplogroup R-M269 resulted in high Y-STR haplotype resemblance.Ann. Hum. Genet. 2014; 78: 92-103
- Recent radiation of R-M269 and high Y-STR haplotype resemblance confirmed.Ann. Hum. Genet. 2014; 78: 253-254
- Forensic characteristics and phylogenetic analysis of both Y-STR and Y-SNP in the Li and Han ethnic groups from Hainan Island of China.Forensic Sci. Int Genet. 2019; 39: e14-e20
- Toward male individualization with rapidly mutating y-chromosomal short tandem repeats.Hum. Mutat. 2014; 35: 1021-1032
- Convergence of Y Chromosome STR Haplotypes from Different SNP Haplogroups Compromises Accuracy of Haplogroup Prediction.J. Genet Genom. 2015; 42: 403-407
- Thomas Jefferson’s Y chromosome belongs to a rare European lineage.Am. J. Phys. Anthr. 2007; 132: 584-589
- What's in a name? Y chromosomes, surnames and the genetic genealogy revolution.Trends Genet. 2009; 25: 351-360
- Relating Clans Ao and Aisin Gioro from northeast China by whole Y-chromosome sequencing.J. Hum. Genet. 2019; 64: 775-780
- Y chromosome of Aisin Gioro, the imperial house of the Qing dynasty.J. Hum. Genet. 2015; 60: 295-298
- Whole-sequence analysis indicates that the Y chromosome C2*-Star Cluster traces back to ordinary Mongols, rather than Genghis Khan.Eur. J. Hum. Genet. 2018; 26: 230-237
- Molecular genealogy of Tusi Lu’s family reveals their paternal relationship with Jochi, Genghis Khan’s eldest son.J. Hum. Genet. 2019; 64: 815-820
- Haplogroup Prediction from Y-STR Values Using a Bayesian-Allele- Frequency Approach.J. Genet. Geneal. 2006; 2: 34-39
N.A. Cetkovic Gentula M., Y-DNA Haplogroup Predictor—NevGen., 2015 (accessed 15 august 2021).
- EA-YPredictor: One New Software Developed to Predict Pedigree Haplogroup Based on Y-STR Haplotypes.Forensic Sci. Technol. 2020; 45: 117-124
- Machine-learning approaches for classifying haplogroup from Y chromosome STR data.PLoS Comput. Biol. 2008; 4e1000093
- Predicting haplogroups using a versatile machine learning program (PredYMaLe) on a new mutationally balanced 32 Y-STR multiplex (CombYplex): Unlocking the full potential of the human STR mutation rate spectrum to estimate forensic parameters.Forensic Sci. Int Genet. 2020; 48102342
- DNA polymorphism detectable by restriction endonucleases.Genetics. 1981; 97: 145-163
- Shannon’s equivocation for forensic Y-STR marker selection.Forensic Sci. Int Genet. 2015; 16: 216-225
- Y-chromosomal microsatellite diversity in three culturally defined regions of historical Tibet.Forensic Sci. Int Genet. 2012; 6: 437-446
- A convenient guideline to determine if two Y-STR profiles are from the same lineage.Electrophoresis. 2016; 37: 1659-1668
- The influence of Y-STR locus number and allowed mismatch locus number on pedigree identification in Y-STR database, CHIN.J. FORENSIC MED. 2018; 33: 489-492
- Analysis of Y-STR Mutation into Familial Searching.Forensic Sci. Technol. 2018; 43: 501-504
- Median-joining networks for inferring intraspecific phylogenies.Mol. Biol. Evol. 1999; 16: 37-48
F.T. Ltd, Network 10.2.0.0. User Guide. , 2020 (accessed August 20.2021).
- A comparison of Y-chromosomal lineage dating using either resequencing or Y-SNP plus Y-STR genotyping.Forensic Sci. Int Genet. 2013; 7: 568-572
- Scikit-learn: Machine learning in Python.J. Mach. Learn. Res. 2011; 12: 2825-2830
- Analysis of a complex of statistical variables into principal components.J. Educ. Psychol. 1933; 24: 417-441
- Nonlinear component analysis as a kernel eigenvalue problem.Neural Comput. 1998; 10: 1299-1319
- The use of multiple measurements in taxonomic problems.Ann. Eugen. 1936; 7: 179-188
- The early origins of the logit model, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical.Sciences. 2004; 35: 613-626
- An introduction to kernel and nearest-neighbor nonparametric regression.Am. Stat. 1992; 46: 175-185
- The CN2 induction algorithm.Mach. Learn. 2021;
- Support-vector networks.Mach. Learn. 1995; 20: 273-297
- A comparison of methods for multiclass support vector machines.IEEE Trans. Neural Netw. 2002; 13: 415-425
- Deep learning in neural networks: An overview.Neural Netw. 2015; 61: 85-117
- Random forests.Mach. Learn. 2001; 45: 5-32
- Greedy function approximation: a gradient boosting machine.Ann. Stat. 2001; : 1189-1232
- XGBoost: a scalable tree boosting system.Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery,, San Francisco, California, USA2016: 785-794
- Mutation analysis of 44 Y-STR loci in Han population of Guilin.Int. J. Genet. 2020; 43: 197-201
- An analysis of genetic polymorphisms and mutations at 50 Y-STR loci in Hunan Han population.Chin. J. Forensic Med. 2019; 34: 454-458
- Mutation rates at 42 Y chromosomal short tandem repeats in Chinese Han population in Eastern China.Int J. Leg. Med. 2018; 132: 1317-1319
- Mutations at 63 Y-STR Loci in Shandong Han Population and the Forensic Applicability.Forensic Sci. Technol. 2016; 41: 424-428
- Observation and analysis of mutation of 27 Y-STR loci in Henna Han population.Chin. J. Forensic Med. 2016; 31: 22-26
- Haplotypic polymorphisms and mutation rate estimates of 22 Y-chromosome STRs in the Northern Chinese Han father-son pairs.Sci. Rep. 2018; 8: 7135
- Genetic polymorphisms and mutation rates of 27 Y-chromosomal STRs in a Han population from Guangdong Province, Southern China.Forensic Sci. Int Genet. 2016; 21: 5-9
- Mutation analysis of 13 RM Y-STR loci in Han population from Beijing of China.Int J. Leg. Med. 2019; 133: 59-63
- Genetic Reconstruction and Forensic Analysis of Chinese Shandong and Yunnan Han Populations by Co-Analyzing Y Chromosomal STRs and SNPs.Genes. 11. Basel,, 2020
- Extended Y chromosome investigation suggests postglacial migrations of modern humans into East Asia via the northern route.Mol. Biol. Evol. 2011; 28: 717-727
- Phylogeography of the Y-chromosome haplogroup C in northern Eurasia.Ann. Hum. Genet. 2010; 74: 539-546
- An updated tree of Y-chromosome Haplogroup O and revised phylogenetic positions of mutations P164 and PK4.Eur. J. Hum. Genet. 2011; 19: 1013-1015
- The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades.Mol. Biol. Evol. 2015; 32: 661-673
- Archaeogenetics: DNA and the population prehistory of Europe.McDonald Inst. Monogr. 2000;
- Comparison of linear discriminant analysis and logistic regression for data classification.AIP Conf. Proc. 2013; 1522: 1159-1165
- Genetic Polymorphism study of Y-SNP in Yunnan Wa male Population.Kunming Med. Univ. 2016;
- A counter-clockwise northern route of the Y-chromosome haplogroup N from Southeast Asia towards Europe.Eur. J. Hum. Genet. 2007; 15: 204-211
- Analyses of genetic structure of Tibeto-Burman populations reveals sex-biased admixture in southern Tibeto-Burmans.Am. J. Hum. Genet. 2004; 74: 856-865
- Ouzhuluobu, Basang, Ciwangsangbu, Bianba, Gonggalanzi, T. Wu, H. Chen, H. Shi, B. Su, Genetic evidence of paleolithic colonization and neolithic expansion of modern humans on the tibetan plateau.Mol. Biol. Evol. 2013; 30: 1761-1778
- Inferring human history in East Asia from Y chromosomes.Invest. Genet. 2013; 4: 11
- Genetic Polymorphism of Y Chromosome Haplogroup D-M174 in East Asian Populations.Fa Yi Xue Za Zhi. 2019; 35: 308-313
- The Eurasian heartland: a continental perspective on Y-chromosome diversity.Proc. Natl. Acad. Sci. USA. 2001; 98: 10244-10249
- Y-chromosome evidence of southern origin of the East Asian-specific haplogroup O3-M122.Am. J. Hum. Genet. 2005; 77: 408-419
- Global distribution of Y-chromosome haplogroup C reveals the prehistoric migration routes of African exodus and early settlement in East Asia.J. Hum. Genet. 2010; 55: 428-435
- Simultaneous analysis of hundreds of Y-chromosomal SNPs for high-resolution paternal lineage classification using targeted semiconductor sequencing.Hum. Mutat. 2015; 36: 151-159
- Performance of the ForenSeq(TM) DNA Signature Prep kit on highly degraded samples.Electrophoresis. 2017; 38: 1163-1174
- Evaluation of the Qiagen 140-SNP forensic identification multiplex for massively parallel sequencing.Forensic Sci. Int Genet. 2017; 28: 35-43
- Developmental validation of a custom panel including 273 SNPs for forensic application using Ion Torrent PGM.Forensic Sci. Int Genet. 2017; 27: 50-57
- Analyzing degraded DNA and challenging samples using the ForenSeqTM DNA Signature Prep kit.Sci. Justice. 2020; 60: 243-252
- Utility of the Ion S5TM and MiSeq FGxTM sequencing platforms to characterize challenging human remains.Leg. Med (Tokyo). 2019; 41101623
- Massively parallel sequencing of forensic STRs and SNPs using the Illumina(®) ForenSeqTM DNA Signature Prep Kit on the MiSeq FGxTM Forensic Genomics System.Forensic Sci. Int Genet. 2017; 31: 135-148
- Improved pairwise kinship analysis using massively parallel sequencing.Forensic Sci. Int Genet. 2019; 38: 77-85
- Utility of ForenSeqTM DNA Signature Prep Kit in the research of pairwise 2nd-degree kinship identification.Int J. Leg. Med. 2019; 133: 1641-1650
- Genetic analysis of the Yavapai Native Americans from West-Central Arizona using the Illumina MiSeq FGxTM forensic genomics system.Forensic Sci. Int Genet. 2016; 24: 18-23
- Population and performance analyses of four major populations with Illumina’s FGx Forensic Genomics System.Forensic Sci. Int Genet. 2017; 30: 81-92
- BGISEQ-500RS Sequencing of a 448-plex SNP Panel for Forensic Individual Identification and Kinship Analysis.Forensic Sci. Int. Genet. 2021; 102580
Article info
Publication history
Published online: December 29, 2021
Accepted:
December 27,
2021
Received in revised form:
December 14,
2021
Received:
April 14,
2021
Identification
Copyright
© 2021 Elsevier B.V. All rights reserved.