Highlights
- •Well-performing ML models were selected to discern population structure in a complex Asian population.
- •Classical methods for population study were performed for population estimation.
- •Dimensionality reduction strategies were adopted to capture and visualize population structure.
- •Diverse ML methods were assessed for biogeographical ancestry prediction.
- •Various coding schemes were applied to explore the impact of input formats on the accuracy.
Abstract
Keywords
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to Forensic Science International: GeneticsReferences
- Skin pigmentation, biogeographical ancestry and admixture mapping.Hum. Genet. 2003; 112: 387-399
- Mexican American ancestry-informative markers: examination of population structure and marker characteristics in European Americans, Mexican Americans, Amerindians and Asians.Hum. Genet. 2004; 114: 263-271
- Informativeness of genetic markers for inference of ancestry.Am. J. Hum. Genet. 2003; 73: 1402-1422
- Genetic admixture and asthma-related phenotypes in Mexican American and Puerto Rican asthmatics.Genet. Epidemiol. 2005; 29: 76-86
- Population genetic structure of variable drug response.Nat. Genet. 2001; 29: 265-269
- Genetic ancestry and the search for personalized genetic histories.Nat. Rev. Genet. 2004; 5: 611-618
- Ethnic-affiliation estimation by use of population-specific DNA markers.Am. J. Hum. Genet. 1997; 60: 957-964
- Estimating African American admixture proportions by use of population-specific alleles.Am. J. Hum. Genet. 1998; 63: 1839-1851
- Ethnic-difference markers for use in mapping by admixture linkage disequilibrium.Am. J. Hum. Genet. 2002; 70: 737-750
- Progress toward an efficient panel of SNPs for ancestry inference.Forensic Sci. Int. Genet. 2014; 10: 23-32
- Large-scale SNP analysis reveals clustered and continuous patterns of human genetic variation.Hum. Genom. 2005; 2: 81-89
- The genetic structure of human populations studied through short insertion-deletion polymorphisms.Ann. Hum. Genet. 2006; 70: 658-665
- Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.PLoS One. 2012; 7
- A 21 marker insertion deletion polymorphism panel to study biogeographic ancestry.Forensic Sci. Int. Genet. 2013; 7: 305-312
- A single-tube 27-plex SNP assay for estimating individual ancestry and admixture from three continents.Int. J. Leg. Med. 2016; 130: 27-37
- Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs.Forensic Sci. Int. Gen. 2007; 1: 273-280
- Analyses of a set of 128 ancestry informative single-nucleotide polymorphisms in a global set of 119 population samples.Investig. Genet. 2011; 2: 1
- A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications.Hum. Mutat. 2008; 29: 648-658
- Using ancestry-informative markers to define populations and detect population stratification.J. Psychopharmacol. 2006; 20: 19-26
- Broadening the applicability of a custom multi-platform panel of microhaplotypes: bio-geographical ancestry inference and expanded reference data.Front. Genet. 2020; 11581041
- Ancestry inference of 96 population samples using microhaplotypes.Int. J. Leg. Med. 2018; 132: 703-711
- A highly polymorphic panel consisting of microhaplotypes and compound markers with the NGS and its forensic efficiency evaluations in Chinese two groups.Genes. 11. 2020
- Performance of ancestry-informative SNP and microhaplotype markers.Forensic Sci. Int. Genet. 2019; 43102141
- Forensic evaluation of the Asia Pacific ancestry-informative MAPlex assay.Forensic Sci. Int. Genet. 2020; 48102344
- MAPlex - a massively parallel sequencing ancestry analysis multiplex for Asia-Pacific populations.Forensic Sci. Int. Genet. 2019; 42: 213-226
- Microhaplotypes in forensic genetics.Forensic Sci. Int. Gen. 2019; 38: 54-69
- Identifying novel microhaplotypes for ancestry inference.Int. J. Leg. Med. 2019; 133: 983-988
- Developing and population analysis of a new multiplex panel of 18 microhaplotypes and compound markers using next generation sequencing and its application in the Shaanxi Han population.Electrophoresis. 2020; 41: 1230-1237
- Multi-InDel analysis for ancestry inference of sub-populations in China.Sci. Rep. 2016; 6: 39797
- Evaluation of 12 Multi-InDel markers for forensic ancestry prediction in Asian populations.Forensic Sci. Int. Genet. 2019; 43102155
- Inferring biogeographic ancestry with compound markers of slow and fast evolving polymorphisms.Eur. J. Hum. Genet. 2018; 26: 1697-1707
- Efficient toolkit implementing best practices for principal component analysis of population genetic data.Bioinformatics. 2020; 36: 4449-4457
- FlashPCA2: principal component analysis of Biobank-scale genotype datasets.Bioinformatics. 2017; 33: 2776-2778
- Scalable probabilistic PCA for large-scale genetic variation data.PLoS Genet. 2020; 16e1008773
- PCAmatchR: a flexible R package for optimal case-control matching using weighted principal components.Bioinformatics. 2020;
- Calibrating population stratification in association analysis.Methods Mol. Biol. 1666; 2017: 441-453
- Inference of population structure using multilocus genotype data.Genetics. 2000; 155: 945-959
- Estimation of individual admixture: analytical and study design considerations.Genet. Epidemiol. 2005; 28: 289-301
- Fast model-based estimation of ancestry in unrelated individuals.Genome Res. 2009; 19: 1655-1664
- POPSTR: inference of admixed population structure based on single-nucleotide polymorphisms and copy number variations.J. Comput. Biol. 2018; 25: 417-429
- MI-MAAP: marker informativeness for multi-ancestry admixed populations.BMC Bioinform. 2020; 21: 131
- Fast admixture analysis and population tree estimation for SNP and NGS data.Bioinformatics. 2017; 33: 2148-2155
- Inference of admixed ancestry with Ancestry Informative Markers.Forensic Sci. Int. Genet. 2019; 42: 147-153
- GRAF-pop: a fast distance-based method to infer subject ancestry from multiple genotype datasets without principal components analysis.G3-Genes Genomes Genet. 2019; 9: 2447-2461
- CONE: community oriented network estimation is a versatile framework for inferring population structure in large-scale sequencing data.G3-Genes Genomes Genet. 2017; 7: 3359-3377
- Machine learning in medicine: a practical introduction.BMC Med. Res. Methodol. 2019; 19: 64
- Deep learning: new computational modelling techniques for genomics.Nat. Rev. Genet. 2019; 20: 389-403
- Similarity-based methods and machine learning approaches for target prediction in early drug discovery: performance and scope.Int. J. Mol. Sci. 2020; 21
- Machine learning applications to clinical decision support in neurosurgery: an artificial intelligence augmented systematic review.Neurosurg. Rev. 2020; 43: 1235-1253
- Machine learning prediction of postoperative emergency department hospital readmission.Anesthesiology. 2020; 132: 968-980
- The development of machine learning methods in cell-penetrating peptides identification: a brief review.Curr. Drug Metab. 2019; 20: 217-223
- Application of machine learning approaches for protein-protein interactions prediction.Med. Chem. 2017; 13: 506-514
- Machine learning methods in prediction of protein palmitoylation sites: a brief review.Curr. Pharm. Des. 2021; 27: 2189-2198
- An overview and metanalysis of machine and deep learning-based CRISPR gRNA design tools.RNA Biol. 2020; 17: 13-22
- Stroke prediction with machine learning methods among older Chinese.Int. J. Environ. Res. Public Health. 2020; 17
- Machine learning-based model for prediction of outcomes in acute stroke.Stroke. 2019; 50: 1263-1265
- Comparing different supervised machine learning algorithms for disease prediction.BMC Med. Inf. Decis. Mak. 2019; 19: 281
- Machine learning for precision psychiatry: opportunities and challenges.Biol. Psychiatry Cogn. Neurosci. Neuroimaging. 2018; 3: 223-230
- Prediction of lung cancer patient survival via supervised machine learning classification techniques.Int. J. Med. Inf. 2017; 108: 1-8
- Evaluation of machine learning methods to stroke outcome prediction using a nationwide disease registry.Comput. Methods Prog. Biomed. 2020; 190105381
- The unreasonable effectiveness of deep learning in artificial intelligence.Proc. Natl. Acad. Sci. USA. 2020; 117: 30033-30038
- Machine learning: trends, perspectives, and prospects.Science. 2015; 349: 255-260
- Improving ancestry distinctions among Southwest Asian populations.Forensic Sci. Int. Gen. 2018; 35: 14-20
- A panel of 74 AISNPs: Improved ancestry inference within Eastern Asia.Forensic Sci. Int. Gen. 2016; 23: 101-110
- Ancestry informative markers (AIMs) for Korean and other East Asian and South East Asian populations.Int. J. Leg. Med. 2019; 133: 1711-1719
- Phylogenetic analysis and forensic characteristics of 12 populations using 23 Y-STR loci.Forensic Sci. Int. Gen. 2015; 19: 130-133
- The genetical structure of populations.Ann. Eugen. 1951; 15: 323-354
- Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.Mol. Ecol. Resour. 2010; 10: 564-567
- Statistics for Windows.in: Corp I. IBM Corp. 2016
- Scikit-learn: machine learning in Python.J. Mach. Learn. Res. 2011; 12: 2825-2830
- A user’s guide to support vector machines.Methods Mol. Biol. 2010; 609: 223-239
sklearn.linear_model.LogisticRegression. 〈https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression〉.
sklearn.svm.SVC. 〈https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC〉.
sklearn.neighbors.KNeighborsClassifier. 〈https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier〉.
XGBoost Parameters. 〈https://xgboost.readthedocs.io/en/latest/parameter.html>.
sklearn.metrics.f1_score. 〈https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html〉.
- The role of geography in human adaptation.PLoS Genet. 2009; 5e1000500
INSIDE THE NEW RUSSIA. 〈https://www.insidethenewrussia.com/adygei/〉.
- Eurasiaplex: a forensic SNP assay for differentiating European and South Asian ancestries.Forensic Sci. Int. Genet. 2013; 7: 359-366
- Polymorphism of trinucleotide repeats in loci DM, DRPLA and SCA1 in East European populations.Eur. J. Hum. Genet. 2001; 9: 829-835
- Apolipoprotein B 3′-VNTR polymorphism in Eastern European populations.Eur. J. Hum. Genet. 2003; 11: 444-451
- Messages through bottlenecks: on the combined use of slow and fast evolving polymorphic markers on the Human Y Chromosome.Am. J. Hum. Genet. 2000; 67: 1055-1061