Advertisement
Research Article| Volume 59, 102702, July 2022

Application of machine learning for ancestry inference using multi-InDel markers

  • Author Footnotes
    1 These authors contribute equally to this work.
    Kuan Sun
    Correspondence
    Corresponding author at: Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China.
    Footnotes
    1 These authors contribute equally to this work.
    Affiliations
    Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China

    Department of Fetal Medicine and Prenatal Diagnosis Center, Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine, 2699 West Gaoke Rd, Shanghai 201204, China

    Shanghai Key Laboratory of Maternal Fetal Medicine, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai 200092, China
    Search for articles by this author
  • Yining Yao
    Affiliations
    Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
    Search for articles by this author
  • Libing Yun
    Affiliations
    West China School of Basic Medical Science and Forensic Medicine, Sichuan University, Chengdu, Sichuan, China
    Search for articles by this author
  • Chen Zhang
    Affiliations
    Obstetrics and Gynecology Hospital, Institute of Reproduction and Development, Fudan University, Shanghai 200011, China
    Search for articles by this author
  • Jianhui Xie
    Affiliations
    Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
    Search for articles by this author
  • Xiaoqin Qian
    Affiliations
    Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
    Search for articles by this author
  • Qiqun Tang
    Affiliations
    Department of Biochemistry and Molecular Biology, Shanghai Medical College of Fudan University, Shanghai 200032, China
    Search for articles by this author
  • Author Footnotes
    1 These authors contribute equally to this work.
    Luming Sun
    Correspondence
    Corresponding author at: Department of Fetal Medicine and Prenatal Diagnosis Center, Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine, 2699 West Gaoke Rd, Shanghai 201204, China.
    Footnotes
    1 These authors contribute equally to this work.
    Affiliations
    Department of Fetal Medicine and Prenatal Diagnosis Center, Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine, 2699 West Gaoke Rd, Shanghai 201204, China

    Shanghai Key Laboratory of Maternal Fetal Medicine, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai 200092, China
    Search for articles by this author
  • Author Footnotes
    1 These authors contribute equally to this work.

      Highlights

      • Well-performing ML models were selected to discern population structure in a complex Asian population.
      • Classical methods for population study were performed for population estimation.
      • Dimensionality reduction strategies were adopted to capture and visualize population structure.
      • Diverse ML methods were assessed for biogeographical ancestry prediction.
      • Various coding schemes were applied to explore the impact of input formats on the accuracy.

      Abstract

      Ancestry inference through population stratification plays an important role in forensic applications. Specifically, ancestry information inferred from forensic DNA evidence can provide vital clues for criminal investigations. Current advances in ancestry inference mostly focus on ancestry informative markers. Hereinto, multi-InDel was proposed as one of the compound markers performing well in complex ancestral classification in the subpopulation of Asia. However, research on analytical methods necessary to make reliable predictions is lacking. The newly proposed compound markers could be assessed with alternative methods. In this study, promising discriminant methods were explored using multi-InDel markers for forensic ancestry inference. As a prerequisite, the adopted multi-InDel markers were assessed by classical methods for population genetics, such as FST analysis, MDS and STRUCTURE. In addition, dimensionality reduction methods and serial reduction strategies were applied for data visualization. Subsequently, machine learning methods, including logistic regression (LR), support vector machine (SVM), k-nearest neighbors (KNN) and extreme gradient boosting (XGBoost), were evaluated by diverse approaches. As the result of multifarious analyses through comparisons and estimations, XGBoost with one-hot encoding was shown to be more effective in population stratification and ancestry inference for challenging cases with admixed populations.

      Keywords

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Forensic Science International: Genetics
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Shriver M.D.
        • Parra E.J.
        • Dios S.
        • Bonilla C.
        • Norton H.
        • Jovel C.
        • Pfaff C.
        • Jones C.
        • Massac A.
        • Cameron N.
        • Baron A.
        • Jackson T.
        • Argyropoulos G.
        • Jin L.
        • Hoggart C.J.
        • McKeigue P.M.
        • Kittles R.A.
        Skin pigmentation, biogeographical ancestry and admixture mapping.
        Hum. Genet. 2003; 112: 387-399
        • Collins-Schramm H.E.
        • Chima B.
        • Morii T.
        • Wah K.
        • Figueroa Y.
        • Criswell L.A.
        • Hanson R.L.
        • Knowler W.C.
        • Silva G.
        • Belmont J.W.
        • Seldin M.F.
        Mexican American ancestry-informative markers: examination of population structure and marker characteristics in European Americans, Mexican Americans, Amerindians and Asians.
        Hum. Genet. 2004; 114: 263-271
        • Rosenberg N.A.
        • Li L.M.
        • Ward R.
        • Pritchard J.K.
        Informativeness of genetic markers for inference of ancestry.
        Am. J. Hum. Genet. 2003; 73: 1402-1422
        • Salari K.
        • Choudhry S.
        • Tang H.
        • Naqvi M.
        • Lind D.
        • Avila P.C.
        • Coyle N.E.
        • Ung N.
        • Nazario S.
        • Casal J.
        • Torres-Palacios A.
        • Clark S.
        • Phong A.
        • Gomez I.
        • Matallana H.
        • Perez-Stable E.J.
        • Shriver M.D.
        • Kwok P.Y.
        • Sheppard D.
        • Rodriguez-Cintron W.
        • Risch N.J.
        • Burchard E.G.
        • Ziv E.
        Genetic admixture and asthma-related phenotypes in Mexican American and Puerto Rican asthmatics.
        Genet. Epidemiol. 2005; 29: 76-86
        • Wilson J.F.
        • Weale M.E.
        • Smith A.C.
        • Gratrix F.
        • Fletcher B.
        • Thomas M.G.
        • Bradman N.
        • Goldstein D.B.
        Population genetic structure of variable drug response.
        Nat. Genet. 2001; 29: 265-269
        • Shriver M.D.
        • Kittles R.A.
        Genetic ancestry and the search for personalized genetic histories.
        Nat. Rev. Genet. 2004; 5: 611-618
        • Shriver M.D.
        • Smith M.W.
        • Jin L.
        • Marcini A.
        • Akey J.M.
        • Deka R.
        • Ferrell R.E.
        Ethnic-affiliation estimation by use of population-specific DNA markers.
        Am. J. Hum. Genet. 1997; 60: 957-964
        • Parra E.J.
        • Marcini A.
        • Akey L.
        • Martinson J.
        • Batzer M.A.
        • Cooper R.
        • Forrester T.
        • Allison D.B.
        • Deka R.
        • Ferrell R.E.
        • Shriver M.D.
        Estimating African American admixture proportions by use of population-specific alleles.
        Am. J. Hum. Genet. 1998; 63: 1839-1851
        • Collins-Schramm H.E.
        • Phillips C.M.
        • Operario D.J.
        • Lee J.S.
        • Weber J.L.
        • Hanson R.L.
        • Knowler W.C.
        • Cooper R.
        • Li H.Z.
        • Seldin M.F.
        Ethnic-difference markers for use in mapping by admixture linkage disequilibrium.
        Am. J. Hum. Genet. 2002; 70: 737-750
        • Kidd K.K.
        • Speed W.C.
        • Pakstis A.J.
        • Furtado M.R.
        • Fang R.
        • Madbouly A.
        • Maiers M.
        • Middha M.
        • Friedlaender F.R.
        • Kidd J.R.
        Progress toward an efficient panel of SNPs for ancestry inference.
        Forensic Sci. Int. Genet. 2014; 10: 23-32
        • Shriver M.D.
        • Mei R.
        • Parra E.J.
        • Sonpar V.
        • Halder I.
        • Tishkoff S.A.
        • Schurr T.G.
        • Zhadanov S.I.
        • Osipova L.P.
        • Brutsaert T.D.
        • Friedlaender J.
        • Jorde L.B.
        • Watkins W.S.
        • Bamshad M.J.
        • Gutierrez G.
        • Loi H.
        • Matsuzaki H.
        • Kittles R.A.
        • Argyropoulos G.
        • Fernandez J.R.
        • Akey J.M.
        • Jones K.W.
        Large-scale SNP analysis reveals clustered and continuous patterns of human genetic variation.
        Hum. Genom. 2005; 2: 81-89
        • Bastos-Rodrigues L.
        • Pimenta J.R.
        • Pena S.D.J.
        The genetic structure of human populations studied through short insertion-deletion polymorphisms.
        Ann. Hum. Genet. 2006; 70: 658-665
        • Pereira R.
        • Phillips C.
        • Pinto N.
        • Santos C.
        • dos Santos S.E.B.
        • Amorim A.
        • Carracedo A.
        • Gusmao L.
        Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.
        PLoS One. 2012; 7
        • Zaumsegel D.
        • Rothschild M.A.
        • Schneider P.M.
        A 21 marker insertion deletion polymorphism panel to study biogeographic ancestry.
        Forensic Sci. Int. Genet. 2013; 7: 305-312
        • Wei Y.L.
        • Wei L.
        • Zhao L.
        • Sun Q.F.
        • Jiang L.
        • Zhang T.
        • Liu H.B.
        • Chen J.G.
        • Ye J.
        • Hu L.
        • Li C.X.
        A single-tube 27-plex SNP assay for estimating individual ancestry and admixture from three continents.
        Int. J. Leg. Med. 2016; 130: 27-37
        • Phillips C.
        • Salas A.
        • Sanchez J.J.
        • Fondevila M.
        • Gomez-Tato A.
        • Alvarez-Dios J.
        • Calaza M.
        • de Cal M.C.
        • Ballard D.
        • Lareu M.V.
        • Carracedo A.
        • Consortium S.
        Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs.
        Forensic Sci. Int. Gen. 2007; 1: 273-280
        • Kidd J.R.
        • Friedlaender F.R.
        • Speed W.C.
        • Pakstis A.J.
        • De La Vega F.M.
        • Kidd K.K.
        Analyses of a set of 128 ancestry informative single-nucleotide polymorphisms in a global set of 119 population samples.
        Investig. Genet. 2011; 2: 1
        • Halder I.
        • Shriver M.
        • Thomas M.
        • Fernandez J.R.
        • Frudakis T.
        A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications.
        Hum. Mutat. 2008; 29: 648-658
        • Enoch M.A.
        • Shen P.H.
        • Xu K.
        • Hodgkinson C.
        • Goldman D.
        Using ancestry-informative markers to define populations and detect population stratification.
        J. Psychopharmacol. 2006; 20: 19-26
        • de la Puente M.
        • Ruiz-Ramirez J.
        • Ambroa-Conde A.
        • Xavier C.
        • Amigo J.
        • Casares de Cal M.A.
        • Gomez-Tato A.
        • Carracedo A.
        • Parson W.
        • Phillips C.
        • Lareu M.V.
        Broadening the applicability of a custom multi-platform panel of microhaplotypes: bio-geographical ancestry inference and expanded reference data.
        Front. Genet. 2020; 11581041
        • Bulbul O.
        • Pakstis A.J.
        • Soundararajan U.
        • Gurkan C.
        • Brissenden J.E.
        • Roscoe J.M.
        • Evsanaa B.
        • Togtokh A.
        • Paschou P.
        • Grigorenko E.L.
        • Gurwitz D.
        • Wootton S.
        • Lagace R.
        • Chang J.
        • Speed W.C.
        • Kidd K.K.
        Ancestry inference of 96 population samples using microhaplotypes.
        Int. J. Leg. Med. 2018; 132: 703-711
        • Jin X.
        • Zhang X.
        • Shen C.
        • Liu Y.
        • Cui W.
        • Chen C.
        • Guo Y.
        • Zhu B.
        A highly polymorphic panel consisting of microhaplotypes and compound markers with the NGS and its forensic efficiency evaluations in Chinese two groups.
        Genes. 11. 2020
        • Cheung E.Y.Y.
        • Phillips C.
        • Eduardoff M.
        • Lareu M.V.
        • McNevin D.
        Performance of ancestry-informative SNP and microhaplotype markers.
        Forensic Sci. Int. Genet. 2019; 43102141
        • Xavier C.
        • de la Puente M.
        • Phillips C.
        • Eduardoff M.
        • Heidegger A.
        • Mosquera-Miguel A.
        • Freire-Aradas A.
        • Lagace R.
        • Wootton S.
        • Power D.
        • Parson W.
        • Lareu M.V.
        • Daniel R.
        Forensic evaluation of the Asia Pacific ancestry-informative MAPlex assay.
        Forensic Sci. Int. Genet. 2020; 48102344
        • Phillips C.
        • McNevin D.
        • Kidd K.K.
        • Lagace R.
        • Wootton S.
        • de la Puente M.
        • Freire-Aradas A.
        • Mosquera-Miguel A.
        • Eduardoff M.
        • Gross T.
        • Dagostino L.
        • Power D.
        • Olson S.
        • Hashiyada M.
        • Oz C.
        • Parson W.
        • Schneider P.M.
        • Lareu M.V.
        • Daniel R.
        MAPlex - a massively parallel sequencing ancestry analysis multiplex for Asia-Pacific populations.
        Forensic Sci. Int. Genet. 2019; 42: 213-226
        • Oldoni F.
        • Kidd K.K.
        • Podini D.
        Microhaplotypes in forensic genetics.
        Forensic Sci. Int. Gen. 2019; 38: 54-69
        • Chen P.
        • Zhu W.
        • Tong F.
        • Pu Y.
        • Yu Y.
        • Huang S.
        • Li Z.
        • Zhang L.
        • Liang W.
        • Chen F.
        Identifying novel microhaplotypes for ancestry inference.
        Int. J. Leg. Med. 2019; 133: 983-988
        • Jin X.Y.
        • Cui W.
        • Chen C.
        • Guo Y.X.
        • Zhang X.R.
        • Xing G.H.
        • Lan J.W.
        • Zhu B.F.
        Developing and population analysis of a new multiplex panel of 18 microhaplotypes and compound markers using next generation sequencing and its application in the Shaanxi Han population.
        Electrophoresis. 2020; 41: 1230-1237
        • Sun K.
        • Ye Y.
        • Luo T.
        • Hou Y.
        Multi-InDel analysis for ancestry inference of sub-populations in China.
        Sci. Rep. 2016; 6: 39797
        • Sun K.
        • Yun L.
        • Zhang C.
        • Shao C.
        • Gao T.
        • Zhao Z.
        • Hou Y.
        • Xie J.
        • Tang Q.
        Evaluation of 12 Multi-InDel markers for forensic ancestry prediction in Asian populations.
        Forensic Sci. Int. Genet. 2019; 43102155
        • Moriot A.
        • Santos C.
        • Freire-Aradas A.
        • Phillips C.
        • Hall D.
        Inferring biogeographic ancestry with compound markers of slow and fast evolving polymorphisms.
        Eur. J. Hum. Genet. 2018; 26: 1697-1707
        • Prive F.
        • Luu K.
        • Blum M.G.B.
        • McGrath J.J.
        • Vilhjalmsson B.J.
        Efficient toolkit implementing best practices for principal component analysis of population genetic data.
        Bioinformatics. 2020; 36: 4449-4457
        • Abraham G.
        • Qiu Y.
        • Inouye M.
        FlashPCA2: principal component analysis of Biobank-scale genotype datasets.
        Bioinformatics. 2017; 33: 2776-2778
        • Agrawal A.
        • Chiu A.M.
        • Le M.
        • Halperin E.
        • Sankararaman S.
        Scalable probabilistic PCA for large-scale genetic variation data.
        PLoS Genet. 2020; 16e1008773
        • Brown D.W.
        • Myers T.A.
        • Machiela M.J.
        PCAmatchR: a flexible R package for optimal case-control matching using weighted principal components.
        Bioinformatics. 2020;
        • Qin H.
        • Zhu X.
        Calibrating population stratification in association analysis.
        Methods Mol. Biol. 1666; 2017: 441-453
        • Pritchard J.K.
        • Stephens M.
        • Donnelly P.
        Inference of population structure using multilocus genotype data.
        Genetics. 2000; 155: 945-959
        • Tang H.
        • Peng J.
        • Wang P.
        • Risch N.J.
        Estimation of individual admixture: analytical and study design considerations.
        Genet. Epidemiol. 2005; 28: 289-301
        • Alexander D.H.
        • Novembre J.
        • Lange K.
        Fast model-based estimation of ancestry in unrelated individuals.
        Genome Res. 2009; 19: 1655-1664
        • Ahn J.
        • Conkright B.
        • Boca S.M.
        • Madhavan S.
        POPSTR: inference of admixed population structure based on single-nucleotide polymorphisms and copy number variations.
        J. Comput. Biol. 2018; 25: 417-429
        • Chen S.
        • Ghandikota S.
        • Gautam Y.
        • Mersha T.B.
        MI-MAAP: marker informativeness for multi-ancestry admixed populations.
        BMC Bioinform. 2020; 21: 131
        • Cheng J.Y.
        • Mailund T.
        • Nielsen R.
        Fast admixture analysis and population tree estimation for SNP and NGS data.
        Bioinformatics. 2017; 33: 2148-2155
        • Tvedebrink T.
        • Eriksen P.S.
        Inference of admixed ancestry with Ancestry Informative Markers.
        Forensic Sci. Int. Genet. 2019; 42: 147-153
        • Jin Y.M.
        • Schaffer A.A.
        • Feolo M.
        • Holmes J.B.
        • Kattman B.L.
        GRAF-pop: a fast distance-based method to infer subject ancestry from multiple genotype datasets without principal components analysis.
        G3-Genes Genomes Genet. 2019; 9: 2447-2461
        • Kuismin M.O.
        • Ahlinder J.
        • Sillanpaa M.J.
        CONE: community oriented network estimation is a versatile framework for inferring population structure in large-scale sequencing data.
        G3-Genes Genomes Genet. 2017; 7: 3359-3377
        • Sidey-Gibbons J.A.M.
        • Sidey-Gibbons C.J.
        Machine learning in medicine: a practical introduction.
        BMC Med. Res. Methodol. 2019; 19: 64
        • Eraslan G.
        • Avsec Z.
        • Gagneur J.
        • Theis F.J.
        Deep learning: new computational modelling techniques for genomics.
        Nat. Rev. Genet. 2019; 20: 389-403
        • Mathai N.
        • Kirchmair J.
        Similarity-based methods and machine learning approaches for target prediction in early drug discovery: performance and scope.
        Int. J. Mol. Sci. 2020; 21
        • Buchlak Q.D.
        • Esmaili N.
        • Leveque J.C.
        • Farrokhi F.
        • Bennett C.
        • Piccardi M.
        • Sethi R.K.
        Machine learning applications to clinical decision support in neurosurgery: an artificial intelligence augmented systematic review.
        Neurosurg. Rev. 2020; 43: 1235-1253
        • Misic V.V.
        • Gabel E.
        • Hofer I.
        • Rajaram K.
        • Mahajan A.
        Machine learning prediction of postoperative emergency department hospital readmission.
        Anesthesiology. 2020; 132: 968-980
        • Wei H.H.
        • Yang W.
        • Tang H.
        • Lin H.
        The development of machine learning methods in cell-penetrating peptides identification: a brief review.
        Curr. Drug Metab. 2019; 20: 217-223
        • Zhang M.
        • Su Q.
        • Lu Y.
        • Zhao M.
        • Niu B.
        Application of machine learning approaches for protein-protein interactions prediction.
        Med. Chem. 2017; 13: 506-514
        • Li Y.
        • Pu F.
        • Wang J.
        • Zhou Z.
        • Zhang C.
        • He F.
        • Ma Z.
        • Zhang J.
        Machine learning methods in prediction of protein palmitoylation sites: a brief review.
        Curr. Pharm. Des. 2021; 27: 2189-2198
        • Wang J.
        • Zhang X.
        • Cheng L.
        • Luo Y.
        An overview and metanalysis of machine and deep learning-based CRISPR gRNA design tools.
        RNA Biol. 2020; 17: 13-22
        • Wu Y.
        • Fang Y.
        Stroke prediction with machine learning methods among older Chinese.
        Int. J. Environ. Res. Public Health. 2020; 17
        • Heo J.
        • Yoon J.G.
        • Park H.
        • Kim Y.D.
        • Nam H.S.
        • Heo J.H.
        Machine learning-based model for prediction of outcomes in acute stroke.
        Stroke. 2019; 50: 1263-1265
        • Uddin S.
        • Khan A.
        • Hossain M.E.
        • Moni M.A.
        Comparing different supervised machine learning algorithms for disease prediction.
        BMC Med. Inf. Decis. Mak. 2019; 19: 281
        • Bzdok D.
        • Meyer-Lindenberg A.
        Machine learning for precision psychiatry: opportunities and challenges.
        Biol. Psychiatry Cogn. Neurosci. Neuroimaging. 2018; 3: 223-230
        • Lynch C.M.
        • Abdollahi B.
        • Fuqua J.D.
        • de Carlo A.R.
        • Bartholomai J.A.
        • Balgemann R.N.
        • van Berkel V.H.
        • Frieboes H.B.
        Prediction of lung cancer patient survival via supervised machine learning classification techniques.
        Int. J. Med. Inf. 2017; 108: 1-8
        • Lin C.H.
        • Hsu K.C.
        • Johnson K.R.
        • Fann Y.C.
        • Tsai C.H.
        • Sun Y.
        • Lien L.M.
        • Chang W.L.
        • Chen P.L.
        • Lin C.L.
        • Hsu C.Y.
        • I. Taiwan Stroke Registry
        Evaluation of machine learning methods to stroke outcome prediction using a nationwide disease registry.
        Comput. Methods Prog. Biomed. 2020; 190105381
        • Sejnowski T.J.
        The unreasonable effectiveness of deep learning in artificial intelligence.
        Proc. Natl. Acad. Sci. USA. 2020; 117: 30033-30038
        • Jordan M.I.
        • Mitchell T.M.
        Machine learning: trends, perspectives, and prospects.
        Science. 2015; 349: 255-260
        • Bulbul O.
        • Speed W.C.
        • Gurkan C.
        • Soundararajan U.
        • Rajeevan H.
        • Pakstis A.J.
        • Kidd K.K.
        Improving ancestry distinctions among Southwest Asian populations.
        Forensic Sci. Int. Gen. 2018; 35: 14-20
        • Li C.X.
        • Pakstis A.J.
        • Jiang L.
        • Wei Y.L.
        • Sun Q.F.
        • Wu H.
        • Bulbul O.
        • Wang P.
        • Kang L.L.
        • Kidd J.R.
        • Kidd K.K.
        A panel of 74 AISNPs: Improved ancestry inference within Eastern Asia.
        Forensic Sci. Int. Gen. 2016; 23: 101-110
        • Jung J.Y.
        • Kang P.W.
        • Kim E.
        • Chacon D.
        • Beck D.
        • McNevin D.
        Ancestry informative markers (AIMs) for Korean and other East Asian and South East Asian populations.
        Int. J. Leg. Med. 2019; 133: 1711-1719
        • Gao T.Z.
        • Yun L.B.
        • Gu Y.
        • He W.
        • Wang Z.
        • Hou Y.P.
        Phylogenetic analysis and forensic characteristics of 12 populations using 23 Y-STR loci.
        Forensic Sci. Int. Gen. 2015; 19: 130-133
        • Wright S.
        The genetical structure of populations.
        Ann. Eugen. 1951; 15: 323-354
        • Excoffier L.
        • Lischer H.E.
        Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
        Mol. Ecol. Resour. 2010; 10: 564-567
        • Armonk N.
        • SPSS I.B.M.
        Statistics for Windows.
        in: Corp I. IBM Corp. 2016
        • Pedregosa F.
        • Varoquaux G.
        • Gramfort A.
        • Michel V.
        • Thirion B.
        • Grisel O.
        • Blondel M.
        • Prettenhofer P.
        • Weiss R.
        • Dubourg V.
        • Vanderplas J.
        • Passos A.
        • Cournapeau D.
        • Brucher M.
        • Perrot M.
        • Duchesnay E.
        Scikit-learn: machine learning in Python.
        J. Mach. Learn. Res. 2011; 12: 2825-2830
        • Ben-Hur A.
        • Weston J.
        A user’s guide to support vector machines.
        Methods Mol. Biol. 2010; 609: 223-239
      1. sklearn.linear_model.LogisticRegression. 〈https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression〉.

      2. sklearn.svm.SVC. 〈https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC〉.

      3. sklearn.neighbors.KNeighborsClassifier. 〈https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier〉.

      4. XGBoost Parameters. 〈https://xgboost.readthedocs.io/en/latest/parameter.html>.

      5. sklearn.metrics.f1_score. 〈https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html〉.

        • Coop G.
        • Pickrell J.K.
        • Novembre J.
        • Kudaravalli S.
        • Li J.
        • Absher D.
        • Myers R.M.
        • Cavalli-Sforza L.L.
        • Feldman M.W.
        • Pritchard J.K.
        The role of geography in human adaptation.
        PLoS Genet. 2009; 5e1000500
      6. INSIDE THE NEW RUSSIA. 〈https://www.insidethenewrussia.com/adygei/〉.

        • Phillips C.
        • Freire Aradas A.
        • Kriegel A.K.
        • Fondevila M.
        • Bulbul O.
        • Santos C.
        • Serrulla Rech F.
        • Perez Carceles M.D.
        • Carracedo A.
        • Schneider P.M.
        • Lareu M.V.
        Eurasiaplex: a forensic SNP assay for differentiating European and South Asian ancestries.
        Forensic Sci. Int. Genet. 2013; 7: 359-366
        • Popova S.N.
        • Slominsky P.A.
        • Pocheshnova E.A.
        • Balanovskaya E.V.
        • Tarskaya L.A.
        • Bebyakova N.A.
        • Bets L.V.
        • Ivanov V.P.
        • Livshits L.A.
        • Khusnutdinova E.K.
        • Spitcyn V.A.
        • Limborska S.A.
        Polymorphism of trinucleotide repeats in loci DM, DRPLA and SCA1 in East European populations.
        Eur. J. Hum. Genet. 2001; 9: 829-835
        • Verbenko D.A.
        • Pogoda T.V.
        • Spitsyn V.A.
        • Mikulich A.I.
        • Bets L.V.
        • Bebyakova N.A.
        • Ivanov V.P.
        • Abolmasov N.N.
        • Pocheshkhova E.A.
        • Balanovskaya E.V.
        • Tarskaya L.A.
        • Sorensen M.V.
        • Limborska S.A.
        Apolipoprotein B 3′-VNTR polymorphism in Eastern European populations.
        Eur. J. Hum. Genet. 2003; 11: 444-451
        • de Knijff P.
        Messages through bottlenecks: on the combined use of slow and fast evolving polymorphic markers on the Human Y Chromosome.
        Am. J. Hum. Genet. 2000; 67: 1055-1061