Advertisement

Sequence diversity of the uniparentally transmitted portions of the genome in the resident population of Catalonia

Open AccessPublished:September 27, 2022DOI:https://doi.org/10.1016/j.fsigen.2022.102783

      Highlights

      • Sequenced the whole mtDNA in 808 individuals, 8 Mb of Y chromosome in 399.
      • Samples from population residing in Catalonia, regardless of ancestry.
      • Increased informativeness over mtDNA control region or Y-STR haplotypes.
      • Non-European sequences could represent recent or ancient migrations.
      • ∼22% of the individuals carry at least one heteroplasmic site in their mtDNA.

      Abstract

      Genomic reference databases of residing populations are available in different countries and regions. Since they represent the whole genetic diversity of a geographical region, they have wide applications, from biomedical studies to forensic identifications. Uniparentally transmitted portions of the genome specifically are highly suitable for kinship analyses, mixed DNA cases and geographical ancestry inferences. We have sampled 808 individuals currently residing in Catalonia within the GCAT cohort, from which we have generated 808 high-quality whole mitochondrial DNA (mtDNA) genomes and 399 sequences of the male-specific part of the Y chromosome (MSY). We observe higher genetic diversity than in classical population genetics datasets. We test the robustness of whole sequences for unequivocal identifications, and we found that they have higher resolution than mitochondrial control region and Y chromosome short tandem repeats (Y-STRs), and that most of the variants they present are at low frequencies, increasing the discrimination capacity between individuals. These results confirm the forensic applicability of whole uniparental sequences and provide one of the largest high-quality reference datasets ever published.

      Keywords

      1. Introduction

      Uniparentally transmitted portions of the genome have been extensively used in forensic studies. They are a powerful tool to infer biogeographical origin, kinship or paternity, and even identification. The mitochondrial genome has become of special relevance when dealing with highly-degraded samples (e.g. ancient bones and hair shafts), where the extraction of mitochondrial DNA (mtDNA) is possible even when autosomal DNA is hard to retrieve, because each cell contains a high number of mtDNA copies. Due to the development of new sequencing protocols, complete mtDNA sequences are increasingly available and have proved to outperform control region-based analyses with a higher fine-scale resolution [
      • Nilsson M.
      • Andréasson-Jansson H.
      • Ingman M.
      • Allen M.
      Evaluation of mitochondrial DNA coding region assays for increased discrimination in forensic analysis.
      ,
      • Just R.S.
      • et al.
      Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
      ,
      • Parson W.
      • et al.
      DNA Commission of the International society for forensic genetics: revised and extended guidelines for mitochondrial DNA typing.
      ]. The non-recombining part of the Y chromosome (MSY) is the region that does not recombine with the pseudoautosomal parts of the X chromosome. Within this region, 8.9 Mb of sequence are single-copy and much easier to map and to call variation on than the rest. These unique regions of the MSY have been widely used in population genetic studies [
      • García-Fernández C.
      • et al.
      Sex-biased patterns shaped the genetic history of Roma.
      ,
      • Pinotti T.
      • et al.
      Y Chromosome sequences reveal a short beringian standstill, rapid expansion, and early population structure of native American Founders.
      ,
      • Solé-Morata N.
      • et al.
      Whole Y-chromosome sequences reveal an extremely recent origin of the most common North African paternal lineage E-M183 (M81).
      ]. The advent of cost-effective techniques to retrieve the MSY, such as target enrichment or flow cytometry, is increasing the availability of reference datasets for different populations [
      • Kutanan W.
      • et al.
      Contrasting maternal and paternal genetic variation of hunter-gatherer groups in Thailand.
      ]. These techniques have also improved the extraction of Y chromosome information from ancient or degraded samples [
      • Petr M.
      • et al.
      The evolutionary history of neanderthal and denisovan Y chromosomes.
      ], increasing the potential for using whole MSY sequences in forensic studies [
      • de Knijff P.
      On the forensic use of Y-chromosome polymorphisms.
      ]. Combining the analysis of both uniparentally transmitted portions of the genome could complete the information in complex cases, such as those involving degraded or mixed DNA samples.
      Catalonia occupies the NE corner of the Iberian Peninsula, limiting with France and Andorra to the North, the Spanish regions of Aragon and Valencia to the West and South, and the Mediterranean Sea to the East. It occupies 32,108 Km2 and has a population of 7,727,029 (2020). The distinct Catalan culture and identity is mostly based on the Catalan language, which is also spoken, in different varieties, in SW France, Valencia, and the Balearic Islands. Historically a population crossroads, the local Iron-Age Iberians had contacts with Greek settlers and Phoenician merchants, but were brought into the Roman dominion in the 2nd century BCE. After the Germanic invasions, the region was completely or partly under Muslim, North African control between the 8th and the 12th centuries CE. A dynastic union brought the Catalan counties (and foremost among them, the earldom of Barcelona) under the kingdom of Aragon, which, between the 13th and 15th centuries expanded southwards towards Valencia and eastwards to the Western Mediterranean, including southern Italy, Corsica, Sardinia, Sicily, and briefly Athens. The late 16th and early 17th centuries saw an influx of French migrants fleeing the Religion Wars which assimilated into the local population even if their numbers may have been up to a quarter of the local population [
      • Nadal J.
      • Giralt E.
      • Braudel F.
      La population catalane de 1553 à 1717. L’immigration française et les autres facteurs de son développement. VI e Section, Centre de Recherches historiques, coll. " Démographie et Sociétés ", III.
      ]. Catalonia industrialized earlier and more intensively than the rest of Spain, which attracted immigration.
      The genetic profile of the Catalan population has been previously characterized in autosomal SNP-array population genetic studies of the Iberian Peninsula [
      • Bycroft C.
      • et al.
      Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula.
      ]. As for mitochondrial DNA, 121 mitogenomes from Catalonia were sequenced as part of a large Spanish dataset [
      • Silva M.
      • et al.
      Biomolecular insights into North African-related ancestry, mobility and diet in eleventh-century Al-Andalus.
      ], and showed haplogroup frequencies that were similar to the countrywide averages. In previous publications, the hypervariable region I (and, occasionally, hypervariable region II) were sequenced and showed haplotype and haplogroup distributions within the Western European diversity [
      • Plaza S.
      • et al.
      Joining the pillars of Hercules: mtDNA sequences show multidirectional gene flow in the western Mediterranean.
      ,
      • Crespillo M.
      • et al.
      Mitochondrial DNA sequences for 118 individuals from northeastern Spain.
      ,
      • Côrte-Real H.B.
      • et al.
      Genetic diversity in the Iberian Peninsula determined from mitochondrial sequence analysis.
      ]. As for the Y chromosome, a survey of the MSY haplogroups in the Iberian Peninsula showed the North-to-South gradients that were also prominent in autosomal SNPs [
      • Bycroft C.
      • et al.
      Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula.
      ]. A notable feature of haplogroup frequencies in Catalonia is the high frequency of R1b-DF27 [
      • Solé-Morata N.
      • et al.
      Analysis of the R1b-DF27 haplogroup shows that a large fraction of Iberian Y-chromosome lineages originated recently in situ.
      ], and, in particular, of R1b-M167 [
      • Hurles M.E.
      • et al.
      Recent male-mediated gene flow over a linguistic barrier in Iberia, suggested by analysis of a Y-chromosomal DNA polymorphism.
      ]. Over 60 SNPs and the Yfiler™ Y-STR kit were typed in > 1500 Catalan men in a study of surnames [
      • Solé-Morata N.
      • Bertranpetit J.
      • Comas D.
      • Calafell F.
      Y-chromosome diversity in Catalan surname samples: insights into surname origin and frequency.
      ]. Other studies of Y-STRs in Catalonia can be found in references [
      • Gené M.
      • et al.
      Haplotype frequencies of eight Y-chromosome STR loci in Barcelona (North-East Spain).
      ,
      • Pérez-Lezaun A.
      • et al.
      Population genetics of Y-chromosome short tandem repeats in humans.
      ].
      However, the studies cited above were carried out on samples of autochthonous Catalans, which represent only a fraction of the actual resident population in the region. Several waves of migration from elsewhere in Spain (particularly in the 1920 s, 1950 s and 1960 s) and from abroad (mostly from Morocco, Romania and Latin America, from ∼1995–2008) constitute a non-negligible part of the population currently residing in Catalonia. Today 16.0% of the Catalan residents were born elsewhere in Spain (and probably as many are second- and third-generation immigrants) and 20.4% were born abroad (data from the Catalan Institute of Statistics, www.idescat.cat). For forensic purposes, the creation of reference databases is needed to cover the present genetic diversity of the population. In this study we aim to describe the current genetic diversity of the Catalan resident population, obtaining a comprehensive database for future studies. To do so we analyse the mtDNA and MSY sequences from a cohort of 808 residents in Catalonia, assessing also the quality and suitability of the whole sequences to unequivocally identify individuals.

      2. Materials and methods

      2.1 Sampling

      The samples used in this study are a subset of the GCAT Genomes For Life Cohort [
      • Obón-Santacana M.
      • et al.
      GCAT=Genomes for life: a prospective cohort study of the genomes of Catalonia.
      ,
      • Galván-Femenía I.
      • et al.
      Multitrait genome association analysis identifies new susceptibility genes for human anthropometric variation in the GCAT cohort.
      ] (see full details in www.genomesforlife.com). The participants were blood donors with access to the public national healthcare system, recruited from the general population (2014–2017) with the only restriction of having lived for at least five years in Catalonia and being aged between 40 and 65 years. All participants who agreed to be part of the study provided informed consent and were asked to sign a consent agreement. This work was approved by Hospital Germans Trias i Pujol IRB, ref. PI-19–081, on April 5th, 2019.

      2.2 Sequencing

      A random, gender-balanced sample of 808 individuals was selected for whole-genome sequencing. Sequencing data from selected individuals was obtained from parallel-short-reads sequences using HiSeq 4000 sequencer (Illumina, 30X coverage, read length 150 bp, insert size 600 bp) in FASTQ format. FASTQ files are deposited to the European Genome-Phenome Archive (EGA, EGAS00001003018) [
      • Obón-Santacana M.
      • et al.
      GCAT=Genomes for life: a prospective cohort study of the genomes of Catalonia.
      ].

      2.3 Sequence preprocessing

      We mapped the raw sequencing reads to the human reference genome hs37d5 including rCRS mtDNA reference [
      • Andrews R.M.
      • et al.
      Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.
      ], using the BWA mem algorithm [
      • Li H.
      • Durbin R.
      Fast and accurate short read alignment with Burrows-Wheeler transform.
      ]. Only those reads mapping to the mtDNA and the Y chromosome (Ychr) were selected; PCR duplicates were removed and base quality scores were recalibrated using Picard tools and GATK v3.7–0 [
      • McKenna A.
      • et al.
      The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data.
      ]. Following GATK best practices [
      • DePristo M.A.
      • et al.
      A framework for variation discovery and genotyping using next-generation DNA sequencing data.
      ] we then called the sequence variants using HaplotypeCaller and GenotypeGVCFs GATK tools [
      • McKenna A.
      • et al.
      The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data.
      ].
      The mtDNA variant calls were manually curated using the BAM files and the Integrative Genomics Viewer v 2.12.2 [
      • Robinson J.T.
      • et al.
      Integrative genomics viewer.
      ], with special attention to unexpected variants or missing calls. The final mtDNA dataset underwent EMPOP [
      • Parson W.
      • Dür A.
      EMPOP—a forensic mtDNA database.
      ] quality control (EMPOP accession id EMP00860) and contains 808 high quality complete sequences with 1767 polymorphic sites in 16,569 base pairs (Table S1). Analyses performed using the whole mtDNA sequence include the range 1–16,569 and control region analyses used nucleotide positions 1–576 plus 16024–16569.
      We filtered the Ychr variants using VariantFiltration according to GATK best practices recommendations [
      • DePristo M.A.
      • et al.
      A framework for variation discovery and genotyping using next-generation DNA sequencing data.
      ], applying a coverage filter threshold between half and double of the average coverage [
      • Mondal M.
      • et al.
      Y-chromosomal sequences of diverse Indian populations and the ancestry of the Andamanese.
      ]. All the analyses were restricted to the 8.97 Mb of high quality regions of the Y chromosome as defined by Wei et al. [
      • Wei W.
      • et al.
      A calibrated human Y-chromosomal phylogeny based on resequencing.
      ]. The final MSY dataset is composed of 399 male sequences and 23,594 variant positions.
      The assignment of haplogroups was performed with yHaplo [
      • Poznik G.D.
      Identifying Y-chromosome haplogroups in arbitrarily large samples of sequenced or genotyped men.
      ] for the MSY, and EMMA [
      • Röck A.W.
      • Dür A.
      • van Oven M.
      • Parson W.
      Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA).
      ] based on phylotree build 17 [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ] for the mtDNA. Haplogroup assignment was subsequently manually curated, and, in the case of the MSY, independently verified by YHRD (www.yhrd.org) [
      • Willuweit S.
      • Roewer L.
      The new Y chromosome haplotype reference database.
      ] with accession number YA004753.

      2.4 Quality control analyses

      The quality of the mapping process and variant calling of the uniparental sequences was controlled on the BAM and VCF files, respectively. As a measure of mapping quality, we estimated the rates of mapping efficiency (number of mapped reads / total number of reads), the mean read length and strand balance (forward strand read depth / total read depth), considering strand bias is present when it falls outside of the 0.3–0.7 interval [
      • García Ó.
      • Alonso S.
      • Huber N.
      • Bodner M.
      • Parson W.
      Forensically relevant phylogeographic evaluation of mitogenome variation in the Basque Country.
      ]. We also estimated the proportion of PCR duplicates using SAMtools [
      • Li H.
      • et al.
      The sequence alignment/map format and SAMtools.
      ], and the coverage per site using VCFtools [
      • Danecek P.
      • et al.
      The variant call format and VCFtools.
      ]. We next analysed the quality of the MSY variant calling by calculating the genotype missingness per site and per sample, the genotype quality (phred-scaled score of the confidence of genotype assignation), and the SNP quality (phred-scaled score of the confidence that the site is a variant) of each variant using VCFtools [
      • Danecek P.
      • et al.
      The variant call format and VCFtools.
      ]. mtDNA variant calling was manually curated as explained above.

      2.5 Statistical analyses

      We constructed two different reference datasets, one for each uniparentally transmitted portion of the genome. For the MSY we used 54 Spanish (IBS, from the 1000 Genomes Project [
      • Byrska-Bishop M.
      • et al.
      High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.
      ]), 15 Spanish Basques [
      • Bergström A.
      • et al.
      Insights into human genetic variation and population history from 929 diverse genomes.
      ], 53 Tuscan (TSI) [
      • Byrska-Bishop M.
      • et al.
      High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.
      ] and 27 other Italians [
      • Bergström A.
      • et al.
      Insights into human genetic variation and population history from 929 diverse genomes.
      ] (7 Bergamese, 6 Tuscan and 14 Sardinian), 11 French [
      • Bergström A.
      • et al.
      Insights into human genetic variation and population history from 929 diverse genomes.
      ], 46 British (BRI) [
      • Byrska-Bishop M.
      • et al.
      High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.
      ], 7 Orcadian [
      • Bergström A.
      • et al.
      Insights into human genetic variation and population history from 929 diverse genomes.
      ], 60 European-Americans (CEU) [
      • Byrska-Bishop M.
      • et al.
      High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.
      ], 38 Finns (FIN) [
      • Byrska-Bishop M.
      • et al.
      High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.
      ], and 12 samples from North Africa [
      • Serra-Vidal G.
      • et al.
      Heterogeneity in palaeolithic population continuity and neolithic expansion in North Africa.
      ] (2 Algerian non-Imazighen, 2 Tunisian Imazighen, 2 Algerian Imazighen Zenata, 2 Egyptian non-Imazighen, 2 Moroccan non-Imazighen, 1 Saharawi and 1 Tunisian non-Imazighen). Samples from the Human Genome Diversity Project were downloaded in VCF format and lifted to the hs37d5 reference with Picard tools LiftoverVcf [], and samples from Serra-Vidal et al. were downloaded in fastq format and processed with the same pipeline as the GCAT cohort. For the mtDNA, we included the following datasets as reference: [
      • Salas A.
      • et al.
      The making of the African mtDNA landscape.
      ] individuals from Zamora (Spain) [
      • Ramos A.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ], 57 Sephardic Portuguese [
      • Nogueiro I.
      • Teixeira J.
      • Amorim A.
      • Gusmão L.
      • Alvarez L.
      Echoes from Sepharad: signatures on the maternal gene pool of crypto-Jewish descendants.
      ], 83 US European-descent individuals (USEuropeans1) [
      • King J.L.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ], 263 US European-descent individuals (USEuropeans2) [
      • Just R.S.
      • et al.
      Full mtGenome reference data: development and characterization of 588 forensic-quality haplotypes representing three U.S. populations.
      ], 96 Hungarians [
      • Malyarchuk B.
      • et al.
      Whole mitochondrial genome diversity in two Hungarian populations.
      ], 225 Serbians [
      • Davidovic S.
      • et al.
      Complete mitogenome data for the Serbian population: the contribution to high-quality forensic databases.
      ], 206 Armenians [
      • Margaryan A.
      • et al.
      Eight millennia of matrilineal genetic continuity in the South Caucasus.
      ], and 177 Spanish Basques (Basques1) [
      • García Ó.
      • Alonso S.
      • Huber N.
      • Bodner M.
      • Parson W.
      Forensically relevant phylogeographic evaluation of mitogenome variation in the Basque Country.
      ]. All these datasets are deposited to EMPOP and underwent the corresponding EMPOP [
      • Parson W.
      • Dür A.
      EMPOP—a forensic mtDNA database.
      ] quality controls (mtDNA reference dataset 1) (Table 1). To increase geographic coverage of the reference datasets, we also downloaded from Genbank in FASTA format the following datasets, all of which were published as population studies: 181 French [
      • Hartmann A.
      • et al.
      Validation of microarray-based resequencing of 93 worldwide mitochondrial genomes.
      ,
      • Behar D.M.
      • et al.
      The Basque paradigm: genetic evidence of a maternal continuity in the Franco-Cantabrian region since pre-Neolithic times.
      ,
      • Ingman M.
      • Kaessmann H.
      • Pääbo S.
      • Gyllensten U.
      Mitochondrial genome variation and the origin of modern humans.
      ,
      • Lippold S.
      • et al.
      Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences.
      ,
      • Gómez-Carballa A.
      • et al.
      Genetic continuity in the Franco-Cantabrian region: new clues from autochthonous mitogenomes.
      ,
      • Guillet V.
      • et al.
      Adenine nucleotide translocase is involved in a mitochondrial coupling defect in MFN2-related Charcot-Marie-Tooth type 2A disease.
      ], 204 North Africans (32 Mozabites, 19 Egyptian non-Imazighen, 48 Moroccan non-Imazighen, 47 Moroccan Imazighen, and 52 Tunisian non-Imazighen [
      • Hartmann A.
      • et al.
      Validation of microarray-based resequencing of 93 worldwide mitochondrial genomes.
      ,
      • Lippold S.
      • et al.
      Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences.
      ,
      • Olivieri A.
      • et al.
      The mtDNA legacy of the Levantine early Upper Palaeolithic in Africa.
      ,
      • Behar D.M.
      • et al.
      The dawn of human matrilineal diversity.
      ,
      • Musilová E.
      • et al.
      Population history of the Red Sea--genetic exchanges between the Arabian Peninsula and East Africa signaled in the mitochondrial DNA HV1 haplogroup.
      ,
      • Ennafaa H.
      • et al.
      Mitochondrial DNA haplogroup H structure in North Africa.
      ,
      • Maca-Meyer N.
      • González A.M.
      • Larruga J.M.
      • Flores C.
      • Cabrera V.M.
      Major genomic mitochondrial lineages delineate early human expansions.
      ,
      • Maca-Meyer N.
      • et al.
      Mitochondrial DNA transit between West Asia and North Africa inferred from U6 phylogeography.
      ,
      • Cerný V.
      • et al.
      Internal diversification of mitochondrial haplogroup R0a reveals post-last glacial maximum demographic expansions in South Arabia.
      ,
      • Pala M.
      • et al.
      Mitochondrial haplogroup U5b3: a distant echo of the epipaleolithic in Italy and the legacy of the early Sardinians.
      ,
      • Pennarun E.
      • et al.
      Divorcing the Late Upper Palaeolithic demographic histories of mtDNA haplogroups M1 and U6 in Africa.
      ,
      • Pereira L.
      • et al.
      Population expansion in the North African late Pleistocene signalled by mitochondrial DNA haplogroup U6.
      ,
      • Achilli A.
      • et al.
      Saami and Berbers--an unexpected mitochondrial DNA link.
      ,
      • Costa M.D.
      • et al.
      Data from complete mtDNA sequencing of Tunisian centenarians: testing haplogroup association and the ‘golden mean’ to longevity.
      ]), 65 Portuguese [
      • Pereira L.
      • et al.
      Population expansion in the North African late Pleistocene signalled by mitochondrial DNA haplogroup U6.
      ,
      • Pereira L.
      • Gonçalves J.
      • Bandelt H.-J.
      Mutation C11994T in the mitochondrial ND4 gene is not a cause of low sperm motility in Portugal.
      ], 352 Italians and 28 Sardinians [
      • Lippold S.
      • et al.
      Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences.
      ,
      • Gómez-Carballa A.
      • et al.
      Genetic continuity in the Franco-Cantabrian region: new clues from autochthonous mitogenomes.
      ,
      • Olivieri A.
      • et al.
      The mtDNA legacy of the Levantine early Upper Palaeolithic in Africa.
      ,
      • Pala M.
      • et al.
      Mitochondrial haplogroup U5b3: a distant echo of the epipaleolithic in Italy and the legacy of the early Sardinians.
      ,
      • Brisighelli F.
      • et al.
      The Etruscan timeline: a recent Anatolian connection.
      ,
      • Ermini L.
      • et al.
      Complete mitochondrial genome sequence of the Tyrolean Iceman.
      ,
      • Pichler I.
      • et al.
      Genetic structure in contemporary South Tyrolean isolated populations revealed by analysis of Y-chromosome, mtDNA, and Alu polymorphisms. 2006.
      ,
      • Santoro A.
      • et al.
      Evidence for sub-haplogroup h5 of mitochondrial DNA as a risk factor for late onset Alzheimer’s disease.
      ,
      • Zaragoza M.V.
      • Brandon M.C.
      • Diegoli M.
      • Arbustini E.
      • Wallace D.C.
      Mitochondrial cardiomyopathies: how to identify candidate pathogenic mutations by mitochondrial DNA sequencing, MITOMASTER and phylogeny.
      ,
      • Achilli A.
      • et al.
      Mitochondrial DNA backgrounds might modulate diabetes complications rather than T2DM as a whole.
      ,
      • Raule N.
      • et al.
      The co-occurrence of mtDNA mutations on different oxidative phosphorylation subunits, not detected by haplogroup analysis, affects human longevity and is population specific.
      ,
      • Bodner M.
      • et al.
      Helena, the hidden beauty: resolving the most common West Eurasian mtDNA control region haplotype by massively parallel sequencing an Italian population sample.
      ,
      • NCBI Resource Coordinators
      Database resources of the national center for biotechnology information.
      ,
      • Clark K.
      • Karsch-Mizrachi I.
      • Lipman D.J.
      • Ostell J.
      • Sayers E.W.
      GenBank.
      ] (mtDNA reference dataset 2). For the mtDNA control region analyses, we included an additional set of reference datasets with control region sequences deposited at EMPOP (Table 1).
      Table 1Genetic diversity metrics for the mtDNA GCAT cohort and mtDNA reference dataset 1. N Haps = number of haplotypes. π = nucleotide diversity. S= number of segregating sites. MPD = mean pairwise differences. RMP = Random Match Probability for the whole mtDNA sequence. N shared Hap = Number of shared haplotypes between Catalonia and reference datasets. Freq. shared hap = Frequency of shared haplotypes between Catalonia and reference datasets. Number of haplotypes and RMP calculated considering or not heteroplasmic positions (indicated with a and b respectively). MPD, S, and π calculated without considering heteroplasmic positions, insertions or deletions. A. Metrics using the whole mtDNA sequence (1–16,569). B. Metrics from the control region (16024–576). See Table S3 for diversity measures of the mtDNA reference dataset 2.
      A) Whole mtDNA sequenceNN HapsaN HapsbπSMPDRMPaRMPbN shared HapFreq. shared HapEMPOPReference
      Catalonia8087777590.0017 ± 0.0008168528.9 ± 12.60.00140.0014EMP00860present study
      Basques11771491490.0015 ± 0.000749024.1 ± 10.60.00950.0095100.1525EMP00756García O et al. 2020
      Zamora10198960.0018 ± 0.000954030.0 ± 13.20.01050.010940.0495EMP00555Ramos A et al. 2013
      Sephardic_Portuguese5747470.0017 ± 0.000927928.7 ± 12.70.02740.027420.0351EMP00619Nogueiro I et al. 2015
      USEuropeans18383830.0019 ± 0.000949231.3 ± 13.80.0120.01210.0120EMP00659King JL et al. 2014
      USEuropeans22632612580.0019 ± 0.000993830.7 ± 13.40.00390.004000.0000EMP00689Just RS et al. 2015
      Serbian2252122110.0017 ± 0.000877327.5 ± 12.10.00500.005000.0000EMP00739Davidovic S et al. 2020
      Hungarian9691910.0018 ± 0.000948029.1 ± 12.820.01150.011510.0044EMP00735Malyarchuk B et al. 2018
      Armenian2061871870.0021 ± 0.00190934.17 ± 14.90.00580.005800.0000EMP00740Margaryan A et al. 2017
      B) Control RegionNN HapsaN HapsbπSMPDRMPaRMPbN shared HapFreq. shared HapEMPOPReference
      Catalonia808662a6260.0078 ± 0.0042628.8 ± 4.10.00330.0042EMP00860present study
      Mallorca7972720.0077 ± 0.004988.6 ± 4.00.01550.0155150.2031EMP00672Ferragut JC, et al. 2015
      Xuetes10460600.0085 ± 0.004938.4 ± 3.90.02010.0201100.194EMP00672Ferragut JC, et al. 2015
      Basques11771091090.0063 ± 0.0031157.1 ± 3.40.01820.0182260.3979EMP00756García O et al. 2020
      Basques210670700.0075 ± 0.004938.4 ± 3.90.02010.0201240.4661EMP00365Cardoso S et al. 2012
      Basques315897880.0071 ± 0.0041018.0 ± 3.70.01730.018210.3478EMP00668Palencia-Madrid L et al. 2017
      Zamora10189860.0084 ± 0.0041249.4 ± 4.40.01300.0136160.2376EMP00555Ramos A et al. 2013
      PasValley6134340.0067 ± 0.003607.4 ± 3.50.05230.052360.2334EMP00400Cardoso S et al. 2010
      Portuguese12922482480.0082 ± 0.0041779.2 ± 4.20.00470.0047300.1919EMP00292 EMP00552–554Marques SL et al. 2015
      Portuguese212175750.0071 ± 0.0041018.0 ± 3.70.02000.0200130.2729EMP00617Mairal Q et al. 2013
      Sephardic_Portuguese5740400.0084 ± 0.0041249.4 ± 4.40.01300.013060.1577EMP00619Nogueiro I et al. 2015
      USEuropeans18381810.0089 ± 0.00511810.0 ± 4.60.01300.013090.1342EMP00659King JL et al. 2014
      USEuropeans22632412370.0086 ± 0.0041769.6 ± 4.40.00480.0051310.1757EMP00689Just RS et al. 2015
      Dutch6785044960.0080 ± 0.0041169.0 ± 4.20.00380.0040400.1901EMP00666Chaitanya L et al. 2016
      Germany10091900.0082 ± 0.0041149.2 ± 4.30.01220.0126150.202EMP00020Brandstätter A et al. 2006
      Austria2732272220.0078 ± 0.0041678.7 ± 4.00.00600.0063310.1846EMP00001Brandstätter A et al. 2007
      Croatian2001821810.0086 ± 0.0041769.6 ± 4.40.00480.0061170.1155EMP00738Barbarić L et al. 2020
      Serbian2251911910.0079 ± 0.0041658.9 ± 4.10.00620.0062230.1789EMP00739Davidovic S et al. 2020
      Hungarian9683830.0080 ± 0.0041169.0 ± 4.20.01480.014890.1263EMP00735Malyarchuk B et al. 2018
      Greece3192562460.0084 ± 0.0041249.4 ± 4.40.00500.0061180.1033EMP00026Irwin J et al. 2008
      Armenian2061771770.0098 ± 0.00519711.0 ± 5.00.00660.006680.0636EMP00740Margaryan A et al. 2017
      Lebanon1951771720.0091 ± 0.00517510.2 ± 4.70.00610.006470.0672EMP00717Zimmermann B et al. 2019
      Cyprus9176760.0092 ± 0.00517910.3 ± 4.70.01680.016850.1444EMP00016Irwin J et al. 2008
      Jordan2131851820.0117 ± 0.00619413.2 ± 5.90.00630.006510.0047EMP00333Zimmermann B et al. 2019
      Iraqi2031481480.0095 ± 0.00513410.7 ± 4.90.00860.0086130.0846EMP00814Jabbar SM et al. 2021
      Bahrain2021841820.0092 ± 0.00517910.3 ± 4.70.00630.0065130.1197EMP00012Zimmermann B et al. 2019
      The forensic informativity of the haploid sequences generated was measured with the random match probability (RMP), i.e. the probability that two randomly selected sequences carry identical haplotypes by chance [
      • Prieto L.
      • et al.
      The GHEP-EMPOP collaboration on mtDNA population data--A new resource for forensic casework.
      ,
      • Stoneking M.
      • Hedgecock D.
      • Higuchi R.G.
      • Vigilant L.
      • Erlich H.A.
      Population variation of human mtDNA control region sequences detected by enzymatic amplification and sequence-specific oligonucleotide probes.
      ], for the complete mtDNA and control region using Arlequin 3.5 [
      • Excoffier L.
      • Lischer H.E.L.
      Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
      ]. As population diversity parameters, we estimated a set of genetic diversity metrics for the Catalan and reference datasets with Arlequin 3.5 [
      • Excoffier L.
      • Lischer H.E.L.
      Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
      ]: number of haplotypes, nucleotide diversity (π), number of segregating sites (S) and mean pairwise differences (MPD). The same software was used to perform the mismatch distribution in MSY and mtDNA Catalan cohorts. In mtDNA, indels and heteroplasmic sites were excluded from the π and MPD calculations.
      We then focused on the frequency distribution of the MSY and mtDNA variants, specifically on the proportion of singletons and the minor allele frequency spectrum in the GCAT cohort using VCFtools [
      • Danecek P.
      • et al.
      The variant call format and VCFtools.
      ] and Arlequin 3.5 [
      • Excoffier L.
      • Lischer H.E.L.
      Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
      ].
      Heteroplasmic sites at mtDNA were called by visual inspection of the BAM files with IGV 2.12.2. A site was deemed heteroplasmic if it contained at least two alleles with a quality-weighted frequency above 10%. For calculations such as π and MPD that require a single allele per site, the most frequent allele at heteroplasmic sites was used.

      3. Results and discussion

      3.1 Quality assessment of the uniparental sequences

      We first assessed the quality of the sequencing and mapping processes. The proportion of PCR duplicates present in the whole genome raw reads was 9% (Fig. S1). The mean read length was 150 base pairs for mtDNA and 141 for the Ychr, which corresponds to the expected value range for the sequencing platform used (see Materials and Methods). The mean strand balance per site was 0.498 and 0.499 for mtDNA and Ychr sequences (Fig. S2), and, in both cases, the fraction of sites in which the strand balance score was within 0.3 and 0.7 was > 99%. Strand balance is expected to be around 0.5 when both DNA strands are sequenced equally and it provides a higher degree of confidence in the base calling [
      • Seo S.
      • et al.
      Underlying data for sequencing the mitochondrial genome with the massively parallel sequencing platform ion torrent™ PGM™.
      ]. The mean numbers of mapped reads per sample were 324,843 and 7,059,683 for mtDNA and the Ychr (Fig. S3) and the mapping efficiency is close to the maximum of 1, with 0.997 and 0.996 respectively. The mean coverage per site was 1937X for mtDNA and 6.2X for the Ychr (Fig. S4), which are in both cases sufficient to perform reliable subsequent analyses. Overall, all the metrics point to a high quality of both sequencing and mapping processes.
      We then evaluated the quality of the MSY variant calling process. The mean genotype quality score per sample is above 98 (Fig. S5), within a scale of 0–99 [
      • DePristo M.A.
      • et al.
      A framework for variation discovery and genotyping using next-generation DNA sequencing data.
      ], and the mean SNP quality score is 1878.6 (Fig. S6). SNP quality scores tend to scale to high values when quality increases [
      • DePristo M.A.
      • et al.
      A framework for variation discovery and genotyping using next-generation DNA sequencing data.
      ]. Given the stringent filtering applied, each sample has a mean of 194.9 missing sites (Fig. S7), and on average each site is missing in 2.19 samples (Fig. S8). However, in relative terms, these are 0.002% of the sites and 0.30% of the samples, which implies that variant calling in our dataset is fairly robust. A distribution of the missing sites along the MSY sequence is shown in Fig. S9. We could call all sites for all samples in mtDNA, with no missing values.

      3.2 Genetic diversity and forensic informativity parameters in Catalonia

      Within our dataset composed of 808 and 399 mtDNA and MSY sequences, we found 777 (759 without considering heteroplasmies) and 399 unique haplotypes respectively (Table 1, Table 2).
      Table 2Genetic diversity metrics for the Catalan and reference datasets in the MSY. N Haplotypes = number of haplotypes. π = nucleotide diversity (x1000). S= number of segregating sites. MPD = mean pairwise differences. The North African group includes the following non-Imazighen populations: Algerian, Tunisian, Egyptian, Libyans, Moroccans, and one Saharawi, together with Zenata and Tunisian Imazighen.
      NN HaplotypesπSMPD
      Catalonia3993990.0622 ± 0.026623594553.9 ± 237.0
      Spanish (IBS)54540.0128 ± 0.006223806115.3 ± 55.4
      Basque15150.0266 ± 0.01211443236.9 ± 107.5
      French11110.0562 ± 0.02611979499.8 ± 231.7
      Tuscan660.0722 ± 0.03611737642.0 ± 321.3
      Tuscan (TSI)53530.0214 ± 0.010326221191.9 ± 92.3
      Bergamese770.0396 ± 0.01941043352.3 ± 172.2
      Sardinian14140.0505 ± 0.02301985449.2 ± 204.4
      British (GBR)46460.0092 ± 0.00441983982.6 ± 39.9
      Orcadian770.0420 ± 0.0205977373.2 ± 182.4
      European-American (CEU)60600.0155 ± 0.007424093138.8 ± 66.6
      Finnish (FIN)38380.0112 ± 0.005517308100.9 ± 48.9
      North African non-Imazighen770.0399 ± 0.01951081355.3 ± 173.6
      North African12120.0251 ± 0.01161127224.0 ± 103.3
      For the complete mitochondrial genome, π and MPD are 0.0019 ± 0.001 and 28.9 ± 12.6, respectively (Fig. S10). These values fall within the range for forensic-quality mitogenomes in Europe (Table 1A); for instance, almost all MPD values range between 27 and 31, with the exception of the isolated Basques (24.1), while Armenians reach 34.2. RMP is inversely correlated with sample size (whole mitogenome, Spearman’s rho=−0.733, p = 0.025; control region, Spearman’s rho=−0.733, p = 0.000021), which is expected, since singleton haplotypes would have a lower frequency in population samples with larger sample sizes and would contribute much less to RMP. Thus, given that our sample is almost four times larger than any EMPOP-curated, whole-mtDNA West Eurasian population sample, its RMP (0.0014) is the lowest, which increases the informativity of this sample in a forensic context. When comparing whole mtDNA haplotypes (Table S2), matches between Catalan residents and non-Iberians are rare (only one each in European-Americans and in Hungarians), while we found 2–10 matches with other Iberian samples.
      Comparisons with non-forensic general population samples (reference dataset 2), chosen to improve the geographical coverage of the reference datasets (Table S3), may be harder to interpret. π and MPD values are again similar between Catalan residents and other West Eurasians, with the exception of the French, with lower values, and the North Africans, where sub-Saharan admixture carrying the divergent L-haplogroup haplotypes can explain the higher values [
      • Behar D.M.
      • et al.
      The dawn of human matrilineal diversity.
      ]. Haplotype matches (Table S4) are rare, from four with an Italian sample and six from a French one, to 24 in a large (N = 1023) Spanish sample. Still, in relative terms, individuals carrying these matching haplotypes have joint frequencies 1–4% in the respective reference populations.
      Since EMPOP-curated, whole-mtDNA West Eurasian population samples are still scarce (at the time of writing these lines, only eight population samples match these criteria), we also compared the Catalan mtDNA control region (CR) sequences. In Table 1B, we present the informativity statistics for Catalans and 25 other datasets. At different scales, informativity statistics for the CR replicate the patterns observed for the whole mitogenomes. Thus, π and MPD values are again similar between Catalan residents and other West Eurasians, albeit, for instance, MPD in the CR has a range of 8–10 differences, about three times smaller than in the whole mtDNA. And, at 0.0033, Catalans have the lowest RMP, followed closely by a large Dutch dataset (N = 678, RMP=0.0038). As expected, many more haplotype matches are found for the CR (Table S5) than for the whole mitogenome: haplotypes shared with Catalan residents make up 10–20% of the individuals in non-Iberian European datasets, while in Iberia that frequency rises to 30–40%. Again, these patterns are similar in reference dataset 2 (Table S3B, S6).
      The Catalan resident mitogenomes present significant ϕst values with all reference populations except with Zamora (Spain), probably due to the low sample size of the latter. ϕst values range between 0.0007 and 0.0046 with European populations, and reach 0.01473 with Armenians (Table S7A). Considering the CR of forensic-grade mtDNA datasets, ϕst values (Table S7B) tend to be higher. They range from 0 with Mallorca (which was repopulated in the Middle Ages from Catalonia) to 0.0078, although some isolated populations such as the Pas Valley (Spain) and the Xueta crypto-Jews are more differentiated from Catalans. For Middle Eastern populations, ϕst distances to Catalans are larger and range from 0.012 to 0.040. For reference dataset 2, patterns are similar. For whole mitogenomes, ϕst values range between 0.0011 and 0.0520 with SW European populations, and they are clearly larger with North African populations (0.0351 – 0.2888) (Table S8A); similar values can be found for the control region (Table S8B).
      As for the MSY, reference datasets are generally more scarce than for the mtDNA, and a forensic standard for quality control and for reporting haplotypes has not been developed for MSY whole sequences, as it exists for mtDNA. Finally, batch effects and variations in sequencing coverage may also bias comparisons. Thus, although we report below informativity statistics for some MSY datasets, comparisons should be taken with care. In particular, population samples from the 1000 Genomes Project show a high number of polymorphic sites, but lower nucleotide diversity. As mentioned above, all 399 male individuals in our sample carried different MSY haplotypes, and thus, RMP= 0. Indeed, that was the case for all reference populations (Table 2) and no haplotype matches were observed across populations either. The RMP using 23 Y-STRs at a global scale is around 5.63 × 10-5 [
      • Purps J.
      • et al.
      A global analysis of Y-chromosomal haplotype diversity for 23 STR loci.
      ]; however, it has been shown that this probability is different in each geographical region, probably due to differences in social structure [
      • Iacovacci G.
      • et al.
      Forensic data and microvariant sequence characterization of 27 Y-STR loci analyzed in four Eastern African countries.
      ]. As seen in Table 2, we were not aware of any West Eurasian MSY sequence datasets that represented random population samples and were as large as ours; in fact, sample sizes were an order of magnitude lower. Catalan residents had one of the largest nucleotide diversity values, with 0.0622 × 10-3, while the range in West Eurasia was 0.009–0.075 × 10-3. This may reflect a sampling bias, since most other datasets were composed of autochthonous rather than resident individuals. Notice that these values are much lower than those observed for mtDNA, a phenomenon that has been previously described and attributed to the smaller effective population size of males compared to females and to various historical bottlenecks in many populations [
      • Poznik G.D.
      Identifying Y-chromosome haplogroups in arbitrarily large samples of sequenced or genotyped men.
      ]. Still, the size of the target regions sequenced (8.93 Mb) compensates the lower nucleotide diversity, and the MPD between MSY sequences reaches 553.9 in Catalan residents, with a range of 80–650 in the reference datasets (Fig. S10). Finally, ϕst values between Catalan and other populations (Table S9) range between 0.04 and 0.08 with other West European populations, and are larger with Sardinians (0.33) and North Africans (0.46–0.60), which reflects their different haplogroup composition [
      • Francalacci P.
      • et al.
      Peopling of three Mediterranean islands (Corsica, Sardinia, and Sicily) inferred by Y-chromosome biallelic variability.
      ,
      • Contu D.
      • et al.
      Y-chromosome based evidence for pre-neolithic origin of the genetically homogeneous but diverse Sardinian population: inference for association scans.
      ,
      • Triki-Fendri S.
      • et al.
      Paternal lineages in Libya inferred from Y-chromosome haplogroups.
      ,
      • Fadhlaoui-Zid K.
      • et al.
      Sousse: extreme genetic heterogeneity in North Africa.
      ,
      • Bekada A.
      • et al.
      Introducing the Algerian mitochondrial DNA and Y-chromosome profiles into the North African landscape.
      ,
      • Fadhlaoui-Zid K.
      • et al.
      Genetic structure of Tunisian ethnic groups revealed by paternal lineages.
      ].

      3.3 Frequency distribution of variants

      Given the lack of recombination, the basic unit of inheritance in mtDNA and the MSY is the haplotype; for instance, the relevant figure needed when assessing a match is the haplotype frequency. Still, mtDNA and MSY haplotypes are (mostly) composed of SNPs, and, ultimately, the power of mtDNA and MSY to discriminate is based on the variants they contain. The number of singletons in the mtDNA and MSY GCAT dataset is 864 (1.07 singletons per sample) and 16,352 (40.98 singletons per sample) respectively (Fig. S11). mtDNA values are in agreement with previous estimates using complete mitogenomes (1.09 singletons per sample) [
      • Andersen M.M.
      • Balding D.J.
      How many individuals share a mitochondrial genome?.
      ]. The mtDNA control region contains more singletons per site (0.106) than in the coding region (0.048), since the former is the most polymorphic region in the mtDNA [
      • Stoneking M.
      • Hedgecock D.
      • Higuchi R.G.
      • Vigilant L.
      • Erlich H.A.
      Population variation of human mtDNA control region sequences detected by enzymatic amplification and sequence-specific oligonucleotide probes.
      ]. The mean number of singletons per individual in the MSY present in our dataset is much higher than the one found in previous studies in the Spanish population (15 singletons per sample) [
      • Batini C.
      • et al.
      Large-scale recent expansion of European patrilineages shown by population resequencing.
      ], probably because this study, besides the Spanish sampled, contained populations from across Eurasia.
      We next examined the minor-allele frequency (MAF) spectrum or proportion of variants in frequency bins in mtDNA and MSY sequences. There are 77.96% and 68.9% of variants with MAF ≤ 0.5%, 18.19% and 24.9% with MAF between 0.5% and 5%, and the remaining 3.85% and 6.14% of variants have a MAF ≥ 5%. Most variants are rare in the mtDNA and MSY sequences as previously suggested [
      • Yamamoto K.
      • et al.
      Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population.
      ], which further exemplifies the ability to differentiate two samples using complete uniparental sequences. Previous studies have already shown the potential of complete mitogenomes for forensics [
      • Nilsson M.
      • Andréasson-Jansson H.
      • Ingman M.
      • Allen M.
      Evaluation of mitochondrial DNA coding region assays for increased discrimination in forensic analysis.
      ,
      • Just R.S.
      • et al.
      Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
      ,
      • García Ó.
      • Alonso S.
      • Huber N.
      • Bodner M.
      • Parson W.
      Forensically relevant phylogeographic evaluation of mitogenome variation in the Basque Country.
      ], while YSTRs are the most analysed markers for the MSY [
      • Purps J.
      • et al.
      A global analysis of Y-chromosomal haplotype diversity for 23 STR loci.
      ,
      • Wróbel M.
      • Parys-Proszek A.
      • Marcińska M.
      • Kupiec T.
      Y chromosome sequence variation of common forensic STR markers and their flanking regions among Polish population.
      ,
      • Dente Â.
      • et al.
      Study of Y chromosome markers with forensic relevance in Lisbon immigrants from African countries – Allelic variants study.
      ]. However, MSY sequences present high levels of rare variants and singletons and can be a potential tool with forensic applications to infer paternal biogeographic ancestry, and in cases in which close male-line relatives are involved.

      3.4 Heteroplasmy

      Massive parallel sequencing affords unprecedented power not only to detect heteroplasmic sites at mtDNA, but also to produce precise estimates of the frequency of each allele at each site. Next, we report point heteroplasmies rather than the length heteroplasmies that are exceedingly common in poly-C tracts such as 303–309 or 311–315. 207 heteroplasmic sites were detected in 181 individuals; that is, 22.4% of the individuals in our sample carried at least one heteroplasmic site. Twenty-two individuals carried two heteroplasmies, and three heteroplasmies were detected in each of two individuals. The number of individuals carrying different numbers of heteroplasmies followed a random Poisson distribution (χ2 =0.211, p = 0.646), which means that the accumulation of heteroplasmies seemed to be independent from any individual background factors. Heteroplasmies were found in 163 different nucleotide positions (Table S10), and tended to accumulate in known sites such as 16183 M (12 samples), 16182 M (9 samples) and 16192Y (6 samples). 43% of the heteroplasmies happened in the control region, which spans only 6.8% of the mtDNA sequence. Still, 30 out of 89 heteroplasmies in the control region were found between positions 16182 and 16192, but even if those were discounted, the control region would still account for one third of the total heteroplasmies. Most (86.5%) of the heteroplasmies are transitions, and this figure reaches 95.7% if 16183 M and 16182 M are omitted.
      The alleles found at a heteroplasmic site can be classified as identical to the reference (in this case, the revised Cambridge Reference Sequence [
      • Andrews R.M.
      • et al.
      Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.
      ]) or alternate. The distribution of the frequency of the alternate allele is shown in Fig. S12A; it can be seen that it is skewed towards lower values. This is expected, since the random change in frequencies at a heteroplasmic site from generation to generation may cause a heteroplasmy to revert to its original, homoplasmic state, and, thus, it is likelier that a low frequency of the derived allele be observed, either because it is recent or because it has reverted to low values from higher frequencies. Obviously, the distinction we made between reference and alternate alleles cannot be equated with ancestral and derived alleles in this mutation process. Alternatively, we can note simply the frequency of the rarer allele at a heteroplasmic site; the distribution of the minor allele frequency (Fig. S12B) also skews towards low values.

      3.5 Haplogroup description

      The mtDNA GCAT dataset contains 378 different haplogroups (Table S1, S11), being superhaplogroup H the most common (43.7%), followed by U (16.3%), and K (9.7%) (Fig. 1). In West Eurasia, superhaplogroup H is the most frequent [
      • Achilli A.
      • et al.
      The molecular dissection of mtDNA haplogroup H confirms that the Franco-Cantabrian glacial refuge was a major source for the European gene pool.
      ], and within it, H1 and H3 have the highest frequencies in Catalonia (16.9% and 7.9%) and in the Iberian Peninsula [
      • Achilli A.
      • et al.
      The molecular dissection of mtDNA haplogroup H confirms that the Franco-Cantabrian glacial refuge was a major source for the European gene pool.
      ]. The U superhaplogroup is highly common in Europe and Southwest Asia, with U5 (9.7%), U2e (2.3%) and U4 (1.6%) the most common U haplogroups in our dataset and in Europe [
      • Malyarchuk B.
      • et al.
      The Peopling of Europe from the Mitochondrial Haplogroup U5 Perspective.
      ,
      • Richards M.
      • et al.
      Tracing European founder lineages in the Near Eastern mtDNA pool.
      ,
      • Quintana-Murci L.
      • et al.
      Where west meets east: the complex mtDNA landscape of the southwest and Central Asian corridor.
      ]. Superhaplogroup K is the third most common GCAT lineage and it is mainly distributed in Europe [
      • Costa M.D.
      • et al.
      A substantial prehistoric European ancestry amongst Ashkenazi maternal lineages.
      ]. When grouping samples according to place of birth, a similar haplogroup distribution is found among autonomous communities in Spain, suggesting that GCAT genetic diversity is not structured by geography (Table S11). As mentioned above, the GCAT cohort includes present-day residents in Catalonia (without considering genealogical background), which may explain the presence of 3.9% of the samples with a mtDNA haplogroup with an origin outside Europe. Native American haplogroups A2 + 64, B2b, B2c1, C1b, and C1c [
      • Achilli A.
      • et al.
      The Phylogeny of the Four Pan-American MtDNA Haplogroups: Implications for Evolutionary and Disease Studies.
      ] are identified in 5 participants born in Ecuador, Mexico and Colombia. North African haplogroups are found in 11 individuals born in Spain: U6 is mainly distributed in West North Africa with U6b1a being a Canary Islands-specific lineage [
      • Maca-Meyer N.
      • et al.
      Mitochondrial DNA transit between West Asia and North Africa inferred from U6 phylogeography.
      ]; M1a is mostly represented in East and the Mediterranean region of North Africa [
      • Pereira L.
      • et al.
      Population expansion in the North African late Pleistocene signalled by mitochondrial DNA haplogroup U6.
      ]. L1, L2 and L3 lineages are found in 15 GCAT samples (14 individuals born in Spain and 1 in Argentina) and are widespread in Africa [
      • Salas A.
      • et al.
      The making of the African mtDNA landscape.
      ]. The presence of haplogroups with a non-European origin can be the result of recent people movement (e.g. Native American haplogroups) or more ancient gene flow (e.g. North African lineages).
      Fig. 1
      Fig. 1Mitochondrial haplogroup (superhaplogroup) relative frequencies for the GCAT cohort.
      A total of 121 different haplogroups were assigned in the MSY GCAT dataset (Table S12). The haplogroup with the highest frequency is R1b1a2a1 (R-L51) and its derivatives (66.4%), of which R-DF27 represents 64.1% (or 42.6% of the total sample) (Fig. 2). R-DF27 is highly prevalent in Western Europe, and especially in the Iberian Peninsula [
      • Solé-Morata N.
      • et al.
      Analysis of the R1b-DF27 haplogroup shows that a large fraction of Iberian Y-chromosome lineages originated recently in situ.
      ], where its parental haplogroup arrived at the beginning of the Bronze Age as a result of the Bell Beaker expansion [
      • Olalde I.
      • et al.
      The genomic history of the Iberian Peninsula over the past 8000 years.
      ]. The second lineage in frequency is E1b1 (E-P2, 8.8%), of which E-V13 (31.4%) and E-M183 (25.7%) are the most abundant (Fig. 2, Table S12). These lineages are very common in the Mediterranean; E-V13 specifically in Greece and the Balkans [
      • Cruciani F.
      • et al.
      Tracing past human male movements in Northern/Eastern Africa and Western Eurasia: New Clues from Y-Chromosomal Haplogroups E-M78 and J-M12.
      ] and E-M183 in North African populations [
      • Solé-Morata N.
      • et al.
      Whole Y-chromosome sequences reveal an extremely recent origin of the most common North African paternal lineage E-M183 (M81).
      ]. Similar patterns of haplogroup frequencies are found when dividing samples by the autonomous community of birth (Table S12). Six individuals (1.5%) carried MSY lineages that are not common in Europe, a proportion that is lower than that in mtDNA (3.9%; p = 0.0238, χ2 test). Five out of these six individuals were born in Catalonia and carried R1a1a1b2-Z93 haplotypes, a haplogroup that has its origin in India and has been related to the Roma diaspora [
      • Underhill P.A.
      • et al.
      The phylogenetic and geographic structure of Y-chromosome haplogroup R1a.
      ]. The remaining individual was born in Mexico and carried a Q1a2a1a1-M3 lineage, which is frequent among Native Americans [
      • Grugni V.
      Analysis of the human Y-chromosome haplogroup Q characterizes ancient population movements in Eurasia and the Americas.
      ] (his mtDNA sequence also carried a Native American lineage, B2c1). The fact that we observe just one Native American Y haplogroup in the GCAT dataset does not imply a higher female Latin American migration rate into Catalonia. It instead confirms the previously described sex-biased process during the colonization of the Americas, where gene flow was mainly driven by European men and Native American women [
      • D’Atanasio E.
      • et al.
      Y haplogroup diversity of the Dominican Republic: reconstructing the effect of the European colonisation and the trans-Atlantic slave trades.
      ].
      Fig. 2
      Fig. 2Y-chromosome haplogroup relative frequencies for the GCAT cohort. Haplogroup longform follows the Y Chromosome Consortium
      [
      • Consortium T.Y.C.
      A Nomenclature System for the Tree of Human Y-Chromosomal Binary Haplogroups.
      ]
      and ISOGG 2016
      [

      ISOGG. International Society of Genetic Genealogy (2016): 〈http://www.isogg.org/〉.

      ]
      nomenclature. A representative SNP for each haplogroup is shown between brackets.

      4. Conclusion

      We report here for the first time 808 mtDNA and 399 MSY complete sequences in individuals from the GCAT cohort. This dataset aims to increase the representation of the population residing in Catalonia and it could be useful as a forensic reference database given the high quality of the sequences and the high number of rare variants. In addition, this dataset presents the potential of whole mtDNA and MSY sequences, which have lower random match probabilities than mtDNA control regions and Y-STRs. We caution however that it might not include all the genetic variability in the resident population, since particular self-reported ethnic minorities (e.g. Asian populations, Romani people), age ranges and socioeconomic groups are underrepresented in the dataset.

      Acknowledgments

      This work was supported by the Spanish Ministry of Economy and Competitiveness and Agencia Estatal de Investigación (grant numbers CGL2016–75389-P (MINEICO/FEDER, UE), PID2019–106485 GB-I00/AEI/10 . 13039/501100011033 (MINEICO), and “ Unidad María de Maeztu ” ( CEX2018–000792-M ) to FC and DC; and Agència de Gestió d’Ajuts Universitaris i de la Recerca (Generalitat de Catalunya, grant 2017SGR00702 ). Computing time at Barcelona Supercomputing Centre was granted by Red Española de Supercomputación ( BCV-2019–3-00002 ). NF-P was supported by a FPU17/03501 fellowship. This study makes use of data generated by the GCAT=Genomes for Life. Cohort study of the Genomes of Catalonia, Fundacio IGTP with registration number PI-2018–03. IGTP is part of the CERCA Program / Generalitat de Catalunya. GCAT is supported by Acción de Dinamización del ISCIII-MINECO and the Ministry of Health of the Generalitat of Catalunya (ADE 10/00026) and; the Agència de Gestió d’Ajuts Universitaris i de Recerca ( AGAUR ) ( 2017-SGR 529 ). We thank IGTP's scientific director Dr.Jordi Barretina for his support. www.genomesforlife.com/ This study was carried out using anonymized data provided by the Catalan Agency for Quality and Health Assessment, within the framework of the PADRIS Program. The authors of this study would like to acknowledge all GCAT project investigators who contributed to the generation of the GCAT data. A full list of the investigators is available from www.genomesforlife.com. We thank the Blood and Tissue Bank from Catalonia (BST) and all the GCAT volunteers that participated in the study.

      Appendix A. Supplementary material

      References

        • Nilsson M.
        • Andréasson-Jansson H.
        • Ingman M.
        • Allen M.
        Evaluation of mitochondrial DNA coding region assays for increased discrimination in forensic analysis.
        Forensic Sci. Int. Genet. 2008; 2: 1-8
        • Just R.S.
        • et al.
        Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
        Forensic Sci. Int. Genet. 2014; 10: 73-79
        • Parson W.
        • et al.
        DNA Commission of the International society for forensic genetics: revised and extended guidelines for mitochondrial DNA typing.
        Forensic Sci. Int. Genet. 2014; 13: 134-142
        • García-Fernández C.
        • et al.
        Sex-biased patterns shaped the genetic history of Roma.
        Sci. Rep. 2020; 10: 14464
        • Pinotti T.
        • et al.
        Y Chromosome sequences reveal a short beringian standstill, rapid expansion, and early population structure of native American Founders.
        Curr. Biol. 2019; 29: 149-157.e3
        • Solé-Morata N.
        • et al.
        Whole Y-chromosome sequences reveal an extremely recent origin of the most common North African paternal lineage E-M183 (M81).
        Sci. Rep. 2017; 7: 15941
        • Kutanan W.
        • et al.
        Contrasting maternal and paternal genetic variation of hunter-gatherer groups in Thailand.
        Sci. Rep. 2018; 8: 1536
        • Petr M.
        • et al.
        The evolutionary history of neanderthal and denisovan Y chromosomes.
        Science. 2020; 369: 1653-1656
        • de Knijff P.
        On the forensic use of Y-chromosome polymorphisms.
        Genes. 2022; 13: 898
        • Nadal J.
        • Giralt E.
        • Braudel F.
        La population catalane de 1553 à 1717. L’immigration française et les autres facteurs de son développement. VI e Section, Centre de Recherches historiques, coll. " Démographie et Sociétés ", III.
        Année Sociol. 19401948-. 1960; 12: 266-269
        • Bycroft C.
        • et al.
        Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula.
        Nat. Commun. 2019; 10: 551
        • Silva M.
        • et al.
        Biomolecular insights into North African-related ancestry, mobility and diet in eleventh-century Al-Andalus.
        Sci. Rep. 2021; 11: 18121
        • Plaza S.
        • et al.
        Joining the pillars of Hercules: mtDNA sequences show multidirectional gene flow in the western Mediterranean.
        Ann. Hum. Genet. 2003; 67: 312-328
        • Crespillo M.
        • et al.
        Mitochondrial DNA sequences for 118 individuals from northeastern Spain.
        Int. J. Leg. Med. 2000; 114: 130-132
        • Côrte-Real H.B.
        • et al.
        Genetic diversity in the Iberian Peninsula determined from mitochondrial sequence analysis.
        Ann. Hum. Genet. 1996; 60: 331-350
        • Solé-Morata N.
        • et al.
        Analysis of the R1b-DF27 haplogroup shows that a large fraction of Iberian Y-chromosome lineages originated recently in situ.
        Sci. Rep. 2017; 7: 7341
        • Hurles M.E.
        • et al.
        Recent male-mediated gene flow over a linguistic barrier in Iberia, suggested by analysis of a Y-chromosomal DNA polymorphism.
        Am. J. Hum. Genet. 1999; 65: 1437-1448
        • Solé-Morata N.
        • Bertranpetit J.
        • Comas D.
        • Calafell F.
        Y-chromosome diversity in Catalan surname samples: insights into surname origin and frequency.
        Eur. J. Hum. Genet. 2015; 23: 1549-1557
        • Gené M.
        • et al.
        Haplotype frequencies of eight Y-chromosome STR loci in Barcelona (North-East Spain).
        Int. J. Leg. Med. 1999; 112: 403-405
        • Pérez-Lezaun A.
        • et al.
        Population genetics of Y-chromosome short tandem repeats in humans.
        J. Mol. Evol. 1997; 45: 265-270
        • Obón-Santacana M.
        • et al.
        GCAT=Genomes for life: a prospective cohort study of the genomes of Catalonia.
        BMJ Open. 2018; 8e018324
        • Galván-Femenía I.
        • et al.
        Multitrait genome association analysis identifies new susceptibility genes for human anthropometric variation in the GCAT cohort.
        J. Med. Genet. 2018; 55: 765-778
        • Andrews R.M.
        • et al.
        Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.
        Nat. Genet. 1999; 23: 147
        • Li H.
        • Durbin R.
        Fast and accurate short read alignment with Burrows-Wheeler transform.
        Bioinforma. Oxf. Engl. 2009; 25: 1754-1760
        • McKenna A.
        • et al.
        The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data.
        Genome Res. 2010; 20: 1297-1303
        • DePristo M.A.
        • et al.
        A framework for variation discovery and genotyping using next-generation DNA sequencing data.
        Nat. Genet. 2011; 43: 491-498
        • Robinson J.T.
        • et al.
        Integrative genomics viewer.
        Nat. Biotechnol. 2011; 29: 24-26
        • Parson W.
        • Dür A.
        EMPOP—a forensic mtDNA database.
        Forensic Sci. Int. Genet. 2007; 1: 88-92
        • Mondal M.
        • et al.
        Y-chromosomal sequences of diverse Indian populations and the ancestry of the Andamanese.
        Hum. Genet. 2017; 136: 499-510
        • Wei W.
        • et al.
        A calibrated human Y-chromosomal phylogeny based on resequencing.
        Genome Res. 2013; 23: 388-395
        • Poznik G.D.
        Identifying Y-chromosome haplogroups in arbitrarily large samples of sequenced or genotyped men.
        bioRxiv. 2016; https://doi.org/10.1101/088716
        • Röck A.W.
        • Dür A.
        • van Oven M.
        • Parson W.
        Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA).
        Forensic Sci. Int. Genet. 2013; 7: 601-609
        • van Oven M.
        • Kayser M.
        Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
        Hum. Mutat. 2009; 30: E386-394
        • Willuweit S.
        • Roewer L.
        The new Y chromosome haplotype reference database.
        Forensic Sci. Int. Genet. 2015; 15: 43-48
        • García Ó.
        • Alonso S.
        • Huber N.
        • Bodner M.
        • Parson W.
        Forensically relevant phylogeographic evaluation of mitogenome variation in the Basque Country.
        Forensic Sci. Int. Genet. 2020; 46102260
        • Li H.
        • et al.
        The sequence alignment/map format and SAMtools.
        Bioinforma. Oxf. Engl. 2009; 25: 2078-2079
        • Danecek P.
        • et al.
        The variant call format and VCFtools.
        Bioinforma. Oxf. Engl. 2011; 27: 2156-2158
        • Byrska-Bishop M.
        • et al.
        High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.
        Cell. 2022; 185: 3426-3440.e19
        • Bergström A.
        • et al.
        Insights into human genetic variation and population history from 929 diverse genomes.
        Science. 2020; 367: eaay5012
        • Serra-Vidal G.
        • et al.
        Heterogeneity in palaeolithic population continuity and neolithic expansion in North Africa.
        Curr. Biol. 2019; 29: 3953-3959.e4
      1. Picard Tools. Broad Institute: 〈http://broadinstitute.github.io/picard/〉.

        • Ramos A.
        • et al.
        Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
        PloS One. 2013; 8e74636
        • Nogueiro I.
        • Teixeira J.
        • Amorim A.
        • Gusmão L.
        • Alvarez L.
        Echoes from Sepharad: signatures on the maternal gene pool of crypto-Jewish descendants.
        Eur. J. Hum. Genet. 2015; 23: 693-699
        • King J.L.
        • et al.
        High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
        Forensic Sci. Int. Genet. 2014; 12: 128-135
        • Just R.S.
        • et al.
        Full mtGenome reference data: development and characterization of 588 forensic-quality haplotypes representing three U.S. populations.
        Forensic Sci. Int. Genet. 2015; 14: 141-155
        • Malyarchuk B.
        • et al.
        Whole mitochondrial genome diversity in two Hungarian populations.
        Mol. Genet. Genom. 2018; 293: 1255-1263
        • Davidovic S.
        • et al.
        Complete mitogenome data for the Serbian population: the contribution to high-quality forensic databases.
        Int. J. Leg. Med. 2020; 134: 1581-1590
        • Margaryan A.
        • et al.
        Eight millennia of matrilineal genetic continuity in the South Caucasus.
        Curr. Biol. 2017; 27: 2023-2028.e7
        • Hartmann A.
        • et al.
        Validation of microarray-based resequencing of 93 worldwide mitochondrial genomes.
        Hum. Mutat. 2009; 30: 115-122
        • Behar D.M.
        • et al.
        The Basque paradigm: genetic evidence of a maternal continuity in the Franco-Cantabrian region since pre-Neolithic times.
        Am. J. Hum. Genet. 2012; 90: 486-493
        • Ingman M.
        • Kaessmann H.
        • Pääbo S.
        • Gyllensten U.
        Mitochondrial genome variation and the origin of modern humans.
        Nature. 2000; 408: 708-713
        • Lippold S.
        • et al.
        Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences.
        Investig. Genet. 2014; 5: 13
        • Gómez-Carballa A.
        • et al.
        Genetic continuity in the Franco-Cantabrian region: new clues from autochthonous mitogenomes.
        PloS One. 2012; 7e32851
        • Guillet V.
        • et al.
        Adenine nucleotide translocase is involved in a mitochondrial coupling defect in MFN2-related Charcot-Marie-Tooth type 2A disease.
        Neurogenetics. 2010; 11: 127-133
        • Olivieri A.
        • et al.
        The mtDNA legacy of the Levantine early Upper Palaeolithic in Africa.
        Science. 2006; 314: 1767-1770
        • Behar D.M.
        • et al.
        The dawn of human matrilineal diversity.
        Am. J. Hum. Genet. 2008; 82: 1130-1140
        • Musilová E.
        • et al.
        Population history of the Red Sea--genetic exchanges between the Arabian Peninsula and East Africa signaled in the mitochondrial DNA HV1 haplogroup.
        Am. J. Phys. Anthropol. 2011; 145: 592-598
        • Ennafaa H.
        • et al.
        Mitochondrial DNA haplogroup H structure in North Africa.
        BMC Genet. 2009; 10: 8
        • Maca-Meyer N.
        • González A.M.
        • Larruga J.M.
        • Flores C.
        • Cabrera V.M.
        Major genomic mitochondrial lineages delineate early human expansions.
        BMC Genet. 2001; 2: 13
        • Maca-Meyer N.
        • et al.
        Mitochondrial DNA transit between West Asia and North Africa inferred from U6 phylogeography.
        BMC Genet. 2003; 4: 15
        • Cerný V.
        • et al.
        Internal diversification of mitochondrial haplogroup R0a reveals post-last glacial maximum demographic expansions in South Arabia.
        Mol. Biol. Evol. 2011; 28: 71-78
        • Pala M.
        • et al.
        Mitochondrial haplogroup U5b3: a distant echo of the epipaleolithic in Italy and the legacy of the early Sardinians.
        Am. J. Hum. Genet. 2009; 84: 814-821
        • Pennarun E.
        • et al.
        Divorcing the Late Upper Palaeolithic demographic histories of mtDNA haplogroups M1 and U6 in Africa.
        BMC Evol. Biol. 2012; 12: 234
        • Pereira L.
        • et al.
        Population expansion in the North African late Pleistocene signalled by mitochondrial DNA haplogroup U6.
        BMC Evol. Biol. 2010; 10: 390
        • Achilli A.
        • et al.
        Saami and Berbers--an unexpected mitochondrial DNA link.
        Am. J. Hum. Genet. 2005; 76: 883-886
        • Costa M.D.
        • et al.
        Data from complete mtDNA sequencing of Tunisian centenarians: testing haplogroup association and the ‘golden mean’ to longevity.
        Mech. Ageing Dev. 2009; 130: 222-226
        • Pereira L.
        • Gonçalves J.
        • Bandelt H.-J.
        Mutation C11994T in the mitochondrial ND4 gene is not a cause of low sperm motility in Portugal.
        Fertil. Steril. 2008; 89: 738-741
        • Brisighelli F.
        • et al.
        The Etruscan timeline: a recent Anatolian connection.
        Eur. J. Hum. Genet. EJHG. 2009; 17: 693-696
        • Ermini L.
        • et al.
        Complete mitochondrial genome sequence of the Tyrolean Iceman.
        Curr. Biol. 2008; 18: 1687-1693
        • Pichler I.
        • et al.
        Genetic structure in contemporary South Tyrolean isolated populations revealed by analysis of Y-chromosome, mtDNA, and Alu polymorphisms. 2006.
        Hum. Biol. 2009; 81: 875-898
        • Santoro A.
        • et al.
        Evidence for sub-haplogroup h5 of mitochondrial DNA as a risk factor for late onset Alzheimer’s disease.
        PloS One. 2010; 5e12037
        • Zaragoza M.V.
        • Brandon M.C.
        • Diegoli M.
        • Arbustini E.
        • Wallace D.C.
        Mitochondrial cardiomyopathies: how to identify candidate pathogenic mutations by mitochondrial DNA sequencing, MITOMASTER and phylogeny.
        Eur. J. Hum. Genet. 2011; 19: 200-207
        • Achilli A.
        • et al.
        Mitochondrial DNA backgrounds might modulate diabetes complications rather than T2DM as a whole.
        PloS One. 2011; 6e21029
        • Raule N.
        • et al.
        The co-occurrence of mtDNA mutations on different oxidative phosphorylation subunits, not detected by haplogroup analysis, affects human longevity and is population specific.
        Aging Cell. 2014; 13: 401-407
        • Bodner M.
        • et al.
        Helena, the hidden beauty: resolving the most common West Eurasian mtDNA control region haplotype by massively parallel sequencing an Italian population sample.
        Forensic Sci. Int. Genet. 2015; 15: 21-26
        • NCBI Resource Coordinators
        Database resources of the national center for biotechnology information.
        Nucleic Acids Res. 2018; 46: D8-D13
        • Clark K.
        • Karsch-Mizrachi I.
        • Lipman D.J.
        • Ostell J.
        • Sayers E.W.
        GenBank.
        Nucleic Acids Res. 2016; 44: D67-D72
        • Prieto L.
        • et al.
        The GHEP-EMPOP collaboration on mtDNA population data--A new resource for forensic casework.
        Forensic Sci. Int. Genet. 2011; 5: 146-151
        • Stoneking M.
        • Hedgecock D.
        • Higuchi R.G.
        • Vigilant L.
        • Erlich H.A.
        Population variation of human mtDNA control region sequences detected by enzymatic amplification and sequence-specific oligonucleotide probes.
        Am. J. Hum. Genet. 1991; 48: 370-382
        • Excoffier L.
        • Lischer H.E.L.
        Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
        Mol. Ecol. Resour. 2010; 10: 564-567
        • Seo S.
        • et al.
        Underlying data for sequencing the mitochondrial genome with the massively parallel sequencing platform ion torrent™ PGM™.
        BMC Genom. 2015; 16: S4
        • Purps J.
        • et al.
        A global analysis of Y-chromosomal haplotype diversity for 23 STR loci.
        Forensic Sci. Int. Genet. 2014; 12: 12-23
        • Iacovacci G.
        • et al.
        Forensic data and microvariant sequence characterization of 27 Y-STR loci analyzed in four Eastern African countries.
        Forensic Sci. Int. Genet. 2017; 27: 123-131
        • Francalacci P.
        • et al.
        Peopling of three Mediterranean islands (Corsica, Sardinia, and Sicily) inferred by Y-chromosome biallelic variability.
        Am. J. Phys. Anthropol. 2003; 121: 270-279
        • Contu D.
        • et al.
        Y-chromosome based evidence for pre-neolithic origin of the genetically homogeneous but diverse Sardinian population: inference for association scans.
        PloS One. 2008; 3e1430
        • Triki-Fendri S.
        • et al.
        Paternal lineages in Libya inferred from Y-chromosome haplogroups.
        Am. J. Phys. Anthropol. 2015; 157: 242-251
        • Fadhlaoui-Zid K.
        • et al.
        Sousse: extreme genetic heterogeneity in North Africa.
        J. Hum. Genet. 2015; 60: 41-49
        • Bekada A.
        • et al.
        Introducing the Algerian mitochondrial DNA and Y-chromosome profiles into the North African landscape.
        PloS One. 2013; 8e56775
        • Fadhlaoui-Zid K.
        • et al.
        Genetic structure of Tunisian ethnic groups revealed by paternal lineages.
        Am. J. Phys. Anthropol. 2011; 146: 271-280
        • Andersen M.M.
        • Balding D.J.
        How many individuals share a mitochondrial genome?.
        PLOS Genet. 2018; 14e1007774
        • Batini C.
        • et al.
        Large-scale recent expansion of European patrilineages shown by population resequencing.
        Nat. Commun. 2015; 6: 7152
        • Yamamoto K.
        • et al.
        Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population.
        Commun. Biol. 2020; 3: 104
        • Wróbel M.
        • Parys-Proszek A.
        • Marcińska M.
        • Kupiec T.
        Y chromosome sequence variation of common forensic STR markers and their flanking regions among Polish population.
        Forensic Sci. Int. Genet. Suppl. Ser. 2019; 7: 557-560
        • Dente Â.
        • et al.
        Study of Y chromosome markers with forensic relevance in Lisbon immigrants from African countries – Allelic variants study.
        Forensic Sci. Int. Genet. Suppl. Ser. 2019; 7: 906-907
        • Achilli A.
        • et al.
        The molecular dissection of mtDNA haplogroup H confirms that the Franco-Cantabrian glacial refuge was a major source for the European gene pool.
        Am. J. Hum. Genet. 2004; 75: 910-918
        • Malyarchuk B.
        • et al.
        The Peopling of Europe from the Mitochondrial Haplogroup U5 Perspective.
        PLOS ONE. 2010; 5e10285
        • Richards M.
        • et al.
        Tracing European founder lineages in the Near Eastern mtDNA pool.
        Am. J. Hum. Genet. 2000; 67: 1251-1276
        • Quintana-Murci L.
        • et al.
        Where west meets east: the complex mtDNA landscape of the southwest and Central Asian corridor.
        Am. J. Hum. Genet. 2004; 74: 827-845
        • Costa M.D.
        • et al.
        A substantial prehistoric European ancestry amongst Ashkenazi maternal lineages.
        Nat. Commun. 2013; 4: 2543
        • Achilli A.
        • et al.
        The Phylogeny of the Four Pan-American MtDNA Haplogroups: Implications for Evolutionary and Disease Studies.
        PLOS ONE. 2008; 3e1764
        • Salas A.
        • et al.
        The making of the African mtDNA landscape.
        Am. J. Hum. Genet. 2002; 71: 1082-1111
        • Olalde I.
        • et al.
        The genomic history of the Iberian Peninsula over the past 8000 years.
        Science. 2019; 363: 1230-1234
        • Cruciani F.
        • et al.
        Tracing past human male movements in Northern/Eastern Africa and Western Eurasia: New Clues from Y-Chromosomal Haplogroups E-M78 and J-M12.
        Mol. Biol. Evol. 2007; 24: 1300-1311
        • Underhill P.A.
        • et al.
        The phylogenetic and geographic structure of Y-chromosome haplogroup R1a.
        Eur. J. Hum. Genet. 2015; 23: 124-131
        • Grugni V.
        Analysis of the human Y-chromosome haplogroup Q characterizes ancient population movements in Eurasia and the Americas.
        BMC Biol. 2019; 17: 3
        • D’Atanasio E.
        • et al.
        Y haplogroup diversity of the Dominican Republic: reconstructing the effect of the European colonisation and the trans-Atlantic slave trades.
        Genome Biol. Evol. 2020; 12: 1579-1590
        • Consortium T.Y.C.
        A Nomenclature System for the Tree of Human Y-Chromosomal Binary Haplogroups.
        Genome Res. 2002; 12: 339-348
      2. ISOGG. International Society of Genetic Genealogy (2016): 〈http://www.isogg.org/〉.