Advertisement
Forensic Population Genetics – Original Research| Volume 14, P141-155, January 2015

Download started.

Ok

Full mtGenome reference data: Development and characterization of 588 forensic-quality haplotypes representing three U.S. populations

  • Rebecca S. Just
    Correspondence
    Corresponding author at: American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States. Tel.: +1 301 257 0794.
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States

    University of Maryland, College Park, 8082 Baltimore Ave., College Park, MD 20740, United States
    Search for articles by this author
  • Melissa K. Scheible
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Spence A. Fast
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Kimberly Sturk-Andreaggi
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Alexander W. Röck
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Jocelyn M. Bush
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Jennifer L. Higginbotham
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Michelle A. Peck
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Joseph D. Ring
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Gabriela E. Huber
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Catarina Xavier
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Christina Strobl
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Author Footnotes
    1 Present address: Michigan State Police, 333 S. Grand Ave., Lansing, MI 48909, United States.
    Elizabeth A. Lyons
    Footnotes
    1 Present address: Michigan State Police, 333 S. Grand Ave., Lansing, MI 48909, United States.
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Toni M. Diegoli
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Martin Bodner
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Author Footnotes
    2 Present address: Division of Genetic Epidemiology, Innsbruck Medical University, Schöpfstrasse 41, Innsbruck, Austria.
    Liane Fendt
    Footnotes
    2 Present address: Division of Genetic Epidemiology, Innsbruck Medical University, Schöpfstrasse 41, Innsbruck, Austria.
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Petra Kralj
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Simone Nagl
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Daniela Niederwieser
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Bettina Zimmermann
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria
    Search for articles by this author
  • Walther Parson
    Affiliations
    Institute of Legal Medicine, Innsbruck Medical University, Müllerstrasse 44, Innsbruck, Austria

    Penn State Eberly College of Science, 517 Thomas Building, University Park, PA 16802, United States
    Search for articles by this author
  • Author Footnotes
    3 Present address: Federal Bureau of Investigation, 2501 Investigation Parkway, Quantico, VA 22135, United States.
    Jodi A. Irwin
    Footnotes
    3 Present address: Federal Bureau of Investigation, 2501 Investigation Parkway, Quantico, VA 22135, United States.
    Affiliations
    Armed Forces DNA Identification Laboratory, 115 Purple Heart Dr., Dover AFB, DE 19902, United States

    American Registry of Pathology, 15245 Shady Grove Rd, Suite 335, Rockville, MD 20850, United States
    Search for articles by this author
  • Author Footnotes
    1 Present address: Michigan State Police, 333 S. Grand Ave., Lansing, MI 48909, United States.
    2 Present address: Division of Genetic Epidemiology, Innsbruck Medical University, Schöpfstrasse 41, Innsbruck, Austria.
    3 Present address: Federal Bureau of Investigation, 2501 Investigation Parkway, Quantico, VA 22135, United States.
Open AccessPublished:October 04, 2014DOI:https://doi.org/10.1016/j.fsigen.2014.09.021

      Highlights

      • 588 complete mtDNA haplotypes from three U.S. population groups are reported.
      • Nearly all haplotypes (>90%) were unique within each population sample.
      • Point heteroplasmy was observed in 23.8% of individuals.
      • The Sanger-based datasets were developed to current forensic quality standards.
      • The haplotypes can serve as a benchmark for evaluation of future mtGenome datasets.

      Abstract

      Though investigations into the use of massively parallel sequencing technologies for the generation of complete mitochondrial genome (mtGenome) profiles from difficult forensic specimens are well underway in multiple laboratories, the high quality population reference data necessary to support full mtGenome typing in the forensic context are lacking. To address this deficiency, we have developed 588 complete mtGenome haplotypes, spanning three U.S. population groups (African American, Caucasian and Hispanic) from anonymized, randomly-sampled specimens. Data production utilized an 8-amplicon, 135 sequencing reaction Sanger-based protocol, performed in semi-automated fashion on robotic instrumentation. Data review followed an intensive multi-step strategy that included a minimum of three independent reviews of the raw data at two laboratories; repeat screenings of all insertions, deletions, heteroplasmies, transversions and any additional private mutations; and a check for phylogenetic feasibility. For all three populations, nearly complete resolution of the haplotypes was achieved with full mtGenome sequences: 90.3–98.8% of haplotypes were unique per population, an improvement of 7.7–29.2% over control region sequencing alone, and zero haplotypes overlapped between populations. Inferred maternal biogeographic ancestry frequencies for each population and heteroplasmy rates in the control region were generally consistent with published datasets. In the coding region, nearly 90% of individuals exhibited length heteroplasmy in the 12418-12425 adenine homopolymer; and despite a relatively high rate of point heteroplasmy (23.8% of individuals across the entire molecule), coding region point heteroplasmies shared by more than one individual were notably absent, and transversion-type heteroplasmies were extremely rare. The ratio of nonsynonymous to synonymous changes among point heteroplasmies in the protein-coding genes (1:1.3) and average pathogenicity scores in comparison to data reported for complete substitutions in previous studies seem to provide some additional support for the role of purifying selection in the evolution of the human mtGenome. Overall, these thoroughly vetted full mtGenome population reference data can serve as a standard against which the quality and features of future mtGenome datasets (especially those developed via massively parallel sequencing) may be evaluated, and will provide a solid foundation for the generation of complete mtGenome haplotype frequency estimates for forensic applications.

      Keywords

      1. Introduction

      Massively parallel sequencing (MPS) technologies hold great potential for efforts to expand forensic mitochondrial DNA (mtDNA) typing beyond current capabilities. Since the first such technology was introduced in 2005 [
      • Margulies M.
      • Egholm M.
      • Altman W.E.
      • Attiya S.
      • Bader J.S.
      • Bemben L.A.
      • et al.
      Genome sequencing in microfabricated high-density picolitre reactors.
      ], MPS has transformed genetic data generation in many fields of research, including ancient DNA (for an overview of some ancient DNA studies that have used MPS, see Table 1 in Knapp and Hofreiter [
      • Knapp M.
      • Hofreiter M.
      Next generation sequencing of ancient DNA: requirements, strategies and perspectives.
      ]; and for a review of the application of MPS to mtDNA sequencing in particular, see Ho and Gilbert [
      • Ho S.Y.
      • Gilbert M.T.
      Ancient mitogenomics.
      ] and Paijmans et al. [
      • Paijmans J.L.
      • Gilbert M.T.
      • Hofreiter M.
      Mitogenomic analyses from ancient DNA.
      ]). The advantages of MPS in comparison to traditional Sanger-type sequencing that have been exploited for analyses of ancient samples also have clear relevance to the low DNA quantity and/or quality specimens to which mtDNA typing is often applied in forensics, where typically only the control region (CR) or portions thereof are targeted due to both limited sample quantities and the enormous cost and effort required to generate Sanger-based profiles to forensic standards. Recent studies have demonstrated both that (1) MPS can effectively recover complete mitochondrial genome (mtGenome) profiles even from highly damaged and degraded forensic samples [
      • Loreille O.
      • Koshinsky H.
      • Fofanov V.Y.
      • Irwin J.A.
      Application of next generation sequencing technologies to the identification of highly degraded unknown soldiers’ remains.
      ,
      • Templeton J.E.
      • Brotherton P.M.
      • Llamas B.
      • Soubrier J.
      • Haak W.
      • Cooper A.
      • et al.
      DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification.
      ], and (2) that full mtGenome sequencing by MPS may be cost-effective in comparison to methods currently used by the forensic community for mtDNA data generation [
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ]. While much further work remains before MPS-based protocols (whether for mtGenome or nuclear genome typing) can be fully validated for forensic use and routinely applied to forensic casework specimens, the ongoing research into MPS for forensic application in many laboratories [
      • Loreille O.
      • Koshinsky H.
      • Fofanov V.Y.
      • Irwin J.A.
      Application of next generation sequencing technologies to the identification of highly degraded unknown soldiers’ remains.
      ,
      • Templeton J.E.
      • Brotherton P.M.
      • Llamas B.
      • Soubrier J.
      • Haak W.
      • Cooper A.
      • et al.
      DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification.
      ,
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ,
      • Mikkelsen M.
      • Rockenbauer E.
      • Wächter A.
      • Fendt L.
      • Zimmermann B.
      • Parson W.
      • et al.
      Application of full mitochondrial genome sequencing using 454 GS FLX pyrosequencing.
      ,
      • Holland M.M.
      • McQuillan M.R.
      • O’Hanlon K.A.
      Second generation sequencing allows for mtDNA mixture deconvolution and high resolution detection of heteroplasmy.
      ,
      • Irwin J.
      • Just R.
      • Scheible M.
      • Loreille O.
      Assessing the potential of next generation sequencing technologies for missing persons identification efforts.
      ,
      • Van Neste C.
      • Van Nieuwerburgh F.
      • Van Hoofstat D.
      • Deforce D.
      Forensic STR analysis using massive parallel sequencing.
      ,
      • Bornman D.M.
      • Hester M.E.
      • Schuetter J.M.
      • Kasoji M.D.
      • Minard-Smith A.
      • Barden C.A.
      • et al.
      Short-read, high-throughput sequencing technology for STR genotyping.
      ,
      • Parson W.
      • Strobl C.
      • Huber G.
      • Zimmermann B.
      • Gomes S.M.
      • Souto L.
      • et al.
      Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM).
      ,
      • Rockenbauer E.
      • Hansen S.
      • Mikkelsen M.
      • Borsting C.
      • Morling N.
      Characterization of mutations and sequence variants in the D21S11 locus by next generation sequencing.
      ,
      • Weber-Lehmann J.
      • Schilling E.
      • Gradl G.
      • Richter D.C.
      • Wiehler J.
      • Rolf B.
      Finding the needle in the haystack: differentiating “identical” twins in paternity testing and forensics by ultra-deep next generation sequencing.
      ,
      • Bintz B.J.
      • Dixon G.B.
      • Wilson M.R.
      Simultaneous detection of human mitochondrial DNA and nuclear-inserted mitochondrial-origin sequences (NumtS) using forensic mtDNA amplification strategies and pyrosequencing technology.
      ,
      • Scheible M.K.
      • Loreille O.
      • Just R.S.
      • Irwin J.A.
      Short tandem repeat typing on the 454 platform: strategies and considerations for targeted sequencing of common forensic markers.
      ,
      • McElhoe J.A.
      • Holland M.M.
      • Makova K.D.
      • Su M.S.
      • Paul I.M.
      • Baker C.H.
      • et al.
      Development and assessment of an optimized next-generation DNA sequencing approach for the mtgenome using the Illumina MiSeq.
      ,
      • Mikkelsen M.
      • Frank-Hansen R.
      • Hansen A.J.
      • Morling N.
      Massively parallel pyrosequencing of the mitochondrial genome with the 454 methodology in forensic genetics.
      ] clearly indicates the direction in which the field is moving.
      At present, though, one of the barriers to wider implementation of complete mtGenome typing in forensic casework is the lack of appropriate reference databases [
      • Templeton J.E.
      • Brotherton P.M.
      • Llamas B.
      • Soubrier J.
      • Haak W.
      • Cooper A.
      • et al.
      DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification.
      ,
      • Irwin J.A.
      • Parson W.
      • Coble M.D.
      • Just R.S.
      mtGenome reference population databases and the future of forensic mtDNA analysis.
      ]. In forensics, weight is assigned to the results of an mtDNA match comparison by estimating the frequency of the mtDNA haplotype given a relevant population sample. While concerted efforts have been put forth in recent years to establish high-quality mtDNA control region reference datasets representing U.S. and global population groups [
      • Parson W.
      • Bandelt H.J.
      Extended guidelines for mtDNA typing of population data in forensic science.
      ,
      • Irwin J.A.
      • Saunier J.L.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • Just R.S.
      • et al.
      Development and expansion of high-quality control region databases to improve forensic mtDNA evidence interpretation.
      ,
      • Parson W.
      • Dur A.
      EMPOP – a forensic mtDNA database.
      ], similar initiatives targeting the mtDNA coding region have been lacking. Although more than 20,000 complete human mtGenome sequences have now been published (see the PhyloTree website http://www.phylotree.org/mtDNA_seqs.htm [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ] for a comprehensive list of publications as of 19 February 2014), none have been developed as U.S.-wide population reference data that meet current forensic standards [
      • Irwin J.A.
      • Parson W.
      • Coble M.D.
      • Just R.S.
      mtGenome reference population databases and the future of forensic mtDNA analysis.
      ,
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ,
      • Parson W.
      • Gusmao L.
      • Hares D.R.
      • Irwin J.A.
      • Mayr W.R.
      • Morling N.
      • et al.
      DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
      ].
      To meet the need for forensic-quality population reference data for the full mtGenome, we report here 588 mtGenome haplotypes from three U.S. populations (African American, U.S. Caucasian and U.S. Hispanic). These Sanger-based data were developed in accordance with current best practices for mtDNA data generation [
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ,
      • Parson W.
      • Gusmao L.
      • Hares D.R.
      • Irwin J.A.
      • Mayr W.R.
      • Morling N.
      • et al.
      DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
      ] to ensure their suitability for forensic use. In this paper we report summary statistics for the complete mtGenome and evaluate the statistical weight of a previously unobserved haplotype, and we compare the composition of each population sample to previously published CR-based datasets to establish their consistency and representativeness. In addition, we examine the coding region insertion/deletion polymorphisms (indels) and the heteroplasmies detected in the haplotypes in detail to help inform future analyses and use of complete mtGenome data for forensic and other purposes.

      2. Materials and methods

      2.1 Specimens and sampling

      The samples used for this databasing initiative were anonymized blood serum specimens from the Department of Defense Serum Repository (DoDSR; [

      Serum specimens from the Department of Defense Serum Repository: The Armed Forces Health Surveillance Center, U.S. Department of Defense, Silver Spring, MD [November 8, 2010; August 1, 2011; and October 20, 2011].

      ]). The 175 African-American, 275 U.S. Caucasian, and 175 U.S. Hispanic samples initially targeted for processing were selected randomly from specimens in the DoDSR collection. Specimens were received with only state and self-reported population/ethnicity information.
      This research involving human subjects, human material or human data was reviewed by the U.S. Army Medical Research and Materiel Command's Office of Research Protections, Institutional Review Board Office, and was granted an exemption from requiring ethics approval.

      2.2 mtGenome data generation

      Full mtGenome haplotypes were generated from the blood serum specimens using the protocol and high-throughput processing strategy described in Lyons et al. [
      • Lyons E.A.
      • Scheible M.K.
      • Sturk-Andreaggi K.
      • Irwin J.A.
      • Just R.S.
      A high-throughput Sanger strategy for human mitochondrial genome sequencing.
      ], with the minor modifications described in Just et al. [
      • Just R.S.
      • Scheible M.K.
      • Fast S.A.
      • Sturk-Andreaggi K.
      • Higginbotham J.L.
      • Lyons E.A.
      • et al.
      Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
      ]. In brief:
      Blood serum specimens were robotically transferred from tubes to 96-well plates. Genomic DNA was extracted from 100 μl of blood serum using the QIAamp 96 DNA Blood Kit (QIAGEN, Valencia, CA), and a combination of robotic pipetting and manual centrifugation. DNA was eluted from the silica columns using either 100 μl or 200 μl of TE buffer (10 mM Tris and 0.5 mM EDTA), and the eluate was evaporated to eliminate any potential ethanol carryover. DNA extracts were resuspended in 100 μl of either UV-irradiated deionized water or TE buffer. Some, but not all, DNA extracts were quantified prior to PCR, using an mtDNA quantitative PCR (qPCR) assay [
      • Diegoli T.M.
      • Coble M.D.
      • Niederstatter H.
      • Loreille O.M.
      • Parsons T.J.
      The use of a mitochondrial DNA-specific qPCR Assay to assess degradation and inhibition.
      ] adapted from Niederstätter et al. [
      • Niederstätter H.
      • Kochl S.
      • Grubwieser P.
      • Pavlic M.
      • Steinlechner M.
      • Parson W.
      A modular real-time PCR concept for determining the quantity and quality of human nuclear and mitochondrial DNA.
      ].
      Amplification of the complete mtGenome was performed in eight overlapping fragments on robotic instrumentation, using the primers and conditions detailed in Lyons et al. [
      • Lyons E.A.
      • Scheible M.K.
      • Sturk-Andreaggi K.
      • Irwin J.A.
      • Just R.S.
      A high-throughput Sanger strategy for human mitochondrial genome sequencing.
      ] and Just et al. [
      • Just R.S.
      • Scheible M.K.
      • Fast S.A.
      • Sturk-Andreaggi K.
      • Higginbotham J.L.
      • Lyons E.A.
      • et al.
      Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
      ]. When qPCR results indicated DNA quantities less than 10 pg/μl, extract input for PCR was doubled from 3 μl to 6 μl. In some cases, such as when specimens from the same extract plate had previously exhibited evidence of inhibition, or to improve first-pass processing success for one or two of the eight mtGenome region targets with the poorest amplification efficiency among the lowest DNA quantity specimens, polymerase (AmpliTaq Gold, Life Technologies, Applied Biosystems, Foster City, CA) inputs were doubled from 2.5 to 5 units.
      Amplification success was evaluated via capillary electrophoresis using automated injection directly from the 96-well PCR plate. When only one of the eight target fragments failed to amplify for a sample, the failed PCR was repeated manually, and the successful PCR product was manually transferred to the original 96-well PCR plate for further processing. When two or more PCR failures for a single sample were encountered, typically no further attempts at amplification were made, and the sample was not carried through to sequencing. PCR product purification of successfully amplified extracts was performed enzymatically in the 96-well PCR plates.
      Sanger sequencing was performed in 96-well plate format on robotic instrumentation using the 135 primers and conditions described in Lyons et al. [
      • Lyons E.A.
      • Scheible M.K.
      • Sturk-Andreaggi K.
      • Irwin J.A.
      • Just R.S.
      A high-throughput Sanger strategy for human mitochondrial genome sequencing.
      ]. Sequencing products were purified via gel filtration columns using a combination of automated pipetting and manual centrifugation. Purified sequencing products were evaporated, resuspended in formamide, and detected on an Applied Biosystems 3730 DNA Analyzer (Life Technologies, Applied Biosystems) using a 50 cm capillary array.
      All sample transfer steps (and nearly all liquid-handling steps) for all stages of the automated sample processing were performed robotically. For any manual re-processing, at least one, and sometimes two, witnesses were used for all sample/PCR product pipetting steps during reaction set-ups and transfers.

      2.3 Data review

      The data review workflow employed for this project is described in brief in Just et al. [
      • Just R.S.
      • Scheible M.K.
      • Fast S.A.
      • Sturk-Andreaggi K.
      • Higginbotham J.L.
      • Lyons E.A.
      • et al.
      Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
      ], and is a version of the review strategy described by Irwin et al. [
      • Irwin J.A.
      • Saunier J.L.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • Just R.S.
      • et al.
      Development and expansion of high-quality control region databases to improve forensic mtDNA evidence interpretation.
      ] modified for complete mtGenome data developed using a multi-amplicon strategy. The workflow is in accordance with the current Scientific Working Group on DNA Analysis Methods (SWGDAM) and International Society for Forensic Genetics (ISFG) best practice guidelines for forensic mtDNA data development [
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ,
      • Parson W.
      • Gusmao L.
      • Hares D.R.
      • Irwin J.A.
      • Mayr W.R.
      • Morling N.
      • et al.
      DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
      ]. Data review was performed by at least three distinct scientists at two different laboratories: the Armed Forces DNA Identification Laboratory (AFDIL); and the Institute of Legal Medicine, Innsbruck Medical University (GMI), curator of the European DNA Profiling Group mtDNA population (EMPOP) database (www.empop.org) [
      • Parson W.
      • Dur A.
      EMPOP – a forensic mtDNA database.
      ]. In detail, the review steps were as follows.

      2.3.1 AFDIL primary analysis

      Initial assembly, trimming and review of the raw sequence data for each sample was performed in Sequencher version 4.8 or 5.0 (Gene Codes Corporation, Ann Arbor, MI). Sequences were aligned to the revised Cambridge Reference Sequence (rCRS; [
      • Anderson S.
      • Bankier A.T.
      • Barrell B.G.
      • de Bruijn M.H.
      • Coulson A.R.
      • Drouin J.
      • et al.
      Sequence and organization of the human mitochondrial genome.
      ,
      • Andrews R.M.
      • Kubacka I.
      • Chinnery P.F.
      • Lightowlers R.N.
      • Turnbull D.M.
      • Howell N.
      Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.
      ]) following phylogenetic alignment rules [
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ,
      • Parson W.
      • Gusmao L.
      • Hares D.R.
      • Irwin J.A.
      • Mayr W.R.
      • Morling N.
      • et al.
      DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
      ,
      • Bandelt H.J.
      • Parson W.
      Consistent treatment of length variants in the human mtDNA control region: a reappraisal.
      ].
      In cases of length heteroplasmy (LHP), a single dominant variant was identified (as per recommendations for mtDNA data interpretation [
      • Parson W.
      • Gusmao L.
      • Hares D.R.
      • Irwin J.A.
      • Mayr W.R.
      • Morling N.
      • et al.
      DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
      ,
      • Bandelt H.J.
      • Parson W.
      Consistent treatment of length variants in the human mtDNA control region: a reappraisal.
      ]). With regard to point heteroplasmy (PHP), an mtGenome position was deemed heteroplasmic only if specific criteria were met upon visual review of the raw sequence data:
      • (1)
        If the minor sequence variant was readily visible (i.e., a distinct peak of normal morphology with white space beneath it could be seen in the trace data without changing the chromatogram view in Sequencher to examine the signal closer to the baseline) in all of the sequences covering the position, and those sequences were generated using both forward and reverse primers, a PHP was called.
      • (2)
        If the minor sequence variant was readily visible in some but not all sequences, data closer to the baseline were inspected for each sequence. If the baseline view demonstrated that the minor variant was substantially higher than any sequence background/noise in (a) the majority of the sequences, and (b) both forward and reverse sequences, a PHP was called.
      When heteroplasmy was suspected but not confirmed according to the above criteria, additional sequence data were generated for the sample/region to clarify the presence or absence of heteroplasmy.
      Once each sample haplotype was complete (i.e., every mtGenome position had at least two strands of high-resolution sequence coverage), a list of differences from the rCRS was prepared manually, and a variance report was electronically exported from Sequencher.

      2.3.2 AFDIL secondary analysis

      Each mtGenome haplotype contig generated during the primary analysis of the raw data was reviewed on a position-by-position basis by a second scientist. A list of differences from the rCRS was generated manually and compared to the list generated at the primary analysis stage, and any discrepancies were resolved to the satisfaction of both reviewers. A variance report was again exported from Sequencher, and compared to the manually-prepared lists of differences from the rCRS to ensure full agreement across all paper and electronic records. In addition, sequences present in the final sample contig were visually examined to confirm that all sequences had the same sample identifier (i.e., that no sequences from a different sample were mistakenly included).

      2.3.3 AFDIL database entry and initial review

      The Sequencher variance reports exported at the secondary analysis stage were electronically imported into the custom software Laboratory Information Systems Applications (LISA; Future Technologies Inc., Fairfax VA). For each sample, the imported record was compared to the handwritten list of differences from the rCRS prepared in the previous data analysis stage to confirm that the database record was consistent with the paper record. In addition, all coding region indels, PHPs and transversions in each electronic profile were visually confirmed by re-review of the raw data at the relevant positions.

      2.3.4 AFDIL database final review

      To confirm the database haplotypes, a second scientist again reviewed each electronic record in comparison to the previously-generated lists of differences from the rCRS, and checked that the correct sequence coverage range (1-16,569 base pairs) was associated with each profile.

      2.3.5 Phylogenetic check

      As described in Just et al. [
      • Just R.S.
      • Scheible M.K.
      • Fast S.A.
      • Sturk-Andreaggi K.
      • Higginbotham J.L.
      • Lyons E.A.
      • et al.
      Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
      ], given the multi-amplicon PCR protocol used for data generation in this project, each mtGenome haplotype was evaluated for phylogenetic feasibility as a quality control measure. Haplotypes were first assigned a preliminary haplogroup, and subsequently compared to the then-current version of PhyloTree (Build 14 or 15, depending on the dates on which different subsets of the data were checked) [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ] to assess each difference from the rCRS. The raw data for each sample were re-reviewed to confirm (a) any expected mutations (based on the preliminary haplogroup) that were lacking, (b) all private mutations (mutations not part of the haplogroup definition), and (c) all PHPs and transversions.

      2.3.6 EMPOP review

      Sequencher project files, variance reports and all raw data for each sample were electronically transferred to EMPOP for tertiary review. At EMPOP, each mtGenome haplotype contig was again reviewed on a position-by-position basis, and edits to the project files were made as warranted. A variance report of differences from the rCRS was exported from Sequencher and imported into a local database.

      2.3.7 Concordance check

      EMPOP and AFDIL-generated variance reports for each haplotype were electronically compared in the local database at EMPOP. Any discrepancies between the haplotypes were reported to AFDIL; and for those samples with discrepancies, the raw data were re-examined by both laboratories for the positions in question. In a few cases, sample re-processing was performed at this stage to clarify the haplotypes. The sample haplotypes were considered finalized once both EMPOP and AFDIL were in agreement, and all relevant files had been corrected at AFDIL and re-sent to EMPOP.

      2.3.8 Haplogroup assignment

      Haplogroups were assigned to each mtGenome haplotype using EMMA [
      • Rock A.W.
      • Dur A.
      • van Oven M.
      • Parson W.
      Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA).
      ] and Build 16 of PhyloTree [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ]. These automated assignments were then compared to the preliminary haplogroups assigned at the phylogenetic check stage, and any discrepancies were evaluated in detail. In all cases, the EMMA-estimated haplogroup was the final haplogroup assigned to the sample.

      2.3.9 Indel screening

      All indels relative to the rCRS in the completed haplotypes were reviewed to assess correct placement according to phylogenetic alignment rules [
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ,
      • Parson W.
      • Gusmao L.
      • Hares D.R.
      • Irwin J.A.
      • Mayr W.R.
      • Morling N.
      • et al.
      DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
      ,
      • Bandelt H.J.
      • Parson W.
      Consistent treatment of length variants in the human mtDNA control region: a reappraisal.
      ] and PhyloTree Build 16 [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ].

      2.3.10 Nuclear mitochondrial pseudogene (NUMT) screening

      All point heteroplasmies in the final haplotypes were compared to a list of positions at which two specific NUMTs (on Chromosomes 1 and 5, and possessing greater than 90% similarity to modern human mtDNA; see Table 3 in Lyons et al. [
      • Lyons E.A.
      • Scheible M.K.
      • Sturk-Andreaggi K.
      • Irwin J.A.
      • Just R.S.
      A high-throughput Sanger strategy for human mitochondrial genome sequencing.
      ]) differ from the rCRS. Any haplotypes with point heteroplasmies that occurred at one of these positions were re-reviewed by careful examination of the raw data to ensure that the point heteroplasmy was not due to co-detection of a NUMT (which would be expected to present as multiple mixed positions within the amplicon in question [
      • Lyons E.A.
      • Scheible M.K.
      • Sturk-Andreaggi K.
      • Irwin J.A.
      • Just R.S.
      A high-throughput Sanger strategy for human mitochondrial genome sequencing.
      ]).

      2.3.11 Data transfer and corrections

      All data transfer steps into internal databases and between laboratories were performed electronically. When changes were made to haplotypes at AFDIL after the initial transfer of sample files to EMPOP, all relevant sample files were re-sent to EMPOP for complete replacement (i.e., no manual changes were made to haplotypes at EMPOP).

      2.4 Data analyses

      Summary statistics (number of haplotypes, number of unique haplotypes, random match probability, haplotype diversity and power of discrimination) for multiple regions of the mtGenome (hypervariable region 1 (HV1) only; HV1 and hypervariable region 2 (HV2) in combination; the complete CR; and the full mtGenome) were based on pairwise comparisons of each of the three populations in the LISA custom software. Cytosine insertions at nucleotide positions 309, 573 and 16193 were ignored for the analyses, and point heteroplasmies were treated as differences.
      Estimations of broad scale maternal biogeographic ancestry (African, East Asian, West Eurasian or Native American) were based on the haplogroups assigned to each haplotype. For the few haplogroup M, N and U lineages which have overlapping present day distributions in certain geographic regions (North Africa, southern Europe and the Near East), assignment to one of the ancestry categories was made on the basis of the geographic distribution of the same or closely related lineages in global populations represented in a beta version of the EMPOP3 database [
      • Parson W.
      • Rock A.W.
      EMPOP 3 NGS mitochondrial databasing.
      ].
      Pairwise comparisons of the haplotypes representing each population and biogeographic ancestry group were performed for (a) the full mtGenome, and (b) with comparisons restricted to the CR, in the LISA custom software. Cytosine insertions at nucleotide positions 309, 573 and 16193 were ignored for the analyses.
      Statistical calculations to assess significance were performed either in Microsoft Office Excel 2010, or, for Chi-Square tests of independence (for comparisons of differing proportions), using the calculator spreadsheet available for download from http://udel.edu/∼mcdonald/statchiind.html [
      • McDonald J.H.
      Handbook of Biological Statistics.
      ].
      Likelihood ratios (LRs) were developed using two methods: the “exact” method for confidence intervals (Clopper–Pearson) [
      • Clopper C.J.
      • Pearson E.S.
      The use of confidence or fiducial limits illustrated in the case of the binomial.
      ] and the “kappa method” [
      • Brenner C.H.
      Fundamental problem of forensic mathematics – the evidential value of a rare haplotype.
      ]. Clopper–Pearson 95% confidence intervals were calculated using HaploCALc Version 1.8 by Steven Myers ([email protected]). LR calculations using the one-tailed confidence interval used the standard formula LR = x/y, where x represents the probability that the questioned and known haplotypes represent the same maternal lineage, and y is the probability that the questioned sample will match an unrelated (or only randomly related) haplotype in the database. The value used for x was always 1, and the value used for y was the one-tailed 95% confidence limit. LR calculations for the kappa method used Eq. (6) from Brenner [
      • Brenner C.H.
      Fundamental problem of forensic mathematics – the evidential value of a rare haplotype.
      ]: LRκ = n/(1 − κ), where κ represents the proportion of haplotypes in the population sample that are singletons (haplotypes observed only once), and n represents the size of the population sample.

      3. Results and discussion

      3.1 Data generation and review

      A variety of data processing metrics were previously detailed for a subset of the low template blood serum samples used for this study [
      • Just R.S.
      • Scheible M.K.
      • Fast S.A.
      • Sturk-Andreaggi K.
      • Higginbotham J.L.
      • Lyons E.A.
      • et al.
      Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
      ].
      As described in Section 2.2, samples that exhibited a single PCR failure during the initial, automated processing were manually reamplified to obtain PCR product that could be carried through to sequencing, whereas samples for which more than one of the eight target mtGenome regions failed to amplify were typically abandoned and not processed beyond amplification. Out of a total of 625 samples that were attempted, 37 were dropped due to PCR failure in two or more of the eight mtGenome target regions. As we previously reported, among the first 242 quantified samples processed, all 12 samples dropped due to multiple PCR failures had PCR DNA input quantities less than 10 pg/μl [
      • Just R.S.
      • Scheible M.K.
      • Fast S.A.
      • Sturk-Andreaggi K.
      • Higginbotham J.L.
      • Lyons E.A.
      • et al.
      Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
      ]. But, as PCR failures can occur due to primer binding site mutations, and those mutations may be haplogroup or lineage-specific, we explored the extent of PCR failure across all 588 completed haplotypes in relation to the PCR strategy employed.
      An examination of the incidence and pattern of PCR failure among samples with primer binding region mutations indicates that such mutations are unlikely to have biased the final datasets for any of the three population samples. A total of 52 polymorphisms, representing 34 distinct mutations, were found across the 16 primer binding regions. Primer binding region mutations were found in 46 of the 588 completed samples (7.8%), and overall had the potential to impact primer binding in 1.1% of the initial eight high-throughput PCR reactions performed per sample (a total of 4704 PCR reactions). Yet, manual reamplification (due to near or complete PCR failure) was required in only eight of the 52 instances in which a mutation was later found in a PCR primer binding region; and thus primer binding region polymorphisms potentially caused PCR failure in just 1.4% of samples and 0.2% of amplifications. Further, as Fig. S1 demonstrates, the position of the mutation relative to the 3′ end of the primer was highly variable in these eight instances of reamplification, and thus the mutation may not have been the reason for the PCR failure in all eight cases. Among the 46 samples which were carried through to sequencing and later found to have polymorphisms in primer binding regions, five (8.9%) exhibited a mutation in more than one of the 16 primer binding regions, yet only three PCR failures (of 10 potentially affected reactions) were observed among these five samples.
      Given the wide variety of mtDNA haplogroups represented by the 588 haplotypes reported in this study (see Section 3.2), and the low DNA quantities for the first twelve samples abandoned [
      • Just R.S.
      • Scheible M.K.
      • Fast S.A.
      • Sturk-Andreaggi K.
      • Higginbotham J.L.
      • Lyons E.A.
      • et al.
      Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
      ], the very low overall incidence of reamplification among samples with known primer binding region mutations suggests that (1) PCR failure due to haplogroup-specific polymorphism when using the Lyons et al. [
      • Lyons E.A.
      • Scheible M.K.
      • Sturk-Andreaggi K.
      • Irwin J.A.
      • Just R.S.
      A high-throughput Sanger strategy for human mitochondrial genome sequencing.
      ] primers is likely to be quite infrequent, and (2) few, if any, of the abandoned samples exhibited multiple PCR failures due to primer binding region mutations. It is therefore unlikely that the PCR or sample handling strategy introduced any particular bias into the datasets reported here.
      The formalized data review process employed for this study (see Section 2.3) included an electronic comparison of the haplotypes independently developed by AFDIL and EMPOP from the raw sequence data. Across the 588 haplotypes compared, 27 discrepancies in 23 samples were identified, a non-concordance rate of 4.6%. The majority of these discrepancies (70%) were due to missed or incorrectly identified heteroplasmies in either the AFDIL or EMPOP analysis; and for three of these samples manual reprocessing (reamplification and repeat sequencing) was performed to generate additional data to determine whether a low-level point heteroplasmy was or was not present. The remaining discrepancies were due either to raw data editing differences (two instances) or indel misalignments (six instances).
      In addition to the differences found upon cross-check of the haplotypes, two further indel misalignments were later identified during additional review of the datasets. In both instances the original alignment of the sequence data was inconsistent with phylogenetic alignment rules and the current mtDNA phylogeny [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ,
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ,
      • Parson W.
      • Gusmao L.
      • Hares D.R.
      • Irwin J.A.
      • Mayr W.R.
      • Morling N.
      • et al.
      DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
      ,
      • Bandelt H.J.
      • Parson W.
      Consistent treatment of length variants in the human mtDNA control region: a reappraisal.
      ]. In one case, a haplotype with 2885 2887del 2888del was incorrectly aligned as 2885del 2886del 2887; and in the second case, a haplotype with 292.1A 292.2T was incorrectly aligned as 291.1T 291.2A. For these two haplotypes the indels were misaligned by both AFDIL and EMPOP, and thus no discrepancy was identified as part of the concordance check. The identification of these two misalignments prompted a thorough review of all 2767 indels present in the 588 haplotypes, and no additional misalignments were found.
      Fig. S2 provides a breakdown of the 29 total data review issues identified in this study. The results of the concordance check and the two additional indel misalignments identified later both (1) underscore the need for multiple reviews of mtDNA sequence data to ensure correct haplotypes are reported, and (2) highlight a need for an automated method for checking regions of the mtGenome prone to indels prior to dataset publication and inclusion in a database. EMPOP includes a software tool that evaluates CR indel placement and is routinely employed to examine CR datasets prior to their inclusion in the database. Until a similar tool is developed to reliably assess complete mtGenome haplotypes, all indels in complete mtGenome datasets should be reviewed in relation to the current knowledge regarding the human mtDNA phylogeny prior to publication.

      3.2 Database composition and statistics

      In total, 588 complete mtGenome haplotypes were generated from three U.S. populations: African American (n = 170), U.S. Caucasian (n = 263) and U.S. Hispanic (n = 155). The number of samples per U.S. state/territory for each population is given in Table S1.
      The 580 distinct mtGenome haplotypes that were observed are presented in Tables S2–S4, and are available in GenBank (accession numbers KM101569–KM102156). Summary statistics for each population are given in Table 1. Across the entire mtGenome, 168 of 170 (98.8%) African American haplotypes, 255 of 263 (97.0%) U.S. Caucasian haplotypes, and 140 of 155 (90.3%) U.S. Hispanic haplotypes were unique in the respective datasets when cytosine insertions at positions 309, 573 and 16193 were ignored. With regard to the summary statistics, the additional value added by sequencing the complete mtGenome is most powerfully demonstrated by comparing the information gleaned from the subsets of the molecule historically targeted for forensic typing. For example, for the African American population sample, the increase in the number of unique haplotypes that would be detected by HV1 and HV2 sequencing compared to HV1 sequencing alone is 13.2%; and moving from HV1 and HV2 typing to complete CR sequencing would increase the number of unique haplotypes detected by 8.3%. In comparison to CR sequencing, complete mtGenome sequencing would increase the number of singletons by 29.2% for this population sample – well more than double the increase seen by moving either from HV1 alone to HV1/HV2, or from HV1/HV2 to the full CR. These improvements in lineage resolution are consistent with a recent examination of 283 mtGenome haplotypes from three Texas population samples [
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ]; however, the random match probabilities reported here are lower due to the larger sample sizes in our study.
      Table 1Summary statistics. Summary statistics were calculated for each of the three U.S. populations for several regions historically targeted for forensic typing: HV1 alone, HV1 and HV2 in combination, the entire CR, and the full mtGenome. Haplotype diversity was calculated as (1 − Random Match Probability) × ((n − 1)/n). The percentage increase in the number of distinct haplotypes and the number of haplotypes unique in each population (observed for only a single individual) were calculated for each successively larger portion of the molecule.
      HV1HV1/HV2CRmtGPercentage increase
      HV1 to HV1/HV2HV1/HV2 to CRCR to mtG
      African American (n = 170)
       # Haplotypes12414014816912.9%5.7%14.2%
       # Unique haplotypes10612013016813.2%8.3%29.2%
       Random match probability1.38%0.92%0.78%0.60%
       Haplotype diversity0.99200.99670.99810.9999
       Power of discrimination99.20%99.67%99.81%99.99%
      U.S. Caucasian (n = 263)
       # Haplotypes15120022925932.5%14.5%13.1%
       # Unique haplotypes12217021125539.3%24.1%20.9%
       Random match probability2.75%0.96%0.60%0.39%
       Haplotype diversity0.97620.99420.99780.9999
       Power of discrimination97.62%99.42%99.78%99.99%
      U.S. Hispanic (n = 155)
       # Haplotypes11913414114712.6%5.2%4.3%
       # Unique haplotypes10212113014018.6%7.4%7.7%
       Random match probability1.27%0.90%0.79%0.72%
       Haplotype diversity0.99370.99740.99860.9992
       Power of discrimination99.37%99.74%99.86%99.92%
      Given the substantially higher degree of haplotype resolution with full mtGenome sequences in comparison to smaller portions of the molecule, we investigated the LRs that would be calculated for previously unobserved haplotypes when considering HV1/HV2 alone, the CR and the complete mtGenome using two different methods: Clopper–Pearson [
      • Clopper C.J.
      • Pearson E.S.
      The use of confidence or fiducial limits illustrated in the case of the binomial.
      ] and the “kappa method” published by Brenner [
      • Brenner C.H.
      Fundamental problem of forensic mathematics – the evidential value of a rare haplotype.
      ]. Confidence interval calculations with the Clopper–Pearson “exact” method use the cumulative probability from a binomial distribution given the number of observations of interest and a sample size; and thus for previously unobserved haplotypes in a database, Clopper–Pearson 95% confidence intervals (either one-tailed or two-tailed) and the resulting LRs will depend entirely on the size of the reference population sample. By contrast, as Brenner's kappa method uses the proportion of singletons (haplotypes observed only once) in a population sample to approximate the rarity of a new haplotype, the calculated LR for a previously unobserved mtDNA haplotype will depend both on database size and the portion of the molecule targeted (as Table 1 demonstrates that the proportion of singletons will be greater as the size of the targeted mtDNA region increases).
      In comparison to the Clopper–Pearson one-tailed method (currently recommended for use in U.S. laboratories [
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ]), LRs developed using the kappa method ranged from 8- to 14-fold higher across our three population samples when only HV1 and HV2 were considered, and from 13- to 18-fold higher when the full CR was considered (Table 2). When the numbers of singletons across the entire mtGenome were used, LRs developed by the kappa method were 31- to 254-fold higher in comparison to the Clopper–Pearson method using a 1-tailed 95% upper confidence limit. Similar values were obtained for the full mtGenome haplotypes recently published by King et al. [
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ]. While the most conservative haplotype frequency estimate may be preferred for some purposes, it is clear from these results that LR calculations using the Clopper–Pearson method negate some of the benefits of the increased resolution achieved by typing the complete mtGenome. Until larger full mtGenome databases are available, Clopper–Pearson based LRs developed for previously unobserved mtGenome haplotypes will be reduced in comparison to even shared haplotypes based on smaller subsets of the molecule given the size of current CR databases (for example, 2823 African American CR haplotypes are presently available in EMPOP, Release 11 [
      • Parson W.
      • Dur A.
      EMPOP – a forensic mtDNA database.
      ]). That is, despite the clearly smaller likelihood of encountering a matching mtGenome haplotype versus a matching CR haplotype (for example) among randomly-selected individuals (Table 1), Clopper–Pearson LRs for full mtGenome haplotypes will, for the time being, be smaller due to database size alone.
      Table 2Likelihood ratios for unobserved haplotypes using two different methods. Clopper–Pearson 95% confidence intervals
      • Clopper C.J.
      • Pearson E.S.
      The use of confidence or fiducial limits illustrated in the case of the binomial.
      and Brenner's “kappa method”
      • Brenner C.H.
      Fundamental problem of forensic mathematics – the evidential value of a rare haplotype.
      were used to calculate LRs for a haplotype not present in the database, for (a) both the three population samples reported in this study and the three population samples reported by King et al.
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      , and (b) given different portions of the mtGenome. As defined by Brenner
      • Brenner C.H.
      Fundamental problem of forensic mathematics – the evidential value of a rare haplotype.
      , κ refers to the proportion of singletons (unique haplotypes) in the database. The number of singletons in the King et al.
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      datasets were obtained from their Table 1 (no full CR values were reported).
      Clopper–Pearson
      • Clopper C.J.
      • Pearson E.S.
      The use of confidence or fiducial limits illustrated in the case of the binomial.
      Brenner kappa
      • Brenner C.H.
      Fundamental problem of forensic mathematics – the evidential value of a rare haplotype.
      1-tailed2-tailedHV1/HV2CRFull mtGenome
      95% CILRUpper 95% CILRSingletonsκLRκSingletonsκLRκSingletonsκLRκ
      This study
      African American (n = 170)0.0175570.0215471200.70595781300.76477231680.988214,450
      U.S. Caucasian (n = 263)0.0113880.0139721700.64647442110.802313302550.96968646
      U.S. Hispanic (n = 155)0.0191520.0235431210.78067071300.83879611400.90321602
      King et al.
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      Texas African American (n = 87)0.0338300.041524760.8736688850.97703785
      Texas Caucasian (n = 83)0.0354280.043523770.9277114883>0.99
      As modeled in Brenner [39] to avoid κ=1.
      9222
      Texas Hispanic (n = 113)0.0262380.032131960.84967511110.98236384
      a As modeled in Brenner
      • Brenner C.H.
      Fundamental problem of forensic mathematics – the evidential value of a rare haplotype.
      to avoid κ = 1.
      On the basis of the EMMA [
      • Rock A.W.
      • Dur A.
      • van Oven M.
      • Parson W.
      Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA).
      ] analyses and comparisons to Build 16 of PhyloTree [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ], 393 distinct named haplogroups were assigned to the 588 haplotypes reported in this study (Tables S2–S4). Across the three population samples, all major haplogroups were represented except L4, L5, L6, O, P, Q, S and Z. The frequency of each major haplogroup by population is given in Table 3, and Table S5 details the specific haplogroups present in each population at greater than 5.0%. The level of phylogenetic resolution of the haplogroups in the latter table was selected to ease more direct comparison to previous, CR-based mtDNA studies; however more highly resolved haplogroup categorizations are included where the frequencies also exceed 5%. These data provide a snapshot of the predominant lineages found in each of the population samples.
      Table 3Haplogroup frequencies by population. Frequencies for each major haplogroup for each population are given in bold. Where more than one of four the biogeographic ancestries (African [AF], East Asian [EA], West Eurasian [WE], and Native American [NA]) are represented in the haplotypes assigned to each major haplogroup, subhaplogroup percentages (italicized) are also included. When more than one ancestry group could have been assigned due to overlapping geographic distributions, the ancestry group that was assigned is underlined. Percentage totals for each population group may not appear to equal 100.0% due to decimal place rounding for each haplogroup.
      HaplogroupAfrican AmericanU.S. CaucasianU.S. Hispanic
      A1.8%1.1%26.5%
      A2 (NA)1.2%0.8%26.5%
      A5, A10 (EA)0.6%0.4%
      B0.6%1.5%16.1%
      B2 (NA)0.6%1.1%15.5%
      B4 (EA)0.4%0.6%
      C0.6%0.8%12.3%
      C1b, C1c, C4c (NA)0.6%0.8%12.3%
      D0.6%5.8%
      D1, D4h3 (NA)5.8%
      D4e (EA)0.6%
      E0.6%
      F0.6%0.4%
      G0.4%
      H1.8%36.5%11.6%
      HV2.3%
      I2.3%1.3%
      J13.7%1.3%
      K1.2%8.0%3.9%
      L02.9%0.6%
      L117.1%2.6%
      L234.1%0.8%1.9%
      L334.7%7.1%
      M1.2%0.4%
      M1 (WE/AF)0.6%
      M7 (EA)0.6%0.4%
      N0.6%0.4%
      N1a (WE/AF)0.4%
      N1b (WE)0.6%
      T9.9%2.6%
      U0.6%14.8%3.9%
      U2, U3, U4, U5 (WE)13.7%3.9%
      U6a3c (WE/AF)0.6%
      U6a7a (WE/AF)0.8%
      U7a (WE)0.4%
      V1.2%3.0%
      W2.7%1.3%
      X0.6%1.1%
      X2b, X2c, X2i (WE)1.1%
      X2a (NA)0.6%
      Y0.6%
      Based on the assigned haplogroups, the 588 mtGenome haplotypes were classified into one of four broad biogeographic ancestry categories: African, East Asian, West Eurasian and Native American (Fig. 1). As has been previously reported [
      • Lao O.
      • Vallone P.M.
      • Coble M.D.
      • Diegoli T.M.
      • van Oven M.
      • van der Gaag K.J.
      • et al.
      Evaluating self-declared ancestry of U.S. Americans with autosomal, Y-chromosomal and mitochondrial DNA.
      ], self-identified ancestry was highly correlated with maternal biogeographic ancestry for the African American and U.S. Caucasian populations. For the African American dataset, the vast majority of haplotypes (90.0%) were assigned to haplogroups L0, L1, L2 and L3; whereas only 2.4%, 4.7% and 2.9% of the haplotypes represent East Asian, West Eurasian and Native American ancestry, respectively. Similarly, 94.7% of the U.S. Caucasian haplotypes in this population sample are of West Eurasian ancestry, with only minor contributions from African, East Asian and Native American lineages (0.8%, 1.9% and 2.7%, respectively). By contrast, while the majority (60.0%) of the U.S. Hispanic population sample was comprised of Native American lineages, West Eurasian and African maternal ancestries were represented in substantial proportions (25.8% and 12.3% of haplotypes, respectively).
      Figure thumbnail gr1
      Fig. 1Biogeographic ancestry proportions in each of the three U.S. population group samples. Haplotypes for each population were assigned to one of four broad biogeographic ancestry categories (African, East Asian, West Eurasian and Native American) on the basis of EMMA
      [
      • Rock A.W.
      • Dur A.
      • van Oven M.
      • Parson W.
      Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA).
      ]
      estimated haplogroups using Phylotree build 16
      [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ]
      .
      Comparisons between the population samples reported here and previously published CR-based datasets were made on the basis of biogeographic ancestry proportions, as these can typically be ascertained for most haplotypes given CR data alone. Table 4 provides the ancestry percentages for the current study as well as for two previous studies for each of the three U.S. population groups [
      • Lao O.
      • Vallone P.M.
      • Coble M.D.
      • Diegoli T.M.
      • van Oven M.
      • van der Gaag K.J.
      • et al.
      Evaluating self-declared ancestry of U.S. Americans with autosomal, Y-chromosomal and mitochondrial DNA.
      ,
      • Allard M.W.
      • Polanskey D.
      • Miller K.
      • Wilson M.R.
      • Monson K.L.
      • Budowle B.
      Characterization of human control region sequences of the African American SWGDAM forensic mtDNA data set.
      ,
      • Allard M.W.
      • Polanskey D.
      • Wilson M.R.
      • Monson K.L.
      • Budowle B.
      Evaluation of variation in control region sequences for Hispanic individuals in the SWGDAM mtDNA data set.
      ,
      • Goncalves V.F.
      • Prosdocimi F.
      • Santos L.S.
      • Ortega J.M.
      • Pena S.D.
      Sex-biased gene flow in African Americans but not in American Caucasians.
      ,
      • Saunier J.L.
      • Irwin J.A.
      • Just R.S.
      • O’Callaghan J.
      • Parsons T.J.
      Mitochondrial control region sequences from a U.S. “Hispanic” population sample.
      ,
      • Diegoli T.M.
      • Irwin J.A.
      • Just R.S.
      • Saunier J.L.
      • O’Callaghan J.E.
      • Parsons T.J.
      Mitochondrial control region sequences from an African American population sample.
      ]. For the African American and U.S. Caucasian populations, the proportion of haplotypes reflecting the predominant ancestry is not statistically significantly different between this and previous studies. However, for the U.S. Hispanic population, the differing proportions of Native American haplotypes across three population samples (this study, [
      • Saunier J.L.
      • Irwin J.A.
      • Just R.S.
      • O’Callaghan J.
      • Parsons T.J.
      Mitochondrial control region sequences from a U.S. “Hispanic” population sample.
      ] and [
      • Allard M.W.
      • Polanskey D.
      • Wilson M.R.
      • Monson K.L.
      • Budowle B.
      Evaluation of variation in control region sequences for Hispanic individuals in the SWGDAM mtDNA data set.
      ]) are significant (p = 0.007). Specifically, the proportion of Native American haplotypes in the U.S. Hispanic population sample reported here differs significantly from that reported in the Allard et al. [
      • Allard M.W.
      • Polanskey D.
      • Wilson M.R.
      • Monson K.L.
      • Budowle B.
      Evaluation of variation in control region sequences for Hispanic individuals in the SWGDAM mtDNA data set.
      ] study (p = 0.008), even after Bonferroni correction for multiple tests. This is most likely due to differences in geographic sampling, which will reflect the substantial regional differences in the Native American component of a U.S. Hispanic population sample [
      • Irwin J.A.
      • Saunier J.L.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • Just R.S.
      • et al.
      Development and expansion of high-quality control region databases to improve forensic mtDNA evidence interpretation.
      ]. Along these lines, the proportion of haplotypes representing Native American maternal ancestry in a recently published Southwest Hispanic population sample from Texas (71.7%; [
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ]) is highly similar to the frequency of Native American haplotypes (70.8%) in the Allard et al. study [
      • Allard M.W.
      • Polanskey D.
      • Wilson M.R.
      • Monson K.L.
      • Budowle B.
      Evaluation of variation in control region sequences for Hispanic individuals in the SWGDAM mtDNA data set.
      ].
      Table 4Biogeographic ancestry proportions for each U.S. population from this study and previous CR-based studies. The maternal biogeographic ancestry proportions inferred for each of the three U.S. populations based on full mtGenome data (this study) and CR data (previous studies). When the proportion of haplotypes assigned to the predominant biogeographic ancestry for each population group (highlighted rows in the table) were compared, only the frequency of Native American haplotypes in the U.S. Hispanic population sample in our study versus the Allard et al.
      • Allard M.W.
      • Polanskey D.
      • Wilson M.R.
      • Monson K.L.
      • Budowle B.
      Evaluation of variation in control region sequences for Hispanic individuals in the SWGDAM mtDNA data set.
      data differed significantly (p = 0.007).
      African AmericanThis study (n = 170)Diegoli et al.
      • Diegoli T.M.
      • Irwin J.A.
      • Just R.S.
      • Saunier J.L.
      • O’Callaghan J.E.
      • Parsons T.J.
      Mitochondrial control region sequences from an African American population sample.
      (n = 248)
      Allard et al.
      • Allard M.W.
      • Polanskey D.
      • Miller K.
      • Wilson M.R.
      • Monson K.L.
      • Budowle B.
      Characterization of human control region sequences of the African American SWGDAM forensic mtDNA data set.
      (n = 1148)
      African90.0%93.1%91.6%
      East Asian2.4%1.1%
      Cannot be adequately separated based on the data presented in the papers.
      West Eurasian4.7%4.3%5.1%
      Native American2.9%0.7%
      Cannot be adequately separated based on the data presented in the papers.
      U.S. CaucasianThis study (n = 263)Gonçalves et al.
      • Goncalves V.F.
      • Prosdocimi F.
      • Santos L.S.
      • Ortega J.M.
      • Pena S.D.
      Sex-biased gene flow in African Americans but not in American Caucasians.
      (n = 1387)
      Lao et al.
      • Lao O.
      • Vallone P.M.
      • Coble M.D.
      • Diegoli T.M.
      • van Oven M.
      • van der Gaag K.J.
      • et al.
      Evaluating self-declared ancestry of U.S. Americans with autosomal, Y-chromosomal and mitochondrial DNA.
      (n = 245)
      African0.8%0.9%
      Not reported.
      East Asian1.9%
      Cannot be adequately separated based on the data presented in the papers.
      Not reported.
      West Eurasian94.7%96.9%96.7%
      Native American2.7%
      Cannot be adequately separated based on the data presented in the papers.
      Not reported.
      U.S. HispanicThis study (n = 155)Saunier et al.
      • Saunier J.L.
      • Irwin J.A.
      • Just R.S.
      • O’Callaghan J.
      • Parsons T.J.
      Mitochondrial control region sequences from a U.S. “Hispanic” population sample.
      (n = 128)
      Allard et al.
      • Allard M.W.
      • Polanskey D.
      • Wilson M.R.
      • Monson K.L.
      • Budowle B.
      Evaluation of variation in control region sequences for Hispanic individuals in the SWGDAM mtDNA data set.
      (n = 686)
      African12.3%14.8%11.8%
      East Asian1.9%1.6%
      West Eurasian25.8%22.7%17.8%
      Native American60.0%60.9%70.8%
      Significantly different from the proportion reported in this study.
      a Cannot be adequately separated based on the data presented in the papers.
      b Not reported.
      Significantly different from the proportion reported in this study.
      In addition to comparisons based on inferred maternal biogeographic ancestry, we also compared the haplotype distribution for the African American population sample reported in this study to that described by Salas et al. [
      • Salas A.
      • Carracedo A.
      • Richards M.
      • Macaulay V.
      Charting the ancestry of African Americans.
      ] in their analysis of an FBI dataset [
      • Monson K.L.
      • Miller K.W.P.
      • Wilson M.R.
      • Dizzino J.A.
      • Budowle B.
      The mtDNA population database: an integrated software and database resource for forensic comparisons.
      ]. When using the same haplogroup categories and level of phylogenetic resolution, the composition of our African American sample (Fig. S3) is nearly identical to Fig. 1 in Salas et al. [
      • Salas A.
      • Carracedo A.
      • Richards M.
      • Macaulay V.
      Charting the ancestry of African Americans.
      ], and reflects the predominantly West African, west-central African and southwestern African origins of the mtDNA lineages present in U.S. haplotypes of recent African descent reported by the authors and in other studies [
      • Salas A.
      • Richards M.
      • Lareu M.V.
      • Scozzari R.
      • Coppa A.
      • Torroni A.
      • et al.
      The African diaspora: mitochondrial DNA and the Atlantic slave trade.
      ,
      • Ely B.
      • Wilson J.L.
      • Jackson F.
      • Jackson B.A.
      African–American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups.
      ,
      • Stefflova K.
      • Dulik M.C.
      • Barnholtz-Sloan J.S.
      • Pai A.A.
      • Walker A.H.
      • Rebbeck T.R.
      Dissecting the within-Africa ancestry of populations of African descent in the Americas.
      ].
      The composition of the African American, U.S. Caucasian and U.S. Hispanic populations, and the extent of the diversity within each of the ancestry groups that contribute to them, are reflected in pairwise comparisons performed for (a) each population sample and (b) all samples ascribed to each of the four biogeographic ancestry categories. Fig. 2 displays histograms of pairwise comparisons for both the full mtGenome and the CR only, for each of the three populations and three of the four ancestry groups, plotted by the proportion of comparisons performed to normalize for the differing sample sizes. The average number of pairwise differences for each of these sets of comparisons are reported in Table S6. When the entire mtGenome was considered, the U.S. Caucasian population sample (Fig. 2b) and the haplotypes of West Eurasian ancestry (Fig. 2e) had asymmetrical bimodal pairwise distributions, with the first, smaller peak representing the comparisons between recently diverged lineages in the dataset, and the second, larger peak representing the comparisons between more distantly related haplotypes. When these same analyses were performed with the comparison restricted to the CR (Fig. 2h and k), the distributions were unimodal and Poisson-like (though still significantly different from a Poisson distribution; p < 0.0001 for both). For the U.S. Hispanic dataset, Fig. 2c displays an asymmetrical bimodal distribution similar to the U.S. Caucasians, but with a substantial tail to the right that represents comparisons to and between the African ancestry haplotypes present in the population sample. The Native American ancestry comparisons (Fig. 2f and l) are sharply bimodal and more symmetrical, reflecting the origins of Native Americans and the genetic distance between the haplotypes in this sample set (primarily, haplogroups A and B from macrohaplogroup N, and haplogroups C and D from macrohaplogroup M). The comparisons between these haplotypes based on the CR alone (Fig. 2l) are the only CR pairwise distribution that closely mirrors the shape of the distribution based on the full mtGenome. In contrast to the other sample sets, comparisons of both the African American population sample and the African ancestry lineages for the complete mtGenome resulted in multimodal distributions (Fig. 2a and d) and high average pairwise numbers of differences (Table S6). In comparison to the U.S. Caucasian and U.S. Hispanic populations, fewer of the African American haplotypes are highly similar to one another across the entire mtGenome, and a much greater number are genetically very distant. Consistent with results from previous studies of African American population samples [
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ,
      • Salas A.
      • Carracedo A.
      • Richards M.
      • Macaulay V.
      Charting the ancestry of African Americans.
      ,
      • Salas A.
      • Richards M.
      • Lareu M.V.
      • Scozzari R.
      • Coppa A.
      • Torroni A.
      • et al.
      The African diaspora: mitochondrial DNA and the Atlantic slave trade.
      ,
      • Ely B.
      • Wilson J.L.
      • Jackson F.
      • Jackson B.A.
      African–American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups.
      ,
      • Stefflova K.
      • Dulik M.C.
      • Barnholtz-Sloan J.S.
      • Pai A.A.
      • Walker A.H.
      • Rebbeck T.R.
      Dissecting the within-Africa ancestry of populations of African descent in the Americas.
      ], the distributions for these two comparisons underscore the extensive mtDNA diversity that exists within the African ancestry component of U.S. populations.
      Figure thumbnail gr2
      Fig. 2Haplotype pairwise comparisons. Pairwise comparisons of the haplotypes were performed for each of the three populations and three of the four biogeographic ancestry groups (African, West Eurasian and Native American). Comparisons for the biogeographic ancestry groups utilized all haplotypes assigned to the ancestry group, regardless of population. The y-axis indicates the proportion of comparisons performed (to normalize for differing sample sizes), and the x-axis represents the number of differences. Histograms on the left side of the figure (panels a through f) represent comparisons performed using the complete mtGenome; whereas for the comparisons on the right side of the figure (g through l), the data compared were restricted to the CR. For all analyses, length insertions at positions 309, 573 and 16193 were ignored.

      3.3 Indels and heteroplasmy

      Length heteroplasmy (LHP) in the CR has been well-characterized in a previous study [
      • Irwin J.A.
      • Saunier J.L.
      • Niederstatter H.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • et al.
      Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
      ] with a much larger sample size than we report here, and the observed incidence of LHP across the complete CR in our dataset is generally consistent with previous reports (see Table S7). However, a few observations from our data are worth noting. Overall, we observed LHP in hypervariable region 1 (HV1) in 17.5% of individuals. Consistent with earlier examinations [
      • Irwin J.A.
      • Saunier J.L.
      • Niederstatter H.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • et al.
      Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
      ,
      • Lee H.Y.
      • Chung U.
      • Yoo J.E.
      • Park M.J.
      • Shin K.J.
      Quantitative and qualitative profiling of mitochondrial DNA length heteroplasmy.
      ,
      • Melton T.
      Mitochondrial DNA heteroplasmy.
      ], LHP in HV1 was observed in every sample in which a transition at position 16,189 resulted in a homopolymer of nine or more cytosine residues, and no LHP was observed when seven or fewer cytosine residues were present. Among the 13 samples in which some combination of transitions and insertions in HV1 resulted in a homopolymer consisting of exactly eight cytosines, eight samples had detectible LHP. In the remaining five samples, LHP was either not present or was too minor to distinguish from sequence background/noise. The incidence of HV1 LHP across all 588 samples in this study is significantly higher (p = 0.001) than the 5.0% recently described for a set of 101 western European individuals [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ]. When our data were considered by population, though, the observed frequency of HV1 LHP varied significantly (p < 0.00001), with a high of 25.2% in the U.S. Hispanic population, and a low of 9.1% in the U.S. Caucasian population (Table S7). This latter value is relatively consistent with the data reported by Ramos et al. [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ]; and the differences we observed by population are largely explained by (a) the nucleotide state at position 16189 (C or T), and (b) the presence or absence of a homopolymer with at least eight cytosine residues, when these factors are considered by major haplogroup (see Fig. S4).
      LHP in the 523-524 AC repeat region was clearly apparent (readily observed above sequence background and/or noise upon initial inspection of the raw data) in 5.3% of the samples in our dataset. The majority (65%) of instances occurred in samples with at least six dinucleotide repeats, and all 13 haplotypes with seven or more AC repeats had clear LHP. This result is consistent with a previous report on LHP in the AC repeat region, which found “pronounced” AC repeat LHP in 4.3% of samples, and generally in individuals with six or more dinucleotide repeats [
      • Irwin J.A.
      • Saunier J.L.
      • Niederstatter H.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • et al.
      Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
      ]. In addition to the LHP observed in this and the three other expected regions (in HV1 around position 16193, in HV2 around position 309, and in HV3 around position 573), a single sample exhibited one further LHP in the CR, at position 463. This haplotype has T to C transitions at positions 454, 455 and 460, resulting in a 10 bp cytosine homopolymer. Overall, across the 588 haplotypes, 374 individuals (63.6%) exhibited CR LHP, and 87 individuals (14.8%) possessed LHP in more than one portion of the CR.
      LHP associated with indels in the coding region was observed in eleven instances across our three datasets (1.9% of samples), at five of the 18 coding region positions at which indels were found (Table 5). In four individuals, a T to C transition at position 961 resulted in a 10 bp polycytosine tract, and all four of these haplotypes exhibited LHP at position 965. Similarly, a T to C transition at position 8277 resulted in a 7 bp polycytosine stretch in three individuals; and in two of these, cytosine insertions (two or three) and LHP were observed. In the third individual, no additional cytosines were present, and no LHP could be detected. LHP was also observed in one sample at position 8287, due to a T to C transition at 8286 and cytosine insertions that resulted in a 12 bp cytosine homopolymer. At position 5899, no LHP was detected when only a single cytosine was inserted, but LHP was observed in the three samples with six or more C insertions. And finally, one sample had LHP of the 8281-8289 9 bp insertion. In this individual at least two length variants were detected, and the majority molecule was two 9 bp insertions.
      Table 5Coding region indels. Across all 588 haplotypes, indels were detected at 18 different positions in the coding region. At three of these 18 positions (960, 5899 and 8289), both insertions and deletions were observed. LHP was detected at five of the 18 positions, in eleven total instances. While observation of an indel in multiple individuals does not necessarily imply multiple occurrences of insertion or deletion at the position (as some indels are primarily or exclusively haplogroup-associated), the number of observations does provides some indication of how frequently each indel might be observed in a population sample.
      Indels relative to the rCRS
      • Anderson S.
      • Bankier A.T.
      • Barrell B.G.
      • de Bruijn M.H.
      • Coulson A.R.
      • Drouin J.
      • et al.
      Sequence and organization of the human mitochondrial genome.
      ,
      • Andrews R.M.
      • Kubacka I.
      • Chinnery P.F.
      • Lightowlers R.N.
      • Turnbull D.M.
      • Howell N.
      Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.
      Excludes 3107 del.
      Number of individualsInstances of associated LHP
      595.1A1
      960 del1
      960.XC5
      965.XC44
      2156.1A3
      2232.1A3
      2395 del13
      2887-2888 del1
      3307.1A1
      4317 del1
      5752 del1
      5899 del1
      5752 del1
      5899 del1
      5899.XC123
      8278.XC22
      8287.XC11
      8281-8289 9 bp del39
      8289.X 9 bp ins71
      12241 del1
      15944 del32
      a Excludes 3107 del.
      In addition to the LHP observed at coding region positions with indels relative to the rCRS, 88.8% of samples had detectible LHP around position 12425. Positions 12418-12425 are an 8 bp polyadenine tract, and a mixture of molecules in this region has been previously described (in a report on mtDNA heteroplasmy from MPS data [
      • Li M.
      • Schonberg A.
      • Schaefer M.
      • Schroeder R.
      • Nasidze I.
      • Stoneking M.
      Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes.
      ], and in multiple cancer studies as reviewed in Lee et al. [
      • Lee H.C.
      • Huang K.H.
      • Yeh T.S.
      • Chi C.W.
      Somatic alterations in mitochondrial DNA and mitochondrial dysfunction in gastric cancer progression.
      ]). In our Sanger data, LHP in this region generally appeared as a mixture of two molecules consisting of seven or eight adenine residues (see Fig. S5 for an example). In all cases the majority molecule matched the rCRS (eight adenines; [
      • Anderson S.
      • Bankier A.T.
      • Barrell B.G.
      • de Bruijn M.H.
      • Coulson A.R.
      • Drouin J.
      • et al.
      Sequence and organization of the human mitochondrial genome.
      ,
      • Andrews R.M.
      • Kubacka I.
      • Chinnery P.F.
      • Lightowlers R.N.
      • Turnbull D.M.
      • Howell N.
      Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.
      ]), and the LHP was generally minor enough that it did not impact sequence coverage (i.e. in most cases, sequences did not need to be trimmed). Among most of the 66 individuals in which LHP at 12425 was not identified or could not be confidently called, nearly all sequences in the region had noise (i.e. background) to the extent that the very low level LHP typically observed at 12,425 would be obscured or difficult to detect. However, for two of the samples, a transition at position 12425 appears to have prevented LHP.
      The frequency of point heteroplasmy (PHP) in the 588 haplotypes was also examined (findings are summarized in Table 6, Table 7). Across the entire mtGenome, a total of 166 PHPs, in 140 individuals (23.8%) were identified. Twenty-five samples (4.3%) exhibited more than one PHP (24 samples had two PHPs, and one had three PHPs); and of the individuals with PHP, 17.9% had multiple PHPs. The incidence of PHP across the entire mtGenome varied significantly between the three populations (p = 0.029). However, when pairwise comparisons of the populations were performed, only the comparison between the African American and U.S. Hispanic populations was significant after Bonferroni correction for multiple tests (p = 0.007992), and the differences between populations were not significant when the CR and coding region PHPs were considered separately. In a large study of more than 5000 individuals, Irwin et al. [
      • Irwin J.A.
      • Saunier J.L.
      • Niederstatter H.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • et al.
      Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
      ] found significant variation in the incidence of CR PHP between multiple populations, and postulated the differences might be due to the differing mtDNA lineages comprising each of the populations. As Table 3 and Fig. 1 demonstrate, there is certainly extreme variation in the composition of each of the three U.S. populations described here. Consistent with a recent study of heteroplasmy in complete mtGenomes [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ], though, no significant differences in the frequency of PHP by haplogroup across the entire mtGenome were observed in our data, even when statistical analysis was restricted to the eleven major haplogroups with greater than five PHPs (see Table S8 for the incidence of PHP by haplogroup). Similarly, no significant differences by haplogroup were observed when PHPs in the CR and the coding region were considered separately. In the case of the present study and the results reported by Ramos et al. [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ], it may be that the numbers of samples with PHP on a per-haplogroup basis are simply too small to detect any non-random differences.
      Table 6Point heteroplasmy statistics across all 588 samples and by population. PHP statistics were calculated for all 588 haplotypes and for each of the three U.S. populations. A total of 140 individuals (23.8%) had at least one PHP; and among those individuals with PHP, 25 (17.9%) had more than one PHP. Thus, across the entire mtGenome, multiple PHPs were seen within one individual 4.3% of the time. The highest number of PHPs observed within a single individual was three.
      All haplotypesAfrican AmericanU.S. CaucasianU.S. Hispanic
      n (individuals)588170263155
      # of PHP166656833
      # (%) of individuals with PHP140 (23.8%)51 (30.0%)62 (23.6%)27 (17.4%)
      # (%) of individuals with >1 PHP25 (4.3%)13 (7.6%)6 (2.3%)6 (3.9%)
      % of individuals with PHP that have >1 PHP17.9%25.5%9.7%22.2%
      # (%) of individuals with 2 PHP24 (4.1%)12 (7.1%)6 (2.3%)6 (3.9%)
      # (%) of individuals with 3 PHP1 (0.2%)1 (0.6%)0 (0.0%)0 (0.0%)
      # (%) of individuals with CR PHP64 (9.9%)28 (13.5%)24 (8.8%)12 (7.7%)
      # (%) of individuals with coding region PHP102 (15.8%)37 (19.4%)44 (15.6%)21 (12.3%)
      Table 7Point heteroplasmy statistics by region. PHP statistics were calculated for the CR and the coding region. The number, percentage and ratio of transitions (separated by type) and transversions are listed for each region of the molecule.
      CRCoding region
      # of PHP64102
      # (%) of individuals with PHP58 (9.9%)93 (15.8%)
      # of positions at which PHP was observed44102
      # of PHP observed in >1 individual10
      Both 228K and 228R were observed; the total number of positions at which PHP was observed in >1 individual is 11.
      0
      % of individuals with >1 PHP in the region0.85%1.53%
      # (%) of PHPs that represented transitions62 (96.9%)101 (99.0%)
      # (%) of PHPs that were pyrimidinepyrimidine38 (59.4%)41 (40.2%)
      # (%) of PHPs that were purinepurine24 (37.5%)60 (58.8%)
      Ratio of pyrimidine to purine PHPs1.6:10.7:1
      # (%) of PHPs that represented transversions2 (3.1%)1 (1.0%)
      Ratio of transition to transversion PHPs31:1101:1
      a Both 228K and 228R were observed; the total number of positions at which PHP was observed in >1 individual is 11.
      A complete list of the mtGenome positions at which PHP was detected is given in Table S9. The 64 PHPs observed in the CR were found in 58 of the 588 individuals (9.9%), at 44 different positions. For a majority of these positions (75%), PHP was observed in just one individual. Eight positions (18%) were heteroplasmic in two individuals (one of these positions, 228, was observed as both 228R and 228K); and three positions – 189, 152 and 16093 – were heteroplasmic in four, five and six individuals, respectively. Several previous examinations of PHP in the CR have indicated that both 16093 and 152 may be hotspots for heteroplasmy [
      • Irwin J.A.
      • Saunier J.L.
      • Niederstatter H.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • et al.
      Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
      ,
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ,
      • Santos C.
      • Sierra B.
      • Alvarez L.
      • Ramos A.
      • Fernandez E.
      • Nogues R.
      • et al.
      Frequency and pattern of heteroplasmy in the control region of human mitochondrial DNA.
      ,
      • Tully L.A.
      • Parsons T.J.
      • Steighner R.J.
      • Holland M.M.
      • Marino M.A.
      • Prenger V.L.
      A sensitive denaturing gradient-Gel electrophoresis assay reveals a high frequency of heteroplasmy in hypervariable region 1 of the human mtDNA control region.
      ,
      • Melton T.
      • Nelson K.
      Forensic mitochondrial DNA analysis: two years of commercial casework experience in the United States.
      ]. However, to our knowledge a high observed incidence of PHP at position 189 has only been reported in muscle tissue samples associated with increased age [
      • Calloway C.D.
      • Reynolds R.L.
      • Herrin Jr., G.L.
      • Anderson W.W.
      The frequency of heteroplasmy in the HVII region of mtDNA differs across tissue types and increases with age.
      ,
      • Theves C.
      • Keyser-Tracqui C.
      • Crubezy E.
      • Salles J.P.
      • Ludes B.
      • Telmon N.
      Detection and quantification of the age-related point mutation A189G in the human mitochondrial DNA.
      ], and in association with increased BMI and insulin resistance [
      • Andrew T.
      • Calloway C.D.
      • Stuart S.
      • Lee S.H.
      • Gill R.
      • Clement G.
      • et al.
      A twin study of mitochondrial DNA polymorphisms shows that heteroplasmy at multiple sites is associated with mtDNA variant 16093 but not with zygosity.
      ] (this excludes the data reported by He et al. [
      • He Y.
      • Wu J.
      • Dressman D.C.
      • Iacobuzio-Donahue C.
      • Markowitz S.D.
      • Velculescu V.E.
      • et al.
      Heteroplasmic mitochondrial DNA mutations in normal and tumour cells.
      ], which has been shown to be problematic [
      • Bandelt H.J.
      • Salas A.
      Current next generation sequencing technology may not meet forensic standards.
      ]), though position 189 is recognized as one of the faster mutating sites in the mtGenome [
      • Meyer S.
      • Weiss G.
      • von Haeseler A.
      Pattern of nucleotide substitution and rate heterogeneity in the hypervariable regions I and II of human mtDNA.
      ,
      • Stoneking M.
      Hypervariable sites in the mtDNA control region are mutational hotspots.
      ,
      • Strouss K.
      Relative evolutionary rate estimation for sites in the mtDNA control region.
      ,
      • Howell N.
      • Elson J.L.
      • Howell C.
      • Turnbull D.M.
      Relative rates of evolution in the coding and control regions of African mtDNAs.
      ,
      • Soares P.
      • Ermini L.
      • Thomson N.
      • Mormina M.
      • Rito T.
      • Rohl A.
      • et al.
      Correcting for purifying selection: an improved human mitochondrial molecular clock.
      ]. In our data, PHP at 189 occurred on varied haplotypic backgrounds (haplogroups L3b1a4, U5a1d1, J1c3 and H1ag1), and in two of the three populations. Visually estimated percentages of the minor molecule across the four samples with 189 PHP ranged from 5% to 15%. In all four cases the variant nucleotide was most clearly apparent in the reverse sequences covering the position, but was confirmed by at least one (though typically more than one) forward sequence. In three of the four cases of PHP at 189, the majority molecule matched the rCRS. No age or health-related information was available for the anonymized blood serum specimens used for the current study.
      A total of 102 PHPs were observed in the coding region. Nine individuals exhibited more than one coding region PHP, and thus the total number of individuals with coding region PHP was 93 (15.8%). However, each PHP was unique in the dataset (observed in only a single individual). The absence of coding region PHPs detected in more than one individual is consistent with the recent analysis by Ramos et al. [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ], which found 21 unique coding region PHPs among 101 individuals. Among the 24 coding region PHPs reported by Li et al. [
      • Li M.
      • Schonberg A.
      • Schaefer M.
      • Schroeder R.
      • Nasidze I.
      • Stoneking M.
      Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes.
      ], one was shared by more than one individual; however this PHP (3492M) is unlikely to be authentic in either individual, given (1) the very low incidence of transversion-type PHPs reported by Ramos et al. [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ] and observed in this study (see below), (2) the very low frequency of substitution at position 3492 (observed just once, and as a transition, among the more than 2000 mtGenomes analyzed by Soares et al. [
      • Soares P.
      • Ermini L.
      • Thomson N.
      • Mormina M.
      • Rito T.
      • Rohl A.
      • et al.
      Correcting for purifying selection: an improved human mitochondrial molecular clock.
      ]), (3) the identification (by the authors themselves) of position 3492 as a sequencing error hot spot, and (4) the coverage dip observed in this region in multiple mtGenome sequencing studies ([
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ,
      • McElhoe J.A.
      • Holland M.M.
      • Makova K.D.
      • Su M.S.
      • Paul I.M.
      • Baker C.H.
      • et al.
      Development and assessment of an optimized next-generation DNA sequencing approach for the mtgenome using the Illumina MiSeq.
      ,
      • Irwin J.
      • Moreno L.
      • Callaghan T.
      Evaluating next generation sequencing technologies for expanded mitochondrial DNA identification capabilities at the FBI lab.
      ]; R. Just, unpublished data; and W. Parson, unpublished data) using Illumina platforms (Illumina, Inc., San Diego, CA). In a slight departure from the absence of authentic shared PHPs in the datasets reported by Ramos et al. [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ], Li et al. [
      • Li M.
      • Schonberg A.
      • Schaefer M.
      • Schroeder R.
      • Nasidze I.
      • Stoneking M.
      Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes.
      ] and in this study, the haplotypes recently published by King et al. [
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ] included three shared PHPs (at positions 1438, 2083, and 8994) among the 58 total coding region PHPs detected (using an 18% threshold) in 283 individuals.
      When 203 coding region PHPs (from the 1103 total mtGenomes published by Ramos et al. [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ], Li et al. [
      • Li M.
      • Schonberg A.
      • Schaefer M.
      • Schroeder R.
      • Nasidze I.
      • Stoneking M.
      Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes.
      ] (minus the 3492M PHPs), King et al. [
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ] and reported in this study) were considered in combination, only five additional PHPs were observed in more than one individual (see Table S10). All five of these positions had low relative substitution rates (1–3) among the 2196 complete mtGenome sequences previously analyzed in a phylogenetic framework by Soares et al. [
      • Soares P.
      • Ermini L.
      • Thomson N.
      • Mormina M.
      • Rito T.
      • Rohl A.
      • et al.
      Correcting for purifying selection: an improved human mitochondrial molecular clock.
      ]. In fact, of the 102 coding region PHPs in our data, only two occurred at positions among the 15 fastest evolving sites in the coding region (and only four among the 50 fastest sites), while nearly half (44%) occurred at positions invariant among the >2000 published mtGenomes included the Soares et al. analysis [
      • Soares P.
      • Ermini L.
      • Thomson N.
      • Mormina M.
      • Rito T.
      • Rohl A.
      • et al.
      Correcting for purifying selection: an improved human mitochondrial molecular clock.
      ] (see Table S9). In combination, these studies suggest that the distribution of heteroplasmy (which should more closely reflect mutation rates than does complete substitution) in the coding region is not consistent with the gamma-distributed relative substitution rates reported for the region [
      • Soares P.
      • Ermini L.
      • Thomson N.
      • Mormina M.
      • Rito T.
      • Rohl A.
      • et al.
      Correcting for purifying selection: an improved human mitochondrial molecular clock.
      ]. This finding is in contrast to the general correlation (with a few exceptions) between heteroplasmic hotspots and mutation/substitution hotspots in the CR [
      • Irwin J.A.
      • Saunier J.L.
      • Niederstatter H.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • et al.
      Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
      ]. The seeming difference between the observed relative heteroplasmy and substitution rates on a position-by-position basis in the coding region has several possible explanations, including selection (at multiple potential levels, e.g. individual, population, etc.), nucleotide state stability/mutability (that may be sequence context dependent), and genetic drift. These factors, alone and in combination, have been previously suggested to explain the difference between phylogenetic and pedigree substitution rates in the CR [
      • Parsons T.J.
      • Muniec D.S.
      • Sullivan K.
      • Woodyatt N.
      • Alliston-Greiner R.
      • Wilson M.R.
      • et al.
      A high observed substitution rate in the human mitochondrial DNA control region.
      ,
      • Howell N.
      • Smejkal C.B.
      • Mackey D.A.
      • Chinnery P.F.
      • Turnbull D.M.
      • Herrnstadt C.
      The pedigree rate of sequence divergence in the human mitochondrial genome: there is a difference between phylogenetic and pedigree rates.
      ,
      • Santos C.
      • Montiel R.
      • Sierra B.
      • Bettencourt C.
      • Fernandez E.
      • Alvarez L.
      • et al.
      Understanding differences between phylogenetic and pedigree-derived mtDNA mutation rate: a model using families from the Azores Islands (Portugal).
      ], departures from the correlation between observed relative substitution and heteroplasmy rates by position in the CR [
      • Irwin J.A.
      • Saunier J.L.
      • Niederstatter H.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • et al.
      Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
      ,
      • Santos C.
      • Sierra B.
      • Alvarez L.
      • Ramos A.
      • Fernandez E.
      • Nogues R.
      • et al.
      Frequency and pattern of heteroplasmy in the control region of human mitochondrial DNA.
      ,
      • Tully L.A.
      • Parsons T.J.
      • Steighner R.J.
      • Holland M.M.
      • Marino M.A.
      • Prenger V.L.
      A sensitive denaturing gradient-Gel electrophoresis assay reveals a high frequency of heteroplasmy in hypervariable region 1 of the human mtDNA control region.
      ] and patterns of substitution ([
      • Howell N.
      • Elson J.L.
      • Howell C.
      • Turnbull D.M.
      Relative rates of evolution in the coding and control regions of African mtDNAs.
      ,
      • Soares P.
      • Ermini L.
      • Thomson N.
      • Mormina M.
      • Rito T.
      • Rohl A.
      • et al.
      Correcting for purifying selection: an improved human mitochondrial molecular clock.
      ,
      • Elson J.L.
      • Turnbull D.M.
      • Howell N.
      Comparative genomics and the evolution of human mitochondrial DNA: assessing the effects of selection.
      ,
      • Kivisild T.
      • Shen P.
      • Wall D.P.
      • Do B.
      • Sung R.
      • Davis K.
      • et al.
      The role of selection in the evolution of human mitochondrial genomes.
      ], among others) and heteroplasmy [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ,
      • Avital G.
      • Buchshtav M.
      • Zhidkov I.
      • Tuval Feder J.
      • Dadon S.
      • Rubin E.
      • et al.
      Mitochondrial DNA heteroplasmy in diabetes and normal adults: role of acquired and inherited mutational patterns in twins.
      ] in the coding region.
      In a substantial departure from the above-mentioned studies regarding heteroplasmy across the mtGenome, a very recent examination of mtDNA sequences from 1085 individuals using high coverage depth MPS data and an ∼1% heteroplasmy detection threshold found 4342 total PHPs at 2531 mtDNA positions (of 13,659 positions examined), of which only 69.42% were observed in just a single individual [
      • Ye K.
      • Lu J.
      • Ma F.
      • Keinan A.
      • Gu Z.
      Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals.
      ]. Relying on the same relative substitution rates published by Soares et al. [
      • Soares P.
      • Ermini L.
      • Thomson N.
      • Mormina M.
      • Rito T.
      • Rohl A.
      • et al.
      Correcting for purifying selection: an improved human mitochondrial molecular clock.
      ] referenced above, Ye et al. [
      • Ye K.
      • Lu J.
      • Ma F.
      • Keinan A.
      • Gu Z.
      Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals.
      ] reported a positive correlation between relative substitution rates and heteroplasmy rates (R2 = 0.3702). However, coding region heteroplasmies were not separated from CR heteroplasmies for that analysis, and an association between substitution and heteroplasmy hotspots has been previously described for the CR [
      • Irwin J.A.
      • Saunier J.L.
      • Niederstatter H.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • et al.
      Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
      ]. When we applied the same analysis to all 166 PHPs detected in our study (64 and 102 PHPs in the CR and coding region, respectively), a similar positive correlation was observed (R2 = 0.3003, r = 0.5480; see Fig. S6a) despite the clear lack of correlation between relative substitution rates and heteroplasmy rates among the coding region PHPs in this study. When the same regression analysis was performed using only the 3547 coding region PHPs reported by Ye et al. [
      • Ye K.
      • Lu J.
      • Ma F.
      • Keinan A.
      • Gu Z.
      Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals.
      ], a much weaker positive correlation between relative substitution rates and heteroplasmy rates was observed (R2 = 0.1076, r = 0.3280; see Fig. S6b).
      Additionally, further examination of the PHPs reported by Ye et al. [
      • Ye K.
      • Lu J.
      • Ma F.
      • Keinan A.
      • Gu Z.
      Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals.
      ] indicated that some may be due to mixtures between distinct individuals/samples, rather than true intraindividual mtDNA variation [
      • Just R.S.
      • Irwin J.A.
      • Parson W.
      Questioning the prevalence and reliability of mitochondrial DNA heteroplasmy from massively parallel sequencing data.
      ]. For example, among the 71 PHPs reported for sample HG00740, nearly all of the positions are diagnostic for two distinct mtDNA haplogroups (L1b1a1a and B2b3a; according to Build 16 of PhyloTree [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ]). Similar issues were observed among the PHPs described in another recent report on human mtGenome heteroplasmy [
      • Sosa M.X.
      • Sivakumar I.K.
      • Maragh S.
      • Veeramachaneni V.
      • Hariharan R.
      • Parulekar M.
      • et al.
      Next-generation sequencing of human mitochondrial reference genomes uncovers high heteroplasmy frequency.
      ]. In that paper, nearly all of the 20 PHPs given for sample NA12248 (for example) can be ascribed to one of two haplogroups (U5b2a2b or H1e), and few PHPs that would be expected from a mixture of two samples representing those haplogroups are absent. These findings cast some doubt on the veracity of the incidence and pattern of heteroplasmy reported in the Ye et al. [
      • Ye K.
      • Lu J.
      • Ma F.
      • Keinan A.
      • Gu Z.
      Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals.
      ] and Sosa et al. [
      • Sosa M.X.
      • Sivakumar I.K.
      • Maragh S.
      • Veeramachaneni V.
      • Hariharan R.
      • Parulekar M.
      • et al.
      Next-generation sequencing of human mitochondrial reference genomes uncovers high heteroplasmy frequency.
      ] studies, and thus the conclusions those authors have drawn from the data.
      Among the PHPs observed in the CR in our study, all but two (97%) were transition-type (purine to purine, or pyrimidine to pyrimidine) PHPs; and of these, approximately two-thirds were pyrimidine transitions while one-third were purine transitions (Table 7 and Fig. S9). The 1.6:1 pyrimidine to purine ratio for PHPs in the CR is consistent both with earlier analyses of CR heteroplasmy [
      • Irwin J.A.
      • Saunier J.L.
      • Niederstatter H.
      • Strouss K.M.
      • Sturk K.A.
      • Diegoli T.M.
      • et al.
      Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
      ,
      • Forster L.
      • Forster P.
      • Lutz-Bonengel S.
      • Willkomm H.
      • Brinkmann B.
      Natural radioactivity and human mitochondrial DNA mutations.
      ] and with the approximately 1.3:1 pyrimidine to purine ratio in the nucleotide composition for the region. Only one of the 102 PHPs in the coding region was a transversion-type change, indicating an even more extreme bias toward transition-type heteroplasmies than has been previously reported [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ,
      • Avital G.
      • Buchshtav M.
      • Zhidkov I.
      • Tuval Feder J.
      • Dadon S.
      • Rubin E.
      • et al.
      Mitochondrial DNA heteroplasmy in diabetes and normal adults: role of acquired and inherited mutational patterns in twins.
      ]. And in contrast to the CR, more of the coding region PHPs were purine (59%) versus pyrimidine (41%) transitions, despite a pyrimidine to purine ratio (in terms of average overall nucleotide composition for the coding region) that is nearly identical to the CR. The same phenomenon has been observed in previous studies of both substitution and heteroplasmy in the coding region [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ,
      • Pereira L.
      • Freitas F.
      • Fernandes V.
      • Pereira J.B.
      • Costa M.D.
      • Costa S.
      • et al.
      The diversity present in 5140 human mitochondrial genomes.
      ].
      Fig. 3 displays the proportion of PHPs observed by mtGenome region in our data; and Fig. 4 details both the proportion of positions within each coding region gene at which PHP was observed, and the portion of that variation that would lead to synonymous and nonsynonymous changes to the amino acid if the observed mutations were fixed. In our data, the highest rate of PHP was observed in ATP8 (four PHPs observed across 207 total positions). The lowest rate of PHP was seen in ND3, with heteroplasmy observed at just one of 346 possible positions, followed closely by 12S rRNA. Consistent with previous reports on coding region substitutions [
      • Elson J.L.
      • Turnbull D.M.
      • Howell N.
      Comparative genomics and the evolution of human mitochondrial DNA: assessing the effects of selection.
      ,
      • Pereira L.
      • Freitas F.
      • Fernandes V.
      • Pereira J.B.
      • Costa M.D.
      • Costa S.
      • et al.
      The diversity present in 5140 human mitochondrial genomes.
      ], the highest rate of nonsynonymous variation in our heteroplasmy data was observed in ATP6, where six of seven PHPs would result in amino acid changes if the mutations were to become fixed. This 1:0.17 nonsynonymous to synonymous ratio exceeds the gene with the next highest ratio (CYTB, 1:0.6) more than 3-fold. However, ATP8, with the highest overall rate of PHP in this study, and previously reported to have a high rate of nonsynonymous substitution [
      • Pereira L.
      • Freitas F.
      • Fernandes V.
      • Pereira J.B.
      • Costa M.D.
      • Costa S.
      • et al.
      The diversity present in 5140 human mitochondrial genomes.
      ], had one of the lowest nonsynonymous to synonymous heteroplasmy ratios at 1:3. With regard to codon position, 87% of the 76 PHPs in protein-coding genes were observed in first or third positions, whereas only 10 were observed in the second codon position (see Table S9). However, all first codon position PHPs we detected were nonsynonymous changes. Approximately twice as many PHPs occurred in third versus first codon positions, and the first to second to third position ratio for PHPs was 2.2:1:4.5.
      Figure thumbnail gr3
      Fig. 3Point heteroplasmies by mtDNA region type. PHPs across all samples were categorized into four regions: non-coding, rRNAs, tRNAs, and protein-coding genes. All PHPs in non-coding regions were found in the CR (i.e., no PHPs were observed in the small intergenic non-coding regions).
      Figure thumbnail gr4
      Fig. 4Point heteroplasmy proportions by gene. PHPs across all protein-coding genes plus the two rRNAs and all tRNAs (combined) were plotted by the fraction of potential positions (size of the gene) at which PHPs were observed. Thus, the height of each bar in the histogram indicates the relative rate of mutation observed for each gene. The actual number of PHPs observed for each gene are indicated above the bars. The mutations in the 13 protein-coding genes were categorized as to synonymous or nonsynonymous amino acid changes if the mutations were to become fixed.
      Overall, the nonsynonymous to synonymous change ratio for the 76 PHPs detected in protein-coding genes in our study was 1:1.4, a value that is in close agreement with a recent report on coding region heteroplasmy [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ]. Our ratio is both closer to a neutral model of sequence evolution and significantly different from some previous examinations of patterns of coding region substitution in protein coding genes (1:2.32 from Elson et al. [
      • Elson J.L.
      • Turnbull D.M.
      • Howell N.
      Comparative genomics and the evolution of human mitochondrial DNA: assessing the effects of selection.
      ], p = 0.035; and 1:2.5 from Kivisild et al. [
      • Kivisild T.
      • Shen P.
      • Wall D.P.
      • Do B.
      • Sung R.
      • Davis K.
      • et al.
      The role of selection in the evolution of human mitochondrial genomes.
      ], p = 0.013), but is not significantly different from the overall ratio determined from an evaluation of >5000 published mtGenomes by Pereira et al. (1:1.97, [
      • Pereira L.
      • Freitas F.
      • Fernandes V.
      • Pereira J.B.
      • Costa M.D.
      • Costa S.
      • et al.
      The diversity present in 5140 human mitochondrial genomes.
      ]). However, the ratio from our data was significantly different from the nonsynonymous to synonymous ratio those authors reported for the substitutions with frequencies at 0.1% or greater in the dataset (1:2.69, p = 0.006).
      In addition to calculations of overall nonsynonymous to synonymous change ratios, examinations of protein-coding gene substitutions in previous studies have also found (1) a higher proportion of nonsynonymous variation and (2) higher pathogenicity scores for nonsynonymous substitutions in younger versus older branches in the human mtDNA phylogeny and other species ([
      • Soares P.
      • Ermini L.
      • Thomson N.
      • Mormina M.
      • Rito T.
      • Rohl A.
      • et al.
      Correcting for purifying selection: an improved human mitochondrial molecular clock.
      ,
      • Elson J.L.
      • Turnbull D.M.
      • Howell N.
      Comparative genomics and the evolution of human mitochondrial DNA: assessing the effects of selection.
      ,
      • Kivisild T.
      • Shen P.
      • Wall D.P.
      • Do B.
      • Sung R.
      • Davis K.
      • et al.
      The role of selection in the evolution of human mitochondrial genomes.
      ,
      • Soares P.
      • Abrantes D.
      • Rito T.
      • Thomson N.
      • Radivojac P.
      • Li B.
      • et al.
      Evaluating purifying selection in the mitochondrial DNA of various mammalian species.
      ,
      • Pereira L.
      • Soares P.
      • Radivojac P.
      • Li B.
      • Samuels D.C.
      Comparing phylogeny and the predicted pathogenicity of protein variations reveals equal purifying selection across the global human mtDNA diversity.
      ], among multiple others), both of which provide further evidence that selection is acting to remove deleterious mutations from the mtGenome over time. When we compared the average pathogenicity scores (based on MutPred values [
      • Li B.
      • Krishnan V.G.
      • Mort M.E.
      • Xin F.
      • Kamati K.K.
      • Cooper D.N.
      • et al.
      Automated inference of molecular mechanisms of disease from amino acid substitutions.
      ] reported by Pereira et al. in their Tables S1 and S3 [
      • Pereira L.
      • Soares P.
      • Radivojac P.
      • Li B.
      • Samuels D.C.
      Comparing phylogeny and the predicted pathogenicity of protein variations reveals equal purifying selection across the global human mtDNA diversity.
      ]) for (a) all possible nonsynonymous substitutions across the mtGenome, (b) the 60 nonsynonymous PHPs detected in our haplotypes and reported in three recent studies [
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ,
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ,
      • Li M.
      • Schonberg A.
      • Schaefer M.
      • Schroeder R.
      • Nasidze I.
      • Stoneking M.
      Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes.
      ], and (c) the nonsynonymous substitutions evaluated by Pereira et al. [
      • Pereira L.
      • Soares P.
      • Radivojac P.
      • Li B.
      • Samuels D.C.
      Comparing phylogeny and the predicted pathogenicity of protein variations reveals equal purifying selection across the global human mtDNA diversity.
      ] for mtDNA haplogroup L, M and N trees, the results again indicated that heteroplasmic changes appear closer to a neutral model of sequence evolution than do complete substitutions (Fig. S7). While the difference between the average pathogenicity scores for heteroplasmies versus all possible substitutions was statistically significant (p = 0.01), the average pathogenicity score for the PHPs was also significantly higher (p = 0.0001) than the average for the haplogroup L, M and N substitutions with rho values of zero (i.e., the mutations observed at the tips of the trees) reported by Pereira et al. In other words, the heteroplasmic variants in our study have greater potential for deleterious effect than the most recently acquired complete substitutions in the haplogroup L, M and N lineages analyzed by the authors. Given the relative evolutionary timescales for heteroplasmy versus the fixation of new mutations, these comparisons between heteroplasmic changes and complete substitutions in protein-coding genes across both close and distant human mtDNA lineages thus also appear to provide some further support for the role of purifying selection in the evolution of the mtDNA coding region.

      4. Conclusions

      The 588 complete mtGenome haplotypes that we have reported here were developed according to current best-practice guidelines in forensics for the generation and review of mtDNA population reference data [
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ,
      • Parson W.
      • Gusmao L.
      • Hares D.R.
      • Irwin J.A.
      • Mayr W.R.
      • Morling N.
      • et al.
      DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
      ]. The use of a robust PCR and sequencing strategy, primarily robotic sample handling, electronic data transfer, adherence to phylogenetic alignment rules [
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ,
      • Parson W.
      • Gusmao L.
      • Hares D.R.
      • Irwin J.A.
      • Mayr W.R.
      • Morling N.
      • et al.
      DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
      ,
      • Bandelt H.J.
      • Parson W.
      Consistent treatment of length variants in the human mtDNA control region: a reappraisal.
      ] with reference to the current mtDNA phylogeny [
      • van Oven M.
      • Kayser M.
      Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
      ], repeated reviews of the raw data, and the inclusion of multiple quality control measures ensure that these haplotypes meet the highest data quality standards and are appropriate for forensic use. In terms of data review, though two laboratories highly accustomed to examining mtDNA sequence data were involved in this databasing effort (AFDIL and EMPOP), a small number of haplotype discrepancies (most regarding missed or misidentified heteroplasmies by one laboratory or the other) were encountered when the raw data reviews were compared. In addition, two alignments that did not adhere to the mtDNA phylogeny and were overlooked by both laboratories were later found upon screening all >2000 indels in the 588 haplotypes. While typically very easily resolved by re-review of the raw data, these discrepancies and misalignments (all fully corrected in the final haplotypes reported here) once again highlight the importance of incorporating multiple levels of quality control in the review of mtDNA population reference data generated for forensic purposes.
      The biogeographic ancestry proportions inferred from the full mtGenome haplotypes are consistent with previously-published mtDNA CR datasets for the same three U.S. populations, thus demonstrating that the population samples reported here are as representative as the reference population data on which current haplotype frequency estimates rely. The single exception was the Native American ancestry component of the U.S. Hispanic population sample, which differed significantly between this and one previous study [
      • Allard M.W.
      • Polanskey D.
      • Wilson M.R.
      • Monson K.L.
      • Budowle B.
      Evaluation of variation in control region sequences for Hispanic individuals in the SWGDAM mtDNA data set.
      ]. This is likely explained by geographic sampling differences between the earlier study and the U.S.-wide population sample we report here.
      On average, full mtGenome sequencing increased the proportion of unique haplotypes in each population sample by 19.3% over what would have been achieved with CR sequencing, and by 35.2% over HV1/HV2 sequencing. Though these resolution improvements and the overall paucity of shared mtGenome haplotypes in each population sample (in both this and another recent study [
      • King J.L.
      • LaRue B.L.
      • Novroski N.
      • Stoljarova M.
      • Seo S.B.
      • Zeng X.
      • et al.
      High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
      ]) clearly reveal the discriminatory power of complete mtGenome typing among randomly-sampled individuals, the development of LRs using the currently-recommended [
      • Scientific Working Group on DNA Analysis Methods (SWGDAM)
      Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
      ] Clopper–Pearson method for 95% confidence interval calculations [
      • Clopper C.J.
      • Pearson E.S.
      The use of confidence or fiducial limits illustrated in the case of the binomial.
      ] will largely negate this advantage (in terms of describing the statistical weight of a match for a novel haplotype) until full mtGenome databases are substantially larger. Because of this, and the anticipated movement from CR-only sequencing to typing greater portions of the mtGenome in forensic practice, the question of how best to capture and convey this additional discriminatory information arises. For the specific scenarios presented here, there would seem to be some benefit in statistical approaches that take into account both database size and database composition.
      As the haplotypes reported here are based on high quality Sanger sequence data with minimal noise, these 588 profiles permit the most extensive insight to date into the heteroplasmy observed across a large set of randomly-sampled, population based complete mtDNAs developed to forensic standards. The incidence of PHP across the entire mtGenome that we detected – 23.8% of individuals – is strikingly similar to the PHP frequency described in two previous analyses [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ,
      • Li M.
      • Schonberg A.
      • Schaefer M.
      • Schroeder R.
      • Nasidze I.
      • Stoneking M.
      Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes.
      ]. This PHP rate is substantially lower than the incidence of heteroplasmy reported in recent MPS studies using bioinformatics methods (and in one case, a detection threshold close to 1%) [
      • Ye K.
      • Lu J.
      • Ma F.
      • Keinan A.
      • Gu Z.
      Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals.
      ,
      • Sosa M.X.
      • Sivakumar I.K.
      • Maragh S.
      • Veeramachaneni V.
      • Hariharan R.
      • Parulekar M.
      • et al.
      Next-generation sequencing of human mitochondrial reference genomes uncovers high heteroplasmy frequency.
      ]; yet those higher heteroplasmy rates are questionable due to errors detected in at least some of the data. A far greater proportion of individuals exhibited LHP in our study than has been previously reported [
      • Ramos A.
      • Santos C.
      • Mateiu L.
      • Gonzalez Mdel M.
      • Alvarez L.
      • Azevedo L.
      • et al.
      Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
      ], in largest part due to (1) the LHP we detected in the 12418-12425 adenine homopolymer, and (2) the differences between the populations examined. When PHP and LHP are considered in combination, nearly all individuals (96.4%) in this study were heteroplasmic. Though our data – even when considered in combination with previous studies – provide only a preliminary look at coding region heteroplasmy (versus the extent of information now available on mtDNA CR heteroplasmy), comparisons between coding region heteroplasmy and substitution patterns seem to provide additional support for selection as a mechanism of human mtGenome evolution.
      The complete mtGenome databases representing the African American, U.S. Caucasian and U.S. Hispanic populations that we have developed will be available for query using forensic tools and parameters in an upcoming version of EMPOP (EMPOP3, with expected release in late 2014 [
      • Parson W.
      • Rock A.W.
      EMPOP 3 NGS mitochondrial databasing.
      ]). In addition, the haplotypes are currently available in GenBank and in the electronic supplementary material included with this paper. These extensively vetted and thoroughly examined Sanger-based population reference data provide not only a solid foundation for the generation of haplotype frequency estimates, but can also serve as a benchmark for the evaluation of future mtGenome data developed for forensic purposes. This includes comparative examination of the features (e.g. variable positions, indels, and heteroplasmy) of not only datasets developed as additional population reference data, but also single mtGenome haplotypes – especially those generated using MPS technologies and protocols new to forensics – from casework specimens.

      Acknowledgements

      The authors would like to thank Jon Norris (Future Technologies, Inc.), Richard Coughlin and James Ross (Armed Forces Medical Examiner System) for technical assistance; Minh Nguyen and Chad Ernst (National Institute of Justice) for grant management; Odile Loreille and Charla Marshall (American Registry of Pathology, Armed Forces DNA Identification Laboratory), Michael Cummings (University of Maryland), Lara Adams and Connie Fisher (Federal Bureau of Investigation), and Arne Dür (Institute of Mathematics, University of Innsbruck) for fruitful discussion and/or manuscript review; and Timothy McMahon, James Canik, and Cynthia Thomas (American Registry of Pathology), Shairose Lalani, Lanelle Chisholm, COL Louis Finelli, Lt Col Laura Regan and CAPT Edward Reedy (Armed Forces Medical Examiner System), and Richard Scheithauer (Institute of Legal Medicine, Innsbruck Medical University) for administrative and logistical support. We also thank two anonymous reviewers for their constructive feedback that improved this paper.
      This project was supported by Award No. 2011-MU-MU-K402, awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. The research leading to this publication was also funded in part by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 285487, and by the intramural funding program of the Medical University Innsbruck for young scientists MUI-START, Project 2013042025. The opinions or assertions presented herein are the private views of the authors and should not be construed as official or as reflecting the views of the Department of Justice, Department of Defense, its branches, the U.S. Army Medical Research and Materiel Command, the Armed Forces Medical Examiner System, the Federal Bureau of Investigation, the Michigan State Police or the U.S. Government. Commercial equipment, instruments and materials are identified to specify some experimental procedures. In no case does such identification imply a recommendation or endorsement by the U.S Department of Justice, the U.S. Department of Defense, the U.S. Department of the Army, the Federal Bureau of Investigation, the Michigan State Police or the U.S. Government, nor does it imply that any of the materials, instruments or equipment identified are necessarily the best available for the purpose.

      Appendix A. Supplementary data

      References

        • Margulies M.
        • Egholm M.
        • Altman W.E.
        • Attiya S.
        • Bader J.S.
        • Bemben L.A.
        • et al.
        Genome sequencing in microfabricated high-density picolitre reactors.
        Nature. 2005; 437: 376-380
        • Knapp M.
        • Hofreiter M.
        Next generation sequencing of ancient DNA: requirements, strategies and perspectives.
        Genes (Basel). 2010; 1: 227-243
        • Ho S.Y.
        • Gilbert M.T.
        Ancient mitogenomics.
        Mitochondrion. 2010; 10: 1-11
        • Paijmans J.L.
        • Gilbert M.T.
        • Hofreiter M.
        Mitogenomic analyses from ancient DNA.
        Mol. Phylogenet. Evol. 2013; 69: 404-416
        • Loreille O.
        • Koshinsky H.
        • Fofanov V.Y.
        • Irwin J.A.
        Application of next generation sequencing technologies to the identification of highly degraded unknown soldiers’ remains.
        Forensic Sci. Int. Genet. Suppl. Ser. 2011; 3: e540-e541
        • Templeton J.E.
        • Brotherton P.M.
        • Llamas B.
        • Soubrier J.
        • Haak W.
        • Cooper A.
        • et al.
        DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification.
        Investig. Genet. 2013; 4 (26-2223-4-26)
        • King J.L.
        • LaRue B.L.
        • Novroski N.
        • Stoljarova M.
        • Seo S.B.
        • Zeng X.
        • et al.
        High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq.
        Forensic Sci. Int. Genet. 2014; (in press)https://doi.org/10.1016/j.fsigen.2014.06.001
        • Mikkelsen M.
        • Rockenbauer E.
        • Wächter A.
        • Fendt L.
        • Zimmermann B.
        • Parson W.
        • et al.
        Application of full mitochondrial genome sequencing using 454 GS FLX pyrosequencing.
        Forensic Sci. Int. Genet. Suppl. Ser. 2009; 2: 518-519
        • Holland M.M.
        • McQuillan M.R.
        • O’Hanlon K.A.
        Second generation sequencing allows for mtDNA mixture deconvolution and high resolution detection of heteroplasmy.
        Croat. Med. J. 2011; 52: 299-313
        • Irwin J.
        • Just R.
        • Scheible M.
        • Loreille O.
        Assessing the potential of next generation sequencing technologies for missing persons identification efforts.
        Forensic Sci. Int. Genet. Suppl. Ser. 2011; 3: e447-e448
        • Van Neste C.
        • Van Nieuwerburgh F.
        • Van Hoofstat D.
        • Deforce D.
        Forensic STR analysis using massive parallel sequencing.
        Forensic Sci. Int. Genet. 2012; 6: 810-818
        • Bornman D.M.
        • Hester M.E.
        • Schuetter J.M.
        • Kasoji M.D.
        • Minard-Smith A.
        • Barden C.A.
        • et al.
        Short-read, high-throughput sequencing technology for STR genotyping.
        BioTechniques. 2012; : 1-6
        • Parson W.
        • Strobl C.
        • Huber G.
        • Zimmermann B.
        • Gomes S.M.
        • Souto L.
        • et al.
        Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM).
        Forensic Sci. Int. Genet. 2013; 7: 543-549
        • Rockenbauer E.
        • Hansen S.
        • Mikkelsen M.
        • Borsting C.
        • Morling N.
        Characterization of mutations and sequence variants in the D21S11 locus by next generation sequencing.
        Forensic Sci. Int. Genet. 2014; 8: 68-72
        • Weber-Lehmann J.
        • Schilling E.
        • Gradl G.
        • Richter D.C.
        • Wiehler J.
        • Rolf B.
        Finding the needle in the haystack: differentiating “identical” twins in paternity testing and forensics by ultra-deep next generation sequencing.
        Forensic Sci. Int. Genet. 2014; 9: 42-46
        • Bintz B.J.
        • Dixon G.B.
        • Wilson M.R.
        Simultaneous detection of human mitochondrial DNA and nuclear-inserted mitochondrial-origin sequences (NumtS) using forensic mtDNA amplification strategies and pyrosequencing technology.
        J. Forensic Sci. 2014; 59: 1064-1073
        • Scheible M.K.
        • Loreille O.
        • Just R.S.
        • Irwin J.A.
        Short tandem repeat typing on the 454 platform: strategies and considerations for targeted sequencing of common forensic markers.
        Forensic Sci. Int. Genet. 2014; 12: 107-119
        • McElhoe J.A.
        • Holland M.M.
        • Makova K.D.
        • Su M.S.
        • Paul I.M.
        • Baker C.H.
        • et al.
        Development and assessment of an optimized next-generation DNA sequencing approach for the mtgenome using the Illumina MiSeq.
        Forensic Sci. Int. Genet. 2014; 13C: 20-29
        • Mikkelsen M.
        • Frank-Hansen R.
        • Hansen A.J.
        • Morling N.
        Massively parallel pyrosequencing of the mitochondrial genome with the 454 methodology in forensic genetics.
        Forensic Sci. Int. Genet. 2014; 12C: 30-37
        • Irwin J.A.
        • Parson W.
        • Coble M.D.
        • Just R.S.
        mtGenome reference population databases and the future of forensic mtDNA analysis.
        Forensic Sci. Int. Genet. 2011; 5: 222-225
        • Parson W.
        • Bandelt H.J.
        Extended guidelines for mtDNA typing of population data in forensic science.
        Forensic Sci. Int. Genet. 2007; 1: 13-19
        • Irwin J.A.
        • Saunier J.L.
        • Strouss K.M.
        • Sturk K.A.
        • Diegoli T.M.
        • Just R.S.
        • et al.
        Development and expansion of high-quality control region databases to improve forensic mtDNA evidence interpretation.
        Forensic Sci. Int. Genet. 2007; 1: 154-157
        • Parson W.
        • Dur A.
        EMPOP – a forensic mtDNA database.
        Forensic Sci. Int. Genet. 2007; 1: 88-92
        • van Oven M.
        • Kayser M.
        Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.
        Hum. Mutat. 2009; 30: E386-E394
        • Scientific Working Group on DNA Analysis Methods (SWGDAM)
        Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.
        2013
        • Parson W.
        • Gusmao L.
        • Hares D.R.
        • Irwin J.A.
        • Mayr W.R.
        • Morling N.
        • et al.
        DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.
        Forensic Sci. Int. Genet. 2014; 13C: 134-142
      1. Serum specimens from the Department of Defense Serum Repository: The Armed Forces Health Surveillance Center, U.S. Department of Defense, Silver Spring, MD [November 8, 2010; August 1, 2011; and October 20, 2011].

        • Lyons E.A.
        • Scheible M.K.
        • Sturk-Andreaggi K.
        • Irwin J.A.
        • Just R.S.
        A high-throughput Sanger strategy for human mitochondrial genome sequencing.
        BMC Genomics. 2013; 14 (881-2164-14-881)
        • Just R.S.
        • Scheible M.K.
        • Fast S.A.
        • Sturk-Andreaggi K.
        • Higginbotham J.L.
        • Lyons E.A.
        • et al.
        Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
        Forensic Sci. Int. Genet. 2014; 10: 73-79
        • Diegoli T.M.
        • Coble M.D.
        • Niederstatter H.
        • Loreille O.M.
        • Parsons T.J.
        The use of a mitochondrial DNA-specific qPCR Assay to assess degradation and inhibition.
        in: Presented at Mid-Atlantic Association of Forensic Scientists Annual Meeting, Washington, DC, May2007
        • Niederstätter H.
        • Kochl S.
        • Grubwieser P.
        • Pavlic M.
        • Steinlechner M.
        • Parson W.
        A modular real-time PCR concept for determining the quantity and quality of human nuclear and mitochondrial DNA.
        Forensic Sci. Int. Genet. 2007; 1: 29-34
        • Anderson S.
        • Bankier A.T.
        • Barrell B.G.
        • de Bruijn M.H.
        • Coulson A.R.
        • Drouin J.
        • et al.
        Sequence and organization of the human mitochondrial genome.
        Nature. 1981; 290: 457-465
        • Andrews R.M.
        • Kubacka I.
        • Chinnery P.F.
        • Lightowlers R.N.
        • Turnbull D.M.
        • Howell N.
        Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.
        Nat. Genet. 1999; 23: 147
        • Bandelt H.J.
        • Parson W.
        Consistent treatment of length variants in the human mtDNA control region: a reappraisal.
        Int. J. Legal Med. 2008; 122: 11-21
        • Rock A.W.
        • Dur A.
        • van Oven M.
        • Parson W.
        Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA).
        Forensic Sci. Int. Genet. 2013; 7: 601-609
        • Parson W.
        • Rock A.W.
        EMPOP 3 NGS mitochondrial databasing.
        in: Oral Presentation at the DNA in Forensics Conference, Brussels, Belgium, May 14-162014
        • McDonald J.H.
        Handbook of Biological Statistics.
        2nd ed. Sparky House Publishing, Baltimore, MD2009: 57-63
        • Clopper C.J.
        • Pearson E.S.
        The use of confidence or fiducial limits illustrated in the case of the binomial.
        Biometrika. 1934; 26: 404-413
        • Brenner C.H.
        Fundamental problem of forensic mathematics – the evidential value of a rare haplotype.
        Forensic Sci. Int. Genet. 2010; 4: 281-291
        • Lao O.
        • Vallone P.M.
        • Coble M.D.
        • Diegoli T.M.
        • van Oven M.
        • van der Gaag K.J.
        • et al.
        Evaluating self-declared ancestry of U.S. Americans with autosomal, Y-chromosomal and mitochondrial DNA.
        Hum. Mutat. 2010; 31: E1875-E1893
        • Allard M.W.
        • Polanskey D.
        • Miller K.
        • Wilson M.R.
        • Monson K.L.
        • Budowle B.
        Characterization of human control region sequences of the African American SWGDAM forensic mtDNA data set.
        Forensic Sci. Int. 2005; 148: 169-179
        • Allard M.W.
        • Polanskey D.
        • Wilson M.R.
        • Monson K.L.
        • Budowle B.
        Evaluation of variation in control region sequences for Hispanic individuals in the SWGDAM mtDNA data set.
        J. Forensic Sci. 2006; 51: 566-573
        • Goncalves V.F.
        • Prosdocimi F.
        • Santos L.S.
        • Ortega J.M.
        • Pena S.D.
        Sex-biased gene flow in African Americans but not in American Caucasians.
        Genet. Mol. Res. 2007; 6: 256-261
        • Saunier J.L.
        • Irwin J.A.
        • Just R.S.
        • O’Callaghan J.
        • Parsons T.J.
        Mitochondrial control region sequences from a U.S. “Hispanic” population sample.
        Forensic Sci. Int. Genet. 2008; 2: e19-e23
        • Diegoli T.M.
        • Irwin J.A.
        • Just R.S.
        • Saunier J.L.
        • O’Callaghan J.E.
        • Parsons T.J.
        Mitochondrial control region sequences from an African American population sample.
        Forensic Sci. Int. Genet. 2009; 4: e45-e52
        • Salas A.
        • Carracedo A.
        • Richards M.
        • Macaulay V.
        Charting the ancestry of African Americans.
        Am. J. Hum. Genet. 2005; 77: 676-680
        • Monson K.L.
        • Miller K.W.P.
        • Wilson M.R.
        • Dizzino J.A.
        • Budowle B.
        The mtDNA population database: an integrated software and database resource for forensic comparisons.
        Forensic Sci. Commun. 2002; 4: 2
        • Salas A.
        • Richards M.
        • Lareu M.V.
        • Scozzari R.
        • Coppa A.
        • Torroni A.
        • et al.
        The African diaspora: mitochondrial DNA and the Atlantic slave trade.
        Am. J. Hum. Genet. 2004; 74: 454-465
        • Ely B.
        • Wilson J.L.
        • Jackson F.
        • Jackson B.A.
        African–American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups.
        BMC Biol. 2006; 4: 34
        • Stefflova K.
        • Dulik M.C.
        • Barnholtz-Sloan J.S.
        • Pai A.A.
        • Walker A.H.
        • Rebbeck T.R.
        Dissecting the within-Africa ancestry of populations of African descent in the Americas.
        PLoS ONE. 2011; 6: e14495
        • Irwin J.A.
        • Saunier J.L.
        • Niederstatter H.
        • Strouss K.M.
        • Sturk K.A.
        • Diegoli T.M.
        • et al.
        Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples.
        J. Mol. Evol. 2009; 68: 516-527
        • Lee H.Y.
        • Chung U.
        • Yoo J.E.
        • Park M.J.
        • Shin K.J.
        Quantitative and qualitative profiling of mitochondrial DNA length heteroplasmy.
        Electrophoresis. 2004; 25: 28-34
        • Melton T.
        Mitochondrial DNA heteroplasmy.
        Forensic Sci. Rev. 2004; 16: 1-20
        • Ramos A.
        • Santos C.
        • Mateiu L.
        • Gonzalez Mdel M.
        • Alvarez L.
        • Azevedo L.
        • et al.
        Frequency and pattern of heteroplasmy in the complete human mitochondrial genome.
        PLoS ONE. 2013; 8: e74636
        • Li M.
        • Schonberg A.
        • Schaefer M.
        • Schroeder R.
        • Nasidze I.
        • Stoneking M.
        Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes.
        Am. J. Hum. Genet. 2010; 87: 237-249
        • Lee H.C.
        • Huang K.H.
        • Yeh T.S.
        • Chi C.W.
        Somatic alterations in mitochondrial DNA and mitochondrial dysfunction in gastric cancer progression.
        World J. Gastroenterol. 2014; 20: 3950-3959
        • Santos C.
        • Sierra B.
        • Alvarez L.
        • Ramos A.
        • Fernandez E.
        • Nogues R.
        • et al.
        Frequency and pattern of heteroplasmy in the control region of human mitochondrial DNA.
        J. Mol. Evol. 2008; 67: 191-200
        • Tully L.A.
        • Parsons T.J.
        • Steighner R.J.
        • Holland M.M.
        • Marino M.A.
        • Prenger V.L.
        A sensitive denaturing gradient-Gel electrophoresis assay reveals a high frequency of heteroplasmy in hypervariable region 1 of the human mtDNA control region.
        Am. J. Hum. Genet. 2000; 67: 432-443
        • Melton T.
        • Nelson K.
        Forensic mitochondrial DNA analysis: two years of commercial casework experience in the United States.
        Croat. Med. J. 2001; 42: 298-303
        • Calloway C.D.
        • Reynolds R.L.
        • Herrin Jr., G.L.
        • Anderson W.W.
        The frequency of heteroplasmy in the HVII region of mtDNA differs across tissue types and increases with age.
        Am. J. Hum. Genet. 2000; 66: 1384-1397
        • Theves C.
        • Keyser-Tracqui C.
        • Crubezy E.
        • Salles J.P.
        • Ludes B.
        • Telmon N.
        Detection and quantification of the age-related point mutation A189G in the human mitochondrial DNA.
        J. Forensic Sci. 2006; 51: 865-873
        • Andrew T.
        • Calloway C.D.
        • Stuart S.
        • Lee S.H.
        • Gill R.
        • Clement G.
        • et al.
        A twin study of mitochondrial DNA polymorphisms shows that heteroplasmy at multiple sites is associated with mtDNA variant 16093 but not with zygosity.
        PLoS ONE. 2011; 6: e22332
        • He Y.
        • Wu J.
        • Dressman D.C.
        • Iacobuzio-Donahue C.
        • Markowitz S.D.
        • Velculescu V.E.
        • et al.
        Heteroplasmic mitochondrial DNA mutations in normal and tumour cells.
        Nature. 2010; 464: 610-614
        • Bandelt H.J.
        • Salas A.
        Current next generation sequencing technology may not meet forensic standards.
        Forensic Sci. Int. Genet. 2012; 6: 143-145
        • Meyer S.
        • Weiss G.
        • von Haeseler A.
        Pattern of nucleotide substitution and rate heterogeneity in the hypervariable regions I and II of human mtDNA.
        Genetics. 1999; 152: 1103-1110
        • Stoneking M.
        Hypervariable sites in the mtDNA control region are mutational hotspots.
        Am. J. Hum. Genet. 2000; 67: 1029-1032
        • Strouss K.
        Relative evolutionary rate estimation for sites in the mtDNA control region.
        (Masters Thesis) George Washington University, Washington, DC2006
        • Howell N.
        • Elson J.L.
        • Howell C.
        • Turnbull D.M.
        Relative rates of evolution in the coding and control regions of African mtDNAs.
        Mol. Biol. Evol. 2007; 24: 2213-2221
        • Soares P.
        • Ermini L.
        • Thomson N.
        • Mormina M.
        • Rito T.
        • Rohl A.
        • et al.
        Correcting for purifying selection: an improved human mitochondrial molecular clock.
        Am. J. Hum. Genet. 2009; 84: 740-759
        • Irwin J.
        • Moreno L.
        • Callaghan T.
        Evaluating next generation sequencing technologies for expanded mitochondrial DNA identification capabilities at the FBI lab.
        in: Oral Presentation at the DNA in Forensics Conference, Brussels, Belgium, May 14–162014
        • Parsons T.J.
        • Muniec D.S.
        • Sullivan K.
        • Woodyatt N.
        • Alliston-Greiner R.
        • Wilson M.R.
        • et al.
        A high observed substitution rate in the human mitochondrial DNA control region.
        Nat. Genet. 1997; 15: 363-368