Advertisement
Research Article| Volume 7, ISSUE 1, P63-74, January 2013

Revision of the SNPforID 34-plex forensic ancestry test: Assay enhancements, standard reference sample genotypes and extended population studies

      Abstract

      A revision of an established 34 SNP forensic ancestry test has been made by swapping the under-performing rs727811 component SNP with the highly informative rs3827760 that shows a near-fixed East Asian specific allele. We collated SNP variability data for the revised SNP set in 66 reference populations from 1000 Genomes and HGDP-CEPH panels and used this as reference data to analyse four U.S. populations showing a range of admixture patterns. The U.S. Hispanics sample in particular displayed heterogeneous values of co-ancestry between European, Native American and African contributors, likely to reflect in part, the way this disparate group is defined using cultural as well as population genetic parameters. The genotyping of over 700 U.S. population samples also provided the opportunity to thoroughly gauge peak mobility variation and peak height ratios observed from routine use of the single base extension chemistry of the 34-plex test. Finally, the genotyping of the widely used DNA profiling Standard Reference Material samples plus other control DNAs completes the audit of the 34-plex assay to allow forensic practitioners to apply this test more readily in their own laboratories.

      Keywords

      1. Introduction

      Single nucleotide polymorphism (SNP) typing is still relatively new to the field of DNA profiling. While short amplicon approaches based around SNP typing have proved their worth in cases of severely degraded DNA analysis [
      • Fondevila M.
      • Phillips C.
      • Naveran N.
      • Fernandez L.
      • Cerezo M.
      • Salas A.
      • Carracedo Á.
      • Lareu M.V.
      Case report: identification of skeletal remains using short-amplicon marker analysis of severely degraded DNA extracted from a decomposed and charred femur.
      ,
      • Freire-Aradas A.
      • Fondevila M.
      • Kriegel A.-K.
      • Phillips C.
      • Gill P.
      • Prieto L.
      • Schneider P.M.
      • Carracedo Á.
      • Lareu M.V.
      A new SNP assay for identification of highly degraded human DNA.
      ,
      • Romanini C.
      • Catelli M.L.
      • Borosky A.
      • Salado-Puerto M.
      • Pereira R.
      • Phillips C.
      • Fondevila M.
      • Freire A.
      • Santos C.
      • Carracedo Á.
      • Lareu M.V.
      • Gusmao L.
      • Vullo C.M.
      Typing short amplicon binary polymorphisms: supplementary SNP and Indel genetic information in the analysis of highly degraded skeletal remains.
      ], the inability of SNPs to provide informative links to STR databases has hindered widespread use beyond simple identification of missing persons with reference to their surviving relatives [
      • Gill P.
      • Werrett D.J.
      • Budowle B.
      • Guerrieri R.
      An assessment of whether SNPs will replace STRs in national DNA databases – joint considerations of the DNA working group of the European Network of Forensic Science Institutes (ENFSI) and the Scientific Working Group on DNA Analysis Methods (SWGDAM).
      ]. At the current time the future of forensic SNP analysis appears to centre on two approaches where SNPs can provide important supplementary information: (a) in complex relationship tests involving deficient pedigrees, very distant relationships [
      • Lareu M.V.
      • García-Magariños M.
      • Phillips C.
      • Quintela I.
      • Carracedo Á.
      • Salas A.
      Analysis of a claimed distant relationship in a deficient pedigree using high density SNP data.
      ] or a single second order STR exclusion that creates ambiguous likelihood ratios [
      • Phillips C.
      • Fondevila M.
      • García-Magariños M.
      • Rodriguez A.
      • Salas A.
      • Carracedo Á.
      • Lareu M.V.
      Resolving relationship tests that show ambiguous STR results using autosomal SNPs as supplementary markers.
      ]; and (b) helping to build an inference of the likely physical appearance of DNA donors when other information such as reliable eye witness or DNA database entries are unavailable to investigators [
      • Kayser M.
      • Schneider P.M.
      DNA-based prediction of human externally visible characteristics in forensics: motivations, scientific challenges, and ethical considerations.
      ,
      • Kayser M.
      • de Knijff M.P.
      Improving human forensics through advances in genetics, genomics and molecular biology.
      ]. In the latter category there is growing interest in establishing SNP tests able to predict hair and eye colour variation in European subjects [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ,
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser O.M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ,
      • Mengel-From J.
      • Børsting C.
      • Sanchez J.J.
      • Eiberg H.
      • Morling N.
      Human eye colour and HERC2, OCA2 and MATP.
      ], along with ancestry informative marker (AIM) SNP tests that equate the genotypes detected in an individual to their genetic ancestry, where in this case, genetic ancestry is a characteristic defined by broadly based continental population group SNP variability [
      • Phillips C.
      • Salas A.
      • Sánchez J.J.
      • Fondevila M.
      • Gómez-Tato A.
      • Álvarez-Dios J.
      • Calaza M.
      • Casares de Cal M.
      • Ballard D.
      • Lareu M.V.
      • Carracedo Á.
      The SNPforID consortium, inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs.
      ,
      • Lao O.
      • Vallone P.M.
      • Coble M.D.
      • Diegoli T.M.
      • van Oven M.
      • van der Gaag K.J.
      • Pijpe J.
      • de Knijff P.
      • Kayser M.
      Evaluating self-declared ancestry of U.S. Americans with autosomal, Y-chromosomal and mitochondrial DNA.
      ,
      • Halder I.
      • Shriver M.
      • Thomas M.
      • Fernandez J.R.
      • Frudakis T.
      A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications.
      ].
      Five years ago we developed a 34 SNP forensic ancestry test [
      • Phillips C.
      • Salas A.
      • Sánchez J.J.
      • Fondevila M.
      • Gómez-Tato A.
      • Álvarez-Dios J.
      • Calaza M.
      • Casares de Cal M.
      • Ballard D.
      • Lareu M.V.
      • Carracedo Á.
      The SNPforID consortium, inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs.
      ] and this paper outlines an audit of the underlying multiplex assay bringing enhancements to the chemistry, swapping a single under-performing SNP rs727811 with a new, more informative replacement SNP rs3827760 and recording the genotypes of standard reference samples used as positive controls in forensic DNA laboratories [
      • Børsting C.
      • Tomas C.
      • Morling N.
      SNP typing of the reference materials SRM 2391b 1-10, K562, XY1, XX74, and 007 with the SNPforID multiplex.
      ,

      M.C. Kline, C.R. Hill, J.L. Almeida, E.L.R. Butts, M.D. Coble, J.M. Butler, The latest and greatest NIST PCR-based DNA profiling standard: updates and status of Standard Reference Material® (SRM) 2391c. Profiles-in-DNA, Promega Corporation Web site: http://www.promega.com/resources/articles/profiles-in-dna/2011/the-latest-and-greatest-nist-pcr-based-dna-profiling-standard/.

      ]. We also outline studies of Central and South American populations where levels of admixture are higher than other parts of the world and we collate recently released genotype data from 1000 Genomes for the 34 component SNPs plus rs3827760. Finally, it is worth emphasising that in the five years since our 34-plex test was developed using Applied Biosystems (AB) SNaPshot primer extension chemistry, no other forensic SNP typing system has emerged as a viable alternative. In fact during this period two other SNP typing chemistries have been discontinued: a chip-based system used in a multiple-reaction ancestry test that typed 176 SNPs [
      • Halder I.
      • Shriver M.
      • Thomas M.
      • Fernandez J.R.
      • Frudakis T.
      A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications.
      ] (Beckman-Coulter GenomeLabSNPstream®) and an oligo-ligation system: AB Genplex®, that held much promise for a range of forensic SNP typing applications [
      • Phillips C.
      • Fang R.
      • Ballard D.
      • Fondevila M.
      • Harrison C.
      • Hyland F.
      • Musgrave-Brown E.
      • Proff C.
      • Ramos-Luis E.
      • Sobrino B.
      • Carracedo Á.
      • Furtado M.R.
      • Syndercombe Court D.
      • Schneider P.M.
      SNPforID consortium, evaluation of the Genplex SNP typing system and a 49plex forensic marker panel.
      ,
      • Tomas C.
      • Axler-DiPerte G.
      • Budimlija Z.M.
      • Børsting C.
      • Coble M.D.
      • Decker A.E.
      • Eisenberg A.
      • Fang R.
      • Fondevila M.
      • Fredslund S.F.
      • Gonzalez S.
      • Hansen A.J.
      • Hoff-Olsen P.
      • Haas C.
      • et al.
      Autosomal SNP typing of forensic samples with the GenPlex™ HID System: results of a collaborative study.
      ]. Since all current forensic physical characteristic and ancestry predictive tests use AB SNaPshot, more emphasis than ever should be placed on further optimisation of primer extension assay conditions in order to continue the successful application of SNPs to forensic analyses.

      2. Materials and methods

      2.1 Population samples

      A total of 709 unrelated male samples of self-declared ancestry from the population reference collection of NIST (National Institute of Standards and Technology, Gaithersburg, MD, USA) were genotyped with the enhanced 34-plex assay reaction conditions described below. This panel is made up of representative samples from the four main U.S. population groups, comprising: 261 Caucasians, 258 African Americans, 140 Hispanics and 50 East Asians. Additionally, the six components of the recently revised NIST PCR-based DNA profiling Standard Reference Material® (SRM) 2391c [
      • Børsting C.
      • Tomas C.
      • Morling N.
      SNP typing of the reference materials SRM 2391b 1-10, K562, XY1, XX74, and 007 with the SNPforID multiplex.
      ] were typed along with the current standard forensic positive control DNAs: AB/Promega 9947a; Qiagen XY5, and; Promega 2800M, thereby complimenting the reference genotypes for the SNPforID 52-plex ID-SNP assay published in 2010 [
      • Børsting C.
      • Tomas C.
      • Morling N.
      SNP typing of the reference materials SRM 2391b 1-10, K562, XY1, XX74, and 007 with the SNPforID multiplex.
      ]. NIST SRM 2391c components A, B and C are described as Caucasian, Mexican Hispanic and Melanesian in origin respectively (component D is a 3:1 mixture of A and C), while E and F do not have ancestry descriptions.
      Previously generated SNP genotypes from the HGDP-CEPH Human Genome Diversity Panel (HGDP-CEPH) global diversity panel [
      • Phillips C.
      • Salas A.
      • Sánchez J.J.
      • Fondevila M.
      • Gómez-Tato A.
      • Álvarez-Dios J.
      • Calaza M.
      • Casares de Cal M.
      • Ballard D.
      • Lareu M.V.
      • Carracedo Á.
      The SNPforID consortium, inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs.
      ] were used as reference data for analyses of the NIST populations (CEPH genotype data available from the SPSmart SNPforID browser at: http://spsmart.cesga.es/snpforid.php), while 1000 Genomes data was collated directly from the ENGINES whole genome SNP browser [
      • Amigo J.
      • Salas A.
      • Phillips C.
      ENGINES: exploring single nucleotide variation in entire human genomes.
      ] available from: http://spsmart.cesga.es/engines.php?dataSet=engines. The HGDP-CEPH panel comprises: 105 Africans, 158 Europeans, 232 East Asians, 64 Native Americans and 28 Oceanians (South Asian and Middle East populations were excluded). The 1000 Genomes samples comprise: 246 Africans (including 61 African Americans with mixed ancestry), 380 Europeans, 286 East Asians and 181 Americans with mixed ancestry (Puerto Rican and Mexicans from Central America plus Colombians from South America). Using both panels enabled the most comprehensive geographic survey of SNP variability and ENGINES allows data to be collected for any SNP locus with minor allele frequencies above ∼1%.

      2.2 34-plex component SNP reconfiguration

      A single marker substitution was made to replace the consistently under-performing SNP rs727811 (internal code A11) with an East Asian-informative SNP rs3827760 (assigned internal code P28). Representative allele frequency distributions from 1000 Genomes and HapMap (GIH) samples comparing both SNPs are summarised in Fig. 1. Note that the SNaPshot extension primer of rs727811 interrogated the AC strand but HapMap and 1000 Genomes databases report the GT strand, while for rs3827760 SNaPshot interrogates the AG strand, as listed in the above SNP databases. The revised marker details for the complete assay are listed in Table 1. Components listed with internal codes prefixed with an ‘A’ are markers overlapping with the SNPforID 52-plex ID-SNP multiplex [
      • Sánchez J.J.
      • Phillips C.
      • Børsting C.
      • Balogh K.
      • Bogus M.
      • Fondevila M.
      • Harrison C.D.
      • Musgrave-Brown E.
      • Salas A.
      • Syndercombe-Court D.
      • Schneider P.
      • Carracedo Á.
      • Morling N.
      A multiplex assay with 52 single nucleotide polymorphisms for human identification.
      ], since they show informative allele frequency differences between populations but also serve as housekeeping markers when using both multiplexes together in an analysis. Lastly, it was necessary to design a redundancy in the extension primer of rs3827760, comprising C and T alternate bases corresponding to neighbour SNPs rs144939741 and rs121908454, 10 bp and 15 bp upstream of the target site.
      Figure thumbnail gr1
      Fig. 1Allele frequency distributions in Africans, Europeans and East Asians from 1000 Genomes and HapMap populations for the replaced rs727811 SNP and new rs3827760. Africans (AFR) comprised YRI: Yoruba in Ibadan, Nigeria and LWK: Luhya in Webuye, Kenya; Europeans (EUR) are CEU: Utah residents with N & W European ancestry from the HGDP-CEPH collection, FIN: Finnish in Finland, GBR: British in England and Scotland, IBS: Iberian populations in Spain and Toscans in Italy; East Asians (E ASN) are CHB: Han Chinese in Beijing, CHS: China, Han Chinese South and JPT: Japanese in Tokyo, Japan. Certain populations are shown separately due to levels of admixture: ASW, African ancestry in Southwest USA; CLM, Colombian in Medellín, Colombia; MXL, Mexican ancestry in Los Angeles; PUR, Puerto Rican in Puerto Rico; GIH, Gujarati Indians from Houston.
      Table 134-plex composite SNP data.
      MarkersPCR sizeStrand informationPrimersPrimer mix ratios
      Internal codedbSNP rs-number1000 G basesAssay basesSNaPshot directionPCR forward primerPCR reverse primerSNaPshot single base extension (SBE) primerPCR (F + R primer stock at 25 μM)SNaPshot SBE (primer stock at 50 μM)
      P02rs599700890ACACFGTCAACACTAGAGTATTTGCCCATCACAAACCCAAAGACTGTTCTGCgac[aactaggtgccacgtcgtgaaagtctgac]2aactctcaCAGGATCGATTGGTTCC1.91.3
      P01rs230492598ACGTRCCCATTAACTCATCAAAGTGGTGATCCCCACTCCACCGCTAAT[aaagtctgacaactaggtgccacgtcgtg]2aaagtctgacaaCCACTCCACCGCTAAT2.52.5
      A07rs91711887CTAGRGCCCTTTAGGGTCGGTTCGTAAGAGATGACTGAGGTCAACGAGt[ct]2TGACTGAGGTCAACGAGC32.1
      P03rs132133380AGCTRGTCAGTAAGACGGTAACTCCCTAACACAAGCCTAAATCCAGAAGACGGTAACTCCATGGCTG2.751.5
      P04rs2814778102CTCTFAACCTGATGGCCCTCATTAGTATGGCACCGTTTGGTTCAGagtctgacaactaggtgccacgtcgtgaaagtctgacaactaggtgccacgtcgtgaaagtctgacatCTCATTAGTCCTTGGCTCTTA1.52
      A29rs102411676CTAGRCCATGTGTTCTAATAAAAAGGATTGCTGGGAAGTGAGCAAAAGTAAATACActtGTTCTAATAAAAAGGATTGCTCAT12
      P05rs7897550100AGCTRCGATGTGTCTTACGGAATACTAGGTAGAGCTGACAGGCAAAAATGCTATt[ct]2TGTGCAGGATTGAAATATAATT21
      A21rs72209880AGAGFGGAAGTACACATCTGTTGACAGTAATGAGGGTAAAGAAATATTCAGCACATCCagtctgacaaTGACAGTAATGAAATATCCTTG2.34
      P06ars1084334487CTCTFTGTACAATGGTAGATGTGTGCTCAGGATAGCTCTGGTGTTGCATTATTGTt[ct]5AGTACTTTGCCAAAGAAACTAAA41.3
      P08rs1291383299AGAGFACGTTGGATGCGAGGCCAGTTTCATTTGAGACGTTGGATGAAAACAAAGAGAAGCCTCGG[ct]8CCAGTTTCATTTGAGCATTAA33
      P07rs23903170AGCTRTAGCTGTGAGATAGAAATCCTGGACACTACCCTAATCTCAGCTTCCACTCt[ct]8cAATCTCAGCTTCCACTC0.51
      P09ars197880694AGCTRAGAGTTTGACATGATGGTGCTCTATCTTGTTTCTAAGCAGGAAAGTTGt[ct]8cGCAGGAAAGTTGTATTCTGATA1.81
      A40rs204041161AGAGFTCTGGAATGCCAGTTCTTTTGTCAGAACGCCTATGAAAACCAGT[ct]7cCTCTGTATTTTCTTACTCTAAGTGC12
      P10rs77365895GCCGRACAAACGGAAAGTAGTATTGGACTGAGAAGGGGCACAGCAATTTAGTA[ct]11cGGGAAGAATAGAGTCAATCAA3.23
      P11rs1014176382TAATRACAGACTTGGTTCCCTGAAGTCTAGTAGATTGTAGGCAAGTCGTAAAGGt[ct]10cGTGTGAGTTGTGTGATAATCTA31.1
      P12rs182549117CTCTFAAGTACTGGGACAAAGGTGTGAGAGAAGTCAGAATACCCCTACCCTAT[ct]16cAGGTGTGAGCCACCG6.26.5
      P13rs157302081AGAGRCTATCTGCCACCTGAGAGAGTATTGAGGTGTCAGCTTCTTCTGACCAT[ct]14GAGTATTGCCAGCCTGATTC11.3
      P14rs89678878CTCTRGTAATGCCTCTGTGGCCCTATATTCCGTCCACATCTTCACTG[ct]17tctACAGTCACCAGCCAC1.51.3
      P15rs206516066AGAGFAAGAATGGCCTCTCGATGAGTAGATGATACCTACGCATAGTCTGTTTACTTCt[ct]14GCATAGTCTGTTTACTTCATTTG1.51.3
      P16ars257230763AGAGFGTGTAGCTATGCCATCATTCAATCATCCTTAGAAGGGTGCTAAACTGAGt[ct]15catcATTCAATCAATAGTCATAAAC2.251.1
      P17rs230379896AGCTRCCAGCCTGCACCACTGTCAGAGATGTGTTCAGGAAGAGGCTAt[ct]19cAGAAGAGGGACGTGGG4.33.5
      P18rs206598288CTAGRCTTGGGGCAGTCTTTAAGTCCTAGGAAGTGGTCAGTGCCAGTAG[ct]18GGAAAAAAAAGTCCTCTTTGGTAT21.3
      P19rs3785181156CTCTFCTCTGTTCAGTTTCAAAGTTCTGGTTGTGTTCAAAAATTTCAATTAGGTTt[ct]20AGGGCATCTTATCTTGAGC1.250.75
      P20rs88192993GTGTFAGCTACCTGGTGTCTAACTCTTGACCCAGTGGTTCTGAGC[ct]7aaactaggtgccacgtcgtgaaagtctgacaaGCCTCTTGCCAGCTCTG32.5
      P21rs1498444106GTACRGGCTATTACCACATTAAGAGAAACTGCCAGCCTCTCAATGCAAATGATt[ct]20cGGTTGTCAGAATATTTGCTACA53
      P22ars142665474AGCTRAATTCAGGAGCTGAACTGCCTGTTCAGCCCTTGGATTGTC[ct]24ttCGCTGCCATGAAAGTTG34
      P23rs2026721108CTAGRGAAGACTTTTTGCAAGCACGAGGGCAAATGCTGTAAGAATCCATt[ct]21AAATGATTGACATAGTAGGCTATTG3.21.75
      P24rs454005576AGT
      dbSNP bases, 1000 Genomes list AC bases only for both of these tri-allelic SNPs.
      ACTRTGTGCCTCTGATCACTTTTGAATACCCTAGCCAACTCCAGAGTTCAT[ct]27cGAAGCAGTGATCAGCAC1.21.1
      A52rs1335873110TAATRGTGGATGATATGGTTTCTCAAGGTTCAACAAACGTGTGATGCTCTt[ct]25AGGTACCTAGCTATGTACTCAGTAT1.21.75
      P25ars1689198278CGGCRGAATAAAGTGAGGAAAACACGGAGTGTTTCTCATCTACGAAAGAGGAGTC[ct]28GGTTGGATGTTGGGGCTT1.752.25
      P26rs73057092AGCTRCAGCACCCTGTAAAGTCCAGCAGCACTCACCTGCATCTCAt[ct]27tCATTAATCACACAAATTTTGCAT1.32.2
      A13rs188651086AGAGFGTCCTTGTCAATCTTTCTACCAGAGGGATTTTCACAACAACACTTGC[ct]27ACAAGATTTTCACAACAACACTTGC1.51
      P27rs503024081CGT
      dbSNP bases, 1000 Genomes list AC bases only for both of these tri-allelic SNPs.
      CGARCCAAAGTGCCAGGATCACAGTCCCTAGAAATCCTTCAGCC[ct]32cCACAGGAGTGAGCCACTGC22
      P28rs382776085AGAGFGCTCAGCTCCACGTACAACCTGTCATGCCCCCAATCTCtctgacaactaggtgccacgtggtgaaagtctgacaactaggtgccacgtcgtgaaagtctgacaactctcaGGYGCCAYGTTTTCACA
      Y bases denote equimolar primers used with C/T at two positions (rs144939741 and rs121908454).
      32.75
      A11rs72781178H2O104.655.62
      [200 μM] (NH3)2SO4(SBE mix only)33.33
      Total179.45108.8
      a dbSNP bases, 1000 Genomes list AC bases only for both of these tri-allelic SNPs.
      b Y bases denote equimolar primers used with C/T at two positions (rs144939741 and rs121908454).

      2.3 Amplification and single base extension primer modifications

      The PCR and single base extension (SBE) primers used in the modified assay are listed in Table 1. As well as novel primers for rs3827760, the SBE primers for component SNPs: rs2304925; rs5997008; rs2814778; rs239031; rs16891982 (internal codes P01, P02, P04, P07 and P25a respectively) have been modified to adjust peak positions for electrophoresis with AB POP-4™ capillary electrophoresis (CE) polymer. Some mobility modifying poly-CT tails were also changed to non-homologous sequences comprising all bases. POP-4™ is now in increasing use and these SBE primer rearrangements anticipate the discontinuation of AB POP-6™ CE polymer used in the original 34-plex assay development. In brief, SNPs rs2304925/rs5997008 were moved away from the fastest mobility size range where dye artefacts sometimes interfered with their peak recognition and SNPs rs2814778/rs239031/rs16891982 were size-adjusted to separate them from the co-incidental mobilities of neighbouring SNPs or closely sited non-specific peaks when using POP-4™. The rearranged SNP positions are listed in Table 2 based on average size estimates of ∼750 samples with multiple runs.
      Table 2SNP peak positions for electrophoresis with POP4 polymer. Std Dev: standard deviation.
      dbSNP rs-numberInternal codeAmplified basesTheoretical sizeGACTSD allele 1 sizeSD allele 2 sizeSD allele 3 size
      rs1321333P03CT2226.8928.630.050.06
      rs917118A07AG2428.1730.160.050.04
      rs1024116A29AG2826.7528.950.020.03
      rs7897550P05CT2831.4432.870.030.08
      rs722098A21AG3333.8836.280.040.00
      rs10843344P06aCT3537.0538.310.040.07
      rs239031P07CT3638.9040.450.080.05
      rs12913832P08AG3840.1540.900.060.07
      rs2040411A40AG4143.1844.250.090.07
      rs1978806P09aCT4144.2745.520.090.06
      rs773658P10GC4546.2546.730.010.07
      rs10141763P11TA4548.7449.320.090.05
      rs182549P12CT4949.1150.550.040.04
      rs1573020P13AG4950.2750.980.060.05
      rs896788P14CT5352.5253.260.060.07
      rs2065160P15AG5354.1154.820.070.04
      rs2572307P16aAG5755.4156.020.060.01
      rs2303798P17CT5757.1457.730.030.02
      rs2065982P18AG6159.1759.900.040.03
      rs3785181P19CT6160.2761.490.050.06
      rs881929P20GT6461.8363.860.050.07
      rs1498444P21AC6564.1764.010.060.10
      rs1426654P22aCT6867.3568.190.040.06
      rs2026721P23AG6968.4569.370.140.08
      rs4540055P24TCA7372.1571.6872.480.080.100.06
      rs16891982P25aGC7576.0376.440.130.10
      rs1335873A52TA7778.1678.310.090.13
      rs1886510A13AG8078.7179.690.100.07
      rs730570P26CT8080.1180.740.080.11
      rs5030240P27GCA8583.7584.9184.510.030.040.05
      rs2304925P01GT8786.6187.930.090.11
      rs5997008P02AC8788.2087.860.050.03
      rs3827760P28AG9089.6490.090.040.04
      rs2814778P04CT9091.0491.930.070.04

      2.4 Modified PCR and SBE reaction protocols

      Both amplification and extension reactions were re-optimised and represent significant modifications to the originally described 34-plex assay chemistry [
      • Phillips C.
      • Salas A.
      • Sánchez J.J.
      • Fondevila M.
      • Gómez-Tato A.
      • Álvarez-Dios J.
      • Calaza M.
      • Casares de Cal M.
      • Ballard D.
      • Lareu M.V.
      • Carracedo Á.
      The SNPforID consortium, inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs.
      ]. We reduced the PCR cycling from the original 35 to 30–32 amplification cycles and from 30 to 28–30 extension cycles – each of the new cycle number ranges was modified according to the peak heights observed on identical 3130xl CE detectors in the two study laboratories, so ± two cycles adjusts for instrument sensitivity. Furthermore, cycling conditions were changed from the original protocol by increasing the initial 95 °C denaturation from 10 to 15 min and increasing the final 65 °C extension from 6 to 15 min. Amplification reactions used a total PCR volume of 6.9 μL, comprising: 1× AB AmpliTaq Gold PCR buffer; 25 mM MgCl2 to a final concentration of 5.9 mM; 10 mM dNTP mix to a final concentration of 0.58 mM; 3.2 μg/μL BSA to a final concentration of 0.29 μg/μL, 0.5 units of AB AmpliTaq Gold polymerase, 1 μL of premixed PCR primers, and; 0.75 ng of target DNA. Individual primer pair ratios have also been changed since the original assay description and are detailed in Table 1. PCR cycling comprised: 15 min at 95 °C, 30–32 cycles of 95 °C for 30 s, 60 °C for 50 s, 65 °C for 40 s, then a final extension at 65 °C for 15 min.
      PCR products were cleaned up prior to extension using combined exonuclease I and Shrimp alkaline phosphatase by adding 1.3 μL Exo-SAPit (USB Products, Affymetrix, Santa Clara, CA, U.S.) to a 2.5 μL aliquot of PCR product. Samples were incubated at 37 °C for 45 min then enzymes were heat-inactivated at 85 °C for 15 min. SNaPshot primer extension reactions have also been modified from the original protocol to a total SBE volume of 3.0 μL combining 1 μL of cleaned-up PCR product, 1 μL of SNaPshot ready reaction mix (previously 1.25 μL), 0.5 μL of pre-mixed SBE primers, as outlined in Table 1 (previously 0.75 μL), and 0.5 μL of water, SBE cycling used: 28–30 cycles of 96 °C for 10 s, 55 °C for 5 s, 60 °C for 30 s. SBE products were cleaned-up by adding 1.3 μL of SAP to the whole 3 μL reaction volume and incubating at 37 °C for 80 min and heat-inactivated at 85 °C for 15 min.
      For CE detection, SBE products were diluted 1 in 25 and 1 μL of diluted product was added to a mix of 8.9 μL AB HiDi™ formamide plus 0.3 μL AB LIZ-120 size standard. All separations were made with AB 3130xl Genetic Analyser, POP-4™ polymer and 36 cm capillary arrays. Standard SNaPshot electrophoresis conditions were used and results analysed with AB Genemapper ID-X software. The injection of 1 μL of SBE product represents a three-fold reduction in the quantity of DNA analysed compared to the previously published protocol.

      2.5 Population analysis

      Reference genotypes were compiled from the HGDP-CEPH SNPforID listings in SPSmart for the five continental population groups of Africa, Europe, East Asia, Oceania and America. This data was used to assess the predictive value of the modified 34-plex test with emphasis on assessing the enhancement of East Asian and closely related American population classifications that can be expected from the incorporation of rs3827760. Full HGDP-CEPH genotype data for rs3827760 has been added to the 34-plex tables of the SNPforID browser at: http://spsmart.cesga.es/snpforid.php?dataSet=snpforid34, while all rs727811 data is retained.
      Classification success was estimated using the verbose cross-validation option of the Snipper SNP classification portal (at: http://mathgene.usc.es/snipper/ – choosing the ‘Thorough analysis of population data with a custom Excel file’ method), which removes each sample in turn and classifies it using the modified training set ‘n − 1′ allele frequency estimates. Population structure analysis was performed using Structure 2.3.3, applying three iterations for each K value from 2 to 8, with a 100,000 burn-in period followed by 100,000 MCMC steps after burn-in. We applied the admixture and correlated allele frequencies models and used prior specification of the population of origin of HGDP-CEPH reference samples (i.e. applying POPFLAG = 1) before analysing the population structure of individuals taken from 1000 Genomes or NIST population panels. Analyses of the NIST panel samples combined HGDP-CEPH reference samples with additional SNPforID populations taken from SPSmart: 60 each of Mozambican; Ugandan; Danish; NW Spanish; mainland Chinese and; Taiwanese, plus 144 samples from three Colombian populations. Plots were constructed using CLUMPP 1.1.2 and distruct 1.1 software. After establishing the statistically optimum K values for the reference samples using the 34 SNPs (both original and new SNP combinations) we analysed 1000 Genomes individuals combined with the nine control DNAs in simultaneous runs by treating them as population samples of unknown ancestry (i.e. POPFLAG = 0) to test the efficacy of the 34-plex assay in characterising populations with significant admixture levels. The 709 NIST population reference samples were analysed in identical fashion, pre-empting expected high levels of admixture by also setting POPFLAG = 0. An additional set of Structure analyses examined just the NIST U.S. population panel samples by themselves as an alternative approach to runs combining reference samples of known un-admixed ancestry (typically HGDP-CEPH panel individuals) in order to assess how closely admixture component ratios matched those obtained with runs combining NIST and reference samples.

      3. Results and discussion

      3.1 Performance of the modified 34-plex SNaPshot assay: SBE product mobility variation and peak height ratios

      An example 34-plex electropherogram is shown in Fig. 2. This profile was chosen due to an above-average heterozygosity of 72% with 49 peaks from a total 68, so it illustrates the improved heterozygote balance we observed in comparison to patterns obtained from the previous assay conditions (see Fig. 1 of [
      • Phillips C.
      • Salas A.
      • Sánchez J.J.
      • Fondevila M.
      • Gómez-Tato A.
      • Álvarez-Dios J.
      • Calaza M.
      • Casares de Cal M.
      • Ballard D.
      • Lareu M.V.
      • Carracedo Á.
      The SNPforID consortium, inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs.
      ]). The migration positions of SBE products not present in this example profile are marked with the SNP and allele code in grey – note three positions are shown for tri-allelic SNPs P24 and P27.
      Figure thumbnail gr2
      Fig. 2Example electropherogram obtained from the revised 34-plex assay. SNP allele peaks marked with internal codes listed in .
      Unlike forensic STR typing, profiles generated by SNaPshot tend to give a range of non-specific peaks that are often high enough to mimic product peaks, typically in the green dye channel. Our experiences with the previous 34-plex assay profiles consistently showed that non-specific signals fall outside the established allelic peak positions and can therefore be confidently excluded as allele extension products. This makes the recording of SBE product mobility variation an important step when validating forensic SNaPshot assays. Therefore we used the genotyping of 712 NIST reference samples to optimise the reconfigured assay and, with reference to multiple runs per sample, plotting all allele mobilities and peak height ratios. Results for these extended electrophoresis analyses are summarised in Table 2, Table 3. Mean base pair size estimates for each SNP allele were made using standardised conditions of POP-4™ in 36 cm capillary arrays with an AB 3130xl Genetic Analyser. The observed allele sizes with the above CE conditions are shown in Table 2. Allele size estimates were found to be very stable across multiple samples, runs, and reactions. Size values for all 34 SNPs in the modified assay gave an average standard deviation across 70 positions of ±0.06 bp and values predominantly equal to or less than 0.1 bp (highest ±0.14 bp), but for most SNPs this was much smaller. Therefore any signal peaks falling outside the highly conservative ±0.5 bp size variation window routinely applied to allelic peak positions can be securely designated as artifactual.
      Table 3Mean heterozygote peak height ratios with SNPs in same order as Fig. 3.
      MarkerMeanStandard deviationMarkerMeanStandard deviationMarkerMeanStandard deviationMarkerMeanStandard deviation
      T/C heterozygotesG/A heterozygotesG/T heterozygotesA/T heterozygotes
      P24-CT1.20920.1692P27-AG1.63720.1701P202.07310.5997P24-AT1.76320.4127
      P040.90890.1153P285.08061.4789P012.32250.0398A520.98870.1182
      P261.67780.1455A131.47420.2727Total GT2.19780.5183P111.12470.1829
      P22a1.25630.1349P232.04290.5083Total AT1.29220.2380
      P191.53590.2772P181.71160.5266A/C heterozygotes
      P172.18530.3817P161.41320.3034P27-AC2.02400.3388C/G heterozygotes
      P141.21180.1916P152.37770.2960P24-AC2.16120.5010P27-CG3.56950.7720
      P121.09790.2036P131.09780.2317P211.18580.1360A25a3.18530.6303
      P09a1.20720.1850A401.56390.3181P021.78300.2528F23.09820.42424
      P071.16720.1331P084.88650.6343Total AC1.78850.3071Total CG3.32150.6089
      P06a1.36410.1858A213.52751.0584
      P051.19040.3136A072.34120.9151
      P031.30960.2822A291.17500.0912
      Total CT1.33240.2091Total AG2.33300.5234
      Heterozygote imbalance has a more disruptive effect on SNaPshot genotyping consistency than occurrence of non-specific peaks. Large disparities in reporter dye signal strength, varying primer extension efficiencies between the four terminator dideoxy nucleotides and stochastic PCR effects can all impact on imbalanced signals from SBE products. While unavoidable, SNaPshot heterozygote imbalance can usually be accommodated by assigning pre-set peak height ratio (PHR) limits. Previously published guidelines [
      • Sánchez J.J.
      • Phillips C.
      • Børsting C.
      • Balogh K.
      • Bogus M.
      • Fondevila M.
      • Harrison C.D.
      • Musgrave-Brown E.
      • Salas A.
      • Syndercombe-Court D.
      • Schneider P.
      • Carracedo Á.
      • Morling N.
      A multiplex assay with 52 single nucleotide polymorphisms for human identification.
      ,
      • Sánchez J.J.
      • Endicott P.
      Developing multiplexed SNP assays with special reference to degraded DNA templates.
      ] suggest that for equal amounts of dye molecules passing the detection window, recorded RFU peak heights for a G-terminating nucleotide (DR110 dye) are generally two times that of a corresponding A (DR6G) and 4 times that of corresponding C (TAMRA) or T (ROX). To explore the widest range of heterozygote peak pairs we collected RFU values from 20 heterozygotes in 80 amplified samples of European, African American, Hispanic and East Asian origin, for each of the 34 SNPs, ensuring many more heterozygotes could be analysed than normally possible in any one population group with these SNPs. The distribution of peak height ratios from these samples is shown in Fig. 3 and the mean values for each PHR range are listed in Table 3.
      Figure thumbnail gr3
      Fig. 3Distribution of peak height ratios recorded in the heterozygotes of 80 NIST population panel samples.
      Amplifications were performed with three separate PCR reactions but this was not observed to have a significant effect on peak height ratios (data not shown), indicating stochastic PCR effects represent a minor source of signal variation. CT heterozygotes showed a mean T/C PHR value of 1.33 (i.e. higher T peak) with a standard deviation of 0.21 and SNP P17 a single outlier with a mean T/C = 2.19. AC and AT heterozygotes showed similar minor variation around mean PHRs of A/C = 1.79 (±0.31) and A/T = 1.29 (±0.23) respectively. The DR110-linked G allele shows the highest deviations from predicted PHR values particularly in AG heterozygotes and it is worth noting the DR110 dye displays the highest energy emission of all the four dyes used in SNaPshot. Some AG SNPs display PHRs close to 1 with little variation, particularly SNPs A07 and A29, but these contrast with the new SNP P28 as well as A21 and P08 that have high between-run variation (up to 0.5) and PHR values greater than predicted. GT peak pairs were the most divergent from the predicted PHR of 4 being closer to a value of 2, likely due to the higher emission energy from the ROX dye of T compared to the TAMRA of C. The standard deviation from average peak height in all G-based heterozygotes was greater than 0.5: more than twice the value of the other heterozygotes and evident in the broader spread of G peak ratios in Fig. 3. Therefore average PHRs collated in our study for each SNP and summarised in Table 3, should be kept in mind when interpreting 34-plex SNaPshot profiles, especially those generated from difficult or scant DNA sources where peaks are likely to be much lower than normal and show greater stochastic variation.

      3.2 Population analyses

      Structure analysis of HGDP-CEPH reference samples alone using both original and revised 34-plex SNP sets are presented in Fig. 4A . The optimum K value corresponds to five population clusters in both cases, as shown by the L(K) mean value plots in Fig. 4B. Average ancestry membership proportions (from the HGDP-CEPH analysis) are listed in Table 4A and these show an increase in the values associated with the East Asian ancestry component using the new SNP combination, seen as a median value of 91.16% raised from a previous 87.64%. The revised SNP set creates rather cleaner Structure cluster plots for East Asians at K = 5 in HGDP-CEPH samples shown in Fig. 4A and to a lesser extent in the CHB, CHS, JPT samples of the combined HGDP-CEPH-1000 Genomes analyses described below and shown in Fig. 4C.
      Figure thumbnail gr4
      Fig. 4Cluster plots summarising five Structure analyses performed on combinations of reference (HGDP-CEPH plus SNPforID populations) and unknown populations (NIST SRM 2391/control DNAs, 1000 Genomes populations and NIST U.S. panel). Structure runs comprised, (A) HGDP-CEPH alone, old and new SNP sets, with the optimum K value of 5 indicated in by the likelihood L(K) plot of (B and C) HGDP-CEPH with unknown 1000 Genomes and SRM 2391c/control DNAs (expanded in D), (E) HGDP-CEPH plus SNPforID with unknown NIST U.S. population panel and new SNPs only giving an optimum K:4. 1000 Genomes population descriptors as in .
      Table 4AAverage membership proportions to K:5 clusters from the Structure analysis of HGDP-CEPH populations plotted in Fig. 4A. Values in bold indicate improved performance with the revised SNP combination.
      Previous 34-plex (admixture model)Revised 34-plex (admixture model)
      Given populationProportion of membership of each pre-defined population in each of the 5 clustersNumber of individualsGiven populationProportion of membership of each pre-defined population in each of the 5 clustersNumber of individuals
      Inferred clustersInferred clusters
      AFREUREASOCENAMAFREURE ASNOCEAME
      African0.95600.00600.01100.01700.010098African0.95800.00670.00800.01900.008398
      European0.00600.95300.01200.01300.0160158European0.00570.95730.01000.01300.0140158
      East Asian0.00700.02500.87640.05460.0370227East Asian0.00700.02300.91160.02700.0313227
      Oceanian0.00600.00500.01230.96870.008026Oceanian0.00700.00600.01300.96300.011026
      Native American0.00600.01730.01900.01300.944663Native American0.00600.01730.01900.01170.94663
      Analysing 1000 Genomes individuals as study samples of ‘unknown’ ancestry (lower half of Fig. 4C) provides results assigning all samples to their expected population clusters. Four populations are arranged in a sequence to indicate rising second admixture component proportions showing the range as a gradient of two-cluster memberships in the following 1000 Genomes populations: ASW (African Americans from Southwest U.S.); CLM (Colombians from Medellín); MXL (Mexicans from LA district, CA); and PUR (Puerto Ricans). Individual sample membership proportions in K5 clusters for both original and new 34-plex SNPs (K4 for NIST populations) estimated from the six Structure analyses of Fig. 4A–E, are listed in supplementary Table S1. These indicate ASW samples have a range of 5–45% European co-ancestry with African the major component and PUR; CLM; MXL show increasing levels of American co-ancestry with European the major component plus indications of a significant African third component in many Colombians and Puerto Ricans. A large number of Mexicans (MXL) have comparable proportions of European to Native American ancestry across the population sample, while many PUR, CLM and MXL show their predominant ancestry is European, not Native American.
      The Structure analysis of forensic reference/control DNAs is presented in Fig. 4D. The patterns observed for the ancestries of SRM 2391c A, B and C components match those reported by NIST (cluster membership proportions are given in supplementary Table S2), while the other reference/control DNAs are consistent with being unadmixed European ancestries.
      An initial Structure analysis was made for the NIST population panel alone (i.e. excluding reference populations) to measure the genetic structure discernible in this sample of the U.S. The cluster plots of the optimum K = 4 (revised SNP set only, and with cluster colours modified to avoid overlap with Fig. 4 plots) are shown in supplementary Fig. S1. Four genetic clusters were detected despite the heterogeneity of the sample set and a largely cultural rather than a population genetics-based definition of Hispanics. The analysis therefore detects four clusters in the NIST panel that closely correspond to the population definitions given to the samples. Furthermore most Hispanics are differentiated by showing a distinct genetic component that does not correspond to the clusters defining the other three U.S. populations.
      With the observation that four clusters were detected in the NIST U.S. panel we then performed Structure analysis adding reference populations and the resulting K:4 cluster plot is shown in Fig. 4E (revised SNP set only). In this analysis reference populations were expanded to include several SNPforID populations with admixture. As with the admixed populations in Fig. 4C (ASW, CLM, MXL, PUR), NIST individuals have been ordered in ascending major co-ancestry component (except descending for US African Americans). It is evident that each NIST group corresponds well with the expected population definitions. U.S. Caucasians and U.S. East Asians show largely unadmixed European and East Asian ancestry. In contrast, a large proportion of African Americans show significant levels of European co-ancestry as a major second component. U.S. Hispanics in particular show considerable heterogeneity and complex patterns of population admixture. The largest proportion of Hispanics show majority European co-ancestry, there is a wide range of Native American co-ancestry proportions and we found detectable African co-ancestry as a third component in most samples. It is evident that U.S. Hispanic individuals have a higher Native American co-ancestry compared to other U.S. population groups, which helps to explain the differentiation pattern observed in supplementary Fig. S1.
      As it is difficult to characterise the overall pattern of admixture in any one population when there is a continuous range of detected second or third co-ancestry, we summarised the data from all samples analysed in the Structure run of Fig. 4E by compiling mean cluster membership proportions with 1st (25th percentile) and 3rd (75th) quartile ranges for the eight population groups analysed using the data of supplementary Tables S1. These are shown in Fig. 5 divided into four ancestry proportion plots based on the range of membership values to the four clusters. The boxes denote 1st–3rd quartile ranges each containing the mean value line with whiskers indicating the full range. The lowest means and narrowest quartile ranges are found in the reference European, African and East Asian proportions for their alternative ancestries (e.g. Europeans have minimal African/East Asian ancestry, etc.) and near identical patterns describe the NIST U.S. European and U.S. East Asian samples. NIST Hispanics show the highest mean value for European ancestry of 57% and this has the broadest quartile range, while the mean American ancestry proportion is only 11.5% but with an equally broad quartile range. In the Hispanics the observed African ancestry has a mean of 6.8% so on average this represents a quite minor third component of co-ancestry but the quartile box shows a significant range reaching 18% at the 3rd quartile, so for many NIST Hispanics African ancestry is a detectably larger component. Therefore, as a group, the NIST Hispanics can be described as showing predominantly European ancestry with comparatively minor proportions of Native American co-ancestry plus detectable but generally much lower African co-ancestry. This pattern is in sharp contrast to the HGDP-CEPH and SNPforID Native American reference populations that have a mean 90% American ancestry and minimal European or African co-ancestry. NIST African Americans give a consistent pattern of African major ancestry ranging from 70 to 85% and European co-ancestry of about 5–20% in most samples.
      Figure thumbnail gr5
      Fig. 5Distribution of cluster membership proportions (% proportions of membership to an optimum K:4 clusters) shown by NIST U.S. population panel samples and HGDP-CEPH/SNPforID reference populations. These values were obtained from the Structure run of E. Boxes indicate 1st and 3rd quartile ranges and contain the mean membership proportion for each population group and K ancestry. Whisker lines indicate the full range of values observed. Values placed next to certain selected populations show mean and quartile range values.
      The classification success rates from Snipper cross validation of the HGDP-CEPH panel samples are outlined in Table 4B, indicating enhanced performance of European, East Asian and Oceanian classifications, with 0.64%, 2.20% and 3.85% improvements in assignment success respectively. Although the 34-plex test was not designed to infer Native American or Oceanian ancestry, the enhanced differentiation of East Asians provided by rs3827760 brings improvements to the analysis of all five global groups.
      Table 4BClassification success (bold % values) from cross validation of the 5 group training set listed in supplementary Table S3. Values in bold indicate improved performance with the new SNP combination.
      Previous 34-plex, 5 group training set of supplementary Table S3, worksheet 2Revised 34-plex, 5 group training set of supplementary Table S3, worksheet 22
      Estimation with cross-validation to compute the success ratio using the best 34 SNPs:Estimation with cross-validation to compute the success ratio using the best 34 SNPs:
      AfricaEuropeEast AsiaOceaniaAmericaAfricaEuropeEast AsiaOceaniaAmerica
      Population of African origin100%0.00%0.00%0.00%0.00%Population of AFRICA origin100%0.00%0.00%0.00%0.00%
      Population of European origin0.00%98.73%0.63%0.00%0.63%Population of EUROPE origin0.00%99.37%0.00%0.00%0.63%
      Population of East Asian origin0.00%0.00%92.51%0.88%6.61%Population of EAST ASIA origin0.00%0.00%94.71%0.44%4.85%
      Population of Oceanian origin0.00%0.00%3.85%96.15%0.00%Population of OCEANIA origin0.00%0.00%0.00%100%0.00%
      Population of American origin0.00%0.00%0.00%0.00%100%Population of AMERICA origin0.00%0.00%0.00%0.00%100%

      3.3 Reference/control DNA profiles and AIM-SNP genotypes in seventy populations

      SNP profiles for the reference/control DNA samples were compiled through independent, triplicated typing in two laboratories with full consensus in all cases and these genotypes are listed in supplementary Table S2. Fig. 4D shows the Structure analysis of the nine reference DNAs detecting the Oceanian ancestry of SRM 2391c-C, with ∼93% membership to the Oceanian cluster identified from the HGDP-CEPH reference samples. The only other SRM 2391c component to show a significant non-European component (excluding the mixed sample D) was sample B with 10.4% membership to the Native American cluster identified from the HGDP-CEPH reference samples, this division of European and American co-ancestry broadly matches values seen in the admixed American populations of CLM, MXL and PUR as well as many NIST Hispanics. The ancestry membership proportions (K5 = five population clusters) obtained from Structure analysis of the nine reference DNAs are listed in supplementary Table S2.
      Supplementary Table S2 also lists the likelihoods and ancestry assignments made for the reference DNAs based on the revised 34-plex SNP profiles submitted to Snipper. Assignments were made using a custom five group HGDP-CEPH training set (AFR-EUR-E ASN-AME-OCE). Two HGDP-CEPH based training sets are included ready to use as an uploadable Excel file in supplementary file S3, though note that one worksheet must be removed in use (revised or previous set) and that SNPs and therefore submitted profiles are ordered in the files by ascending rs-number, not SNaPshot mobility. The previous rs727811 component substitutes rs3827760 in the second training set worksheet. The Oceanian classification of SRM 2391c-C matches the Structure inferences described in Section 3.2 and we interpret this as an unequivocal ancestry assignment. Five reference/control DNAs (those except SRM 2391c-B to E) gave very strong European classifications of 1.E+15 to 1.E+18 times more likely European than the next closest assignment (American or East Asian). Two SRM controls, SRM 2391c-B and E gave reduced European assignments, although the non-European co-ancestry indicated by Snipper and measured by Structure as 10% and 5% American membership proportions respectively, is based on SNPs that have less power to differentiate this ancestry compared to EUR-AFR-E ASN.
      Full genotype data from the 52 HGDP-CEPH populations, fourteen 1000 Genomes populations and four NIST U.S. populations for 34 component SNPs plus rs3827760 is listed in supplementary Tables S4. As the Stanford/Michigan HGDP-CEPH SNP data accessible in SPSmart (http://spsmart.cesga.es/) does not include rs3827760 we have also listed separately the HGDP-CEPH genotypes for this SNP. The 1000 Genomes genotype tables also provide allele frequency estimates for each population although it should be noted that the two tri-allelic SNPs rs4540055/rs5030240 (P24/27) are both given as binary AC substitutions by 1000 Genomes and this discrepancy is discussed in more detail in Section 3.4.

      3.4 Tri-allelic AIM-SNPs rs4540055 and rs5030240

      Two of the most informative components of the 34-plex assay are tri-allelic SNPs: rs4540055 and rs5030240, both having three alleles recorded at a single variable nucleotide position (A/C/T and A/C/G respectively) in all population groups, albeit at low frequencies for the third allele in some populations (rs4540055-C and rs5030240-A). We made two noteworthy observations concerning these tri-allelic SNPs: (1) their evident success in detecting three alleles in the two mixture contributors of SRM2391c D and; (2) the disparity between the allele frequency distributions obtained from our SNaPshot typing compared to the data from the next-generation deep re-sequencing analysis of 1000 Genomes as listed in supplementary Table S4.
      The original motivation for identifying and characterising tri-allelic SNPs was to bring mixture detection properties into forensic SNP typing assays [
      • Phillips C.
      • Lareu M.V.
      • Salas A.
      • Carracedo Á.
      Nonbinary single-nucleotide polymorphism markers.
      ] and this approach has been more thoroughly explored in recent studies by Westen et al. [
      • Westen A.A.
      • Matai A.S.
      • Laros J.F.J.
      • Meiland H.C.
      • Jasper M.
      • de Leeuw W.J.F.
      • de Knijff P.
      • Sijen T.
      Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples.
      ]. Since tri-allelic SNP allele frequencies commonly show strong population differentiation it is also not surprising to see three alleles in both markers from a mixture of a European and an Oceanian donor. For instance, the rs4540055-C allele is almost eight times more frequent in Oceanians than Europeans and rs5030240-C more than eighteen times more frequent in Europeans. So while tri-allelic SNPs are much weaker than STRs at detecting multiple alleles in mixed donor profiles, the probability of finding more than two alleles is raised if donors are from different populations, potentially helping to identify the ancestry of components in simple mixtures. The three-peak patterns observed in SRM 2391c-D are shown in Fig. 6A .
      Figure thumbnail gr6
      Fig. 6(A) Observed tri-allelic SNP peak patterns in the mixed-source SRM 2391c-D sample. (B) Allele frequency estimates for both tri-allelic SNPs in 34-plex obtained from next generation sequencing by 1000 Genomes (top row) and SNaPshot typing of the HGDP-CEPH panel (middle row). The bottom row pie charts indicate the HGDP-CEPH allele frequencies found when the T allele is treated as silent (undetectable) in rs4540055 and the G allele in rs5030240, giving a close match to the estimates from 1000 Genomes and explaining the discrepant values.
      The second, immediately apparent characteristic of tri-allelic SNPs seen in the allele distribution comparisons in this study, is their failure to be characterised as three allele SNPs by both the HapMap and 1000 Genomes projects. Each project has reported the two 34-plex tri-allelic SNPs as well as those additionally identified as tri-allelic SNPs by Westen [
      • Westen A.A.
      • Matai A.S.
      • Laros J.F.J.
      • Meiland H.C.
      • Jasper M.
      • de Leeuw W.J.F.
      • de Knijff P.
      • Sijen T.
      Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples.
      ] as binary SNPs. There appears to be problems with the genotyping of tri-allelic SNPs with next-generation whole genome sequencing approaches, and it is significant that 1000 Genomes have changed the allele calls of rs4540055 from AT to AC and rs5030240 from AG to AC between published data revisions of December 2010 and August 2011. Comparisons of the allele frequency distributions of both tri-allelic SNPs from our typing with 1000 Genomes sequencing are summarised in Fig. 6B and these appear to be quite disparate. We have explained these differences by treating one of the alleles as silent, for example when rs4540055-T is considered silent or undetectable, then AT appears as an AA homozygote and CT as a CC. Therefore the A frequency can be estimated as: (AA + AT + (AC × 0.5)/total genotypes) and C = (CC + CT + (AC × 0.5)/total). When these adjustments are made to the HGDP-CEPH panel genotypes made by SNPforID SNaPshot typing, then the allele frequencies are much better matched. Fig. 6B shows these HGDP-CEPH panel genotype adjustments made by assuming silent rs4540055-T and rs5030240-G alleles. However, it is discouraging that one of the benefits of such extensive sequencing initiatives: an enhanced ability to locate and identify SNPs with more than two alternative nucleotides, appears to be lost in progressing to next generation sequencing technologies. This is further exacerbated by 1000 Genomes using multiple sequencing centres/platforms but organising analyses to concentrate on samples from one population in any one centre.

      4. Concluding remarks

      Swapping a single component SNP at the same time as making adjustments to the chemistry and certain mobility positions in the 34-plex assay provided relatively little disruption to routine forensic SNP typing with this ancestry analysis test. The enhancement to the differentiation of East Asians is evident amongst the seventy populations characterised and the comparisons made in this study. As East Asians had the weakest population differentiation using the original 34 SNP test, the introduction of a near-fixed SNP in this population group represents a major enhancement of the test without upsetting its forensic performance. While SNaPshot primer extension assays remain the only viable multiplexed SNP typing option available to forensic analysis it is crucial to thoroughly gauge the mobility and signal variation entailed with using this chemistry as we have done with the 34 sets of peak pair data observed with the NIST U.S. population panel.
      The Hispanics that were studied as part for the NIST panel highlight the difficulties of analysing individuals with complex, potentially three-way, patterns of admixture. Not only is it difficult to make accurate assessments of co-ancestry when these are so varied within the group as a whole, but also component AIM-SNPs in small forensic multiplexes such as the 34-plex test cannot provide equally informative differentiations of all co-ancestry contributors. With this in mind we have continued development of additional AIM-SNP based ancestry tests, complimentary to the 34-plex, that improve the differentiation of Native Americans from EUR-AFR-E ASN populations. It is also likely that an optimum strategy will be the addition of ancestry-informative indels in small multiplexes with similar properties to the 34-plex SNPs. We recently published details of an AIM-indel set [
      • Pereira R.
      • Phillips C.
      • Pinto N.
      • Santos C.
      • Dos Santos S.E.
      • Amorim A.
      • Carracedo Á.
      • Gusmão L.
      Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.
      ] that is certain to improve the inferences made for complex admixture patterns when combined with the current 34-plex SNPs and bring improved differentiation of Native American ancestry. Therefore forensic ancestry inferences based on typing a range of short binary markers will show improved precision while keeping multiplexes small-scale, manageable and with sufficient sensitivity for challenging forensic material. Additionally, we continue to advocate the use of tri-allelic SNPs as highly differentiated ancestry markers. Regrettably, the opportunity to build a well-validated catalog of tri-allelic SNPs from the output of 1000 Genomes is not available to forensic genetics research at this moment and consequently their number remains small.
      Revisions to the 34-plex primers, chemistry and cycling conditions added since the original development of the assay have evolved gradually and therefore it is difficult to state how each of these changes has influenced the improved peak height balance we report. However the biggest advance in 34-plex electropherogram quality was obtained with the changes to primer ratios that accompanied changes to the positions of four SNPs. Specifically, the largest improvement resulted from increasing the PCR and EXT primers for rs2814778 (P04) and rs16891982 (P25a), while reducing those of rs2304925 (P01).
      Lastly, at the time of writing, a new study of the statistical limitations of familial searching strategies has highlighted the problem of using inappropriate STR allele frequencies when the ancestry of the profile donor is not known [
      • Rohlfs R.V.
      • Fullerton S.M.
      • Weir B.S.
      Familial identification: population structure and relationship distinguishability.
      ]. Using generalised or incorrect allele frequencies can lead to a large number of adventitious matches and the search loses specificity. Therefore it appears that a simple forensic ancestry test with sufficient power to distinguish the major population groups could be applied as an informative adjunct to conventional STR profiling and help enhance the power such tests already provide. This approach would not involve database-wide SNP typing as only the profile would require an inference of ancestry to select the appropriate STR allele frequencies in each search.

      Disclaimer

      This work was funded in part through funding from the U.S. FBI Biometric Center of Excellence: ‘Forensic DNA Typing as a Biometric Tool’. Points of view in this document are those of the authors and do not necessarily represent the official position or policies of the U.S. Departments of Justice or Commerce. Commercial equipment, instruments, and materials are identified in order to specify experimental procedures as completely as possible. In no case does such identification imply a recommendation or endorsement by the National Institute of Standards and Technology nor does it imply that any of the materials, instruments or equipment identified are necessarily the best available for the purpose.

      Acknowledgements

      The work of MF at NIST has been supported by the Fundacion Barrie de la Maza Postgraduate Grant Program (2010). CS was awarded a PhD grant to by the Portuguese Foundation for Science and Technology (SFRH/BD/75627/2010 and co-financed by the European Social Fund (Human Potential Thematic Operational Program). AFA was supported by a María Barbeito grant from Xunta de Galicia. MVL was supported by funding from Xunta de Galicia INCITE 09 208163PR and Ministerio de Educación y Ciencia BIO2006-06178.

      Appendix A. Supplementary data

      The following are the supplementary data to this article:
      Figure thumbnail mmc1
      Supplementary Fig. S1Structure cluster plot of NIST populations at K:4.

      References

        • Fondevila M.
        • Phillips C.
        • Naveran N.
        • Fernandez L.
        • Cerezo M.
        • Salas A.
        • Carracedo Á.
        • Lareu M.V.
        Case report: identification of skeletal remains using short-amplicon marker analysis of severely degraded DNA extracted from a decomposed and charred femur.
        Forensic Sci. Int. Genet. 2008; 2: 212-218
        • Freire-Aradas A.
        • Fondevila M.
        • Kriegel A.-K.
        • Phillips C.
        • Gill P.
        • Prieto L.
        • Schneider P.M.
        • Carracedo Á.
        • Lareu M.V.
        A new SNP assay for identification of highly degraded human DNA.
        Forensic Sci. Int. Genet. 2012; 6: 341-349
        • Romanini C.
        • Catelli M.L.
        • Borosky A.
        • Salado-Puerto M.
        • Pereira R.
        • Phillips C.
        • Fondevila M.
        • Freire A.
        • Santos C.
        • Carracedo Á.
        • Lareu M.V.
        • Gusmao L.
        • Vullo C.M.
        Typing short amplicon binary polymorphisms: supplementary SNP and Indel genetic information in the analysis of highly degraded skeletal remains.
        Forensic Sci. Int. Genet. 2012; 6: 469-476
        • Gill P.
        • Werrett D.J.
        • Budowle B.
        • Guerrieri R.
        An assessment of whether SNPs will replace STRs in national DNA databases – joint considerations of the DNA working group of the European Network of Forensic Science Institutes (ENFSI) and the Scientific Working Group on DNA Analysis Methods (SWGDAM).
        Sci. Justice. 2004; 44: 51-53
        • Lareu M.V.
        • García-Magariños M.
        • Phillips C.
        • Quintela I.
        • Carracedo Á.
        • Salas A.
        Analysis of a claimed distant relationship in a deficient pedigree using high density SNP data.
        Forensic Sci. Int. Genet. 2012; 6: 350-353
        • Phillips C.
        • Fondevila M.
        • García-Magariños M.
        • Rodriguez A.
        • Salas A.
        • Carracedo Á.
        • Lareu M.V.
        Resolving relationship tests that show ambiguous STR results using autosomal SNPs as supplementary markers.
        Forensic Sci. Int. Genet. 2008; 2: 198-204
        • Kayser M.
        • Schneider P.M.
        DNA-based prediction of human externally visible characteristics in forensics: motivations, scientific challenges, and ethical considerations.
        Forensic Sci. Int. Genet. 2009; 3: 154-161
        • Kayser M.
        • de Knijff M.P.
        Improving human forensics through advances in genetics, genomics and molecular biology.
        Nat. Rev. Genet. 2011; 12: 179-192
        • Branicki W.
        • Liu F.
        • van Duijn K.
        • Draus-Barini J.
        • Pośpiech E.
        • Walsh S.
        • Kupiec T.
        • Wojas-Pelc A.
        • Kayser M.
        Model-based prediction of human hair color using DNA variants.
        Hum. Genet. 2011; 129: 443-454
        • Walsh S.
        • Liu F.
        • Ballantyne K.N.
        • van Oven M.
        • Lao O.
        • Kayser O.M.
        IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
        Forensic Sci. Int. Genet. 2011; 5: 170-180
        • Mengel-From J.
        • Børsting C.
        • Sanchez J.J.
        • Eiberg H.
        • Morling N.
        Human eye colour and HERC2, OCA2 and MATP.
        Forensic Sci. Int. Genet. 2010; 4: 323-328
        • Phillips C.
        • Salas A.
        • Sánchez J.J.
        • Fondevila M.
        • Gómez-Tato A.
        • Álvarez-Dios J.
        • Calaza M.
        • Casares de Cal M.
        • Ballard D.
        • Lareu M.V.
        • Carracedo Á.
        The SNPforID consortium, inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs.
        Forensic Sci. Int. Genet. 2007; 1: 273-280
        • Lao O.
        • Vallone P.M.
        • Coble M.D.
        • Diegoli T.M.
        • van Oven M.
        • van der Gaag K.J.
        • Pijpe J.
        • de Knijff P.
        • Kayser M.
        Evaluating self-declared ancestry of U.S. Americans with autosomal, Y-chromosomal and mitochondrial DNA.
        Hum. Mutat. 2010; 31: E1875-E1893
        • Halder I.
        • Shriver M.
        • Thomas M.
        • Fernandez J.R.
        • Frudakis T.
        A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications.
        Hum. Mutat. 2008; 29: 648-658
        • Børsting C.
        • Tomas C.
        • Morling N.
        SNP typing of the reference materials SRM 2391b 1-10, K562, XY1, XX74, and 007 with the SNPforID multiplex.
        Forensic Sci. Int. Genet. 2011; 5: e81-e82
      1. M.C. Kline, C.R. Hill, J.L. Almeida, E.L.R. Butts, M.D. Coble, J.M. Butler, The latest and greatest NIST PCR-based DNA profiling standard: updates and status of Standard Reference Material® (SRM) 2391c. Profiles-in-DNA, Promega Corporation Web site: http://www.promega.com/resources/articles/profiles-in-dna/2011/the-latest-and-greatest-nist-pcr-based-dna-profiling-standard/.

        • Phillips C.
        • Fang R.
        • Ballard D.
        • Fondevila M.
        • Harrison C.
        • Hyland F.
        • Musgrave-Brown E.
        • Proff C.
        • Ramos-Luis E.
        • Sobrino B.
        • Carracedo Á.
        • Furtado M.R.
        • Syndercombe Court D.
        • Schneider P.M.
        SNPforID consortium, evaluation of the Genplex SNP typing system and a 49plex forensic marker panel.
        Forensic Sci. Int. Genet. 2007; 1: 180-185
        • Tomas C.
        • Axler-DiPerte G.
        • Budimlija Z.M.
        • Børsting C.
        • Coble M.D.
        • Decker A.E.
        • Eisenberg A.
        • Fang R.
        • Fondevila M.
        • Fredslund S.F.
        • Gonzalez S.
        • Hansen A.J.
        • Hoff-Olsen P.
        • Haas C.
        • et al.
        Autosomal SNP typing of forensic samples with the GenPlex™ HID System: results of a collaborative study.
        Forensic Sci. Int. Genet. 2011; 5: 369-375
        • Amigo J.
        • Salas A.
        • Phillips C.
        ENGINES: exploring single nucleotide variation in entire human genomes.
        BMC Bioinformatics. 2011; 12: 105
        • Sánchez J.J.
        • Phillips C.
        • Børsting C.
        • Balogh K.
        • Bogus M.
        • Fondevila M.
        • Harrison C.D.
        • Musgrave-Brown E.
        • Salas A.
        • Syndercombe-Court D.
        • Schneider P.
        • Carracedo Á.
        • Morling N.
        A multiplex assay with 52 single nucleotide polymorphisms for human identification.
        Electrophoresis. 2006; 27: 1713-1724
        • Sánchez J.J.
        • Endicott P.
        Developing multiplexed SNP assays with special reference to degraded DNA templates.
        Nat. Protoc. 2006; 1: 1370-1378
        • Phillips C.
        • Lareu M.V.
        • Salas A.
        • Carracedo Á.
        Nonbinary single-nucleotide polymorphism markers.
        Int. Congr. Ser. 2004; 1261: 27-29
        • Westen A.A.
        • Matai A.S.
        • Laros J.F.J.
        • Meiland H.C.
        • Jasper M.
        • de Leeuw W.J.F.
        • de Knijff P.
        • Sijen T.
        Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples.
        Forensic Sci. Int. Genet. 2009; 3: 233-241
        • Pereira R.
        • Phillips C.
        • Pinto N.
        • Santos C.
        • Dos Santos S.E.
        • Amorim A.
        • Carracedo Á.
        • Gusmão L.
        Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.
        PLoS One. 2012; 7: e29684
        • Rohlfs R.V.
        • Fullerton S.M.
        • Weir B.S.
        Familial identification: population structure and relationship distinguishability.
        PLoS Genet. 2012; 8: e1002469