Advertisement

The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA

Open AccessPublished:August 22, 2012DOI:https://doi.org/10.1016/j.fsigen.2012.07.005

      Abstract

      Recently, the field of predicting phenotypes of externally visible characteristics (EVCs) from DNA genotypes with the final aim of concentrating police investigations to find persons completely unknown to investigating authorities, also referred to as Forensic DNA Phenotyping (FDP), has started to become established in forensic biology. We previously developed and forensically validated the IrisPlex system for accurate prediction of blue and brown eye colour from DNA, and recently showed that all major hair colour categories are predictable from carefully selected DNA markers. Here, we introduce the newly developed HIrisPlex system, which is capable of simultaneously predicting both hair and eye colour from DNA. HIrisPlex consists of a single multiplex assay targeting 24 eye and hair colour predictive DNA variants including all 6 IrisPlex SNPs, as well as two prediction models, a newly developed model for hair colour categories and shade, and the previously developed IrisPlex model for eye colour. The HIrisPlex assay was designed to cope with low amounts of template DNA, as well as degraded DNA, and preliminary sensitivity testing revealed full DNA profiles down to 63 pg input DNA. The power of the HIrisPlex system to predict hair colour was assessed in 1551 individuals from three different parts of Europe showing different hair colour frequencies. Using a 20% subset of individuals, while 80% were used for model building, the individual-based prediction accuracies employing a prediction-guided approach were 69.5% for blond, 78.5% for brown, 80% for red and 87.5% for black hair colour on average. Results from HIrisPlex analysis on worldwide DNA samples imply that HIrisPlex hair colour prediction is reliable independent of bio-geographic ancestry (similar to previous IrisPlex findings for eye colour). We furthermore demonstrate that it is possible to infer with a prediction accuracy of >86% if a brown-eyed, black-haired individual is of non-European (excluding regions nearby Europe) versus European (including nearby regions) bio-geographic origin solely from the strength of HIrisPlex eye and hair colour probabilities, which can provide extra intelligence for future forensic applications. The HIrisPlex system introduced here, including a single multiplex test assay, an interactive tool and prediction guide, and recommendations for reporting final outcomes, represents the first tool for simultaneously establishing categorical eye and hair colour of a person from DNA. The practical forensic application of the HIrisPlex system is expected to benefit cases where other avenues of investigation, including STR profiling, provide no leads on who the unknown crime scene sample donor or the unknown missing person might be.

      Keywords

      1. Introduction

      Over the last few years, the prediction of externally visible characteristics (EVCs) from DNA has been an interesting topic of study for many reasons, in particular, its anticipated use within forensic genetics [
      • Tully G.
      Genotype versus phenotype: human pigmentation.
      ,
      • Kayser M.
      • Schneider P.M.
      DNA-based prediction of human externally visible characteristics in forensics: motivations, scientific challenges, and ethical considerations.
      ,
      • Kayser M.
      • de Knijff P.
      Improving human forensics through advances in genetics, genomics and molecular biology.
      ] resulting in the chosen term Forensic DNA Phenotyping (FDP). The ability to predict the physical appearance of an individual directly from crime scene material can in principle help police investigations by limiting a large number of potential suspects in cases where perpetrators unknown to the investigating authorities are involved. These include cases where conventional STR profiling could not provide a hit within the forensic DNA (profile) database, or could not provide a match with a suspect singled-out by police investigation, or cases where an STR profile could simply not be generated due to low quality and/or quantity of DNA available. Using EVC information obtained from the crime scene material via FDP, police would then proceed with more concentrated enquires, and finally request standard forensic STR profiling only for the reduced number of EVC matching suspects aiming DNA individualisation for court room use. Obviously, the more EVCs that are predictable from crime scene material, the better a person's appearance can be described, and in turn the smaller the number of appearance-matching potential suspects for subsequent forensic STR profiling. Also in missing person cases where a body was found decomposed with no EVC information discernable from visual inspection, or body parts that do not provide EVC information including bones, FDP is expected to provide leads for finding the right antemortem samples or family members for final STR-based identification.
      The use of DNA (or other biomarkers) for investigative purposes termed ‘DNA intelligence’, rather than for identification purposes in the court room as currently applied in forensics, marks a completely new application of DNA in forensics and is currently at the early stages of development. At present there is only one FDP tool available that has already been developmentally validated for forensic use and that is the IrisPlex system, capable of predicting eye colour from DNA [
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ,
      • Walsh S.
      • Lindenbergh A.
      • Zuniga S.B.
      • Sijen T.
      • de Knijff P.
      • Kayser M.
      • Ballantyne K.N.
      Developmental validation of the IrisPlex system: determination of blue and brown iris colour for forensic intelligence.
      ]. Although other studies have suggested DNA markers and methods for predicting externally visible traits, most notably eye colour [
      • Sulem P.
      • Gudbjartsson D.F.
      • Stacey S.N.
      • et al.
      Genetic determinants of hair, eye and skin pigmentation in Europeans.
      ,
      • Duffy D.L.
      • Montgomery G.W.
      • Chen W.
      • Zhao Z.Z.
      • Le L.
      • James M.R.
      • Hayward N.K.
      • Martin N.G.
      • Sturm R.A.
      A three single-nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color variation.
      ,
      • Han J.
      • Kraft P.
      • Nan H.
      • et al.
      A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation.
      ,
      • Branicki W.
      • Brudnik U.
      • Kupiec T.
      • Wolańska-Nowak P.
      • Szczerbińska A.
      • Wojas-Pelc A.
      Association of polymorphic sites in the OCA2 gene with eye colour using the tree scanning method.
      ,
      • Sturm R.A.
      • Larsson M.
      Genetics of human iris colour and patterns.
      ,
      • Eiberg H.
      • Troelsen J.
      • Nielsen M.
      • Mikkelsen A.
      • Mengel-From J.
      • Kjaer K.
      • Hansen L.
      Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA2 expression.
      ,
      • Sulem P.
      • Gudbjartsson D.F.
      • Stacey S.N.
      • et al.
      Two newly identified genetic determinants of pigmentation in Europeans.
      ,
      • Kayser M.
      • Liu F.
      • Janssens A.C.J.W.
      • et al.
      Three genome-wide association studies and a linkage analysis identify HERC2 as a human iris color gene.
      ,
      • Mengel-From J.
      • Wong T.
      • Morling N.
      • Rees J.
      • Jackson I.
      Genetic determinants of hair and eye colours in the Scottish and Danish populations.
      ,
      • Spichenok O.
      • Budimlija Z.M.
      • Mitchell A.A.
      • Jenny A.
      • Kovacevic L.
      • Marjanovic D.
      • Caragine T.
      • Prinz M.
      • Wurmbach E.
      Prediction of eye and skin color in diverse populations using seven SNPs.
      ,
      • Valenzuela R.K.
      • Henderson M.S.
      • Walsh M.H.
      • et al.
      Predicting phenotype from genotype: normal pigmentation.
      ,
      • Pospiech E.
      • Draus-Barini J.
      • Kupiec T.
      • Wojas-Pelc A.
      • Branicki W.
      Gene–gene interactions contribute to eye colour variation in humans.
      ,

      Y. Ruiz, C. Phillips, A. Gomez-Tato, et al., Further development of forensic eye color predictive tests. Forenic. Sci. Int. Genet., http://dx.doi.org/10.1016/j.fsigen.2012.05.009.

      ] none of them introduced a tool that had undergone systematic forensic developmental validation testing as of yet. The IrisPlex system allows the prediction of eye colour from minute amounts of DNA (31 pg DNA input full profiles) and has proven to be 94% accurate for predicting blue and brown eye colour when tested on a European set of >3800 individuals [
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ]. However, work is on-going with regards to identifying the underlying genes and developing predictive DNA markers for several other EVCs [
      • Kayser M.
      • de Knijff P.
      Improving human forensics through advances in genetics, genomics and molecular biology.
      ] such as skin colour [
      • Han J.
      • Kraft P.
      • Nan H.
      • et al.
      A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation.
      ,
      • Lao O.
      • De Gruijter J.M.
      • Van Duijn K.
      • Navarro A.
      • Kayser M.
      Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms.
      ,
      • Myles S.
      • Somel M.
      • Tang K.
      • Kelso J.
      • Stoneking M.
      Identifying genes underlying skin pigmentation differences among human populations.
      ], hair colour [
      • Sulem P.
      • Gudbjartsson D.F.
      • Stacey S.N.
      • et al.
      Genetic determinants of hair, eye and skin pigmentation in Europeans.
      ,
      • Han J.
      • Kraft P.
      • Nan H.
      • et al.
      A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation.
      ], body height [
      • Estrada K.
      • Krawczak M.
      • Schreiber S.
      • et al.
      A genome-wide association study of northwestern Europeans involves the C-type natriuretic peptide signaling pathway in the etiology of human height variation.
      ,
      • Lango Allen H.
      • Estrada K.
      • Lettre G.
      • et al.
      Hundreds of variants clustered in genomic loci and biological pathways affect human height.
      ], male baldness [
      • Hillmer A.M.
      • Brockschmidt F.F.
      • Hanneken S.
      • et al.
      Susceptibility variants for male-pattern baldness on chromosome 20p11.
      ], and hair morphology [
      • Medland S.E.
      • Nyholt D.R.
      • Painter J.N.
      • et al.
      Common variants in the trichohyalin gene are associated with straight hair in Europeans.
      ,
      • Fujimoto A.
      • Kimura R.
      • Ohashi J.
      • et al.
      A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness.
      ].
      The previous progress on categorical eye colour DNA predictability together with the strong genetic and phenotypic relationship between eye and hair colour variation, as well as the increased understanding of the genetic basis of hair colour, all suggest that hair colour may represent the next-promising candidate EVC for DNA prediction after eye colour. Hair colour (as well as eye colour), is generally known to be highly variable in people of (at least partial) European descent and those from nearby regions such as the Middle East and parts of Western Asia [
      • Beals R.L.
      • Hoijer H.
      An Introduction to Anthropology.
      ], with individuals displaying numerous variations of hair colour shade that are usually summarised in four main categories of colour such as red, blond, brown and black. In contrast, people from any other parts of the world (and without European/nearby genetic admixture) usually display the ancestral black hair colour (together with the ancestral brown eye colour) phenotype. Variation in hair (and eye) colour is assumed to be of European origin and is thought to have reached their currently observed frequencies via sexual selection (i.e. mate choice preferences) [
      • Peter F.
      European hair and eye color: a case of frequency-dependent sexual selection?.
      ]. The genetic basis of human hair colour variation has been studied considerably in the last few years. Recent studies either employing the candidate gene approach or genome-wide association and/or linkage analysis have identified genes and DNA variants likely to be involved in human hair colour variation [
      • Sulem P.
      • Gudbjartsson D.F.
      • Stacey S.N.
      • et al.
      Genetic determinants of hair, eye and skin pigmentation in Europeans.
      ,
      • Duffy D.L.
      • Montgomery G.W.
      • Chen W.
      • Zhao Z.Z.
      • Le L.
      • James M.R.
      • Hayward N.K.
      • Martin N.G.
      • Sturm R.A.
      A three single-nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color variation.
      ,
      • Han J.
      • Kraft P.
      • Nan H.
      • et al.
      A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation.
      ,
      • Sulem P.
      • Gudbjartsson D.F.
      • Stacey S.N.
      • et al.
      Two newly identified genetic determinants of pigmentation in Europeans.
      ,
      • Mengel-From J.
      • Wong T.
      • Morling N.
      • Rees J.
      • Jackson I.
      Genetic determinants of hair and eye colours in the Scottish and Danish populations.
      ,
      • Valverde P.
      • Healy E.
      • Jackson I.
      • Rees J.L.
      • Thody A.J.
      Variants of the melanocyte-stimulating hormone receptor gene are associated with red hair and fair skin in humans.
      ,
      • Kanetsky P.A.
      • Swoyer J.
      • Panossian S.
      • Holmes R.
      • Guerry D.
      • Rebbeck T.R.
      A polymorphism in the agouti signaling protein gene is associated with human pigmentation.
      ,
      • Graf J.
      • Hodgson R.
      • van Daal A.
      Single nucleotide polymorphisms in the MATP gene are associated with normal human pigmentation variation.
      ,
      • Branicki W.
      • Brudnik U.
      • Draus-Barini J.
      • Kupiec T.
      • Wojas-Pelc A.
      Association of the SLC45A2 gene with physiological human hair colour variation.
      ,
      • Shekar S.N.
      • Duffy D.L.
      • Frudakis T.
      • Sturm R.A.
      • Zhao Z.Z.
      • Montgomery G.W.
      • Martin N.G.
      Linkage and association analysis of spectrophotometrically quantified hair color in australian adolescents: the effect of OCA2 and HERC2.
      ]. Some preliminary attempts have already been made towards the prediction of hair colour from informative DNA variants. In fact, an early red hair prediction protocol based on a combination of non-synonymous single nucleotide polymorphisms (SNPs) in the MC1R gene that incur the red hair phenotype effect was already developed for forensic use more than ten years ago [
      • Grimes E.A.
      • Noake P.J.
      • Dixon L.
      • Urquhart A.
      Sequence polymorphism in the human melanocortin 1 receptor gene as an indicator of the red hair phenotype.
      ] and its accuracy was 84% in the prediction of red hair individuals. Sulem et al. [
      • Sulem P.
      • Gudbjartsson D.F.
      • Stacey S.N.
      • et al.
      Genetic determinants of hair, eye and skin pigmentation in Europeans.
      ] in their genome-wide association study for European pigmentation traits developed a hair colour prediction tool, which was capable of excluding red and either blond or brown hair colour in its prediction for many of their individuals. More recently, Valenzuela et al. [
      • Valenzuela R.K.
      • Henderson M.S.
      • Walsh M.H.
      • et al.
      Predicting phenotype from genotype: normal pigmentation.
      ] assessed 75 SNPs from 24 genes previously implicated in hair, skin and eye colour in samples of various bio-geographic origins (Europe and elsewhere) and found that three of them, i.e. rs12913832 (HERC2), rs16891982 (SLC45A2) and rs1426654 (SLC24A5) combined gave the best prediction for light and dark hair colour.
      Armed with previous knowledge on hair colour associated DNA variants and in considering the most up-to-date list of DNA variants related to human hair colour variation available at the time, we recently performed an evaluation of 46 SNPs from 13 genes [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ] for model-based population-wise hair colour prediction aiming to find a set of most hair colour predictive DNA variants. In this previous study we identified a set of 13 DNA markers (2 MC1R combined marker sets and 11 single DNA markers) from 11 genes (MC1R, HERC2, OCA2, SLC45A2 (MATP), KITLG, EXOC2, TYR, SLC24A4, IRF4, PIGU/ASIP and TYRP1) containing most hair colour predictive information. This DNA marker set provided a high degree of population-based, prevalence-adjusted overall prediction accuracy as expressed by the area under the curve of a receiver operating characteristic curve (AUC) with estimates at 0.93 for red, 0.87 for black, 0.82 for brown, and 0.81 for blond hair colour, where 1 means completely accurate prediction. However, the genotyping methodology used in this previous screening study did not allow simultaneous genotyping of all 22 identified hair colour predictive DNA markers in a single reaction as would be appreciated in forensic DNA analysis where there can be limited amounts of starting material. Furthermore, in the previous study, only samples with hair colour genotypes and phenotypes from a single country in Eastern Europe, i.e. Poland, were available, whereas the inclusion of individuals from other European regions, such as Western and Southern parts, would be beneficial in order to enrich with individuals displaying hair colours such as brown and black that are more common in these parts of Europe.
      In the present study, we developed and evaluated the sensitivity of a single-tube multiplex assay targeting the 22 previously recognised hair colour predictive DNA variants as well as the six eye colour predictive SNPs from our previously developed IrisPlex system (four of which are overlapping). We employed the SNaPshot technology because it can be easily implemented in forensic DNA laboratories as no additional equipment or serious interference with protocols is needed to apply it. Furthermore, we assessed the power of the 22 DNA variants to predict hair colour categories, as well as hair colour shade, via model-based prediction studies using an expanded database of hair colour genotype and phenotype data for >1500 individuals from Eastern, Western and Southern parts of Europe that displayed varying degrees of hair colouration. Moreover, we investigated via analysing a worldwide set of individuals from 51 populations (HGDP-CEPH), whether or not the reliability of hair colour prediction available with these 22 DNA variants depends on knowledge of bio-geographic ancestry. We present and make available for future use, the first system for parallel prediction of hair and eye colour from DNA we termed HIrisPlex, consisting of a single multiplex assay for 24 eye and/or hair colour predictive DNA variants and two prediction models, i.e. a newly developed model for hair colour and shade prediction and the previously developed IrisPlex model for eye colour prediction. An interactive spreadsheet tool for obtaining individual hair colour, hair colour shade, and eye colour prediction probabilities from HIrisPlex genotypes as well as a prediction guide for accurate interpretation of individual hair colour and shade probabilities are made available to enhance the practical use of the HIrisPlex system in future applications such as forensics.

      2. Materials and methods

      2.1 Subjects, imagery and hair and eye colour classification

      DNA samples and hair colour information was collected from 1551 European subjects living in Poland (n = 1093), the Republic of Ireland (n = 339) and Greece (n = 119). All participants gave informed consent. The study was approved in part by the Ethics Committee of the Jagiellonian University, number KBET/17/B/2005 and the Commission on Bioethics of the Regional Board of Medical Doctors in Krakow number 48 KBL/OIL/2008. Hair and eye colour phenotypes were collected by a combination of self-assessment and professional single observer grading (Polish data). The professional grader (AKK) for the polish dataset is a medical doctor (dermatologist) who evaluated hair colour upon observation, and questioning of individuals in circumstances where hair was dyed or grey. For hair colour phenotype self-assessment, individuals were asked to fill into the questionnaire, the colour of their hair during their 20s, and at what age grey/white hairs started to appear (Irish collection), this avoided the effects of hair greying and whitening on phenotyping. Sample collection in Ireland included high-resolution eye and hair photographic imagery. In a brief description, hair and eye images were taken using a Nikon D3100 with an AF-S Micro Nikkor 60 mm macro lens, the aperture, shutterspeed and ISO were fixed to f = 22, 1/125, and 200 respectively. A ring flash (model Speedlight SB-R200) and an average distance of 7 cm was used from the eye and from the back of the head for hair imagery. This ensured consistent sampling and regulated lighting conditions, including lens settings of a 0.2 and 0.23 fixed focal length. All individuals were asked to fill in a questionnaire that included basic information, such as gender and age as well as data concerning eye and hair pigmentation phenotype. However, due to many Irish individuals having dyed or grey hair, self-reported hair colour classifications were used for this set in model training. For the Greek collection, a buccal swab was taken from each individual and a self-reported questionnaire regarding hair and eye colour information was collected. For both the Irish and Greek set, hair colour was classified into 7 categories: blond (5.9%), light-brown (34%), dark-brown (45.2%), auburn (5.7%), blond-red (1.3%), red (2.2%), and black (5.7%). For the Polish dataset, this data was collected as previously reported [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ] and hair colour was classified into 7 categories: blond (13.7%), dark-blond (44.2%), brown (22.6%), auburn (1%), blond-red (3.9%), red (3.8%), and black (10.8%)). For hair colour prediction analyses, we grouped blond and dark-blond into one blond category (42.6%), light brown and dark brown into one brown category (39.3%) and auburn, blond-red, and red into one red category (8.8%) with black as an additional fourth category (9.3%). Eye colour was classified into 3 categories blue, brown and intermediate (including green). The term category in this context refers to the grouping of similar phenotypic colours into one group to separate them from another colour group, i.e. blond category, black category. Table 1 displays the numbers of hair and eye colour phenotypes including sex, within all 3 populations sampled. Notably red hair in the Polish population and green eye colour in the Irish population were intentionally enriched due to their rare occurrence, therefore both phenotypes do not reflect natural population frequencies.
      Table 1Phenotype frequencies according to hair and eye colour categories (including sex) for the full combined set of individuals from Poland, Ireland and Greece.
      Hair colourBlondDark blond
      represents individuals who were reported as dark blond in the dark blond/light brown category.
      /light brown
      Dark brownBrown red/auburnBlond redRedBlackTotalEye colour – blueIntermediate (green, heterochromia)BrownTotalMaleFemaleTotal
      Poland150483
      represents individuals who were reported as dark blond in the dark blond/light brown category.
      247114341118109359016433910934496441093
      Ireland161111582361015339172907733977262339
      Greece114549300111191315911195168119
      Total177639454374951144155177526950715515779741551
      * represents individuals who were reported as dark blond in the dark blond/light brown category.

      2.2 DNA samples and HIrisPlex genotyping

      DNA from the Polish samples was extracted as described previously [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ]. Saliva samples collected from individuals in Ireland were extracted using the Puregene DNA isolation kit (Qiagen, Hilden, Germany). Buccal swabs collected from individuals in Greece were extracted using an in-house organic extraction protocol. DNA from the H952 subset of the HGDP-CEPH panel that represents 952 individuals from 51 worldwide populations [
      • Rosenberg N.A.
      • Pritchard J.K.
      • Weber J.L.
      • Cann H.M.
      • Kidd K.K.
      • Zhivotovsky L.A.
      • Feldman M.W.
      Genetic structure of human populations.
      ,
      • Rosenberg N.A.
      Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives.
      ] were purchased from CEPH. Due to lack of DNA in some samples belonging to the HGDP-CEPH 952 set, 7 individuals could not be genotyped by the HIrisPlex assay, and therefore the final number of worldwide samples was 945.
      All samples were genotyped using the HIrisPlex assay. The assay includes 23 SNPs and 1 insertion/deletion (INDEL) polymorphism, altogether 24 DNA variants, from 11 genes: MC1R, HERC2, OCA2, SLC24A4, SLC45A2, IRF4, EXOC2, TRYP1, TYR, KITLG, and PIGU/ASIP. Further information on these 24 markers can be found in Table 2, including primer sequences. The 24 PCR primer pairs were designed using the default parameters of the program Primer3Plus [
      • Untergasser A.
      • Nijveen H.
      • Rao X.
      • Bisseling T.
      • Geurts R.
      • Leunissen J.A.
      Primer3Plus, an enhanced web interface to Primer3.
      ], which is a free web-based design software. PCR fragments were designed to be as short as possible to cater for degraded DNA, and therefore all are less than 160 bp in length. To reduce the possibility of primer pairs interacting with each other, the program Autodimer [
      • Vallone P.M.
      • Butler J.M.
      AutoDimer: a screening tool for primer–dimer and hairpin structures.
      ] was used to analyse primer sequences. Surrounding sequence regions were also searched with BLAST [
      • Altschul S.F.
      • Madden T.L.
      • Schäffer A.A.
      • Zhang J.
      • Zhang Z.
      • Miller W.
      • Lipman D.J.
      Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
      ] against dbSNP [
      • Sherry S.T.
      • Ward M.H.
      • Kholodov M.
      • Baker J.
      • Phan L.
      • Smigielski E.M.
      • Sirotkin K.
      DbSNP: the NCBI database of genetic variation.
      ] to reduce the chance of a primers location covering a known interfering SNP site for efficient primer binding.
      Table 2Information about the 24 DNA variants of the HIrisPlex assay, including PCR and single base extension (SBE) primer sequences and concentrations.
      Assay positionSNPCHRPositionGeneMajor AlleleMinor AllelePCR primersConcentrationProduct sizeSBE primersConcentration
      1N29insA1689985753ExonicMC1RCinsAMC1Rset1FSet10.55 μmGCAGGGATCCCAGAGAAGAC117bpCCCCAGCTGGGGCTGGCTGCCAA1.3 μm
      2rs115474641689986091ExonicMC1RGAMC1Rset1R0.55 μmTCAGAGATGGACACCTCCAGttttttttttttGCCATCGCCGTGGACC0.1 μm
      3rs8854791689986154ExonicMC1RCTMC1Rset2FSet20.5 μmCTGGTGAGCTTGGTGGAGA158bpttttttttttttttttttGATGGCCGCAACGGCT1.25 μm
      4rs18050081689986144ExonicMC1RCTMC1Rset2R0.5 μmTCCAGCAGGAGGATGACGtttttttttttttACAGCATCGTGACCCTGCCG0.375 μm
      5rs18050051689985844ExonicMC1RGTMC1Rset3FSet30.5 μmGTCCAGCCTCTGCTTCCTG147bptttttttttttttttTGGTGGAGAACGCGCTGGTG0.75 μm
      6rs18050061689985918ExonicMC1RCAMC1Rset3R0.5 μmAGCGTGCTGAAGACGACACttttttttttttttttttttCTGCCTGGCCTTGTCGGA0.75 μm
      7rs18050071689986117ExonicMC1RCTMC1Rset4FSet40.4 μmCAAGAACTTCAACCTCTTTCTCG106bptttttttttttttttttttttttttCTCCATCTTCTACGCACTG1 μm
      8rs18050091689986546ExonicMC1RGCMC1Rset4R0.4 μmCACCTCCTTGAGCGTCCTGttttttttttttttttttttttttttttttATCTGCAATGCCATCATC0.4 μm
      9Y152OCH1689986122ExonicMC1RCAttttttttttttttttttttttttttttttCATCTTCTACGCACTGCGCTA0.6 μm
      10rs22284791689985940ExonicMC1RGAttttttttttttttttttttttttttttttttttttCTGGTGAGCGGGAGCAAC0.375 μm
      11rs11104001689986130ExonicMC1RTCttttttttttttttttttttttttttttttCTTCTACGCACTGCGCTACCACAGCA0.3 μm
      12rs28777533994716IntronicSLC45A2ACrs28777_FSet50.4 μmTACTCGTGTGGGAGTTCCAT150bptttttttttttttttttttttttttttttttttttttttCATGTGATCCTCACAGCAG1.2 μm
      rs28777_R0.4 μmTCTTTGATGTCCCCTTCGAT
      13rs16891982533987450ExonicSLC45A2GCRs16891982_FSet60.4 μmTCCAAGTTGTGCTAGACCAGA128bpttttttttttttttttttttttttttttttttttttttttttttAAACACGGAGTTGATGCA1 μm
      Rs16891982_R0.4 μmCGAAAGAGGAGTCGAGGTTG
      14rs128212561287852466IntergenicKITLGAGrs12821256_FSet70.4 μmATGCCCAAAGGATAAGGAAT118bptttttttttttttttttttttttttttttttttttttttGGAGCCAAGGGCATGTTACTACGGCAC0.1 μm
      rs12821256_R0.4 μmGGAGCCAAGGGCATGTTACT
      15rs49592706402748IntergenicEXOC2CARs4959270_FSet80.4 μmTGAGAAATCTACCCCCACGA140bptttttttttttttttttttttttttttttttttttttttttGGAACACATCCAAACTATGACACTATG0.375 μm
      Rs4959270_R0.4 μmGTGTTCTTACCCCCTGTGGA
      16rs122035926341321IntronicIRF4CTrs12203592_FSet90.4 μmAGGGCAGCTGATCTCTTCAG126bptttttttttttttttttttttttttttttttttttttttttttttTCCACTTTGGTGGGTAAAAGAAGG0.3 μm
      rs12203592_R0.4 μmGCTTCGTCATATGGCTAAACCT
      17rs10426021188551344ExonicTYRGTrs1042602 _FSet100.4 μmCAACACCCATGTTTAACGACA124bpttttttttttttttttttttttttttttttttttttttttttttttttttttTCAATGTCTCTCCAGATTTCA1.25 μm
      rs1042602 _R0.4 μmGCTTCATGGGCAAAATCAAT
      18rs18004071525903913ExonicOCA2GArs1800407_FSet110.4 μmAAGGCTGCCTCTGTTCTACG124bptttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttGCATACCGGCTCTCCC0.1 μm
      rs1800407_R0.4 μmCGATGAGACAGAGCATGATGA
      19rs24021301491870956IntronicSLC24A4AGrs2402130_FSet120.4 μmACCTGTCTCACAGTGCTGCT150bpttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttTGAACCATACGGAGCCCGTG0.75 μm
      rs2402130_R0.4 μmTTCACCTCGATGACGATGAT
      20rs129138321526039213IntronicHERC2CTrs12913832_FSet130.4 μmTCAACATCAGGGTAAAAATCATGT150bpttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttTAGCGTGCAGAACTTGACA1.2 μm
      rs12913832_R0.4 μmGGCCCCTGATGATGATAGC
      21rs23782492032681751IntronicASIP/PIGUTCrs2378249_FSet140.4 μmCGCATAACCCATCCCTCTAA136bpttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttCCACACCTCTCCTCAGCCCA0.18 μm
      rs2378249_R0.4 μmCATTGCTTTTCAGCCCACAC
      22rs128963991491843416IntergenicSLC24A4TGRs12896399_FSet150.4 μmCTGGCGATCCAATTCTTTGT125bptttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttTCTTTAGGTCAGTATATTTTGGG1.125 μm
      Rs12896399_R0.4 μmGACCCTGTGTGAGACCCAGT
      23rs13933501188650694IntronicTYRCTRs1393350_FSet160.4 μmTTTCTTTATCCCCCTGATGC124bptttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttCATTTGTAAAAGACCACACAGATTT1.1 μm
      Rs1393350_R0.4 μmGGGAAGGTGAATGATAACACG
      24rs683912699305ExonicTYRP1TGrs683_FSet170.4 μmCACAAAACCACCTGGTTGAA138bpttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttGCTTTGAAAAGTATGCCTAGAACTTTAAT0.175 μm
      rs683_R0.4 μmTGAAAGGGTCTTCCCAGCTT
      For the population genotyping, genomic DNA quantities ranging from 300 pg to 3 ng in 1 μl formats were amplified per individual in a 10 μl reaction volume consisting of 1× PCR buffer, 2.5 mM MgCl2, 220 μM of each dNTP, and 1.75 U AmpliTaq Gold DNA polymerase (Applied Biosystems Inc., Foster City, CA) including PCR primer concentrations found in Table 2. Thermo-cycling was performed on the 96-well GeneAmp® PCR system 9700 (Applied Biosystems) under the following conditions (1) 95 °C for 10 min, (2) 33 cycles of 95 °C for 30 s and 61 °C for 30 s, (3) 5 min at 61 °C. PCR products were cleaned with ExoSAP-IT (USB Corp., Cleveland, OH), as recommended by the manufacturer. Following removal of unincorporated dNTPs and primers. The multiplex SBE (single base extension) assay was performed using 2 μl of product with 1 μl of ABI SNaPshot kit (Applied Biosystems, Foster City, CA) reaction mix in a total reaction volume of 5 μl. Single base extension (SBE) primer sequences and concentrations used in the assay can be found in Table 2. Thermocycling conditions were as follows: 96 °C for 2 min and 25 cycles of 96 °C for 10 s, 50 °C for 5 s and 60 °C for 30 s. Products were cleaned using SAP (USB Corp.), following manufacturers guidelines and 1 μl of cleaned product was run on the ABI 3130xl Genetic Analyser (Applied Biosystems) with POP-7 on a 36 cm capillary array following the SNaPshot kit sample preparation guidelines, however run parameters of 2.5 kV for 10 s injection voltage and run time of 500 s at 60 °C were used for increased sensitivity.
      For assay sensitivity studies, genotyping results from two different individuals were assessed from serial dilutions of DNA input samples of 500 pg, 250 pg, 125 pg, 63 pg and 31 pg. Each result was investigated for allelic drop out, which includes peaks below the 50-rfu threshold that cannot be called. The determination of sensitivity was based on the production of a full profile in every replicate at a particular DNA input level.

      2.3 HIrisPlex DNA variants and their use for eye/hair colour prediction including in a worldwide sample

      The HIrisPlex assay consists of 24 DNA variants (23 SNPs and 1 INDEL), 6 of these markers, rs12913832 (HERC2), rs1800407 (OCA2), rs12896399 (SLC24A4), rs16891982 (SLC45A2 (MATP)), rs1393350 (TYR) and rs12203592 (IRF4) are taken from the IrisPlex system which has already been well established [
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ,
      • Walsh S.
      • Lindenbergh A.
      • Zuniga S.B.
      • Sijen T.
      • de Knijff P.
      • Kayser M.
      • Ballantyne K.N.
      Developmental validation of the IrisPlex system: determination of blue and brown iris colour for forensic intelligence.
      ,
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ,
      • Liu F.
      • van Duijn K.
      • Vingerling J.R.
      • Hofman A.
      • Uitterlinden A.G.
      • Janssens A.C.J.W.
      • Kayser M.
      Eye color and the prediction of complex phenotypes from genotypes.
      ] and are used for the eye colour prediction part of the HIrisPlex system. The results of these 6 SNPs when their minor allele is input into the HIrisPlex prediction tool are used to predict the eye colour of the individual using the IrisPlex model as previously published [
      • Walsh S.
      • Lindenbergh A.
      • Zuniga S.B.
      • Sijen T.
      • de Knijff P.
      • Kayser M.
      • Ballantyne K.N.
      Developmental validation of the IrisPlex system: determination of blue and brown iris colour for forensic intelligence.
      ,
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ] with the highest probability of the three categories, brown, blue or intermediate being the predicted eye colour.
      The 22 DNA variables used for hair colour prediction are Y152OCH, N29insA, rs1805006, rs11547464, rs1805007, rs1805008, rs1805009, rs1805005, rs2228479, rs1110400 and rs885479 from the MC1R gene, rs1042602 (TYR), rs4959270 (EXOC2), rs28777 (SLC45A2 (MATP)), rs683 (TYRP1), rs2402130 (SLC24A4), rs12821256 (KITLG), rs2378249 (PIGU/ASIP), rs12913832 (HERC2), rs1800407 (OCA2), rs16891982 (SLC45A2 (MATP)) and rs12203592 (IRF4) based on our previous publication for hair colour prediction [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ]. When their minor alleles are input into the HIrisPlex prediction tool, they are used to predict the hair colour of the individual using the HIrisPlex hair prediction model developed in this paper. From the four hair colour categories of blond, brown, red and black, the highest probability value is indicative of the predicted hair colour following guidelines that are published within this paper and described in the next section.
      For worldwide hair colour prediction, we assessed the HIrisPlex assay performance on 945 samples from 51 populations of the HGDP-CEPH set. The MapViewer 7 (Golden Software, Inc., Golden, CO, USA) package was used to plot the predicted hair colour categories and the distribution of SNP genotypes on the world map. A non-metric multidimensional scaling (MDS) plot was produced to illustrate the pairwise FST distances [
      • Weir B.S.
      • Cockerham C.C.
      Estimating F-statistics for the analysis of population structure.
      ] of the 24 eye and hair colour SNPs between populations, using SPSS 17.0.2 for Windows (SPSS Inc., Chicago, USA). Analysis of molecular variance (AMOVA) (Excoffier 1992) was performed using Arlequin v3.11 [
      • Excoffier L.
      • Laval G.
      • Schneider S.
      Arlequin (version 3.0): an integrated software package for population genetics data analysis.
      ]. A threshold assessment of prediction probabilities for each hair colour category was also carried out including a combined eye and hair colour prediction probability threshold in the inference of a Non-European individual with Black hair and brown eyes. For the assessment of an age-dependent hair colour change, a Pearson correlation was calculated and the graph plotted using SPSS 17.0.2 for Windows (SPSS Inc., Chicago, USA).

      2.4 Prediction modelling for hair colour

      To develop a hair colour prediction model using samples from several sites with varying levels of hair colour due to their position within Europe, central, western and southern Europe, we took a random subset of 80% of the samples from each site, Poland (n = 875), Ireland (n = 272) and Greece (n = 96). This 80% subset was used to train the model and was based on Multinomial Logistic Regression (MLR), as previously published by Liu et al. [
      • Liu F.
      • van Duijn K.
      • Vingerling J.R.
      • Hofman A.
      • Uitterlinden A.G.
      • Janssens A.C.J.W.
      • Kayser M.
      Eye color and the prediction of complex phenotypes from genotypes.
      ]. In brief, individuals were categorised according to their hair phenotypes and were split into 4 categories, Blond (n = 529), Brown (n = 490), Red (n = 109) and Black (n = 115). For their genotypes, 22 of the 24 HIrisPlex DNA variations (as described above) were used to test for hair colour differentiation and use in the prediction model. By inputting the minor allele of each DNA variant, including its phenotype and applying MLR, alpha and beta values are generated that form the core of the prediction model. This model then allows the probabilistic prediction of an individuals hair colour category solely based on the input of the 22 variant minor alleles into the HIrisPlex hair colour prediction tool. To assess the effect of the light and dark shades of hair colour that may be contributed from blond and black respectively, a similar approach was used that combined the individuals grouped in the light category (blond, n = 529) versus a dark category (black, n = 115). Red hair individuals were omitted (n = 109) from this analysis as their resulting colour is based upon an MC1R cumulative mutation and not on the continuous spectrum of light to dark (i.e. blond to black). Brown hair individuals (n = 490) were omitted, as only the extremes of light and dark were required. Therefore using this two-pronged model approach, a predicted hair colour is generated with an approximate indication of the colour being light or dark (i.e. light brown, dark brown) due to the influence of the genotypes commonly associated with the light/dark categories, of blond and black respectively. The further 20% of the combined dataset (total n = 308), i.e. from Poland (n = 218), Ireland (n = 67) and Greece (n = 23), was used to assess the accuracy of the prediction model in terms of the final hair colour prediction being correct or incorrect based on colour category, shade and use of the hair colour prediction guide that is described in detail in Section 3, and an assessment of optimal category thresholds was undertaken. The steps to take when acquiring a prediction based on colour and shade are outlined in a guide provided below.

      3. Results and discussion

      3.1 HIrisPlex genotyping assay – design and sensitivity

      The HIrisPlex assay was designed with the intention to cope with low template and degraded DNA, a standard concern when genotyping forensic casework samples. Therefore, care was taken to ensure small PCR amplicon sizes of <160 bp for all of the 24 DNA variants considered. During optimisation of the single multiplex assay, a balance of homozygote allele heights and their associated heterozygote allele heights was catered for to be as consistent as possible when viewing the combined set. With this we aimed to limit the chances of heterozygote dropout at the lower concentration levels. For the INDEL variant N29insA (first peak in the assay, Fig. 1) however, the peak height is lower, on average by a factor of 2 depending on the sample DNA input, relative to the 23 SNPs in the multiplex. This is due to difficulties within the design that is known to occur with INDELs. Nevertheless, this does not affect the assay until the very low DNA input levels (<63 pg) for which sensitivity was assessed. Notably, N29insA is extremely rare in the prediction of red hair individuals alone; only 4 out of a total 137 red hair phenotype individuals had this mutation in our dataset. Hence, in most of the cases, this technical issue is not likely to affect the practical use of the HIrisPlex assay. If, however, allelic drop-out for N29insA is indeed observed in a case, N29insA shall be genotyped using the more sensitive singleplex assay to take full advantage of the red hair colour prediction available with the marker set considered here.
      Figure thumbnail gr1
      Fig. 1An assessment of the HIrisPlex assay's sensitivity on two individuals ascertained to have high numbers of heterozygote alleles (7 and 11, respectively) for quantified DNA input at 500 pg, 125 pg, 63 pg and 31 pg. Full profiles were observed down to 63 pg DNA input with drop out occurring at 31 pg DNA input for insertion 1 (N29insA), SNP 15 (rs4959270), SNP 17 (rs1042602), SNP 18 (rs1800407), and SNP 23 (rs1393350) including a C allele drop in at SNP 9 (Y152OCH).
      Our population studies revealed that DNA inputs of >500 pg usually yield a balanced profile with high relative fluorescence units (rfu) levels, especially for homozygote SNP alleles. For a first investigation of the sensitivity threshold of the HIrisPlex assay, two individuals were genotyped in a duplicate dilution series of DNA input at 500 pg, 250 pg, 125 pg, 63 pg, and 31 pg, established after DNA quantification at 500 pg using Quantifiler Human DNA Quantification kit (Applied Biosystems). These individuals were chosen for maximising as much as possible the heterozygous state of the 24 DNA variants, which is important, as signals from heterozygote alleles are not as strong as homozygote alleles for the same marker. From Fig. 1 it is evident that at 500 pg and lower, peak height imbalance occurs and this should be taken into account when assessing genotype calls at these lower DNA levels; however, genotype accuracy is not affected until very low DNA input levels. Peak imbalance can sometimes be confused with the possibility of a DNA mixture from different individuals; but it is important to note here that in most circumstances HIrisPlex will be used after an STR profile has been generated from crime scene material (and found not to be informative), therefore the presence of a DNA mixture should be evident from the STR profile. The sensitivity of the 24 HIrisPlex assay is high, with full profiles observed at DNA input levels down to and including 63 pg, while allele drop out occurs at the lowest examined level of 31 pg DNA input for some HIrisPlex DNA variants (Fig. 1). In particular, dropout was observed in 5 instances for this set of profiles, at N29insA, rs1042602, rs4959270, rs1800407 and rs1393350. One drop-in occurred at 31 pg starting DNA of a C allele at Y152OCH.
      Overall, the HIrisPlex assays sensitivity, according to the preliminary assessment done here, is comparable to some other complex SNaPshot™ assays such as an 18-plex designed by Freire-Aradas et al. [
      • Freire-Aradas A.
      • Fondevila M.
      • Kriegel A.K.
      • Phillips C.
      • Gill P.
      • Prieto L.
      • Schneider P.M.
      • Carracedo Å.
      • Lareu M.V.
      A new SNP assay for identification of highly degraded human DNA.
      ] for human individual identification from highly degraded DNA using autosomal SNPs. For that assay, full profiles down to 78 pg/μl DNA input were observed with partial profiles down to 31 pg DNA input, as similar for the HIrisPlex assay. These minimal input levels are lower than those reported for other autosomal SNP assays such as the two multiplex assays together covering 44 SNPs for individual identification by Lou et al. [
      • Lou C.
      • Cong B.
      • Li S.
      • et al.
      A SNaPshot assay for genotyping 44 individual identification single nucleotide polymorphisms.
      ] where a DNA input of at least 125 pg is needed to receive a full profile. Notably, our previously developed IrisPlex assay that includes the same 6 eye colour predictive SNPs as also included in the HIrisPlex assay gave full profiles down to a level of about 31 pg input DNA [
      • Walsh S.
      • Lindenbergh A.
      • Zuniga S.B.
      • Sijen T.
      • de Knijff P.
      • Kayser M.
      • Ballantyne K.N.
      Developmental validation of the IrisPlex system: determination of blue and brown iris colour for forensic intelligence.
      ], which is slightly more sensitive than the HIrisPlex assay presented here. This is at least partly explained by the 4 times larger number of DNA variants included in the HIrisPlex assay relative to the previously developed IrisPlex assay. For practical applications this may mean that if allelic dropout due to low quality/quantity input DNA is indicated by complete locus drop-out at any of the 6 HIrisPlex SNPs for eye colour, the more sensitive IrisPlex assay may be applied subsequently and may provide a full 6-SNP profile for eye colour prediction on critical DNA samples.

      3.2 HIrisPlex model-based hair colour prediction

      MC1R polymorphisms are largely recessive when considered individually, but also interact with each other through a genetic mechanism known as “compound heterozygosity” [
      • Flanagan N.
      • Healy E.
      • Ray A.
      • Philips S.
      • Todd C.
      • Jackson I.J.
      • Birch-Machin M.A.
      • Rees J.L.
      Pleiotropic effects of the melanocortin 1 receptor (MC1R) gene on human pigmentation.
      ,
      • Sturm R.A.
      • Duffy D.L.
      • Box N.F.
      • Newton R.A.
      • Shepherd A.G.
      • Chen W.
      • Marks L.H.
      • Leonard J.H.
      • Martin N.G.
      Genetic association and cellular function of MC1R variant alleles in human pigmentation.
      ,
      • Liu F.
      • Struchalin M.V.
      • van Duijn K.
      • Hofman A.
      • Uitterlinden A.G.
      • van Duijn C.
      • Aulchenko Y.S.
      • Kayser M.
      Detecting low frequent loss-of-function alleles in genome wide association studies with red hair color as example.
      ]. In our previous population-based hair colour prediction study [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ], the MC1R variants Y152OCH, N29insA, rs1805006, rs11547464, rs1805007, rs1805008, rs1805009, rs1805005, rs2228479, rs1110400 and rs885479 were all collapsed into two markers, MC1R-R (R/R, R/wt, wt/wt) and MC1R-r (r/r, r/wt, wt/wt), depending on the penetrance of the mutant alleles. Thus, the total 22 hair colour markers were considered as 13 markers in our previous prediction analysis, including, MC1R_R, MC1R_r, rs1042602 (TYR), rs4959270 (EXOC2), rs28777 (SLC45A2 (MATP)), rs683 (TYRP1), rs2402130 (SLC24A4), rs12821256 (KITLG), rs2378249 (PIGU/ASIP), rs12913832 (HERC2), rs1800407 (OCA2), rs16891982 (SLC45A2 (MATP)) and rs12203592 (IRF4). In the current study, we had two main reasons for the development of a new hair colour prediction model utilising a 22 DNA variant set without collapsing into MC1R-R and MC1R-r. First, we were able to produce a larger dataset that provides a broader representation of Europe and its highly variable hair colour regions. Notably, we not only increased the sample size relative to our previous study [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ] by 3-fold, but in addition to considering more Eastern Europeans from Poland (also used before) we also added individuals from Western Europe, i.e. Ireland and from Southern Europe, i.e. Greece. These three countries display very different hair colour phenotype frequencies (Table 1), which would also impact on the modelling. The use of samples from three European regions and countries provides an increase in overall sample size and also a better representation of the hair colour phenotype variation across Europe, but this also increases the different genotype combinations observable. Second, some of the MC1R variants also contribute to hair colours other than red [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ] (as seen in Table 3). As many individuals from Ireland display a higher frequency of MC1R mutations, up to 75% noted in a previous study with 30% of these being double mutations [
      • Smith R.
      • Healy E.
      • Siddiqui S.
      • et al.
      Melanocortin 1 receptor variants in an Irish population.
      ] and 78% in our own set contain at least one of the MC1R mutations without displaying the red hair phenotype, other Europeans from other regions to which our hair colour prediction tool may be applied in the future may also reflect this. Therefore, a new hair colour prediction model was developed to examine the input of each single DNA variation for hair colour categorical prediction, including the individual impact of all MC1R variants separately.
      Table 3Assessment of the contribution of each HIrisPlex DNA variant for hair colour prediction within the model in terms of betas and probability (p) values. The values generated reflect a binary category assessment of colour prediction, i.e. blond versus non-blond, brown versus non-brown, etc. The lowest (and thus most statistically significant) p values for each category are highlighted for the respectively associated DNA variants.
      Table thumbnail fx1
      Fig. 2 shows a hypothesised tree model illustrating how each of the 22 DNA variants contributes towards a categorical hair colour prediction as inferred from our current data. This scenario represents the extreme of a 2 minor allele input for each single DNA variant and the largest single hair colour category effect that is seen on the models prediction, based on that input. However, it is important to note here (and as further outlined below) that it is the combination of all 22 DNA variants together in a single model that finally allows the prediction of hair colours as we suggest with this study.
      Figure thumbnail gr2
      Fig. 2Hypothesised scenario of the effect of each HIrisPlex SNPs minor allele input on the model for hair colour prediction as a homozygous genotype (the minor allele input is 2). The highest effect in terms of probability for a certain hair colour category is noted and the SNP is named near that category within the figure.
      Table 3 provides a measure of the strength of each DNA variant's contribution towards each hair colour category prediction using beta values including p-values obtained from the MLR model. The analysis is based on the combined 80% model-building subset of 1243 Polish, Irish and Greek individuals assigned into a red versus non-red colour category which then displays each DNA variants contribution towards red hair colour within the model. For the other categories (i.e. blond versus non-blond, brown versus non-brown and black versus non-black), we used a total set of 1134 individuals representing the 80% model-building subset but now omitting the red hair individuals from the analyses due to their rare DNA variants and the fact that red hair is not a continuous colour but more a combined MC1R mutation effect on colour change [
      • Flanagan N.
      • Healy E.
      • Ray A.
      • Philips S.
      • Todd C.
      • Jackson I.J.
      • Birch-Machin M.A.
      • Rees J.L.
      Pleiotropic effects of the melanocortin 1 receptor (MC1R) gene on human pigmentation.
      ,
      • Branicki W.
      • Brudnik U.
      • Kupiec T.
      • Wolañska-Nowak P.
      • Wojas-Pelc A.
      Determination of phenotype associated SNPs in the MC1R gene.
      ]. As the probability values shown suggest, the results for hair colour variation from blond via brown to black (without red) are consistent with our previous findings [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ] in several DNA variants, i.e. rs12913832 (HERC2) and rs12203592 (IRF4) with high statistical support (P 10−6 to 10−16) in the present enlarged dataset considering Poland, Ireland and Greece. Although less powerful, additional DNA variants also show significant evidence (p < 0.05) for some hair colours, such as rs2402130 (SLC24A4), rs12821256 (KITLG), rs4959270 (EXOC2), rs1805006 (MC1R), rs1805007 (MC1R), rs1805008 (MC1R) for blond, rs1805006 (MC1R) and rs2402130 (SLC24A4) for brown, and rs1805007 (MC1R) for black. Red hair colour prediction is observed with highest probability values (P 10−8 to 10−16) for several of the individually considered MC1R variants as expected, i.e. rs1805008, rs1805007, rs1805009 and rs11547464, and with somewhat less statistical strength (p < 0.05) for other MC1R variants, i.e. rs1805005, rs1805006 and rs1110400. However, due to the very low frequency in our set of individuals of the generally rare MC1R variant allele at N29insA (INDEL) and Y152OCH, their contribution towards red hair probabilities are particularly high (Table 3; red hair beta values of −22 and −19.4 respectively), i.e. the presence of an A allele at N29insA or Y152OCH produces red hair prediction probabilities of 1. This effect is not mirrored in the other MC1R variants investigated and reflects the presence of these very rare alleles (heterozygote and homozygote state) within all individuals displaying a red hair phenotype in our model training set at a very low frequency (n = 6). Although this does not affect the final prediction of red hair, it is important to note the abnormally high probability values for red when these rare variants are present. Notably, some DNA variants outside the MC1R gene also show significant red hair colour probabilities (p < 0.05), i.e. rs12913832 (HERC2), and rs2378249 (PIGU/ASIP).
      Fig. 3 provides the results of HIrisPlex prediction for a subset of 44 Irish individuals where high-resolution non-dyed hair colour imagery was available to illustrate the model's performance. The individuals natural hair colour images were ordered according to their predicted hair colour category probability values achieved via HIrisPlex analysis while the actual hair colour phenotypes were not considered in the ordering. From left to right, top to bottom, the images are ordered from the highest to lowest HIrisPlex prediction probabilities for black hair and then the lowest to highest prediction probabilities for brown, red and blond hair respectively. As evident, there is a high correlation with the predicted hair colour category from HIrisPlex and the hair colour phenotype observed from visual inspection of these images.
      Figure thumbnail gr3
      Fig. 3Illustrative example of 44 individuals on the model performance of the HIrisPlex system for hair colour prediction. The 44 individual set was taken from the Irish collection for which hair imagery was noted as neither grey nor dyed. These individuals are only a visual example of how the model performs. Probability values are given for all four hair-colour categories (black, brown, red and blond) with the highest probability value and the category for which a colour is called highlighted. Dark and light probability values are also indicated which show the amount of black and blond contribution and effect towards the final colour prediction using the guide in . The individual hair figures are ordered from left to right, top to bottom, starting with the highest probabilities for black to the lowest in column one, for the remaining columns the order is the lowest to the highest probabilities for brown, red and blond, respectively.
      Table 4 shows the accuracy of hair colour prediction in the 20% model-testing subset of the Polish, Irish, and Greek individuals (n = 308). It is important to emphasise here that these individuals were not used for model building. The highest probability category approach (as opposed to the prediction-guide approach explained in the next paragraph) considers the colour category with the highest predicted probability as the final predicted colour and does not take other categories into account for the final prediction. Using this approach, we tested various probability thresholds, from no threshold, to p > 0.7, which we previously recommended for eye colour prediction using the IrisPlex system [
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ,
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ]. As seen in Table 4, using the p > 0.7 (B) threshold increases the percentage of correct calls relative to the value obtained without using any threshold (A) for some hair colours such as red hair by ∼10% (i.e. from 89.5 without threshold to 100% with threshold), and for blond hair by ∼6.5% (i.e. from 57.2 to 63.6%), whereas no difference was seen for brown hair at 75%, and for black we saw a decrease by ∼8.5% (i.e. from 28.6 to 20%). The low prediction accuracy obtained with this approach for black hair may reflect the difficulty of defining the true black hair colour phenotype relative to the dark brown phenotype within this European dataset, where black hair is rare. Notably, the low correct call rate of 28.6% for black (without using a threshold) is mainly caused by 30 individuals with non-black self-reported phenotypes that were predicted as black by the HIrisPlex model. Of these, almost all (i.e. 90%) had the brown–dark brown phenotype. We could speculate that at least some of them may have been self-categorised as black if black hair colour would be more frequent in the sampled populations and therefore easier to differentiate from dark brown in the phenotyping procedure. Although red hair is also rare in the European population (albeit in our Polish dataset it was enriched for) this problem is less expected for red hair as red is usually well differentiable from other hair colours, perhaps with the exception of the blond-red individuals. The prediction accuracy for blond hair, being lower than those for red and brown hair colour with and without threshold, is partly due to another phenomenon that will be discussed in detail in Section 3.3; age-dependent hair colour changes. As brown hair is the intermediary stage between blond and black, no prediction threshold for this category is required as can be seen in Table 4. Even at the 75% correct call rate, the incorrect 5/8 defined themselves as being dark blond. Since we know an overlap exists between light-brown and dark-blond in people's perception and definition of colour, it is best to consider dark blond the same colour as light brown. Therefore, brown hair colour may also be seen at black and blond category predictions <0.7 p depending on their light and dark shade predictions and this is where the use of the prediction guide (see next paragraph) is more informative. For the red hair category, as its occurrence is independent of the continuous spectrum of dark to light (black to blond), and mutations in the MC1R gene produce a prediction within the category of red hair, all (with >0.7 p threshold) or nearly all (89.5% without threshold) individuals for which the red hair category was the highest prediction probability were correctly predicted as seen in Table 4. Notably, the two individuals that were incorrectly predicted red without using a threshold defined themselves as blond and brown, respectively; upon inspection of a hair image of the latter individual that was available to us, it did in fact display light red hints of colour. This reflects another example of how the phenotyping procedure, particularly self-reported hair colour grading as done in our Irish and Greek datasets, influences DNA prediction accuracy. However, it is important to point out here that for 11(39%) individuals that had defined themselves as having red hair, the red hair probability was not the highest, relative to probabilities for non-red hair colour, and these individuals were therefore missed out with HIrisPlex using this highest-probability approach. Furthermore, for 8 (6%) of the phenotypic blond, 96 (80%) of the phenotypic brown, and 17 (59%) of the phenotypic black hair individuals the highest predicted hair colour category did not correspond to the phenotypic hair colour category and hence these individuals were missed using this highest-probability approach. This illustrates the limitation with the highest-probability approach that we aimed to overcome by developing and applying a prediction-guide approach as discussed next.
      Table 4HIrisPlex hair colour prediction accuracies obtained from a 308 separate model testing set of individuals from Poland, Ireland and Greece (individuals were not considered for prediction model building for which a different set of 1243 individuals was used) using two approaches: the highest probability category approach (with and without thresholds) and the prediction guide approach (see Fig. 4 for the prediction guide).
      Table thumbnail fx2
      To take full advantage of the genotype–phenotype relationship for hair colour and the 22 hair-colour predictive DNA variants included in the HIrisPlex system we developed a hair colour prediction guide considering categorical hair colour probabilities in combination with light/dark hair colour shade probabilities as obtained from the HIrisPlex genotype data (Fig. 4, see also Section 3.5 for additional practical recommendations). The reason for considering light/dark shade prediction in addition to categorical hair colour prediction in the final approach is that the 22 DNA variants not only impact on the main hair colour categories, but also on more detailed hair colour information, which is difficult to measure; hence, we express in light/dark prediction probabilities. For this, we took the individuals from the black category, now termed dark, and the individuals from the blond category, now termed light, and designed an additional prediction model for light and dark colour shade. Therefore, the HIrisPlex genotype input finally provides the core prediction colour category with an added level or shade, i.e. light or dark. This part of the prediction should be useful as additional information to the initial prediction category, e.g. to differentiate light blond from dark blond (light brown), or light brown from dark brown/black. It becomes particularly beneficial in the lower hair colour category prediction probability levels (i.e. category prediction <0.7 p for non-red) as the categories are closer together and may be more difficult to accurately predict one category over another due to given genotype combinations. A >0.9 threshold is used for light versus dark shade prediction. As seen in Table 4(C), using the prediction guide approach the correct call percentages were for all hair colours considerably higher than using the highest probability category approach, except for red hair. In fact, using the prediction guide approach we obtained on average 69.5% correct calls for blond, 78.5% for brown and 87.5% for black. Particularly black hair prediction was strongly improved by using the prediction guide approach with an increase of almost 60% on average relative to the highest probability category approach without a threshold. For an explanation of why blond is the least accurately predictable hair colour with currently available DNA markers, also after applying the prediction guide, see Section 3.3. Although we saw an apparent decrease of accurate prediction for red hair with the prediction guide approach (80% versus 89.5% with highest-probability approach without threshold), this can be explained by the total number of red predictions made by the models and if they were correct or not. In particular, for the highest probability approach the model was incorrect at predicting red only 2 times but missed out on 11 actual reds from our dataset. The prediction guide approach, although was inaccurate for red hair prediction for 6 individuals, it managed to predict 24 out of the 28 actual red hair phenotypes from our test set. In summary, the number of individuals in our 308 model-test set that were missed by HIrisPlex hair colour prediction using the prediction guide approach were 4 (14%) of the phenotypic red, 8 (19.5%) of the phenotypic blond, 7 (6%) of the phenotypic d-blond/l-brown, 28 (31%) of the phenotypic d-brown and 26 (90%) of the phenotypic black, with an overall hair colour prediction accuracy of 76%. All are considerably less than what was missed when applying the highest probability category approach, apart from black hair where we believe phenotyping inaccuracy/perception of colour plays a role as discussed above already, as 21 of those individuals were predicted as having d-brown hair and may have in fact displayed d-brown hair that was perceived as black within Europe. We therefore recommend using the prediction guide approach for properly interpreting HIrisPlex genotype data and the probability values derived from our prediction tool to infer the most likely hair colour phenotype in future practical applications.
      Figure thumbnail gr4
      Fig. 4HIrisPlex prediction guide on how to interpret individual hair colour and hair shade probabilities as derived from the HIrisPlex prediction tool available via . d-Brown stands for dark brown and l-brown stands for light brown.
      There are several important differences between eye and hair colour, both on the phenotypic as well as the genotypic levels, that may play a role in why some eye colours (i.e. blue and brown) appear to be currently predictable from DNA with higher accuracy than some hair colours (i.e. all non-red hair colours). Rs12913832 from the HERC2 gene plays a major role in the functional aspects of iris pigmentation [
      • Sturm R.A.
      • Larsson M.
      Genetics of human iris colour and patterns.
      ,
      • Visser M.
      • Kayser M.
      • Palstra R.-J.
      HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter.
      ] and its proposed model of action reflects a type of on/off switch from the absence of the T allele (and the homozygous presence of the C-allele) resulting in blue eye colour, to the presence of one or two T allele(s) reflecting brown eye colour [
      • Sturm R.A.
      • Larsson M.
      Genetics of human iris colour and patterns.
      ]. Indeed, it has been shown recently [
      • Visser M.
      • Kayser M.
      • Palstra R.-J.
      HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter.
      ] via a series of functional genetic experiments that the rs12913832 T-allele leads to binding of several transcription factors and a chromatin loop with the promoter of the neighbouring pigmentation gene OCA2 leading to elevated OCA2 expression and dark pigmentation. In contrast, when the rs12913832 C-allele is present, transcription factor binding, loop formation and OCA2 expression are all reduced leading to light pigmentation. Because of its strong functional involvement, HERC2 rs12913832 shows the strongest predictive power on categorical eye colour with an AUC of 0.877 for blue and 0.899 for brown alone for this SNP [
      • Liu F.
      • van Duijn K.
      • Vingerling J.R.
      • Hofman A.
      • Uitterlinden A.G.
      • Janssens A.C.J.W.
      • Kayser M.
      Eye color and the prediction of complex phenotypes from genotypes.
      ], and it also shows a strong impact on quantitative eye colour variance explaining on average 46% of the H and S spectrum [
      • Liu F.
      • Wollstein A.
      • Hysi P.G.
      • et al.
      Digital quantification of human eye color highlights genetic association of three new loci.
      ]. When comparing this to its impact on hair colour, the percentage of residual variation from black to blond explained by HERC2 rs12913832 was 10.7% in a previous study [
      • Han J.
      • Kraft P.
      • Nan H.
      • et al.
      A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation.
      ]. However, the effect of rs12913832 is considerably less on hair colour than it is on eye colour for reasons yet to be unveiled, and there are no other high impact hair colour SNP that take its place. For instance, in our full dataset using 1551 individuals, the correlation of rs12913832 with eye colour is nearly twice as high (Pearson correlation r2 = 0.46, p = 2.2e−16) as its correlation with hair colour (Pearson correlation r2 = 0.24, p = 2.2e−16). Furthermore, the colour distribution of European hair appears much wider than that of European eyes, requiring the combination of several similar gene effects [
      • Richard S.
      A, A golden age of human pigmentation genetics.
      ]. Thus, categorical hair colour prediction is expected to be more error-prone especially when involving factors such as shade and intensity, etc. at least with the DNA markers known thus far. Additional effects such as environmental contributions particularly life time that are much stronger on certain hair colours than they are on all eye colours also influence hair colour prediction accuracy more so than eye colour prediction accuracy and will be discussed in the following chapter (see Section 3.3).

      3.3 Age-dependent hair colour changes and consequences for hair colour prediction

      Age-dependent changes in hair colour are evident from anecdotal knowledge. The most often observed age-dependent hair colour changes occurs from light blond during childhood towards dark blond/light brown as an adult, but can also occur from light brown to dark brown/almost black. Suggestions of hormonal changes during adolescence have been advocated as a possible explanation [
      • Rees J.L.
      Genetics of hair and skin colour.
      ], but the molecular basis are yet to be unveiled. In order to study the effect of age-dependent hair colour change on hair colour prediction from child to adulthood we recorded via questionnaires in the Irish sample set hair colour during childhood and adulthood separately, including the approximate age of the hair colour change. Of the 339 Irish individuals, 157 contained current images in which the hair was not dyed and not grey, and from these the 8 individuals that were classified as blond in adulthood were 100% correctly predicted by the HIrisPlex system following the prediction guide approach. However, for 14 individuals with light brown to black phenotypes the HIrisPlex model had faltered and gave a high blond prediction probability (>0.7 p) with high light shade probabilities (>0.9 p). On further examination of these incorrectly predicted individuals, 8 (57%) of them noted that a change in hair colour regarding a darkening from blond to brown had occurred in their younger lives at ages ranging from 9 to 12 years. Furthermore, we found a high and statistically significant correlation (Pearson correlation r2 = 0.81, p < 0.01) between the increase in brown (darkening of hair) and the increase in age since the hair colour change occurred for those Irish individuals for whom such data were available to us (Fig. 7), which substantiates that the hair colour change observed is age dependent in these individuals. From this data we can see that our current HIrisPlex system works to a high degree of accuracy for hair colour prediction, but there may be processes that alter the hair colour over an individual's lifetime (possibly molecular processes) without changing the HIrisPlex predicted hair colour of the individual. For instance, an adult that had blond hair as a young child, but now displays light–dark brown/black hair colour is likely to display blond HIrisPlex genotypes and therefore a blond hair colour prediction will be obtained. This is due to the fact that the hair colour SNPs included in the HIrisPlex system, as well as any additional hair colour associated DNA variant available today, were identified in studies dealing with adults, and not in studies that particularly searched for bio-markers informative for the age-dependent hair colour change, which is still yet to be carried out. It is important to note therefore that the HIrisPlex model cannot decipher between these change-affected individuals and blonds who remain blonds from childhood to adulthood and thus a HIrisPlex prediction of blond hair may be inaccurate to a certain degree (30% (Supplementary Table 3) in our dataset). This limitation in DNA-based hair colour prediction will remain as long as bio-markers informative for indicating age-dependent hair colour changes are not identified. Furthermore, this age-dependent study was conducted using images solely taken from a small Irish set (childhood hair colour was not available for the Polish and the Greek set); it is worth mentioning that this may reflect a trend in other countries within Europe, however we do not have this information as of present. Therefore more samples and increased accuracy testing of the HIrisPlex system on a broader collection around Europe would be advantageous to get a better measure of this phenomenon. Furthermore, activities shall be placed for finding the processes/genes responsible for age-dependent hair colour changes and developing respective bio-markers that may increase hair colour prediction accuracy in the future.
      A different aspect of age-dependent hair colour change is the loss of hair colour when turning grey and white at a more or less advanced age, which likely represents a different mechanism of action [
      • Commo S.
      • Wakamatsu K.
      • Lozano I.
      • Panhard S.
      • Loussouarn G.
      • Bernard B.A.
      • Ito S.
      Age-dependent changes in eumelanin composition in hairs of various ethnic origins.
      ] than changing from one hair colour to another. We examined the Irish population of 339 individuals for which we had questionnaire information on the age at which grey or white hairs had started to grow. As shown in Supplementary Fig. 1, after the age of 30 there are more individuals starting to produce grey or white hairs relative to those who do not, confirming anecdotal knowledge. However, we have no data on how long it will take for those individuals who started to have grey hairs to turn grey to a substantially obvious phenotypic degree. For practical considerations, knowing the natural hair colour for an individual during its youth that now at more advanced age displays an obvious grey or white hair phenotype will not be directly useful in an investigative search, but this information can still be useful albeit less strongly, when asked for natural hair colour prior to greying in these questionable individuals during a police inquiry. For differentiating whether a crime scene sample donor still had his/her natural hair colour, or perhaps turned grey or white already, a molecular age estimation performed on crime scene samples such as blood would be useful in combination with the HIrisPlex application. Previously, our group developed a DNA test for chronological age, which allows age-group estimation on an accurate level [
      • Zubakov D.
      • Liu F.
      • van Zelm M.C.
      • et al.
      Estimating human age from T-cell DNA rearrangements.
      ]. Obviously, any dyed hair colour, as long as it produces a hair colour different from the natural hair colour category, would not be identifiable with HIrisplex or any other DNA-based hair colour prediction tool. However, in general it is believed that many people who dye their hair as a result of hair greying, and with the intention of hiding the fact that their hair has greyed, try to achieve their natural hair colour category via dyeing, especially in the case of men, to avoid stigmatisms associated with hair colouring. In such cases HIrisPlex hair colour prediction can still be useful even though the hair is dyed.

      3.4 HIrisPlex analysis on a worldwide scale

      Due to the fact that the HIrisPlex hair prediction model was created using individuals solely from Europe, as it should be for a European trait, to verify its use outside of Europe we performed HIrisPlex analysis on worldwide DNA samples from the H952 subset of the HGDP-CEPH panel that represents 952 individuals from 51 populations [
      • Rosenberg N.A.
      • Pritchard J.K.
      • Weber J.L.
      • Cann H.M.
      • Kidd K.K.
      • Zhivotovsky L.A.
      • Feldman M.W.
      Genetic structure of human populations.
      ,
      • Rosenberg N.A.
      Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives.
      ]. Due to lack of DNA in some samples, a final number of 945 worldwide samples were used. Fig. 5 displays the prediction of the four hair-colour categories blond, brown, red and black on a worldwide scale. This figure does not use any threshold parameters and therefore it is worthy to note that the prediction levels of blond hair in Europe (especially with probability values <0.7 p) may reflect more of a brown hair colour prediction upon inspection of the probability values and the prediction chart that should be used in Fig. 4. Although the actual hair colour of the HGDP-CEPH individuals is not known, we conform to general knowledge that individuals distant from Europe and its neighbouring regions (i.e. Middle East and parts of West Asia) display a black hair colour phenotype as illustrated by proposed figures of hair colour distribution [
      • Beals R.L.
      • Hoijer H.
      An Introduction to Anthropology.
      ], (with a image depiction found at http://cogweb.ucla.edu/ep/Frost_06.html). As seen from Fig. 5, for every individual who originates from regions that are distant from Europe and neighbouring regions, namely East Asia, Oceania, Sub-Saharan Africa and the Americas where only black hair is assumed to be present, HIrisPlex indeed predicts black hair as the only hair colour with no exception. Only in Europe, Russia, Israel and parts of Pakistan, the region covered by HGDP-CEPH samples where hair colour variation is assumed to be present, HIrisPlex predicts individuals with red, blond, brown as well as black hair colour. This mirrors our earlier findings using the IrisPlex system for worldwide eye colour prediction, where only brown eye colour was predicted in East Asia, Oceania, Sub-Saharan Africa and the Americas (with a single exception of an individual below the 0.7 p threshold level but still displayed a brown eye colour prediction); i.e. the worldwide regions where only brown eyes are assumed to exist. Also in Europe, Russia, Israel and parts of Pakistan where there is assumed eye colour variation, IrisPlex indeed predicted blue, intermediate and brown eye colour [
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ]. These results suggest that HIrisPlex hair and eye colour prediction is reliable on the worldwide scale and highlights that HIrisPlex hair and eye colour prediction can be applied independently from bio-geographic ancestry knowledge and without the need for extra DNA ancestry testing in practical applications such as forensics.
      Figure thumbnail gr5
      Fig. 5Worldwide depiction of the performance of the HIrisPlex system for hair colour prediction on the HGDP-CEPH H952 set of 945 individuals from 51 populations. No threshold is in place for prediction where the highest categorical p value (not taking into account shade) is deemed the predicted colour. There is a higher propensity of blond individuals predicted solely based on category p values within Europe, when using the guide from , these numbers tend to decrease with an increase in brown/light brown predictions (data not shown). For worldwide eye colour prediction analysis using the IrisPlex system see of our previous IrisPlex paper
      [
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ]
      (note that IrisPlex represents the eye colour part of HIrisPlex with the very same SNPs).
      Furthermore, we examined the effect of the 24 DNA variants included in the HIrisPlex system on their potent ability to infer biogeographic ancestry. It had been advocated before that SNPs from pigmentation genes are useful for genetic ancestry detection [
      • Pulker H.
      • Lareu M.V.
      • Phillips C.
      • Carracedo A.
      Finding genes that underlie physical traits of forensic interest using genetic tools.
      ]. Previously we had shown that the 6 SNPs from the IrisPlex system were able to separate Europeans from Non-Europeans to a certain degree on the population (but not necessarily on the individual) level [
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ]. Fig. 6 shows a two-dimensional plot from a non-metric multidimensional scaling (MDS) analysis of pairwise FST values estimated between pairs of all the 51 HGDP-CEPH populations using the 24 DNA variants of the HIrisPlex system (S-stress value 0.04030). As evident, the 1st dimension separates the European populations (except Sardinians and Adygei) from all non-European populations with all Middle-Eastern populations and the Kalash from Pakistan. Hence almost all groups with predicted hair colour variation are clustered closer to the European groups, whereas the East Asian groups together with the American groups cluster the farthest distance from the Europeans. The 2nd dimension separates African groups on one side and Oceanian groups to the other side from all other worldwide groups that appear centre. We then performed an AMOVA test to see how much of the total genetic variation provided by these 24 eye and hair colour predictive DNA variants is explained by geography when assigning the 51 populations into seven continental groups; Europe, Middle East, Africa, Central South Asia, East Asia, Oceania and America. A remarkably high variance proportion of 24.44% was estimated from 1100 permutations, which was highly statistically significant (p < 0.000005). When separating the 51 populations into two groups, i.e. Europeans and non-Europeans, we obtained a very similarly high variance proportion of 24.76% (p < 0.000005) from 1100 permutations. Grouping the 945 individuals according to their predicted hair colour categories (black, brown, red and blond) resulted in an only slightly higher variance proportion of 29.79% (p < 0.000005) as expected for a European trait such as hair colour variation.
      Figure thumbnail gr6
      Fig. 6Two-dimensional plot from a non-metric multidimensional scaling (MDS) analysis of pairwise FST distances between the 51 worldwide HGDP-CEPH populations using the 24 HIrisPlex eye and hair colour DNA variants. Colour coding is according to geographic regions as provided in the legend. Populations in between two geographic clusters are given with names.
      Motivated by this finding, we investigated a combined eye and hair prediction threshold to test if it may be possible to find out simply by means of HIrisPlex eye and hair colour probability strength if a brown-eyed and black haired individual originates from Europe or from a region distant to Europe. If successful, this would provide additional information to the sheer eye/hair colour prediction, as it may alleviate the potential need for ancestry testing in finding out more about an unknown crime scene sample donor/missing person. Obviously, a prediction with sufficiently high probability of blue or intermediate eye colour, as well as of brown, blond or red hair colour would already allow a conclusion that the person is of at least partial European descent. However, this is different for brown-eyed, black-haired predicted individuals as this phenotype combination occurs worldwide. The results of this non-European threshold assessment can be seen in Supplementary Fig. 2 with the breakdown of population numbers shown in Supplementary Table 1. Our data demonstrate that it is indeed possible to predict that a brown-eyed, black-haired individual is likely to have non-European ancestry (excluding the nearby regions of Middle East and partly North Asia and America) using a threshold of >0.7 p for black hair and >0.99 p for brown eyes and the respective prediction accuracy based on our dataset is 86.5% (see Supplementary Table 1 for precise numbers).
      We also investigated the worldwide allelic distribution of the 24 HIrisPlex DNA variants in the HGDP-CEPH samples separately for every DNA marker as shown in Supplementary Figs. 3–5 (except for rs12913832 (HERC2), rs1800407 (OCA2), rs12896399 (SLC24A4), rs16891982 (SLC45A2 (MATP)), rs1393350 (TYR) and rs12203592 (IRF4), as they can be found in Fig. 4 of our previous publication on worldwide IrisPlex analysis [
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ]). Notably, none of the DNA variants provides such a strong degree of separation of Europeans versus Non-Europeans as HERC2 SNP rs12913832 (see Fig. 4 in [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ], which is the highest ranked SNP in eye colour prediction [
      • Liu F.
      • van Duijn K.
      • Vingerling J.R.
      • Hofman A.
      • Uitterlinden A.G.
      • Janssens A.C.J.W.
      • Kayser M.
      Eye color and the prediction of complex phenotypes from genotypes.
      ] and also displays high probability values for hair colour prediction, except for red (Table 3). Although the MC1R variants displayed in Supplementary Fig. 2(A) N29insA, (B) rs11547464, (D) rs1805008, (F) rs1805006, and Supplementary Fig. 3(G) rs1805007, (H) rs1805009, (I) Y152OCH which are all “high penetrance” MC1R variants as well as (K) rs111400 a “low penetrance” MC1R variant, all have a restricted European and surrounding areas distribution, as expected given their role in red hair that is normally observed in individuals with European and nearby ancestry, they are all quite rare especially N29insA and Y152OCH. However the remaining MC1R variants included in HIrisPlex (rs885479, rs1805005, rs2228479) show a variable distribution within Europe and its proximate areas, as well as outside these regions, which may explain the very rare occurrence of red hair individuals outside of Europe and surrounding areas [
      • Harding R.M.
      • Healy E.
      • Ray A.J.
      • et al.
      Evidence for variable selective pressures at MC1R.
      ], or that their effect size is rather minor. Notably, both rs1805005 (Supplementary Fig. 2(E), and rs2228479 (Supplementary Fig. 3(J)) were grouped into the MC1R_r low penetrance group for red hair prediction in our previous publication [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ] and require a combination of MC1R alleles before the red hair phenotype is displayed due to their minor contributions towards red hair, which would explain their distribution outside Europe as their red hair effect is more minor. Rs885479 (Supplementary Fig. 2(C)) was also deemed a “low penetrance” SNP that is responsible for red hair colour production, but it seems to contribute to other hair colours as well as seen in its effect on the prediction model in Fig. 2, where the largest effect by its minor allele contribution was towards the darkening of hair colour (brown–black) in comparison to its contribution towards red hair colour prediction. This SNP is also noted to have a skin colour contribution, especially related to the evolution of lighter skin colour in East Asians [
      • Yuasa I.
      • Umetsu K.
      • Harihara S.
      • et al.
      Distribution of two Asian-related coding SNPs in the MC1R and OCA2 genes.
      ], which mirrors its worldwide allelic distribution as shown here. Another HIrisPlex SNP with a peculiar worldwide allele distribution is rs28777 in the SLC45A2 (MATP) gene (Supplementary Fig. 3(L)), which reflects a pattern of European (and surrounding areas) versus Non-European differentiation due to its hair, in particular AA (black) versus CC (red) colour effect, but also due to its assumed skin colour association [
      • Han J.
      • Kraft P.
      • Nan H.
      • et al.
      A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation.
      ]. Notably, its distribution is similar to rs1042602 in the TYR gene (Supplementary Fig. 4(O)), which has been previously associated with normal hair colour variation and freckles [
      • Sulem P.
      • Gudbjartsson D.F.
      • Stacey S.N.
      • et al.
      Genetic determinants of hair, eye and skin pigmentation in Europeans.
      ]. Although it did not show significant association with hair colour in our previous paper [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ], it did however provide an independent effect on hair colour prediction [
      • Branicki W.
      • Liu F.
      • van Duijn K.
      • Draus-Barini J.
      • Pośpiech E.
      • Walsh S.
      • Kupiec T.
      • Wojas-Pelc A.
      • Kayser M.
      Model-based prediction of human hair color using DNA variants.
      ], which reflects this non-synonymous SNP's vital role in pigmentation. Rs683 (TYRP1) (Supplementary Fig. 4(R)) also reflects a slight European versus non-European pattern in terms of its TT genotype, which is present at a higher frequency within Europe and its surrounding areas than outside in which its counterpart allele GG is predominant. For the remaining SNPs, Supplementary Fig. 4(M), rs12821256 (KITLG), (N) rs4959270 (EXOC2), (P) rs2402130 (SLC24A4), (Q) rs2378249 (PIGU/ASIP), although associated with hair colour in Europeans, there is no discernable pattern of allelic distribution worldwide.

      3.5 Considerations on the practical use of the HIrisPlex system for hair and eye colour prediction

      The HIrisPlex system is capable of simultaneously predicting the hair and eye colour of an individual from DNA. Practical recommendations for eye colour prediction using the HIrisPlex system follow those previously published for the IrisPlex system [
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ] as the very same 6 SNPs and the very same eye colour prediction model used in IrisPlex are also used in the HIrisPlex system when it comes to eye colour. To allow easy use of the HIrisPlex system in practical applications, and to take full advantage of our eye and hair colour genotype and phenotype database and its relevant parameters for model-based prediction, we provide with the present paper the HIrisPlex hair and eye colour prediction tool (Supplementary Table 2). This tool is a combined Excel macro specifically designed to manage both the eye colour and the hair colour prediction models in an easy-to-use fashion that allows interactive use. Users simply input the number of minor alleles (0, 1 or 2) of each of the 24 DNA variants included in the HIrisPlex assay and a probability value for black, brown, red and blond hair colour is produced based on the underlying hair colour prediction model, as well as separately the probability of light and dark hair colour shade, and separately the eye colour probabilities of blue, intermediate and brown based on the underlying eye colour prediction model. This tool replaces our previously provided [
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ]. Excel spreadsheet for eye colour prediction based on the IrisPlex system as it combines eye and hair colour prediction with the respective underlying database knowledge in one tool. For the most accurate interpretation of the categorical hair colour and hair shade prediction probabilities revealed from the Excel macro prediction tool (Supplementary Table 2), we recommend to follow the hair prediction guide as shown in Fig. 4 and described above.
      As a working example of the tool, upon assessment of the 308 individuals used for model testing based solely on the highest probability category, we also assessed their hair colour prediction following the prediction guidelines set in this paper (Fig. 4) as well as eye colour assessment following the guide set in the pan-European IrisPlex paper we published previously [
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ]. This reflects how the DNA prediction of both pigmentation traits would be performed in practice, with a final hair colour prediction being made to the case officer, i.e. “the most probable hair colour is light blond”, including the accuracy at which the HIrisPlex system is able to predict the hair colour category based on current research (at present, based on our 308 individual test set), and the eye colour prediction would follow our previously published guidelines [
      • Sulem P.
      • Gudbjartsson D.F.
      • Stacey S.N.
      • et al.
      Genetic determinants of hair, eye and skin pigmentation in Europeans.
      ], i.e. the most probable eye colour result is brown above others (if this p value was >0.7 p) at an accuracy of 94% based on a European dataset of over 3800 individuals. In Fig. 8 we show four illustrative examples including eye and hair colour phenotypes from high-resolution photographs, the categorical eye and hair colour as well as hair shade probabilities as derived from HIrisPlex genotyping, and a summarising statement of the prediction outcomes as may be used for reporting purposes (these individuals were not used in modelling).
      Figure thumbnail gr7
      Fig. 7A Pearson correlation plot illustrating the age dependent hair colour change in years versus the darkening of hair colour. Individuals are from the Irish set and display a dark blond/light brown to dark brown phenotype depending on the number of years since the change occurred, while all were noted as having light blond hair as a child.
      Figure thumbnail gr8
      Fig. 8Four diverse examples of European individuals (A–D) illustrating the application of the HIrisPlex system including final outcome summaries as might be used for reporting purposes. Each individual depicts high-resolution hair and eye images, which display their actual eye and hair colour phenotypes. The eye and hair colour, as well as hair shade categories are shown with their respective probabilities as derived from HIrisPlex genotyping and input into the eye and hair colour prediction model, with the category displaying the highest probability per each trait highlighted. In relation to hair colour, the written statement underneath is completed by using the hair colour prediction guide described in this study including currently known prediction accuracies for a certain hair colour as based on the test set of 308 individual from 3 European populations used in this study. Regarding eye colour the statement is completed by following guidelines for eye colour prediction described previously
      [
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ]
      using threshold assessment accuracies previously produced from a test set of >3800 individuals from 7 European populations
      [
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ]
      . None of the four individuals shown here were used for model building or model testing on eye and hair colour prediction.
      Supplementary Table 3 provides the actual single grader (Polish) and self reported (Irish and Greek) hair and eye colour phenotype of the individual and includes the final prediction that would be produced with the HIrisPlex system for hair and eye colour. An accuracy of 60% correct prediction for both hair and eye colour together (measured as the presence of an inaccurate prediction for either hair colour or eye colour) was achieved in this 308 model testing sample set. Expectedly, an increased number of individuals would be beneficial to test the accuracy of the HIrisPlex hair colour prediction model, especially from different countries in Europe other than Poland, Ireland and Greece that were involved in modelling to rule out any possible bias that may be present, which should be targeted in future studies. However, the relatively low percentage of correct combined eye and hair colour prediction in this test set is not only influenced by sample size but also by the different accuracies achieved for eye colour on one hand and hair colour on the other. For instance, in only 7% of the test individuals (all with intermediate eye and brown to black hair colours) were both pigmentation traits, eye and hair colour, predicted incorrectly.
      When splitting-up the accuracies in this test set for the two pigmentation traits separately, hair colour alone was 76% correctly predicted using the prediction guide approach. Although different prediction accuracies were obtained for different hair colours as described above, the majority of the error lay in predicting a colour lighter than the physical phenotype, which can be attributed to the darkening of hair colour with age. Without having available biomarkers informative for the age-dependent hair colour change, we believe it will not be possible to dramatically reduce the prediction error currently obtained in such individuals. Consequently, basic research in the molecular biology of age-dependent hair colour changes is required to investigate whether such biomarkers can indeed be developed for future applications such as forensics.
      For eye colour categories alone, 76% of individuals gave probabilities that were correctly predicted in this set without using a threshold, or 82% by applying the >0.7 p threshold as we advocated before [
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ]. This overall estimate of eye colour prediction accuracy is strongly influenced by the intermediate category, which, with the currently available SNPs, is known to be by far the least accurately predictable eye colour category in relation to blue and brown. In fact, the majority (59%) of individuals in this test set that showed inaccurate eye colour prediction belonged to the intermediate category and only 14% intermediate (total n = 50) eye coloured individuals were predicted correctly. In contrast, and even without considering the previously suggested probability threshold of 0.7 p and omitting the phenotypic intermediate individuals, in only 8% of cases did the HIrisPlex system provide an incorrect prediction for individuals who had phenotypic blue eye colour and in only 18% of cases for individuals who had phenotypic brown eye colour in this set. This reflects an accuracy call rate of 88% (n = 258) for blue and brown eye colours alone in this test set, or 94% (n = 194) by applying the >0.7 p threshold for correctly predicting the phenotypic blue and brown-eyed individuals within this test set. Our previous IrisPlex study on >3800 individuals from seven countries of different parts of Europe also provided an overall eye colour prediction accuracy of 94% for blue and brown using the >0.7 p threshold [
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ]. This indicates that the eye colour accuracy when just considering blue and brown eye colour predictions is much higher than in the prediction of all three categories, blue, brown and intermediate, mainly due to the fact that currently, DNA markers with the ability to strongly predict non-blue and non-brown eye colours are lacking and need to be established in future basic research.
      Some of the individuals categorised here as intermediate eye colour in fact carry green eyes and some DNA variants have been previously suggested to be informative for green eye prediction such as OCA2 rs1800407 [
      • Duffy D.L.
      • Montgomery G.W.
      • Chen W.
      • Zhao Z.Z.
      • Le L.
      • James M.R.
      • Hayward N.K.
      • Martin N.G.
      • Sturm R.A.
      A three single-nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color variation.
      ,
      • Pospiech E.
      • Draus-Barini J.
      • Kupiec T.
      • Wojas-Pelc A.
      • Branicki W.
      Gene–gene interactions contribute to eye colour variation in humans.
      ]). Recently, Pneuman et al. [
      • Pneuman A.
      • Budimlija Z.M.
      • Caragine T.
      • Prinz M.
      • Wurmbach E.
      Verification of eye and skin color predictors in various populations.
      ] stated that green eye prediction with a high degree of accuracy is possible using specific genotype combinations, i.e. A/G at rs12913832 plus T/T at rs12203592 designed combo 1, or G/G at rs12913832 plus C/C at rs16891982 designed combo 2. We were interested to see if we could improve the green eye prediction in our test set where with HIrisPlex we only achieve 5 correct intermediate/green (19%) predictions from the 27 phenotypic green eyed individuals considered. Using their guidelines, we found that combo 1 predicted only 2 of the 27 green individuals (8%), which is less than half of the ones correctly predicted by HIrisPlex, and wrongly predicted 3 blues as green. Combo 2 did not exist within this set of 308 individuals; hence, none of the remaining 25 green eyed individuals could be identified with this combo. Therefore, applying the approach of Pneuman et al. [
      • Pneuman A.
      • Budimlija Z.M.
      • Caragine T.
      • Prinz M.
      • Wurmbach E.
      Verification of eye and skin color predictors in various populations.
      ] to this test set did not yield an improvement of green eye colour prediction. We believe, as we advocated before [
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ,
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ], that intensified basic research into the genetics underlying green eye colour is needed before better markers for green eye prediction in practical applications such as forensics can be provided.

      3.6 Conclusions

      The hereby introduced HIrisPlex system is capable of simultaneously predicting hair and eye colour phenotypes from DNA using a single 24-multiplex assay and a combined eye and hair colour prediction tool. The HIrisPlex genotyping assay is highly sensitive allowing successful genotyping down to at least 63 pg starting DNA, and is capable of successfully coping with degraded DNA due to fragment sizes of <160 bp. An on-going developmental validation study of the HIrisPlex assay will deliver additional characteristics relevant for forensic applications. The HIrisPlex hair colour prediction model and prediction guide revealed on average individual-based hair colour prediction accuracies of 69.5% for blond, 78.5% for brown, 80% for red and 87.5% for black hair. The HIrisPlex system provides reliable hair colour prediction independent from bio-geographic ancestry as we previously also showed for eye colour prediction and the IrisPlex system, which represents the eye colour prediction part of the new HIrisPlex system. HIrisPlex hair and eye colour prediction in practical applications is eased by providing a user-friendly Excel spreadsheet requiring not more than the input of the number of minor alleles of the 24 assay DNA variants. It produces individual probabilities for four hair colour categories (red, blond, brown, and black) and hair colour shade (light and dark) – used together and following the prediction guide approach we provide here, this allows a more specific hair colour estimation than available from the categorical approach alone. This spreadsheet also delivers three eye colour categories (blue, intermediate, and brown) based on the previously developed and validated IrisPlex model. As an extra element with investigative value we demonstrate here that it is possible to infer bio-geographic ancestry on the level of European (including nearby regions) versus non-European (excluding nearby regions) origin from the strength of HIrisPlex hair and eye colour probabilities for brown eyed and black haired individuals distributed worldwide (whereas non-brown eye colour and non-black hair colour per se indicate an origin in Europe, including nearby regions).
      Current limitations of the HIrisPlex system are in accurately predicting hair colour in those individuals who underwent age-dependent changes that influenced category shifts (such as blond to brown) because of the current unavailability of biomarkers to indicate such a colour change, and in accurately predicting intermediate eye colours such as green because of the current unavailability of good DNA predictors for these non-blue and non-brown eye colours. Basic research for finding more appropriate bio-markers for these aspects is needed to overcome current limitations of DNA-based eye and hair colour prediction in the future. Furthermore, future research is needed on the biology and genetics of hair greying, and the development of informative bio-markers for its molecular prediction. Last but not least, and similar to our previous proclamation on eye colour [
      • Walsh S.
      • Liu F.
      • Ballantyne K.N.
      • van Oven M.
      • Lao O.
      • Kayser M.
      IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
      ,
      • Walsh S.
      • Wollstein A.
      • Liu F.
      • et al.
      DNA-based eye colour prediction across Europe with the IrisPlex system.
      ], we would like to emphasise here that only moving DNA-based hair (and eye) colour prediction from the current categorical level to a future continuous level, aiming to accurately predict all shades of hair (and eye) colour including age-dependent changes in early and in advanced ages, will provide the highest level of accuracy, as may be wished by the investigating authorities for forensic applications. Notably, such continuous prediction approach will also avoid the current uncertainties that come along with the interpretation variance of hair and eye colour categories by different receiving investigators, by being able to provide them with actual hair and eye colour charts or printouts to be used for tracing an unknown person instead of a simplified colour category as possible for the time being.

      Acknowledgements

      We are very grateful to the study participants for providing samples including eye and hair images. We would also like to thank Professor Tommie McCarthy of University College Cork (UCC), Ireland for helping with sample collection. This work was funded in part by the Netherlands Forensic Institute (NFI) and by a grant from the Netherlands Genomics Initiative (NGI) / Netherlands Organization for Scientific Research (NWO) within the framework of the Forensic Genomics Consortium Netherlands (FGCN), and was furthermore supported in part by a grant from the Ministry of Science and Higher Education in Poland no ON301115136 to W.B.

      Appendix A. Supplementary data

      Figure thumbnail mmc1
      Supplementary Fig. S1The age range at which individuals first showed signs of grey/white hairs within the Irish population set of n = 339. The black colour shows the percentage within the group that did not show signs of hair greying/whitening. The grey colour shows the percentage of individuals within the group that started to first display grey/white hairs.
      Figure thumbnail mmc2
      Supplementary Fig. S2Worldwide survey on how to infer European (including nearby regions) versus non-European (excluding nearby regions) bio-geographic ancestry of a brown eyed, black haired person based on the threshold combination of black hair prediction probability >0.7 and brown eye colour prediction probability >0.99 using the HGDP-CEPH H952 set.
      Figure thumbnail mmc3
      Supplementary Fig. S3Worldwide distribution of the alleles from the DNA insertion N29insA, and SNPs rs11547464, rs885479, rs1805008, rs1805005, and rs1805006 all from the MC1R gene using the HGDP-CEPH H952 set.
      Figure thumbnail mmc4
      Supplementary Fig. S4Worldwide distribution of the alleles from the SNPs rs1805007, rs1805009, Y152OCH, rs2228479, rs111400 all from the MC1R gene, and the rs28777 alleles from the SLC45A2 (MATP) gene using the HGDP-CEPH H952 set.
      Figure thumbnail mmc5
      Supplementary Fig. S5Worldwide distribution of the alleles from the SNPs rs12821256 (KITLG), rs4959270 (EXOC2), rs1042602 (TYR), rs2402130 (SLC24A4), rs2378249 (PIGU/ASIP) and rs683 (TYRP1) using the HGDP-CEPH H952 set.
      • Supplementary Table S1

        Frequency of individuals called non-European in the 51 populations from the HGDP-CEPH H952 set when using black hair colour probabilities >0.7 on its own and in conjunction with brown eye colour probabilities >0.99. Includes the percentage ability to differentiate between a black haired brown-eyed European from a non-European with black hair and brown eyes per population.

      • Supplementary Table S2

        Interactive HIrisPlex prediction tool for hair and eye colour: an easy to use Excel macro to input the minor alleles that are generated from the HIrisPlex genotypes. The output of the tool gives the individual probabilities of the four hair colour categories (Black, brown, red and blond), two hair colour shade categories (light and dark), and three category probabilities for eye colour (blue, intermediate and brown) given its HIrisPlex genotype and based on a prediction model obtained from 1243 Polish, Irish and Greek individuals. For accurate interpretation of hair colour and shade prediction probabilities and to derive the final most likely individual hair colour category see prediction guide in Fig. 4. For accurate interpretation of eye colour prediction probabilities and to derive the final most likely individual eye colour category see recommendations described in Walsh et al. [

        • Walsh S.
        • Wollstein A.
        • Liu F.
        • et al.
        DNA-based eye colour prediction across Europe with the IrisPlex system.
        ].

      • Supplementary Table S3

        Prediction calls of the 308 test set of individuals, includes HIrisPlex probabilities for hair colour categories (including hair shade) and the final prediction call for hair colour (considering colour and shade based on the guide in Fig. 4) as well as eye colour prediction accuracies based on our recommendations described in Walsh et al. [

        • Walsh S.
        • Wollstein A.
        • Liu F.
        • et al.
        DNA-based eye colour prediction across Europe with the IrisPlex system.
        ].

      References

        • Tully G.
        Genotype versus phenotype: human pigmentation.
        Forenic. Sci. Int. Genet. 2007; 1: 105-110
        • Kayser M.
        • Schneider P.M.
        DNA-based prediction of human externally visible characteristics in forensics: motivations, scientific challenges, and ethical considerations.
        Forenic. Sci. Int. Genet. 2009; 3: 154-161
        • Kayser M.
        • de Knijff P.
        Improving human forensics through advances in genetics, genomics and molecular biology.
        Nat. Rev. Genet. 2011; 12: 179-192
        • Walsh S.
        • Liu F.
        • Ballantyne K.N.
        • van Oven M.
        • Lao O.
        • Kayser M.
        IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
        Forenic. Sci. Int. Genet. 2011; 5: 170-180
        • Walsh S.
        • Lindenbergh A.
        • Zuniga S.B.
        • Sijen T.
        • de Knijff P.
        • Kayser M.
        • Ballantyne K.N.
        Developmental validation of the IrisPlex system: determination of blue and brown iris colour for forensic intelligence.
        Forenic. Sci. Int. Genet. 2011; 5: 464-471
        • Sulem P.
        • Gudbjartsson D.F.
        • Stacey S.N.
        • et al.
        Genetic determinants of hair, eye and skin pigmentation in Europeans.
        Nat. Genet. 2007; 39: 1443-1452
        • Duffy D.L.
        • Montgomery G.W.
        • Chen W.
        • Zhao Z.Z.
        • Le L.
        • James M.R.
        • Hayward N.K.
        • Martin N.G.
        • Sturm R.A.
        A three single-nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color variation.
        Am. J. Hum. Genet. 2007; 80: 241-252
        • Han J.
        • Kraft P.
        • Nan H.
        • et al.
        A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation.
        PLoS Genet. 2008; 4: e1000074
        • Branicki W.
        • Brudnik U.
        • Kupiec T.
        • Wolańska-Nowak P.
        • Szczerbińska A.
        • Wojas-Pelc A.
        Association of polymorphic sites in the OCA2 gene with eye colour using the tree scanning method.
        Ann. Hum. Genet. 2008; 72: 184-192
        • Sturm R.A.
        • Larsson M.
        Genetics of human iris colour and patterns.
        Pig. Cell Melan. Res. 2009; 22: 544-562
        • Eiberg H.
        • Troelsen J.
        • Nielsen M.
        • Mikkelsen A.
        • Mengel-From J.
        • Kjaer K.
        • Hansen L.
        Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA2 expression.
        Hum. Genet. 2008; 123: 177-187
        • Sulem P.
        • Gudbjartsson D.F.
        • Stacey S.N.
        • et al.
        Two newly identified genetic determinants of pigmentation in Europeans.
        Nat. Genet. 2008; 40: 835-837
        • Kayser M.
        • Liu F.
        • Janssens A.C.J.W.
        • et al.
        Three genome-wide association studies and a linkage analysis identify HERC2 as a human iris color gene.
        Am. J. Hum. Genet. 2008; 82: 801
        • Mengel-From J.
        • Wong T.
        • Morling N.
        • Rees J.
        • Jackson I.
        Genetic determinants of hair and eye colours in the Scottish and Danish populations.
        BMC Genet. 2009; 10: 88
        • Spichenok O.
        • Budimlija Z.M.
        • Mitchell A.A.
        • Jenny A.
        • Kovacevic L.
        • Marjanovic D.
        • Caragine T.
        • Prinz M.
        • Wurmbach E.
        Prediction of eye and skin color in diverse populations using seven SNPs.
        Forenic. Sci. Int. Genet. 2011; 5: 472-478
        • Valenzuela R.K.
        • Henderson M.S.
        • Walsh M.H.
        • et al.
        Predicting phenotype from genotype: normal pigmentation.
        J. Forensic Sci. 2010; 55: 315-322
        • Pospiech E.
        • Draus-Barini J.
        • Kupiec T.
        • Wojas-Pelc A.
        • Branicki W.
        Gene–gene interactions contribute to eye colour variation in humans.
        J. Hum. Genet. 2011; 56: 447-455
      1. Y. Ruiz, C. Phillips, A. Gomez-Tato, et al., Further development of forensic eye color predictive tests. Forenic. Sci. Int. Genet., http://dx.doi.org/10.1016/j.fsigen.2012.05.009.

        • Walsh S.
        • Wollstein A.
        • Liu F.
        • et al.
        DNA-based eye colour prediction across Europe with the IrisPlex system.
        Forenic. Sci. Int. Genet. 2012; 6: 330-340
        • Lao O.
        • De Gruijter J.M.
        • Van Duijn K.
        • Navarro A.
        • Kayser M.
        Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms.
        Ann. Hum. Genet. 2007; 71: 354-369
        • Myles S.
        • Somel M.
        • Tang K.
        • Kelso J.
        • Stoneking M.
        Identifying genes underlying skin pigmentation differences among human populations.
        Hum. Genet. 2007; 120: 613-621
        • Estrada K.
        • Krawczak M.
        • Schreiber S.
        • et al.
        A genome-wide association study of northwestern Europeans involves the C-type natriuretic peptide signaling pathway in the etiology of human height variation.
        Hum. Mol. Genet. 2009; 18: 3516-3524
        • Lango Allen H.
        • Estrada K.
        • Lettre G.
        • et al.
        Hundreds of variants clustered in genomic loci and biological pathways affect human height.
        Nature. 2010; 467: 832-838
        • Hillmer A.M.
        • Brockschmidt F.F.
        • Hanneken S.
        • et al.
        Susceptibility variants for male-pattern baldness on chromosome 20p11.
        Nat. Genet. 2008; 40: 1279-1281
        • Medland S.E.
        • Nyholt D.R.
        • Painter J.N.
        • et al.
        Common variants in the trichohyalin gene are associated with straight hair in Europeans.
        Am. J. Hum. Genet. 2009; 85: 750-755
        • Fujimoto A.
        • Kimura R.
        • Ohashi J.
        • et al.
        A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness.
        Hum. Mol. Genet. 2008; 17: 835-843
        • Beals R.L.
        • Hoijer H.
        An Introduction to Anthropology.
        MacMillan, New York1965
        • Peter F.
        European hair and eye color: a case of frequency-dependent sexual selection?.
        Evol. Hum. Behav. 2006; 27: 85-103
        • Valverde P.
        • Healy E.
        • Jackson I.
        • Rees J.L.
        • Thody A.J.
        Variants of the melanocyte-stimulating hormone receptor gene are associated with red hair and fair skin in humans.
        Nat. Genet. 1995; 11: 328-330
        • Kanetsky P.A.
        • Swoyer J.
        • Panossian S.
        • Holmes R.
        • Guerry D.
        • Rebbeck T.R.
        A polymorphism in the agouti signaling protein gene is associated with human pigmentation.
        Am. J. Hum. Genet. 2002; 70: 770-775
        • Graf J.
        • Hodgson R.
        • van Daal A.
        Single nucleotide polymorphisms in the MATP gene are associated with normal human pigmentation variation.
        Hum. Mutat. 2005; 25: 278-284
        • Branicki W.
        • Brudnik U.
        • Draus-Barini J.
        • Kupiec T.
        • Wojas-Pelc A.
        Association of the SLC45A2 gene with physiological human hair colour variation.
        J. Hum. Genet. 2008; 53: 966-971
        • Shekar S.N.
        • Duffy D.L.
        • Frudakis T.
        • Sturm R.A.
        • Zhao Z.Z.
        • Montgomery G.W.
        • Martin N.G.
        Linkage and association analysis of spectrophotometrically quantified hair color in australian adolescents: the effect of OCA2 and HERC2.
        J. Invest. Dermatol. 2008; 128: 2807-2814
        • Grimes E.A.
        • Noake P.J.
        • Dixon L.
        • Urquhart A.
        Sequence polymorphism in the human melanocortin 1 receptor gene as an indicator of the red hair phenotype.
        Forensic Sci. Int. 2001; 122: 124-129
        • Branicki W.
        • Liu F.
        • van Duijn K.
        • Draus-Barini J.
        • Pośpiech E.
        • Walsh S.
        • Kupiec T.
        • Wojas-Pelc A.
        • Kayser M.
        Model-based prediction of human hair color using DNA variants.
        Hum. Genet. 2011; 129: 443-454
        • Rosenberg N.A.
        • Pritchard J.K.
        • Weber J.L.
        • Cann H.M.
        • Kidd K.K.
        • Zhivotovsky L.A.
        • Feldman M.W.
        Genetic structure of human populations.
        Science. 2002; 298: 2381-2385
        • Rosenberg N.A.
        Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives.
        Ann. Hum. Genet. 2006; 70: 841-847
        • Untergasser A.
        • Nijveen H.
        • Rao X.
        • Bisseling T.
        • Geurts R.
        • Leunissen J.A.
        Primer3Plus, an enhanced web interface to Primer3.
        Nucleic Acids Res. 2007; 35: W71-W74
        • Vallone P.M.
        • Butler J.M.
        AutoDimer: a screening tool for primer–dimer and hairpin structures.
        Biotechniques. 2004; 37: 226-231
        • Altschul S.F.
        • Madden T.L.
        • Schäffer A.A.
        • Zhang J.
        • Zhang Z.
        • Miller W.
        • Lipman D.J.
        Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
        Nucleic Acids Res. 1997; 25: 3389-3402
        • Sherry S.T.
        • Ward M.H.
        • Kholodov M.
        • Baker J.
        • Phan L.
        • Smigielski E.M.
        • Sirotkin K.
        DbSNP: the NCBI database of genetic variation.
        Nucleic Acids Res. 2001; 29: 308-311
        • Liu F.
        • van Duijn K.
        • Vingerling J.R.
        • Hofman A.
        • Uitterlinden A.G.
        • Janssens A.C.J.W.
        • Kayser M.
        Eye color and the prediction of complex phenotypes from genotypes.
        Curr. Biol. 2009; 19: R192-R193
        • Weir B.S.
        • Cockerham C.C.
        Estimating F-statistics for the analysis of population structure.
        Evolution. 1984; 38: 1358-1370
        • Excoffier L.
        • Laval G.
        • Schneider S.
        Arlequin (version 3.0): an integrated software package for population genetics data analysis.
        Evol. Bioinform. Online. 2005; 1: 47-50
        • Freire-Aradas A.
        • Fondevila M.
        • Kriegel A.K.
        • Phillips C.
        • Gill P.
        • Prieto L.
        • Schneider P.M.
        • Carracedo Å.
        • Lareu M.V.
        A new SNP assay for identification of highly degraded human DNA.
        Forenic. Sci. Int. Genet. 2011;
        • Lou C.
        • Cong B.
        • Li S.
        • et al.
        A SNaPshot assay for genotyping 44 individual identification single nucleotide polymorphisms.
        Electrophoresis. 2011; 32: 368-378
        • Flanagan N.
        • Healy E.
        • Ray A.
        • Philips S.
        • Todd C.
        • Jackson I.J.
        • Birch-Machin M.A.
        • Rees J.L.
        Pleiotropic effects of the melanocortin 1 receptor (MC1R) gene on human pigmentation.
        Hum. Mol. Genet. 2000; 9: 2531-2537
        • Sturm R.A.
        • Duffy D.L.
        • Box N.F.
        • Newton R.A.
        • Shepherd A.G.
        • Chen W.
        • Marks L.H.
        • Leonard J.H.
        • Martin N.G.
        Genetic association and cellular function of MC1R variant alleles in human pigmentation.
        Ann. N. Y. Acad. Sci. 2003; 994: 348-358
        • Liu F.
        • Struchalin M.V.
        • van Duijn K.
        • Hofman A.
        • Uitterlinden A.G.
        • van Duijn C.
        • Aulchenko Y.S.
        • Kayser M.
        Detecting low frequent loss-of-function alleles in genome wide association studies with red hair color as example.
        PLoS One. 2011; 6: e28145
        • Smith R.
        • Healy E.
        • Siddiqui S.
        • et al.
        Melanocortin 1 receptor variants in an Irish population.
        J. Invest. Dermatol. 1998; : 119-122
        • Branicki W.
        • Brudnik U.
        • Kupiec T.
        • Wolañska-Nowak P.
        • Wojas-Pelc A.
        Determination of phenotype associated SNPs in the MC1R gene.
        J. Forensic Sci. 2007; 52: 349-354
        • Visser M.
        • Kayser M.
        • Palstra R.-J.
        HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter.
        Genet. Res. 2012;
        • Liu F.
        • Wollstein A.
        • Hysi P.G.
        • et al.
        Digital quantification of human eye color highlights genetic association of three new loci.
        PLoS Genet. 2010; 6: e1000934
        • Richard S.
        A, A golden age of human pigmentation genetics.
        Trends Genet. 2006; 22: 464-468
        • Rees J.L.
        Genetics of hair and skin colour.
        Annu. Rev. Genet. 2003; 37: 67-90
        • Commo S.
        • Wakamatsu K.
        • Lozano I.
        • Panhard S.
        • Loussouarn G.
        • Bernard B.A.
        • Ito S.
        Age-dependent changes in eumelanin composition in hairs of various ethnic origins.
        Int. J. Cosmet. Sci. 2011; 34: 102-107
        • Zubakov D.
        • Liu F.
        • van Zelm M.C.
        • et al.
        Estimating human age from T-cell DNA rearrangements.
        Curr. Biol. 2010; 20: R970-R971
        • Pulker H.
        • Lareu M.V.
        • Phillips C.
        • Carracedo A.
        Finding genes that underlie physical traits of forensic interest using genetic tools.
        Forenic. Sci. Int. Genet. 2007; 1: 100-104
        • Harding R.M.
        • Healy E.
        • Ray A.J.
        • et al.
        Evidence for variable selective pressures at MC1R.
        Am. J. Hum. Genet. 2000; 66: 1351-1361
        • Yuasa I.
        • Umetsu K.
        • Harihara S.
        • et al.
        Distribution of two Asian-related coding SNPs in the MC1R and OCA2 genes.
        Biochem. Genet. 2007; 45: 535-542
        • Pneuman A.
        • Budimlija Z.M.
        • Caragine T.
        • Prinz M.
        • Wurmbach E.
        Verification of eye and skin color predictors in various populations.
        Leg. Med. 2012; 14: 78-83