1. Introduction
Prediction of externally visible characteristics (EVCs) of an individual solely based on genetic information, also referred to as DNA phenotyping or forensic DNA phenotyping (FDP), has become a focus in human genetic research and applications, such as in forensics, ancient DNA analysis and other areas. In forensic cases where conventional DNA-profiling methods, typically based on short tandem repeat (STR) markers, fail to identify the crime scene sample donor, because the evidential DNA-profile does not match the DNA-profile of any of the case suspects or anybody in the criminal offender DNA database, FDP may provide significant leads for police investigations to find unknown perpetrators [
1Forensic DNA Phenotyping: predicting human appearance from crime scene material for investigative purposes.
,
2Improving human forensics through advances in genetics, genomics and molecular biology.
,
3DNA-based prediction of human externally visible characteristics in forensics: motivations, scientific challenges, and ethical considerations.
]. In such cases, FDP can contribute significantly by narrowing down a potentially large number of putative sample donors to a smaller group of individuals that carry the FDP-derived EVC information on which the police can then focus with further investigation. Groups that do not carry such information can be left out from the police investigation. Thus far, for eye, hair and skin color various underlying genes have been identified, predictive DNA markers have been identified, DNA tests suitable for analyzing such genetic markers in forensic DNA samples and statistical prediction models have been developed [
4Eye color and the prediction of complex phenotypes from genotypes.
,
5W. Branicki et al. Model-based prediction of human hair color using DNA variants. 129(4) (2011): p. 443-454.
,
6Global skin colour prediction from DNA.
,
7Eye color prediction using single nucleotide polymorphisms in Saudi population.
,
8Further development of forensic eye color predictive tests.
,
9IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
,
10The common occurrence of epistasis in the determination of human pigmentation and its impact on DNA-based pigmentation phenotype prediction.
], and some of these DNA test systems have been forensically validated [
[9]IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
,
[11]The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA.
,
]. For traits such as freckles and hair structure, some associated genetic markers and the first predictive models have already been published, respectively [
13- Kukla-Bartoszek M.
- et al.
DNA-based predictive models for the presence of freckles.
,
14Prediction of male-pattern baldness from genotypes.
,
15E. Pospiech, et al. Towards broadening Forensic DNA Phenotyping beyond pigmentation: Improving the prediction of head hair shape from DNA (2018). 37: p. 241–251.
,
16Genetic determinants of freckle occurrence in the Spanish population: towards ephelides prediction from human DNA samples.
]; however, no forensically validated tool has been established so far. Prediction models for some other EVCs are currently under investigation [
17Genetic prediction of male pattern baldness.
,
18Evaluation of DNA variants associated with Androgenetic Alopecia and their potential to predict male pattern baldness.
,
19F. Peng et al. Genome-Wide Association Studies Identify Multiple Genetic Loci Influencing Eyebrow Color Variation in Europeans (2019). 139: p. 1601–1605.
,
20Common DNA variants predict tall stature in Europeans.
].
Categorical prediction of eye, hair and skin color is often based on multinomial logistic regression (MLR) using established genetic marker panels. For instance, the IrisPlex test and model for eye color prediction consists of a set of 6 single-nucleotide polymorphisms (SNPs) [
[4]Eye color and the prediction of complex phenotypes from genotypes.
,
[9]IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
,
[21]DNA-based eye colour prediction across Europe with the IrisPlex system.
]. Its extension to eye and hair color, the HIrisPlex test and model is based on 24 SNPs in total [
[11]The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA.
]. The latest extension is the HIrisPlex-S test and model, which consists of 41 SNPs and allows simultaneous prediction of eye, hair and skin color from a DNA sample [
]. All three prediction models are publicly available via
https://hirisplex.erasmusmc.nl/. An alternative statistical tool for the prediction of eye, hair and skin color from genotype data is offered by Snipper [
[8]Further development of forensic eye color predictive tests.
,
[22]Development of a forensic skin colour predictive test.
,
[23]Exploration of SNP variants affecting hair colour predictionin Europeans.
], which uses pairwise likelihood ratios to present prediction outcomes, while other pigmentation prediction tool models were also developed (see [
[24]- Katsara M.A.
- Nothnagel M.
True colors: a literature review on the spatial distribution of eye and hairpigmentation.
] for a review). While some of these models show high prediction accuracies for some pigmentation categories, more research is currently under way in order to improve existing tools, either by including more SNP predictors after they have been identified in large-scale gene mapping studies, or by using alternative prediction methods.
Bayesian classification is a statistical approach that considers the data-independent probability of each category, or class, as well as the data-derived likelihood that a given subject or object belongs to a particular category, and bases the classification decision upon these probabilities. More specifically, the Bayesian approach combines a prior probability distribution on the different categories with the density probabilities obtained from the observed samples, yielding the posterior distribution used to predict category, or class, membership of an individual or object [
[25]Discriminant Analysis and Statistical Pattern Recognition.
]. Prior probabilities for parameters may reflect previous evidence, but also purely subjective assessment or available information on these parameters from the past, before any evidence from the sample set at hand is considered. Incorporation of such prior knowledge in the data analysis may potentially increase the prediction accuracy, namely in situations where the prediction model does not include all causal genetic factors and where the environment contributes significantly to the trait variance via non-genetic factors. In both situations, trait prevalence-informed priors may then act as proxies for the yet unknown causal genetic factors and non-genetic factors in a population, group or region. In the framework of appearance DNA prediction, including FDP, inference of the biogeographic ancestry of an unknown DNA sample from which EVCs are to be predicted, together with the use of the trait class prevalence in such biogeographic ancestry group as prior in the EVC prediction model may improve the prediction accuracy. However, despite the already existing approaches for EVC prediction, the impact of trait prevalence priors on EVC prediction accuracies has not been investigated thus far.
For putting prior-based EVC prediction into practice within the concept of FDP, one would envision to first carry out forensic DNA ancestry testing on the unknown crime scene DNA samples and use the obtained ancestry outcome as guidance for allocating the appropriate trait class prevalence data for the EVC to be predicted, and finally use them as priors in EVC prediction. Based on the DNA-identified geographic region of ancestry of the tested DNA donor, allocated trait class prevalence data for different populations from such region would be averaged (or combined in another suitable way), in order to likely represent continental or sub-continental groups, and would then be used as priors for Bayesian EVC prediction on the same DNA sample previously used for ancestry testing. Alternatively, to avoid population averaging, DNA ancestry testing would need to be specific for a particular population, which not only requires the availability of trait prevalence data for such population but also the ability of forensic DNA ancestry to work on the population level.
Here, we assess the impact of incorporating prior knowledge on EVC trait prevalence in a Bayesian setting on improving the accuracy of DNA-based EVC prediction, but also potential pitfalls caused by misspecification of such prior probabilities. To this end, we consider EVCs such as eye, hair and skin color for which prior-free genetic prediction models have previously been established [
[9]IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
,
[11]The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA.
,
], but also traits such as hair structure and freckles for which the first prediction models were recently proposed without considering priors [
[13]- Kukla-Bartoszek M.
- et al.
DNA-based predictive models for the presence of freckles.
,
[15]E. Pospiech, et al. Towards broadening Forensic DNA Phenotyping beyond pigmentation: Improving the prediction of head hair shape from DNA (2018). 37: p. 241–251.
,
[16]Genetic determinants of freckle occurrence in the Spanish population: towards ephelides prediction from human DNA samples.
]. Given the sparsity or even lack of spatial or population-specific prevalence information available for each of these EVCs [
[24]- Katsara M.A.
- Nothnagel M.
True colors: a literature review on the spatial distribution of eye and hairpigmentation.
], we investigated the impact of prevalence-informed priors across a grid in the complete space of all possible values for each trait category, thereby emulating the (mis-)specification of the informative prior values. Prediction modelling was performed by applying previously proposed DNA predictors in datasets from different populations inside and outside of Europe. We report on standard prediction performance measures for each trait category separately and for all model measurements, and then compare prior-informed model-based prediction against prior-free models. Furthermore, we demonstrate the effect of priors on the overall prediction accuracy of the EVCs investigated.
4. Discussion
In the present study, we aimed at assessing the impact of using trait prevalence-informed priors on the prediction accuracy of an expanded set of EVCs, including eye, hair and skin color as well as hair structure and freckles. Our study was motivated by the question if such prior information, possibly representing trait class prevalence in biogeographic ancestry groups, may improve the prediction accuracy of traits over prior-free models. For all EVCs except freckles, we used for our models the same predictive markers as applied in the previously established prediction models [
[9]IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information.
,
[11]The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA.
,
,
[15]E. Pospiech, et al. Towards broadening Forensic DNA Phenotyping beyond pigmentation: Improving the prediction of head hair shape from DNA (2018). 37: p. 241–251.
]. Although due to data availability issues the number of predictors was lower in our freckles prediction modeling than previously [
[13]- Kukla-Bartoszek M.
- et al.
DNA-based predictive models for the presence of freckles.
] this discrepancy shall not affect our main outcomes for freckles significantly, since we applied the same reduced marker set to both the model with and without priors. Regarding the prior information, we surprisingly noticed that there is a limited spatial and population-specific trait prevalence information available for hair, skin and eye color, hair structure [
[24]- Katsara M.A.
- Nothnagel M.
True colors: a literature review on the spatial distribution of eye and hairpigmentation.
] and even non-existent for other traits such as freckles. We therefore exhaustively investigated the impact of the choice of prior values for the different trait categories on a fine-grained grid of all possible sets, or tupels, of values to obtain a general picture of the impact of priors on prediction performance. To this end, we trained and tested Bayesian versions of multinomial logistic regression (MLR) and binomial logistic regression (BLR) models, respectively, and compared their performance to the respective prior-free versions, using different trait-specific data sets.
Our results showed that the use of trait prevalence-informed priors can have a strong impact on the performance of the prediction models for the 5 EVCs tested. Such use carries some potential to improve the prediction of most EVCs and some of their categories compared to a prior-free approach, as evidenced by a substantial proportion of prior tupels with better performance statistics. However, we also found large proportions of prior tupels that led to inferior prediction results, indicating the risk that the misspecification of those priors may lead to a gross deterioration in the model performance. This deterioration could be explained by the fact that the true prevalence values are unknown. The prior-free approach is influenced by the proportions of the categories in the data set. Random splitting into separate training (80 %) and test (20 %) datasets, as performed here for all EVCs, resulted in approximately equal proportions for each of the trait categories in these two data sets, respectively. In consequence, the trained model was well adapted to the category proportions in the test data set, possibly leading to some over-fitting of the model. This may have led to a slight overestimation of the performance of the prior-free models. Accurate trait prevalence specification is of utmost importance to obtain reliable and accurate predictions. However, with the lack of such information, the application of prior-incorporating Bayesian approaches for EVC prediction in forensic cases appears not feasible at this stage.
Given the lack of spatial or population-specific prevalence information for the EVCs considered in this study, which represented a significant obstacle to our analysis, we were not able to compare the performance of prior-incorporating and prior-free approaches against a gold standard. As gold standard we should have had reliable population-representative prior values for all EVCs and their categories, which, however, are not available. Therefore, we explored the impact of priors across the whole space of possible tupels. Another possible interpretation of our approach, given the lack of knowledge about the underlying “truth” regarding the knowledge on trait prevalence over geographic space, is that the priors resemble differential costs for misclassification, which may also be an interesting future approach in forensic applications.
Little susceptibility of the prediction outcome to the choice of prior values, represented by likelihood ratio values of large magnitude compared to those of the priors, likely reflects a large extent of genetic determination of a trait or a particular trait category and that a large proportion of the causal genetic variants determining this trait, or at least their strongly correlated proxies, are already included in the prediction model [
[5]W. Branicki et al. Model-based prediction of human hair color using DNA variants. 129(4) (2011): p. 443-454.
,
37- Schaffer J.V.
- Bolognia J.L.
The melanocortin-1 receptor: red hair and beyond.
,
38Genetic variations associated with red hair color and fear of dental pain, anxiety regarding dental care and avoidance of dental care.
,
39A. Siewierska-Gorska et al. Association of five SNPs with human hair colour in the Polish population 68(2) (2017): p. 134–144.
]. This agrees with the statement of Caliebe et al. [
[33]Likelihood ratio and posterior odds in forensic genetics: two sides of the same coin.
] that trait prevalence values provide no (or little) additional information if all (or almost all) genetic trait-determining variants are included as predictors in the model, i.e. that the prediction is independent of the population. From all EVCs and their categories investigated here, red hair color prediction comes closest to this, as red hair is determined by only one gene,
MC1R, from which multiple DNA variants, most of them being non-synonymous DNA variants that are likely causal, are included in the hair color prediction model based on the HIrisPlex markers for hair color prediction used here. For complex traits or trait categories, however, dozens or even hundreds of genetic factors will contribute to the trait and usually only a fraction of them is known and included in the prediction model. It is assumed that all EVCs and EVC categories, including those tested here besides red hair are complex traits or trait categories determined by large numbers of genes, respectively. This was already demonstrated for hair and skin color based on large-scale genome-wide association studies (GWAS) [
[40]Genome-wide association meta-analysis of individuals of European ancestry identifies new loci explaining a substantial fraction of hair color variation and heritability.
,
[41]Genome-wide association study in 176,678 Europeans reveals genetic loci for tanning response to sun exposure.
], and therefore is also expected for eye color for which such a large-scale GWAS is currently pending. For hair shape and freckles the previous GWAS were not yet on such large scale, but those multiple genes that were successfully identified showed mostly small effect sizes and explained only a fraction of the estimated heritability [
[42]Meta-analysis of genome-wide association studies identifies 8 novel loci involved in shape variation of human head hair.
,
[43]Two newly identified genetic determinants of pigmentation in Europeans.
]; only large-size GWAS will be able to increase the explained heritably in the future. For complex phenotypes, use of prevalence values may actually increase prediction accuracy if specified correctly, because they contain information on, and can act as proxies for, those variants that also contribute but are not included in the model.
The strong dependency of prediction performance on priors for most traits and categories further reflects that many, if not most, predictions are made based on only moderately different posterior probabilities and, in turn, likelihoods do not differ strongly between the categories, because not all causal factors are yet known and could therefore be included in the prediction models. Use of priors may then easily shift classification decisions, thereby simply facilitating a trade-off between sensitivity and specificity as well as PPV and NPV in the absence of information on true trait prevalence values. Interestingly, the AUC appeared to be largely unaffected by changing prior tupels.
Both observations, the potential for prediction improvement by use of priors as well as the risk of inferior performance when those priors are mis-specified, motivate future studies. An important and preferable way would be to identify more causal genetic factors involved in EVC etiology, thereby obliterating the need for proxies of those causal factors. However, given their likely small and at most moderate effects, this would require very large data sets for future studies to identify such genetic variants. For instance, a recent GWAS on hair color tested more than 290,000 individuals in an European discovery dataset that led to the identification of 124 associated independent genetic loci at genome-wide significance, of which 111 were novel [
[40]Genome-wide association meta-analysis of individuals of European ancestry identifies new loci explaining a substantial fraction of hair color variation and heritability.
]. However, most of these DNA variants will not be causal themselves, because of the focus of commonly used SNP microarrays on markers that allow for good imputation of other, common markers (‘imputation backbone’), while providing only limited numbers of SNPs centered on gene regions or selected phenotypic relevance (‘contents enrichment’).
Another area for future research is to collect, for as many populations from as many geographic regions that are relevant based on the phenotypic variation of the EVCs to be predicted, trait prevalence data on the same or higher level of detail (e.g. categories) as achievable by DNA-based EVC prediction. However, even when such data are available, the use of forensic ancestry DNA testing to identify the geographic region for which EVC trait prevalence data are to be allocated for use as priors in EVC prediction will only be applicable, in case the prevalence values for different populations within such DNA-identified region do not show much variation, and if the regional geographic ancestry can be inferred with high confidence from the crime scene DNA sample. While collection of prevalence data may be achievable in the future, provided such studies are carried out with suitable geographic coverage and EVC phenotypic details, and given that regional such as continental ancestry inference based on enough DNA markers already is possible [
[44]The use of forensic DNA phenotyping in predicting appearance and biogeographic ancestry.
], the trait variation within DNA-identifiable geographic regions remains as problem. For instance, within Europe, which as continental region is identifiable with forensic DNA ancestry testing [
[44]The use of forensic DNA phenotyping in predicting appearance and biogeographic ancestry.
], eye and hair color prevalence values largely vary between populations from different parts of Europe. Thus, averaging such population prevalence values, if available, will not result in suitable priors for any person originating from any European population. This could only be solved by increasing the level of detail of DNA-based ancestry testing to the sub-regional or even population level, which currently, however, is not achievable and also is not expected to be achievable in the near future. Identifying genetic geographic population substructure within continents, such as within Europe [
[45]Correlation between genetic and geographic structure in Europe.
], requires thousands of autosomal SNPs – a number that currently cannot be achieved given available technologies that are suitable for forensic DNA analysis. The simultaneous and targeted analysis of many thousands of SNPs in low-quantity and low-quality DNA typically available from crime scene stains requires the development of new DNA technology in the future.
In summary, our results provide a first assessment of the impact of trait prevalence-informed priors on the prediction model performance for several EVCs. Incorporation of priors, possibly informed by trait class prevalence values in biogeographic ancestry groups, can improve the performance of predicting appearance traits, but a correct specification of those priors appears mandatory to protect against a deteriorated performance. Future work is needed to obtain unbiased estimates of trait prevalence for EVCs to be predicted in a large variety of populations, when mostly non-causal genetic markers are continued to being used for trait prediction. This need will be reinforced by future GWAS whose larger sample sizes will allow the detection of genetic markers with even smaller effect sizes, yet most of them likely being non-causal. Finally, appearance trait research has to overcome the assembly of ever more associated, yet non-causal genetic markers and, via experimental evidence, has to arrive at the identification of the actual causal genetic factors for EVCs. If successful, this will allow to achieve accurate EVC prediction in a population-independent way, eventually rendering the use of trait prevalence priors obsolete in the future.
Appendix.
Centres and investigators of the VISible Attributes through GEnomics (VISAGE) Consortium
Jagiellonian University (Poland): Wojciech Branicki, Ewelina Pośpiech, Aleksandra Pisarek.
Universidade de Santiago de Compostela (Spain): Ángel Carracedo, Maria Victoria Lareu, Christopher Phillips, Ana Freire-Aradas, Ana Mosquera-Miguel, María de la Puente.
Medizinische Universität Innsbruck (Austria): Walther Parson, Catarina Xavier, Antonia Heidegger, Harald Niederstätter.
Universität zu Köln (Germany): Michael Nothnagel, Maria-Alexandra Katsara, Tarek Khellaf.
King’s College London (United Kingdom): Barbara Prainsack, Gabrielle Samuel.
Klinikum der Universität zu Köln (Germany): Peter M. Schneider, Theresa E. Gross, Jan Fleckhaus.
Bundeskriminalamt (Germany): Ingo Bastisch, Nathalie Schury, Jens Teodoridis, Martina Unterländer.
Institut National De Police Scientifique (France): François-Xavier Laurent, Caroline Bouakaze, Yann Chantrel, Anna Delest, Clémence Hollard, Ayhan Ulus, Julien Vannier.
Netherlands Forensic Institute (Netherlands): Titia Sijen, Kris van der Gaag, Marina Ventayol-Garcia.
National Forensic Centre, Swedish Police Authority (Sweden): Johannes Hedman, Klara Junker, Maja Sidstedt.
Metropolitan Police Service, London (United Kingdom): Shazia Khan, Carole E. Ames, Andrew Revoir.
Centralne Laboratorium Kryminalistyczne Policji (Poland): Magdalena Spólnicka, Ewa Kartasińska, Anna Woźniak.