1. Introduction
Epigenetics plays a key role in the control of gene expression [
[1]Epigenetics and gene expression.
]. Epigenetic signatures affecting this molecular process are reversible, act in cascade or network and affect DNA regulation without altering the underlying DNA sequence [
[2]- Riggs A.
- Russo V.
- Martienssen R.
Epigenetic Mechanisms Of Gene Regulation.
]. Four main categories of epigenetic marks have been described: chromatin remodeling [
[3]- Saha A.
- Wittmeyer J.
- Cairns B.R.
Chromatin remodelling: the industrial revolution of DNA around histones.
], post-translational histone modifications [
[4]The language of covalent histone modifications.
], non-coding RNAs [
[5]Epigenetic regulation by long noncoding RNAs.
] and DNA methylation [
[6]Functions of DNA methylation: islands, start sites, gene bodies and beyond.
]; with the latter the most widely studied so far. DNA methylation is the addition of a methyl group in the 5′ carbon of those cytosine residues predominantly located in CpG dinucleotides, that generally contributes to gene silencing [
7DNA methylation: roles in mammalian development.
,
8Function and information content of DNA methylation.
]. A plethora of genome-wide studies have shed light on the DNA methylation process during the last ten years, many of them targeting CpG sites correlated with individual age [
9- Rakyan V.K.
- Down T.A.
- Maslau S.
- Andrew T.
- Yang T.P.
- Beyan H.
- et al.
Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains.
,
10- Bocklandt S.
- Lin W.
- Sehl M.
- Sánchez F.
- Sinsheimer J.
- Horvath S.
- et al.
Epigenetic Predictor of Age.
,
11- Garagnani P.
- Bacalini M.G.
- Pirazzini C.
- Gori D.
- Giuliani C.
- Mari D.
- et al.
Methylation of ELOVL2 gene as a new epigenetic marker of age.
,
12- Hannum G.
- Guinney J.
- Zhao L.
- Zhang L.
- Hughes G.
- Sadda S.
- et al.
Genome-wide methylation profiles reveal quantitative views of human aging rates.
,
13DNA methylation age of human tissues and cell types.
,
14- Johansson Å.
- Enroth S.
- Gyllensten U.
Continuous aging of the human DNA methylome throughout the human lifespan.
,
15- Florath I.
- Butterbach K.
- Müller H.
- Bewerunge-hudler M.
- Brenner H.
Cross-sectional and longitudinal changes in DNA methylation with age: An epigenome-wide analysis revealing over 60 novel age-associated CpG sites.
,
,
17- Merid S.K.
- Novoloaca A.
- Sharp G.C.
- Küpers L.K.
- Kho A.T.
- Roy R.
- et al.
Epigenome-wide meta-analysis of blood DNA methylation in newborns and children identifies numerous loci related to gestational age.
]. Gradual age-correlated hyper- and hypomethylation patterns have been observed in the human genome [
[18]Aging and DNA methylation.
]. Based on these observed correlations, a new concept termed “epigenetic age” emerged. Epigenetic age refers either to chronological or biological age depending on the marker set used. Additionally, depending on the individual’s lifestyle and/or presence of disease, chronological or biological age might match or differ in scale. While chronological age has been proved to be useful in a forensic context [
[19]- Freire-Aradas A.
- Phillips C.
- Lareu M.V.
Forensic individual age estimation with DNA: From initial approaches to methylation tests.
], biological age might also be used to monitor the progress of a person with illness or undergoing treatment for a medical condition [
20- Levine M.E.
- Lu A.T.
- Quach A.
- Chen B.H.
- Assimes T.L.
- Bandinelli S.
- et al.
An epigenetic biomarker of aging for lifespan and healthspan.
,
21- Lu A.T.
- Quach A.
- Wilson J.G.
- Reiner A.P.
- Aviv A.
- Raj K.
- et al.
DNA methylation GrimAge strongly predicts lifespan and healthspan.
,
22- Noroozi R.
- Ghafouri-Fard S.
- Pisarek A.
- Rudnicka J.
- Spólnicka M.
- Branicki W.
- et al.
DNA methylation-based age clocks: From age prediction to age reversion.
].
A universal epigenetic clock was proposed by Horvath in 2013 [
[13]DNA methylation age of human tissues and cell types.
]. In spite of the advantages that such an age prediction model presented, covering multiple tissues in donors that ranged in age from newborns to centenarians and trained on more than 7000 control individuals, the analysis of more than 300 CpG sites hampered its application in platforms apart from Illumina HumanMethylation Beadchips [
[23]- Bibikova M.
- Barnes B.
- Tsan C.
- Ho V.
- Klotzle B.
- Le J.M.
- et al.
]. To apply epigenetic clocks using alternative DNA methylation technologies, a substantial reduction of markers has been the strategy of choice for forensic applications [
[24]- Weidner C.I.
- Lin Q.
- Koch C.M.
- Eisele L.
- Beier F.
- Ziegler P.
- et al.
Aging of blood can be tracked by DNA methylation changes at just three CpG sites.
].
In recent years, multiple age prediction models have been developed for forensic analysis using a reduced number of CpG sites. These epigenetic clocks were designed targeting multiple forensic tissues: blood [
25- Zbieć-Piekarska R.
- Spólnicka M.
- Kupiec T.
- Parys-Proszek A.
- Makowska Z.
- Pałeczka A.
- et al.
Development of a forensically useful age prediction method based on DNA methylation analysis.
,
26- Freire-Aradas A.
- Phillips C.
- Mosquera-Miguel A.
- Girón-Santamaría L.
- Gómez-Tato A.
Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
], saliva [
27- Hong S.R.
- Jung S.E.
- Lee E.H.
- Shin K.J.
- Yang W.I.
- Lee H.Y.
DNA methylation-based age prediction from saliva: High age predictability by combination of 7 CpG markers.
,
28- Jung S.E.
- Lim S.M.
- Hong S.R.
- Lee E.H.
- Shin K.J.
- Lee H.Y.
], semen [
[29]- Lee H.Y.
- Jung S.E.
- Oh Y.N.
- Choi A.
- Yang W.I.
- Shin K.J.
Epigenetic age signatures in the forensically relevant body fluid of semen: A preliminary study.
], teeth [
[30]- Bekaert B.
- Kamalandua A.
- Zapico S.C.
- Van De Voorde W.
- Decorte R.
Improved age determination of blood and teeth samples using a selected set of DNA methylation markers.
] and bones [
[31]- Lee H.Y.
- Hong S.R.
- Lee J.E.
- Hwang I.K.
- Kim N.Y.
- Lee J.M.
- et al.
Epigenetic age signatures in bones.
]; using a variety of technologies: Pyrosequencing [
24- Weidner C.I.
- Lin Q.
- Koch C.M.
- Eisele L.
- Beier F.
- Ziegler P.
- et al.
Aging of blood can be tracked by DNA methylation changes at just three CpG sites.
,
32- Fleckhaus J.
- Schneider P.M.
Novel multiplex strategy for DNA methylation-based age prediction from small amounts of DNA via Pyrosequencing.
], EpiTYPER [
26- Freire-Aradas A.
- Phillips C.
- Mosquera-Miguel A.
- Girón-Santamaría L.
- Gómez-Tato A.
Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
,
33- Zubakov D.
- Liu F.
- Kokmeijer I.
- Choi Y.
- van Meurs J.B.J.
- van IJcken W.F.J.
- et al.
Human age estimation from blood using mRNA, DNA methylation, DNA rearrangement, and telomere length.
], SNaPshot [
27- Hong S.R.
- Jung S.E.
- Lee E.H.
- Shin K.J.
- Yang W.I.
- Lee H.Y.
DNA methylation-based age prediction from saliva: High age predictability by combination of 7 CpG markers.
,
28- Jung S.E.
- Lim S.M.
- Hong S.R.
- Lee E.H.
- Shin K.J.
- Lee H.Y.
] or Massively Parallel Sequencing [
34- Aliferi A.
- Ballard D.
- Gallidabino M.D.
- Thurtle H.
- Barron L.
- Syndercombe Court D.
DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.
,
35- Heidegger A.
- Xavier C.
- Niederstätter H.
- de la Puente M.
- Pośpiech E.
- Pisarek A.
- et al.
Development and optimization of the VISAGE basic prototype tool for forensic age estimation.
,
36- Woźniak A.
- Heidegger A.
- Piniewska-róg D.
- Pośpiech E.
- Pisarek A.
- Kartasińska E.
- et al.
Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
,
37- Aliferi A.
- Sundaram S.
- Ballard D.
- Freire-Aradas A.
- Phillips C.
- Lareu M.V.
- et al.
Combining current knowledge on DNA methylation-based age estimation towards the development of a superior forensic DNA intelligence tool.
]; and applying different statistical models, including linear regression [
[25]- Zbieć-Piekarska R.
- Spólnicka M.
- Kupiec T.
- Parys-Proszek A.
- Makowska Z.
- Pałeczka A.
- et al.
Development of a forensically useful age prediction method based on DNA methylation analysis.
], quantile regression [
[38]- Smeers I.
- Decorte R.
- Van de Voorde W.
- Bekaert B.
Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation.
], support vector machine [
[39]- Xu C.
- Qu H.
- Wang G.
- Xie B.
- Shi Y.
- Yang Y.
- et al.
A novel strategy for forensic age prediction by DNA methylation and support vector regression model.
] or artificial neural networks [
[40]- Vidaki A.
- Ballard D.
- Aliferi A.
- Miller T.H.
- Barron L.P.
- Syndercombe Court D.
DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing.
]; as well as covering different age ranges: adults [
[41]- Naue J.
- Hoefsloot H.C.J.
- Mook O.R.F.
- Rijlaarsdam-Hoekstra L.
- van der Zwalm M.C.H.
- Henneman P.
- et al.
Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.
] and children [
[42]- Freire-Aradas A.
- Phillips C.
- Girón-Santamaría L.
- Mosquera-Miguel A.
- Gómez-Tato A.
- Casares de Cal M.Á.
- et al.
Tracking age-correlated DNA methylation markers in the young.
]. Common to all of them is the use of a reduced number of markers, from 3 to 16 CpG sites.
Most age prediction models published to date mainly cover adult samples, with subjects below adult ages consistently underrepresented. Differences have been observed between children and adults in terms of epigenetic changes. DNA methylation patterns for some CpG sites, reveal a logarithmic dependence until adulthood that slows to a linear dependence later in life, as depicted by Horvath [
[13]DNA methylation age of human tissues and cell types.
]. This increased variation of epigenetic states during the early stages of life could be explained by the rapid maturation of the immune system at this period [
[43]- Alisch R.S.
- Barwick B.G.
- Chopra P.
- Myrick L.K.
- Satten G.A.
- Conneely K.N.
- et al.
Age-associated DNA methylation in pediatric populations.
]. Nevertheless, some CpG sites present a linear or quasi-linear pattern of gradual DNA methylation changes from childhood to very old age, which makes these the most suitable epigenetic biomarkers for establishing a common age prediction model that includes all age ranges, that statistically can be treated in a unified way.
To develop a common epigenetic clock covering the whole lifetime of a person, inter-individual variability should be also considered. Since epigenetics is the result of environmental interaction with genetics, individuals presenting similar chronological ages can be represented by multiple scenarios [
[44]- Fraga M.F.
- Ballestar E.
- Paz M.F.
- Ropero S.
- Setien F.
- Ballestar M.L.
- et al.
Epigenetic differences arise during the lifetime of monozygotic twins.
], including potential differences among populations [
[45]- Cho S.
- Jung S.E.
- Hong S.R.
- Lee E.H.
- Lee J.H.
- Lee S.D.
- et al.
Independent validation of DNA-based approaches for age prediction in blood.
]. Age prediction models reported so far have been trained using dozens to hundreds of volunteers, but no minimum sample size has been established to date.
In the present study, a common epigenetic clock for all human ages – from children to centenarians – was developed using seven CpG sites detected using EpiTYPER® technology. A total of 895 Spanish blood DNA samples ranging from 2 to 104 years old were trained exploring three statistical models. K-fold cross-validation was used for validation purposes, as well as an independent testing set composed of 152 Spanish individuals from 3 to 69 years old. Additionally, an optimal training set size was calculated assessing dimensionality reduction based on stepwise-reduced training sets from a total of 895 to 99 individuals.
4. Discussion
Age estimation is a DNA intelligence tool aiming to provide additional information to the genetic profile at different scenarios, which can comprise: i) individual identification, ii) mass disaster screening, iii) forensic anthropology and iv) legal disputes about age. Subsequently, the development of epigenetic clocks is being implemented in forensic practice as a supplementary analysis for individual age prediction. In general terms, most of the forensic age prediction models to date have been based on adult samples [
28- Jung S.E.
- Lim S.M.
- Hong S.R.
- Lee E.H.
- Shin K.J.
- Lee H.Y.
,
41- Naue J.
- Hoefsloot H.C.J.
- Mook O.R.F.
- Rijlaarsdam-Hoekstra L.
- van der Zwalm M.C.H.
- Henneman P.
- et al.
Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.
], but minors and the variation in methylation patterns they show, must also be taken into account in the development of universally-applicable forensic analyses. Horvath’s clock, although including both adults and minors, presents a different statistical treatment for both age ranges. While for ages below 20 years, a logarithmic transformation was applied, an untransformed linear model was used for ages above 20 years [
[13]DNA methylation age of human tissues and cell types.
]. This logarithmic transformation is due to an exponential change on the DNA methylation levels at early stages of the individual’s life [
[43]- Alisch R.S.
- Barwick B.G.
- Chopra P.
- Myrick L.K.
- Satten G.A.
- Conneely K.N.
- et al.
Age-associated DNA methylation in pediatric populations.
]. Despite the high level of coverage of all ages used to develop Horvath’s model, it is based on an impractically large number of markers – 353 CpG sites – representing a major drawback for forensic testing, due to the poor quality and/or quantity of DNA associated with most casework samples.
Minors have already been included in certain previous forensic epigenetic clocks, but these cover dispersed datapoints for age ranges under 18 years [
24- Weidner C.I.
- Lin Q.
- Koch C.M.
- Eisele L.
- Beier F.
- Ziegler P.
- et al.
Aging of blood can be tracked by DNA methylation changes at just three CpG sites.
,
25- Zbieć-Piekarska R.
- Spólnicka M.
- Kupiec T.
- Parys-Proszek A.
- Makowska Z.
- Pałeczka A.
- et al.
Development of a forensically useful age prediction method based on DNA methylation analysis.
,
34- Aliferi A.
- Ballard D.
- Gallidabino M.D.
- Thurtle H.
- Barron L.
- Syndercombe Court D.
DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.
]. To improve this area of study, we developed a specific age prediction model for children and adolescents [
[42]- Freire-Aradas A.
- Phillips C.
- Girón-Santamaría L.
- Mosquera-Miguel A.
- Gómez-Tato A.
- Casares de Cal M.Á.
- et al.
Tracking age-correlated DNA methylation markers in the young.
]. However, when a biological sample is found, no information is generally available to know if the donor is a minor or an adult. Therefore, the most useful epigenetic clock will be one unifying both age ranges into a single test model. A recent study from Wozniak et al., considered the whole range of ages from 1 to 75 years old (N = 112) to build a novel age prediction model [
[36]- Woźniak A.
- Heidegger A.
- Piniewska-róg D.
- Pośpiech E.
- Pisarek A.
- Kartasińska E.
- et al.
Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
], obtaining an MAE of ± 3.2 years for blood samples. In Wozniak’s model, minors were represented from 0 to 18 years old with about one individual per year. Following this study, we aimed a step further by covering as much as possible the potential for inter-individual epigenetic variability. This was achieved by covering the fullest interval (2–104 years old) with approximately ten individuals per age (N = 895). By analyzing previously developed CpG sites in the seven genomic regions of
ELOVL2,
ASPA,
PDE4C,
FHL2,
CCDC102B,
MIR29B2CHG and chr16:85395429, using EpiTYPER technology, we were able to retain a robust and efficient age prediction model [
[26]- Freire-Aradas A.
- Phillips C.
- Mosquera-Miguel A.
- Girón-Santamaría L.
- Gómez-Tato A.
Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
], overlapping four of these loci with the recently released VISAGE Enhanced Tool for age prediction [
[36]- Woźniak A.
- Heidegger A.
- Piniewska-róg D.
- Pośpiech E.
- Pisarek A.
- Kartasińska E.
- et al.
Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
]. Additional CpG sites especially informative at childhood and adolescence had also been considered for analysis, such as
KCNAB3 [
[42]- Freire-Aradas A.
- Phillips C.
- Girón-Santamaría L.
- Mosquera-Miguel A.
- Gómez-Tato A.
- Casares de Cal M.Á.
- et al.
Tracking age-correlated DNA methylation markers in the young.
]. However, aiming to build a single epigenetic clock covering all ages from childhood to the old age, a marker such as
KCNAB3 was discarded for analysis, because no linear correlation of DNA methylation levels with chronological age is maintained in this marker across all ages, exponentially increasing during childhood but presenting much more stable levels across adulthood (see
Fig. 1 at [
[42]- Freire-Aradas A.
- Phillips C.
- Girón-Santamaría L.
- Mosquera-Miguel A.
- Gómez-Tato A.
- Casares de Cal M.Á.
- et al.
Tracking age-correlated DNA methylation markers in the young.
]). This extreme lack of linearity between both age groups prevents the inclusion of this marker into a common epigenetic clock.
In addition to marker selection, although the age range and sample size of the training set were key factors in adapting our age prediction model, the underlying statistical model used also plays an important role. To date, application of linear regression [
24- Weidner C.I.
- Lin Q.
- Koch C.M.
- Eisele L.
- Beier F.
- Ziegler P.
- et al.
Aging of blood can be tracked by DNA methylation changes at just three CpG sites.
,
25- Zbieć-Piekarska R.
- Spólnicka M.
- Kupiec T.
- Parys-Proszek A.
- Makowska Z.
- Pałeczka A.
- et al.
Development of a forensically useful age prediction method based on DNA methylation analysis.
,
28- Jung S.E.
- Lim S.M.
- Hong S.R.
- Lee E.H.
- Shin K.J.
- Lee H.Y.
,
36- Woźniak A.
- Heidegger A.
- Piniewska-róg D.
- Pośpiech E.
- Pisarek A.
- Kartasińska E.
- et al.
Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
,
52- Eipel M.
- Mayer F.
- Arent T.
- Ferreira M.R.P.
- Birkhofer C.
- Gerstenmaier U.
- et al.
Epigenetic age predictions based on buccal swabs are more precise in combination with cell type-specific DNA methylation signatures.
] has been widely accepted. Since DNA methylation is quantitative in nature and gradually changes through the individual’s lifetime, linear regression models fit well with age estimation based on this epigenetic signature. Quadratic regression models or power transformations have also been applied in cases where the change of the DNA methylation levels with chronological age demonstrates non-linear patterns [
30- Bekaert B.
- Kamalandua A.
- Zapico S.C.
- Van De Voorde W.
- Decorte R.
Improved age determination of blood and teeth samples using a selected set of DNA methylation markers.
,
36- Woźniak A.
- Heidegger A.
- Piniewska-róg D.
- Pośpiech E.
- Pisarek A.
- Kartasińska E.
- et al.
Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
]. Recently, novel statistical tools based on machine learning have been introduced [
34- Aliferi A.
- Ballard D.
- Gallidabino M.D.
- Thurtle H.
- Barron L.
- Syndercombe Court D.
DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.
,
37- Aliferi A.
- Sundaram S.
- Ballard D.
- Freire-Aradas A.
- Phillips C.
- Lareu M.V.
- et al.
Combining current knowledge on DNA methylation-based age estimation towards the development of a superior forensic DNA intelligence tool.
,
41- Naue J.
- Hoefsloot H.C.J.
- Mook O.R.F.
- Rijlaarsdam-Hoekstra L.
- van der Zwalm M.C.H.
- Henneman P.
- et al.
Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.
,
53- Spólnicka M.
- Pośpiech E.
- Pepłońska B.
- Zbieć-Piekarska R.
- Makowska
- Pięta A.
- et al.
DNA methylation in ELOVL2 and C1orf132 correctly predicted chronological age of individuals from three disease groups.
]. Common to all these models is the fact that the error obtained is unique, independent of the age of the sample donor, and should be applied to whatever predicted age is achieved. Nevertheless, DNA methylation data consistently shows that young ages are better predicted than old ages and this has been observed in our dataset as well.
Fig. 2 depicts the epigenetic age versus the chronological age for all the individuals from 2 to 104 years old. While the youngest subjects (under 20 years) have datapoints very closely positioned together, this pattern gradually changes until the oldest samples (over 80 years) that show the highest dispersion between datapoints. Inter-individual epigenetic variation is expected since epigenetics derives from an interaction between genetics and environment. Subsequently, when the longest period of time that two age-matched individuals have been exposed to different external factors applies, then major epigenetic differences will be encountered between them, in contrast to the earliest stages on life. In order to improve the accuracy of predictions, specific age-dependent errors could be applied if using statistical models based on quantile regression [
26- Freire-Aradas A.
- Phillips C.
- Mosquera-Miguel A.
- Girón-Santamaría L.
- Gómez-Tato A.
Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
,
38- Smeers I.
- Decorte R.
- Van de Voorde W.
- Bekaert B.
Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation.
]. Inter-individual epigenetic variation could also occur among populations being affected by different environmental conditions at which the individuals are exposed to. In our study, we used a Spanish cohort in order to build and validate the age prediction model proposed. Further validation will be needed to demonstrate that our model can be used at different worldwide population groups.
In the present study, the QR, QRNN and QRSVM statistical prediction models have been tested for age estimation. The highest accuracy was obtained for QRNN in terms of MAE ( ± 3.36) and %CP ± PI (81.45%). Nevertheless, no statistical differences (p-value >0.05) were found between QRNN and QRSVM. Therefore, QRNN and QRSVM were both selected as the most accurate age prediction models and subsequent analyses were constraint to these two methods. Validation of the models with an independent set of samples (N = 152) produced similar results (MAE: ± 3.32, ± 3.45 and %CP ± PI: 76.32%, 77.63%; for QRNN and QRSVM, respectively). Nevertheless, since the testing set was restricted to 69 years old, further analysis of older samples should be required in order to validate these results in old age. Errors obtained in the present work were similar to previous models [
25- Zbieć-Piekarska R.
- Spólnicka M.
- Kupiec T.
- Parys-Proszek A.
- Makowska Z.
- Pałeczka A.
- et al.
Development of a forensically useful age prediction method based on DNA methylation analysis.
,
28- Jung S.E.
- Lim S.M.
- Hong S.R.
- Lee E.H.
- Shin K.J.
- Lee H.Y.
,
34- Aliferi A.
- Ballard D.
- Gallidabino M.D.
- Thurtle H.
- Barron L.
- Syndercombe Court D.
DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.
,
41- Naue J.
- Hoefsloot H.C.J.
- Mook O.R.F.
- Rijlaarsdam-Hoekstra L.
- van der Zwalm M.C.H.
- Henneman P.
- et al.
Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.
] (MAE: ± 3.9, ± 3.48, ± 4.1 and ± 3.24, respectively); nevertheless, these errors were fixed to whatever age was predicted. The main advantage of using age-dependent errors, such as those from quantile regression models used here, is to be able to narrow down the errors at early stages of life and to increase them at older ages, where inter-individual epigenetic variability plays an important role.
Finally, since the sample size of the training set is considered a key factor in developing an accurate age prediction model, the more samples are included, the more inter-individual epigenetic variation can be properly gauged, resulting in lower errors and a higher number of correct classifications. The stepwise dimensionality reduction we performed, taking into account that when quantiles are applied (q10 and q90), ~80% of samples should be correctly predicted, indicated Ntrain= 552 (6 individuals per year of age) gave an optimum balance between sample size and predictive accuracy. The sample size of six individuals per year provided for the training set, a MAE of ± 3.82 for QRNN and ± 3.69 for QRSVM. Correct classification rates were 80.96% for QRNN and 79.51% for QRSVM. At this analysis, it is important to note that, QRSVM showed to be less susceptible to shrinking of the training set than QRNN, therefore, in case of low number of samples, it could be used preferably. Similar metrics to the training were obtained for the corresponding testing set (MAE: ± 3.03, ± 3.56 and %CP ± PI: 84.87%, 78.29% for QRNN and QRSVM, respectively). However, patterns displayed by the testing set when tested under some of the stepwise-reduced training sets didn’t follow exactly the same pattern as the corresponding training sets by themselves. This could be explained due to a reduced age range on the testing set (3–69 years old) in comparison to the training sets (2–104 years old). In summary, to cover a maximum level of inter-individual variability, six individuals per year of age is recommended for the development of future epigenetic clocks which aim to cover the complete range of human ages.
As a final remark, it should be taken into account that the underlying data used for building the age prediction models developed under this study have been generated using EpiTYPER technology, a system that uses high quantities of genomic DNA (300 ng). In order to directly apply these models to forensic specimens usually presenting low quality and/or quantity of DNA, a step further will be to implement these age predictors on forensic technologies such as SNaPshot or Massively Parallel Sequencing, systems able to handle minor amounts of genomic DNA for methylation analyses of forensic casework.
Article info
Publication history
Published online: June 24, 2022
Accepted:
June 23,
2022
Received in revised form:
June 22,
2022
Received:
December 21,
2021
Copyright
© 2022 The Author(s). Published by Elsevier B.V.