Advertisement

A common epigenetic clock from childhood to old age

Open AccessPublished:June 24, 2022DOI:https://doi.org/10.1016/j.fsigen.2022.102743

      Highlights

      • A new age prediction model covering a full spectrum of human ages from childhood to old age has been developed.
      • DNA methylation levels in seven CpG sites analyzed using EpiTYPER® technology on blood samples led to MAEs of ± 3.32 and ± 3.45 years using quantile regression neural network (QRNN) and quantile regression support vector machine (QRSVM).
      • Analysis of dimensionality reduction based on stepwise reduced training sets proposed a minimum six individuals per year of age balanced highest age prediction accuracy and efficient coverage of the broadest inter-individual variability.

      Abstract

      Forensic age estimation is a DNA intelligence tool that forms an important part of Forensic DNA Phenotyping. Criminal cases with no suspects or with unsuccessful matches in searches on DNA databases; human identification analyses in mass disasters; anthropological studies or legal disputes; all benefit from age estimation to gain investigative leads. Several age prediction models have been developed to date based on DNA methylation. Although different DNA methylation technologies as well as diverse statistical methods have been proposed, most of them are based on blood samples and mainly restricted to adult age ranges. In the current study, we present an extended age prediction model based on 895 evenly distributed Spanish DNA blood samples from 2 to 104 years old. DNA methylation levels were detected using Agena Bioscience EpiTYPER® technology for a total of seven CpG sites located at seven genomic regions: ELOVL2ASPAPDE4CFHL2CCDC102BMIR29B2CHG and chr16:85395429 (GRCh38). The accuracy of the age prediction system was tested by comparing three statistical methods: quantile regression (QR), quantile regression neural network (QRNN) and quantile regression support vector machine (QRSVM). The most accurate predictions were obtained when using QRNN or QRSVM (mean absolute prediction error, MAE of ± 3.36 and ± 3.41, respectively). Validation of the models with an independent Spanish testing set (N = 152) provided similar accuracies for both methods (MAE: ± 3.32 and ± 3.45, respectively). The main advantage of using quantile regression statistical tools lies in obtaining age-dependent prediction intervals, fitting the error to the estimated age. An additional analysis of dimensionality reduction shows a direct correlation of increased error and a reduction of correct classifications as the training sample size is reduced. Results indicated that a minimum sample size of six samples per year-of-age covered by the training set is recommended to efficiently capture the most inter-individual variability..

      Keywords

      1. Introduction

      Epigenetics plays a key role in the control of gene expression [
      • Gibney E.R.
      • Nolan C.M.
      Epigenetics and gene expression.
      ]. Epigenetic signatures affecting this molecular process are reversible, act in cascade or network and affect DNA regulation without altering the underlying DNA sequence [
      • Riggs A.
      • Russo V.
      • Martienssen R.
      Epigenetic Mechanisms Of Gene Regulation.
      ]. Four main categories of epigenetic marks have been described: chromatin remodeling [
      • Saha A.
      • Wittmeyer J.
      • Cairns B.R.
      Chromatin remodelling: the industrial revolution of DNA around histones.
      ], post-translational histone modifications [
      • Strahl B.D.
      • Allis C.D.
      The language of covalent histone modifications.
      ], non-coding RNAs [
      • Lee J.T.
      Epigenetic regulation by long noncoding RNAs.
      ] and DNA methylation [
      • Jones P.A.
      Functions of DNA methylation: islands, start sites, gene bodies and beyond.
      ]; with the latter the most widely studied so far. DNA methylation is the addition of a methyl group in the 5′ carbon of those cytosine residues predominantly located in CpG dinucleotides, that generally contributes to gene silencing [
      • Smith Z.D.
      • Meissner A.
      DNA methylation: roles in mammalian development.
      ,
      • Schübeler D.
      Function and information content of DNA methylation.
      ]. A plethora of genome-wide studies have shed light on the DNA methylation process during the last ten years, many of them targeting CpG sites correlated with individual age [
      • Rakyan V.K.
      • Down T.A.
      • Maslau S.
      • Andrew T.
      • Yang T.P.
      • Beyan H.
      • et al.
      Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains.
      ,
      • Bocklandt S.
      • Lin W.
      • Sehl M.
      • Sánchez F.
      • Sinsheimer J.
      • Horvath S.
      • et al.
      Epigenetic Predictor of Age.
      ,
      • Garagnani P.
      • Bacalini M.G.
      • Pirazzini C.
      • Gori D.
      • Giuliani C.
      • Mari D.
      • et al.
      Methylation of ELOVL2 gene as a new epigenetic marker of age.
      ,
      • Hannum G.
      • Guinney J.
      • Zhao L.
      • Zhang L.
      • Hughes G.
      • Sadda S.
      • et al.
      Genome-wide methylation profiles reveal quantitative views of human aging rates.
      ,
      • Horvath S.
      DNA methylation age of human tissues and cell types.
      ,
      • Johansson Å.
      • Enroth S.
      • Gyllensten U.
      Continuous aging of the human DNA methylome throughout the human lifespan.
      ,
      • Florath I.
      • Butterbach K.
      • Müller H.
      • Bewerunge-hudler M.
      • Brenner H.
      Cross-sectional and longitudinal changes in DNA methylation with age: An epigenome-wide analysis revealing over 60 novel age-associated CpG sites.
      ,
      • Alsaleh H.
      • Haddrill P.R.
      ,
      • Merid S.K.
      • Novoloaca A.
      • Sharp G.C.
      • Küpers L.K.
      • Kho A.T.
      • Roy R.
      • et al.
      Epigenome-wide meta-analysis of blood DNA methylation in newborns and children identifies numerous loci related to gestational age.
      ]. Gradual age-correlated hyper- and hypomethylation patterns have been observed in the human genome [
      • Jung M.
      • Pfeifer G.P.
      Aging and DNA methylation.
      ]. Based on these observed correlations, a new concept termed “epigenetic age” emerged. Epigenetic age refers either to chronological or biological age depending on the marker set used. Additionally, depending on the individual’s lifestyle and/or presence of disease, chronological or biological age might match or differ in scale. While chronological age has been proved to be useful in a forensic context [
      • Freire-Aradas A.
      • Phillips C.
      • Lareu M.V.
      Forensic individual age estimation with DNA: From initial approaches to methylation tests.
      ], biological age might also be used to monitor the progress of a person with illness or undergoing treatment for a medical condition [
      • Levine M.E.
      • Lu A.T.
      • Quach A.
      • Chen B.H.
      • Assimes T.L.
      • Bandinelli S.
      • et al.
      An epigenetic biomarker of aging for lifespan and healthspan.
      ,
      • Lu A.T.
      • Quach A.
      • Wilson J.G.
      • Reiner A.P.
      • Aviv A.
      • Raj K.
      • et al.
      DNA methylation GrimAge strongly predicts lifespan and healthspan.
      ,
      • Noroozi R.
      • Ghafouri-Fard S.
      • Pisarek A.
      • Rudnicka J.
      • Spólnicka M.
      • Branicki W.
      • et al.
      DNA methylation-based age clocks: From age prediction to age reversion.
      ].
      A universal epigenetic clock was proposed by Horvath in 2013 [
      • Horvath S.
      DNA methylation age of human tissues and cell types.
      ]. In spite of the advantages that such an age prediction model presented, covering multiple tissues in donors that ranged in age from newborns to centenarians and trained on more than 7000 control individuals, the analysis of more than 300 CpG sites hampered its application in platforms apart from Illumina HumanMethylation Beadchips [
      • Bibikova M.
      • Barnes B.
      • Tsan C.
      • Ho V.
      • Klotzle B.
      • Le J.M.
      • et al.
      ]. To apply epigenetic clocks using alternative DNA methylation technologies, a substantial reduction of markers has been the strategy of choice for forensic applications [
      • Weidner C.I.
      • Lin Q.
      • Koch C.M.
      • Eisele L.
      • Beier F.
      • Ziegler P.
      • et al.
      Aging of blood can be tracked by DNA methylation changes at just three CpG sites.
      ].
      In recent years, multiple age prediction models have been developed for forensic analysis using a reduced number of CpG sites. These epigenetic clocks were designed targeting multiple forensic tissues: blood [
      • Zbieć-Piekarska R.
      • Spólnicka M.
      • Kupiec T.
      • Parys-Proszek A.
      • Makowska Z.
      • Pałeczka A.
      • et al.
      Development of a forensically useful age prediction method based on DNA methylation analysis.
      ,
      • Freire-Aradas A.
      • Phillips C.
      • Mosquera-Miguel A.
      • Girón-Santamaría L.
      • Gómez-Tato A.
      Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
      ], saliva [
      • Hong S.R.
      • Jung S.E.
      • Lee E.H.
      • Shin K.J.
      • Yang W.I.
      • Lee H.Y.
      DNA methylation-based age prediction from saliva: High age predictability by combination of 7 CpG markers.
      ,
      • Jung S.E.
      • Lim S.M.
      • Hong S.R.
      • Lee E.H.
      • Shin K.J.
      • Lee H.Y.
      ], semen [
      • Lee H.Y.
      • Jung S.E.
      • Oh Y.N.
      • Choi A.
      • Yang W.I.
      • Shin K.J.
      Epigenetic age signatures in the forensically relevant body fluid of semen: A preliminary study.
      ], teeth [
      • Bekaert B.
      • Kamalandua A.
      • Zapico S.C.
      • Van De Voorde W.
      • Decorte R.
      Improved age determination of blood and teeth samples using a selected set of DNA methylation markers.
      ] and bones [
      • Lee H.Y.
      • Hong S.R.
      • Lee J.E.
      • Hwang I.K.
      • Kim N.Y.
      • Lee J.M.
      • et al.
      Epigenetic age signatures in bones.
      ]; using a variety of technologies: Pyrosequencing [
      • Weidner C.I.
      • Lin Q.
      • Koch C.M.
      • Eisele L.
      • Beier F.
      • Ziegler P.
      • et al.
      Aging of blood can be tracked by DNA methylation changes at just three CpG sites.
      ,
      • Fleckhaus J.
      • Schneider P.M.
      Novel multiplex strategy for DNA methylation-based age prediction from small amounts of DNA via Pyrosequencing.
      ], EpiTYPER [
      • Freire-Aradas A.
      • Phillips C.
      • Mosquera-Miguel A.
      • Girón-Santamaría L.
      • Gómez-Tato A.
      Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
      ,
      • Zubakov D.
      • Liu F.
      • Kokmeijer I.
      • Choi Y.
      • van Meurs J.B.J.
      • van IJcken W.F.J.
      • et al.
      Human age estimation from blood using mRNA, DNA methylation, DNA rearrangement, and telomere length.
      ], SNaPshot [
      • Hong S.R.
      • Jung S.E.
      • Lee E.H.
      • Shin K.J.
      • Yang W.I.
      • Lee H.Y.
      DNA methylation-based age prediction from saliva: High age predictability by combination of 7 CpG markers.
      ,
      • Jung S.E.
      • Lim S.M.
      • Hong S.R.
      • Lee E.H.
      • Shin K.J.
      • Lee H.Y.
      ] or Massively Parallel Sequencing [
      • Aliferi A.
      • Ballard D.
      • Gallidabino M.D.
      • Thurtle H.
      • Barron L.
      • Syndercombe Court D.
      DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.
      ,
      • Heidegger A.
      • Xavier C.
      • Niederstätter H.
      • de la Puente M.
      • Pośpiech E.
      • Pisarek A.
      • et al.
      Development and optimization of the VISAGE basic prototype tool for forensic age estimation.
      ,
      • Woźniak A.
      • Heidegger A.
      • Piniewska-róg D.
      • Pośpiech E.
      • Pisarek A.
      • Kartasińska E.
      • et al.
      Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
      ,
      • Aliferi A.
      • Sundaram S.
      • Ballard D.
      • Freire-Aradas A.
      • Phillips C.
      • Lareu M.V.
      • et al.
      Combining current knowledge on DNA methylation-based age estimation towards the development of a superior forensic DNA intelligence tool.
      ]; and applying different statistical models, including linear regression [
      • Zbieć-Piekarska R.
      • Spólnicka M.
      • Kupiec T.
      • Parys-Proszek A.
      • Makowska Z.
      • Pałeczka A.
      • et al.
      Development of a forensically useful age prediction method based on DNA methylation analysis.
      ], quantile regression [
      • Smeers I.
      • Decorte R.
      • Van de Voorde W.
      • Bekaert B.
      Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation.
      ], support vector machine [
      • Xu C.
      • Qu H.
      • Wang G.
      • Xie B.
      • Shi Y.
      • Yang Y.
      • et al.
      A novel strategy for forensic age prediction by DNA methylation and support vector regression model.
      ] or artificial neural networks [
      • Vidaki A.
      • Ballard D.
      • Aliferi A.
      • Miller T.H.
      • Barron L.P.
      • Syndercombe Court D.
      DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing.
      ]; as well as covering different age ranges: adults [
      • Naue J.
      • Hoefsloot H.C.J.
      • Mook O.R.F.
      • Rijlaarsdam-Hoekstra L.
      • van der Zwalm M.C.H.
      • Henneman P.
      • et al.
      Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.
      ] and children [
      • Freire-Aradas A.
      • Phillips C.
      • Girón-Santamaría L.
      • Mosquera-Miguel A.
      • Gómez-Tato A.
      • Casares de Cal M.Á.
      • et al.
      Tracking age-correlated DNA methylation markers in the young.
      ]. Common to all of them is the use of a reduced number of markers, from 3 to 16 CpG sites.
      Most age prediction models published to date mainly cover adult samples, with subjects below adult ages consistently underrepresented. Differences have been observed between children and adults in terms of epigenetic changes. DNA methylation patterns for some CpG sites, reveal a logarithmic dependence until adulthood that slows to a linear dependence later in life, as depicted by Horvath [
      • Horvath S.
      DNA methylation age of human tissues and cell types.
      ]. This increased variation of epigenetic states during the early stages of life could be explained by the rapid maturation of the immune system at this period [
      • Alisch R.S.
      • Barwick B.G.
      • Chopra P.
      • Myrick L.K.
      • Satten G.A.
      • Conneely K.N.
      • et al.
      Age-associated DNA methylation in pediatric populations.
      ]. Nevertheless, some CpG sites present a linear or quasi-linear pattern of gradual DNA methylation changes from childhood to very old age, which makes these the most suitable epigenetic biomarkers for establishing a common age prediction model that includes all age ranges, that statistically can be treated in a unified way.
      To develop a common epigenetic clock covering the whole lifetime of a person, inter-individual variability should be also considered. Since epigenetics is the result of environmental interaction with genetics, individuals presenting similar chronological ages can be represented by multiple scenarios [
      • Fraga M.F.
      • Ballestar E.
      • Paz M.F.
      • Ropero S.
      • Setien F.
      • Ballestar M.L.
      • et al.
      Epigenetic differences arise during the lifetime of monozygotic twins.
      ], including potential differences among populations [
      • Cho S.
      • Jung S.E.
      • Hong S.R.
      • Lee E.H.
      • Lee J.H.
      • Lee S.D.
      • et al.
      Independent validation of DNA-based approaches for age prediction in blood.
      ]. Age prediction models reported so far have been trained using dozens to hundreds of volunteers, but no minimum sample size has been established to date.
      In the present study, a common epigenetic clock for all human ages – from children to centenarians – was developed using seven CpG sites detected using EpiTYPER® technology. A total of 895 Spanish blood DNA samples ranging from 2 to 104 years old were trained exploring three statistical models. K-fold cross-validation was used for validation purposes, as well as an independent testing set composed of 152 Spanish individuals from 3 to 69 years old. Additionally, an optimal training set size was calculated assessing dimensionality reduction based on stepwise-reduced training sets from a total of 895 to 99 individuals.

      2. Materials and methods

      2.1 DNA samples and quantification

      A total of 1047 blood DNA samples from Spanish donors (collected across multiple regions) ranging from 2 to 104 years old (mean age: 44.51 years, 477 males and 570 females, approximately 10 individuals per age) were used for the development of an extended age prediction model – DNA methylation levels for all samples were collected within the scope of previous projects. From these samples, 895 (~85%) were used for establishing the training set and the remaining 152 (~15%) for validation purposes. DNA samples were obtained from the Spanish National DNA Bank Carlos III, University of Salamanca and from the BioBank IBSP-CV (PT13/0010/0064), integrated in the Spanish National Biobanks Network and in the Valencian Biobanking Network; and they were processed following standard operating procedures with the appropriate approval of the Ethical and Scientific Committees. Ethical approval for the present study was granted from the ethics committee of investigation in Galicia, Spain (CAEI: 2013/543). Additionally, two internal controls were included in all methylation analyses in order to confirm the reproducibility of results (blood DNA samples from a male and a female, 59 and 32 years old, respectively). All DNA samples were quantified by Qubit® dsDNA High Sensitivity (HS) Assay Kit (Thermo Fisher) and subsequently normalized to 10 ng/µL.

      2.2 CpG target sites and Agena Bioscience EpiTYPER® DNA methylation analysis

      The DNA methylation markers selected for this study were seven CpG sites from the genomic regions: ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429 (GRCh38), included in a previous age prediction model initially created for adult samples [
      • Freire-Aradas A.
      • Phillips C.
      • Mosquera-Miguel A.
      • Girón-Santamaría L.
      • Gómez-Tato A.
      Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
      ]. The Agena Bioscience EpiTYPER® system (San Diego, CA, USA) is a bisulfite-treatment-based method for detection and quantification of DNA methylation using MassARRAY® mass spectrometry [

      Ehrich M., Correll D., Boom D.Van Den Introduction to EpiTYPER for quantitative DNA methylation analysis using the MassARRAY ® System. Seq Appl Note [Internet]. 2006;Doc. No. 8(8876):1–8. Available from: www.sequenom.com.

      ]. The bisulfite conversion step is performed with the EZ DNA Methyl- ation™ Kit (Zymo Research), with input of 300 ng of genomic DNA, following manufacturer’s guidelines, to produce 40 µL of converted DNA. That means that from the samples normalized to 10 ng/µL, 30 µL were used for bisulfite conversion. From the final 40 µL of converted DNA, 1 µL was used for subsequent EpiTYPER analyses. EpiTYPER DNA methylation data for the present study were obtained from two previous publications [
      • Freire-Aradas A.
      • Phillips C.
      • Mosquera-Miguel A.
      • Girón-Santamaría L.
      • Gómez-Tato A.
      Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
      ,
      • Freire-Aradas A.
      • Phillips C.
      • Girón-Santamaría L.
      • Mosquera-Miguel A.
      • Gómez-Tato A.
      • Casares de Cal M.Á.
      • et al.
      Tracking age-correlated DNA methylation markers in the young.
      ]. EpiTYPER detects methylation levels as CpG sets, comprising one or multiple CpG positions in the same cleavage fragment. Therefore, multiple CpGs in a set will be detected when they are closely positioned on the targeted fragment. Herein, we use the term CpG site whether one CpG, or a cluster of CpGs is detected in a single short DNA fragment.

      2.3 Statistical analyses and establishment of online age prediction tools

      All calculations were performed using R software v.3.4.2. Chronological age refers to the actual self-declared age of the individual. Correlations between DNA methylation levels and chronological age were calculated using the Spearman correlation (rs). Inter-individual variability based on DNA methylation levels was assessed using standard deviation (threshold SD >0.05). To build an extended age prediction model, three statistical tools were explored using quantiles 0.1 and 0.9 (q10 and q90): quantile regression (QR), quantile regression neural network (QRNN) and quantile regression support vector machine (QRSVM); using the quantreg (rq function), qrnn (mcqrnn.fit function, 3 hidden layers) and liquidSVM (qtSVM function) R packages, respectively [

      Koenker R., Portnoy S., Ng P.T., Zeileis A., Grosjean P., Moler C., et al. Quantile Regression, Package “ quantreg.” 2019.

      ,

      Cannon A. Package “ qrnn.” 2019.

      ,

      Steinwart I., Thomann P., Farooq M. Package “ liquidSVM.” 2017.

      ]. Validation of the prediction models was performed using k-fold cross-validation (k = 10) applying an R script developed in‐house. The k-fold cross-validation randomly cleavages the input data (N = 895) into k fragments of similar sample size. Random cleavage of the input data was made using the cvTools R package [

      Alfons A. Package “cvTools”: Cross-validation tools for regression models. 2015.

      ]. Every k time that the model was assessed, a k cluster was retained as the test set with the remaining clusters used as the training set, maintaining proportions of 10% and 90% of the input data for test and training sets respectively, per run. The corresponding predictive accuracy was measured with the following performance metrics: mean absolute prediction error (MAE); root-mean-square error (RMSE) and percent of correct classifications within the prediction intervals (%CP±PI). Although when working with quantiles (QR, QRNN and QRSVM), the MAE can be represented by the median instead of the mean, the mean was used in the present study for comparative purposes with additional models, where the MAE is usually based on the mean. Correlation between epigenetic age and chronological age was tested using R2. Predicted versus chronological age was plotted using the ggplot2 R package [

      Wickham H., Chang W. Create Elegant Data Visualisations Using the Grammar of Graphics, Package “ggplot2.” 2019.

      ]. To assess potential differences between statistical methods, a p-value < 0.05 was considered statistically significant. Potential sex differences were tested using the Wilcoxon Mann Whitney test. The final online age prediction tools (QRNN and QRSVM) developed in our study have been placed in the open‐access Snipper forensic classification website and are freely available at: http://mathgene.usc.es/cgi-bin/snps/age_tools/processmethylation-blood_2–104.cgi.

      3. Results

      3.1 Association of seven CpG sites with chronological age using Agena Bioscience EpiTYPER®

      Seven age-correlated CpG sites from the genomic regions: ELOVL2 (CR_1_CpG_9: cg21572722), ASPA (CR_2_CpG_3: cg02228185), PDE4C (CR_4_CpG_27.28.29: none CpG_ID), FHL2 (CR_12_1_CpG_3: cg06639320), CCDC102B (CR_13_CpG_2: cg19283806), MIR29B2CHG (CR_21_CpG_11: none CpG_ID) and chr16:85395429 (CR_23_CpG_3: cg07082267); were selected according to a previous age prediction model initially created for application to adult samples (CpG details in Table 3 of [
      • Freire-Aradas A.
      • Phillips C.
      • Mosquera-Miguel A.
      • Girón-Santamaría L.
      • Gómez-Tato A.
      Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
      ]). These epigenetic markers were analyzed in the present study using EpiTYPER® in a total of 895 individuals ranging from 2 to 104 years old. Reproducibility of results was confirmed using two internal controls that were included in all analyses. Fig. 1 represents the corresponding DNA methylation values compared with the chronological age. While ELOVL2, PDE4C and FHL2 displayed hypermethylation with age; a decrease in the DNA methylation levels for ASPA, CCDC102B, MIR29B2CHG and chr16:85395429 was observed. The strongest age-correlation was observed in ELOVL2 (rs: 0.97), followed by MIR29B2CHG (rs: −0.95), FHL2 (rs: 0.94), PDE4C (rs: 0.94), CCDC102B (rs: −0.93), chr16:85395429 (rs: −0.92) and lastly ASPA (rs: −0.86). Inter-individual variability was similar at all ages in most markers (average SD <0.05), except ASPA (average SD: 0.068) and MIR29B2CHG (average SD: 0.063), which both gradually displayed inter-individual dispersion with increasing age, starting from the age of 40 and 30 years old, respectively.
      Fig. 1
      Fig. 1Dispersion plots (DNA methylation levels compared with chronological age) for ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429 markers using DNA blood samples from 895 Spanish individuals (2–104 years old).

      3.2 Development and validation of a full coverage age prediction model based on EpiTYPER®: from children to the elderly

      In view of the high age-correlation displayed by the markers assessed in Section 3.1, the seven CpG sites detected using Agena Bioscience EpiTYPER® were used to develop a common epigenetic clock from childhood to old age. Table 1 describes the performance metrics for the training set composed of 895 individuals ranging from 2 to 104 years old, comparing the QR, QRNN and QRSVM statistical models and based on k-fold cross-validation (average values representing the ten clusters). Cluster-specific cross-validations have been detailed in Supplementary Material S1. According to Table 1, QR provided a MAE: ± 3.75, RMSE: 5.23 and %CP±PI: 78.77%. Despite the accuracy of these results, the observed metrics were improved by applying additional models. Both QRNN and QRSVM displayed similar metrics: MAE: ± 3.36, ± 3.41; RMSE: 4.83, 4.78 and %CP±PI: 81.45%, 79.66%, respectively. Statistically significant differences between errors (p-value <0.05) were found between QR and QRNN, and between QR and QRSVM. However, when comparing errors obtained using QRNN and QRSVM, no statistically significant differences were found (p-value >0.05). Based on these results, subsequent analyses were constrained to QRNN and QRSVM. Fig. 2A and 2B show the epigenetic age versus the chronological age for QRNN and QRSVM models respectively. Continuous grey and black lines represent the perfect and fitted correlation, respectively; showing practical overlap between both lines in both models. A continuous correlation of the epigenetic age with the chronological age is evident from persons at early life stages to centenarians (R2: 0.9669 for QRNN and 0.9679 for QRSVM). However, an increase in inter-individual variation is detectable with increasing age, observed as narrow prediction intervals (minimum and maximum predicted ages displayed by the discontinuous dark red lines) in children and young adults that gradually expand with age, representing the main reason why the consideration of both predicted age and the prediction interval improves accuracy of the reported results, avoiding a higher rate of misclassification. Sex was not detected as a confounder factor (p-value >0.05). The QRNN and QRSVM age prediction models are freely available from the open‐access Snipper forensic classification website described in Material and Methods. Detailed information regarding the underlying data used for building the models implemented on the website can be found in Supplementary Material S2.
      Table 1Performance metrics for the training, comprising 895 individuals, 2–104 years old and the testing set, comprising 152 individuals, 3–69 years; analyzing seven CpG sites (ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429) detected with EpiTYPER®. The performance metrics for the training set are based on k-fold cross-validation (average values representing the ten clusters). MAE: mean absolute prediction error, RMSE: root-mean-square error, %CP±PI: percent of correct classifications within the prediction intervals, QR: quantile regression, QRNN: quantile regression neural network, QRSVM: quantile regression support vector machine.
      Sample setModelMAERMSE%CP±PI
      TrainingQR± 3.755.2378.77%
      TrainingQRNN± 3.364.8381.45%
      TrainingQRSVM± 3.414.7879.66%
      TestingQRNN± 3.324.5176.32%
      TestingQRSVM± 3.454.7577.63%
      Fig. 2
      Fig. 2Epigenetic age compared to chronological age for the training set of 895 Spanish individuals (2–104 years old) based on seven CpG sites in ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429 regions. The black diagonal line represents the 0.5 quantile regression line between epigenetic age and chronological age and the discontinuous (dark red) lines, the corresponding 0.1 and 0.9 quantile regression limits. The grey line is the diagonal line representing perfect correlation. The underlying statistical model was based on either QRNN (A) or QRSVM (B).
      Further validation of the full coverage age prediction models was performed using a total of 152 independent blood samples ranging in age from 3 to 69 years old. Table 1 summarizes the corresponding performance metrics. Errors were similar to the corresponding training sets in QRNN and QRSVM models (MAE: ± 3.32, ± 3.45 and RMSE: 4.51, 4.75; respectively). The percentage of correct predictions, although similar, slightly diminished in scale in comparison to the corresponding training set %CP ± PI values (76.32% and 77.63%, respectively).

      3.3 Dimensionality reduction of the training set

      To develop the present epigenetic clocks, a total of 895 individuals were used for training the models. In order to capture a high variety of potential inter-individual differences, each year of age was represented by approximately 10 individuals. However, some accurate reported age prediction models have been developed using reduced training sets composed of an average one individual per year (i.e., using N ≈ 100). Aiming to establish an optimal sample size for training an age prediction model, and in order to get the minimum error covering the highest inter-individual variation, an assessment of stepwise-reduced training sets was performed considering an approximate total number of individuals per year of 10, 8, 6, 4, 2 and 1 (ntrain) corresponding to a total number of samples in the training set of 895 (N = 441 males, N = 454 females), 707 (N = 334 males, N = 373 females), 552 (N = 253 males, N = 299 females), 379 (N = 171 males, N = 208 females), 195 (N = 89 males, N = 106 females) and 99 (N = 45 males, N = 54 females), respectively (Ntrain). Specific distributions per age and sex can be found for each analyzed training set in Supplementary Fig. S1-S6. Fig. 3 shows the MAE and correct classification rate within the prediction intervals using both QRNN and QRSVM models for the six explored Ntrain combinations. Table 2 summarizes the underlying performance metrics for the training set using the different Ntrain combinations based on k-fold cross-validation (average values representing the ten clusters). Cluster-specific cross-validations, as well as the corresponding learning curve plots can be found in Supplementary Material S3. According to Table 2, the MAE values from QRNN showed a gradual increase when decreasing the sample size from ntrain= 10 to ntrain= 1 (from ± 3.36 to ± 4.82). Although similar variations were found for the MAE from QRSVM, these were smaller in scale (from ± 3.41 to ± 4.07), providing a more stable error and an improved sample size independence. A gradual decrease in the QRNN correct classification rate (%CP ± PI) was observed accordingly when constraining the training set (from 81.45% to 60.67%). Similarly, when analyzing the same data with the QRSVM model, although a decrease was also detected, it was more stable across different Ntrain combinations (from 79.66% to 74%). Despite these variations between QRNN and QRSVM, no statistical differences were detected between models and across ntrain numbers (p-value >0.05), except for ntrain= 1 (p-value: 0.0144).
      Fig. 3
      Fig. 3Mean absolute prediction error (MAE) and percentage of correct classifications within the prediction intervals (%CP ± PI) using both QRNN and QRSVM models for the six Ntrain combinations evaluated.
      Table 2Stepwise assessment of six training sets, comprising samples of individuals in the age range 2–104 years old. The performance metrics for the training sets are based on k-fold cross-validation (average values representing the ten clusters). Ntrain: sample size of the training set, ntrain: individuals per year at the training set, MAE: mean absolute prediction error, RMSE: root-mean-square error, %CP ± PI: percent of correct classifications within the prediction intervals, QRNN: quantile regression neural network, QRSVM: quantile regression support vector machine.
      ModelNtrainntrainMAERMSE%CP ± PI
      QRNN89510± 3.364.8381.45%
      QRSVM89510± 3.414.7879.66%
      QRNN7078± 3.645.2381.47%
      QRSVM7078± 3.584.9979.48%
      QRNN5526± 3.825.4480.96%
      QRSVM5526± 3.695.1179.51%
      QRNN3794± 4.045.4378.33%
      QRSVM3794± 3.764.9477.8%
      QRNN1952± 4.776.6968.76%
      QRSVM1952± 4.015.4575.29%
      QRNN991± 4.826.4160.67%
      QRSVM991± 4.075.2874%
      To establish an optimal minimum sample size, it is important to consider that quantiles (q10 and q90) are applied in QRNN and QRSVM models by calculating the prediction intervals, which determine the percent of correct classifications (%CP ± PI). Since the interval between the quantiles applied is 80, about 80% of the samples should be correctly predicted to consider that the model is working properly. According to Fig. 3 and Table 2, a minimum sample size of ntrain= 6, in this case corresponding to Ntrain= 552, would be the most appropriate sample size which can capture the most accurate age predictions (%CP ± PI≈80%) covering the broadest inter-individual variation. However, it is important to note that in terms of shrinking the training’s sample size, QRSVM is less susceptible to lower numbers than QRNN (difference in MAE between Ntrain=895 and Ntrain=99: ± 1.46 and 0.66; difference in %CP ± PI between Ntrain=895 and Ntrain=99: 20.78% and 5.66%, for QRNN and QRSVM, respectively). The learning curve plots based on the MAE of each training set tested also show more stability when a minimum ntrain= 6 is analyzed (Supplementary Material S3). Corresponding data for an independent testing set (N = 152) can be found in Supplementary Table S1. Although patterns are not exactly as displayed by Table 2, ntrain= 6 continues to be the optimum sample size to be selected according to the previous criteria applied.

      4. Discussion

      Age estimation is a DNA intelligence tool aiming to provide additional information to the genetic profile at different scenarios, which can comprise: i) individual identification, ii) mass disaster screening, iii) forensic anthropology and iv) legal disputes about age. Subsequently, the development of epigenetic clocks is being implemented in forensic practice as a supplementary analysis for individual age prediction. In general terms, most of the forensic age prediction models to date have been based on adult samples [
      • Jung S.E.
      • Lim S.M.
      • Hong S.R.
      • Lee E.H.
      • Shin K.J.
      • Lee H.Y.
      ,
      • Naue J.
      • Hoefsloot H.C.J.
      • Mook O.R.F.
      • Rijlaarsdam-Hoekstra L.
      • van der Zwalm M.C.H.
      • Henneman P.
      • et al.
      Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.
      ], but minors and the variation in methylation patterns they show, must also be taken into account in the development of universally-applicable forensic analyses. Horvath’s clock, although including both adults and minors, presents a different statistical treatment for both age ranges. While for ages below 20 years, a logarithmic transformation was applied, an untransformed linear model was used for ages above 20 years [
      • Horvath S.
      DNA methylation age of human tissues and cell types.
      ]. This logarithmic transformation is due to an exponential change on the DNA methylation levels at early stages of the individual’s life [
      • Alisch R.S.
      • Barwick B.G.
      • Chopra P.
      • Myrick L.K.
      • Satten G.A.
      • Conneely K.N.
      • et al.
      Age-associated DNA methylation in pediatric populations.
      ]. Despite the high level of coverage of all ages used to develop Horvath’s model, it is based on an impractically large number of markers – 353 CpG sites – representing a major drawback for forensic testing, due to the poor quality and/or quantity of DNA associated with most casework samples.
      Minors have already been included in certain previous forensic epigenetic clocks, but these cover dispersed datapoints for age ranges under 18 years [
      • Weidner C.I.
      • Lin Q.
      • Koch C.M.
      • Eisele L.
      • Beier F.
      • Ziegler P.
      • et al.
      Aging of blood can be tracked by DNA methylation changes at just three CpG sites.
      ,
      • Zbieć-Piekarska R.
      • Spólnicka M.
      • Kupiec T.
      • Parys-Proszek A.
      • Makowska Z.
      • Pałeczka A.
      • et al.
      Development of a forensically useful age prediction method based on DNA methylation analysis.
      ,
      • Aliferi A.
      • Ballard D.
      • Gallidabino M.D.
      • Thurtle H.
      • Barron L.
      • Syndercombe Court D.
      DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.
      ]. To improve this area of study, we developed a specific age prediction model for children and adolescents [
      • Freire-Aradas A.
      • Phillips C.
      • Girón-Santamaría L.
      • Mosquera-Miguel A.
      • Gómez-Tato A.
      • Casares de Cal M.Á.
      • et al.
      Tracking age-correlated DNA methylation markers in the young.
      ]. However, when a biological sample is found, no information is generally available to know if the donor is a minor or an adult. Therefore, the most useful epigenetic clock will be one unifying both age ranges into a single test model. A recent study from Wozniak et al., considered the whole range of ages from 1 to 75 years old (N = 112) to build a novel age prediction model [
      • Woźniak A.
      • Heidegger A.
      • Piniewska-róg D.
      • Pośpiech E.
      • Pisarek A.
      • Kartasińska E.
      • et al.
      Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
      ], obtaining an MAE of ± 3.2 years for blood samples. In Wozniak’s model, minors were represented from 0 to 18 years old with about one individual per year. Following this study, we aimed a step further by covering as much as possible the potential for inter-individual epigenetic variability. This was achieved by covering the fullest interval (2–104 years old) with approximately ten individuals per age (N = 895). By analyzing previously developed CpG sites in the seven genomic regions of ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429, using EpiTYPER technology, we were able to retain a robust and efficient age prediction model [
      • Freire-Aradas A.
      • Phillips C.
      • Mosquera-Miguel A.
      • Girón-Santamaría L.
      • Gómez-Tato A.
      Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
      ], overlapping four of these loci with the recently released VISAGE Enhanced Tool for age prediction [
      • Woźniak A.
      • Heidegger A.
      • Piniewska-róg D.
      • Pośpiech E.
      • Pisarek A.
      • Kartasińska E.
      • et al.
      Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
      ]. Additional CpG sites especially informative at childhood and adolescence had also been considered for analysis, such as KCNAB3 [
      • Freire-Aradas A.
      • Phillips C.
      • Girón-Santamaría L.
      • Mosquera-Miguel A.
      • Gómez-Tato A.
      • Casares de Cal M.Á.
      • et al.
      Tracking age-correlated DNA methylation markers in the young.
      ]. However, aiming to build a single epigenetic clock covering all ages from childhood to the old age, a marker such as KCNAB3 was discarded for analysis, because no linear correlation of DNA methylation levels with chronological age is maintained in this marker across all ages, exponentially increasing during childhood but presenting much more stable levels across adulthood (see Fig. 1 at [
      • Freire-Aradas A.
      • Phillips C.
      • Girón-Santamaría L.
      • Mosquera-Miguel A.
      • Gómez-Tato A.
      • Casares de Cal M.Á.
      • et al.
      Tracking age-correlated DNA methylation markers in the young.
      ]). This extreme lack of linearity between both age groups prevents the inclusion of this marker into a common epigenetic clock.
      In addition to marker selection, although the age range and sample size of the training set were key factors in adapting our age prediction model, the underlying statistical model used also plays an important role. To date, application of linear regression [
      • Weidner C.I.
      • Lin Q.
      • Koch C.M.
      • Eisele L.
      • Beier F.
      • Ziegler P.
      • et al.
      Aging of blood can be tracked by DNA methylation changes at just three CpG sites.
      ,
      • Zbieć-Piekarska R.
      • Spólnicka M.
      • Kupiec T.
      • Parys-Proszek A.
      • Makowska Z.
      • Pałeczka A.
      • et al.
      Development of a forensically useful age prediction method based on DNA methylation analysis.
      ,
      • Jung S.E.
      • Lim S.M.
      • Hong S.R.
      • Lee E.H.
      • Shin K.J.
      • Lee H.Y.
      ,
      • Woźniak A.
      • Heidegger A.
      • Piniewska-róg D.
      • Pośpiech E.
      • Pisarek A.
      • Kartasińska E.
      • et al.
      Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
      ,
      • Eipel M.
      • Mayer F.
      • Arent T.
      • Ferreira M.R.P.
      • Birkhofer C.
      • Gerstenmaier U.
      • et al.
      Epigenetic age predictions based on buccal swabs are more precise in combination with cell type-specific DNA methylation signatures.
      ] has been widely accepted. Since DNA methylation is quantitative in nature and gradually changes through the individual’s lifetime, linear regression models fit well with age estimation based on this epigenetic signature. Quadratic regression models or power transformations have also been applied in cases where the change of the DNA methylation levels with chronological age demonstrates non-linear patterns [
      • Bekaert B.
      • Kamalandua A.
      • Zapico S.C.
      • Van De Voorde W.
      • Decorte R.
      Improved age determination of blood and teeth samples using a selected set of DNA methylation markers.
      ,
      • Woźniak A.
      • Heidegger A.
      • Piniewska-róg D.
      • Pośpiech E.
      • Pisarek A.
      • Kartasińska E.
      • et al.
      Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
      ]. Recently, novel statistical tools based on machine learning have been introduced [
      • Aliferi A.
      • Ballard D.
      • Gallidabino M.D.
      • Thurtle H.
      • Barron L.
      • Syndercombe Court D.
      DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.
      ,
      • Aliferi A.
      • Sundaram S.
      • Ballard D.
      • Freire-Aradas A.
      • Phillips C.
      • Lareu M.V.
      • et al.
      Combining current knowledge on DNA methylation-based age estimation towards the development of a superior forensic DNA intelligence tool.
      ,
      • Naue J.
      • Hoefsloot H.C.J.
      • Mook O.R.F.
      • Rijlaarsdam-Hoekstra L.
      • van der Zwalm M.C.H.
      • Henneman P.
      • et al.
      Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.
      ,
      • Spólnicka M.
      • Pośpiech E.
      • Pepłońska B.
      • Zbieć-Piekarska R.
      • Makowska
      • Pięta A.
      • et al.
      DNA methylation in ELOVL2 and C1orf132 correctly predicted chronological age of individuals from three disease groups.
      ]. Common to all these models is the fact that the error obtained is unique, independent of the age of the sample donor, and should be applied to whatever predicted age is achieved. Nevertheless, DNA methylation data consistently shows that young ages are better predicted than old ages and this has been observed in our dataset as well. Fig. 2 depicts the epigenetic age versus the chronological age for all the individuals from 2 to 104 years old. While the youngest subjects (under 20 years) have datapoints very closely positioned together, this pattern gradually changes until the oldest samples (over 80 years) that show the highest dispersion between datapoints. Inter-individual epigenetic variation is expected since epigenetics derives from an interaction between genetics and environment. Subsequently, when the longest period of time that two age-matched individuals have been exposed to different external factors applies, then major epigenetic differences will be encountered between them, in contrast to the earliest stages on life. In order to improve the accuracy of predictions, specific age-dependent errors could be applied if using statistical models based on quantile regression [
      • Freire-Aradas A.
      • Phillips C.
      • Mosquera-Miguel A.
      • Girón-Santamaría L.
      • Gómez-Tato A.
      Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
      ,
      • Smeers I.
      • Decorte R.
      • Van de Voorde W.
      • Bekaert B.
      Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation.
      ]. Inter-individual epigenetic variation could also occur among populations being affected by different environmental conditions at which the individuals are exposed to. In our study, we used a Spanish cohort in order to build and validate the age prediction model proposed. Further validation will be needed to demonstrate that our model can be used at different worldwide population groups.
      In the present study, the QR, QRNN and QRSVM statistical prediction models have been tested for age estimation. The highest accuracy was obtained for QRNN in terms of MAE ( ± 3.36) and %CP ± PI (81.45%). Nevertheless, no statistical differences (p-value >0.05) were found between QRNN and QRSVM. Therefore, QRNN and QRSVM were both selected as the most accurate age prediction models and subsequent analyses were constraint to these two methods. Validation of the models with an independent set of samples (N = 152) produced similar results (MAE: ± 3.32, ± 3.45 and %CP ± PI: 76.32%, 77.63%; for QRNN and QRSVM, respectively). Nevertheless, since the testing set was restricted to 69 years old, further analysis of older samples should be required in order to validate these results in old age. Errors obtained in the present work were similar to previous models [
      • Zbieć-Piekarska R.
      • Spólnicka M.
      • Kupiec T.
      • Parys-Proszek A.
      • Makowska Z.
      • Pałeczka A.
      • et al.
      Development of a forensically useful age prediction method based on DNA methylation analysis.
      ,
      • Jung S.E.
      • Lim S.M.
      • Hong S.R.
      • Lee E.H.
      • Shin K.J.
      • Lee H.Y.
      ,
      • Aliferi A.
      • Ballard D.
      • Gallidabino M.D.
      • Thurtle H.
      • Barron L.
      • Syndercombe Court D.
      DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.
      ,
      • Naue J.
      • Hoefsloot H.C.J.
      • Mook O.R.F.
      • Rijlaarsdam-Hoekstra L.
      • van der Zwalm M.C.H.
      • Henneman P.
      • et al.
      Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.
      ] (MAE: ± 3.9, ± 3.48, ± 4.1 and ± 3.24, respectively); nevertheless, these errors were fixed to whatever age was predicted. The main advantage of using age-dependent errors, such as those from quantile regression models used here, is to be able to narrow down the errors at early stages of life and to increase them at older ages, where inter-individual epigenetic variability plays an important role.
      Finally, since the sample size of the training set is considered a key factor in developing an accurate age prediction model, the more samples are included, the more inter-individual epigenetic variation can be properly gauged, resulting in lower errors and a higher number of correct classifications. The stepwise dimensionality reduction we performed, taking into account that when quantiles are applied (q10 and q90), ~80% of samples should be correctly predicted, indicated Ntrain= 552 (6 individuals per year of age) gave an optimum balance between sample size and predictive accuracy. The sample size of six individuals per year provided for the training set, a MAE of ± 3.82 for QRNN and ± 3.69 for QRSVM. Correct classification rates were 80.96% for QRNN and 79.51% for QRSVM. At this analysis, it is important to note that, QRSVM showed to be less susceptible to shrinking of the training set than QRNN, therefore, in case of low number of samples, it could be used preferably. Similar metrics to the training were obtained for the corresponding testing set (MAE: ± 3.03, ± 3.56 and %CP ± PI: 84.87%, 78.29% for QRNN and QRSVM, respectively). However, patterns displayed by the testing set when tested under some of the stepwise-reduced training sets didn’t follow exactly the same pattern as the corresponding training sets by themselves. This could be explained due to a reduced age range on the testing set (3–69 years old) in comparison to the training sets (2–104 years old). In summary, to cover a maximum level of inter-individual variability, six individuals per year of age is recommended for the development of future epigenetic clocks which aim to cover the complete range of human ages.
      As a final remark, it should be taken into account that the underlying data used for building the age prediction models developed under this study have been generated using EpiTYPER technology, a system that uses high quantities of genomic DNA (300 ng). In order to directly apply these models to forensic specimens usually presenting low quality and/or quantity of DNA, a step further will be to implement these age predictors on forensic technologies such as SNaPshot or Massively Parallel Sequencing, systems able to handle minor amounts of genomic DNA for methylation analyses of forensic casework.

      Acknowledgements

      AFA was supported by a post-doctorate grant funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (Modalidade B, ED481B 2018/010). The National DNA Bank Carlos III is supported by ISCIII, Ministry of Science and Innovation, Spain (PT13/0001/0037, PT13/0010/0067): The Murcia Twin Registry is supported by the Seneca Foundation, Regional Agency for Science and Technology, Murcia, Spain (15302/PHCS/10) and Ministry of Science and Innovation, Spain (PSI11560–2009). We particularly wish to gratefully acknowledge the sample volunteers and the BioBank IBSP-CV (PT13/0010/0064) integrated in the Spanish National Biobanks Network and Valencian Biobanking Network for their collaboration.

      Appendix A. Supplementary material

      References

        • Gibney E.R.
        • Nolan C.M.
        Epigenetics and gene expression.
        Hered. (Edinb. ). Nat. Publ. Group. 2010; 105: 4-13
        • Riggs A.
        • Russo V.
        • Martienssen R.
        Epigenetic Mechanisms Of Gene Regulation.
        Cold Spring Harbor Laboratory Press,, Plainview, N.Y1996
        • Saha A.
        • Wittmeyer J.
        • Cairns B.R.
        Chromatin remodelling: the industrial revolution of DNA around histones.
        Nat. Rev. Mol. Cell Biol. 2006; 7: 437-447
        • Strahl B.D.
        • Allis C.D.
        The language of covalent histone modifications.
        Nature. 2000; 403: 41-45
        • Lee J.T.
        Epigenetic regulation by long noncoding RNAs.
        Science (80-). 2012; 338: 1435-1439
        • Jones P.A.
        Functions of DNA methylation: islands, start sites, gene bodies and beyond.
        Nat. Rev. Genet. Nat. Publ. Group. 2012; 13: 484-492
        • Smith Z.D.
        • Meissner A.
        DNA methylation: roles in mammalian development.
        Nat. Rev. Genet. 2013; 14: 204-220
        • Schübeler D.
        Function and information content of DNA methylation.
        Nature. 2015; 517: 321-326
        • Rakyan V.K.
        • Down T.A.
        • Maslau S.
        • Andrew T.
        • Yang T.P.
        • Beyan H.
        • et al.
        Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains.
        Genome Res. 2010; 20: 434-439
        • Bocklandt S.
        • Lin W.
        • Sehl M.
        • Sánchez F.
        • Sinsheimer J.
        • Horvath S.
        • et al.
        Epigenetic Predictor of Age.
        PLoS One. 2011; 6e14821
        • Garagnani P.
        • Bacalini M.G.
        • Pirazzini C.
        • Gori D.
        • Giuliani C.
        • Mari D.
        • et al.
        Methylation of ELOVL2 gene as a new epigenetic marker of age.
        Aging Cell. 2012; 11: 1132-1134
        • Hannum G.
        • Guinney J.
        • Zhao L.
        • Zhang L.
        • Hughes G.
        • Sadda S.
        • et al.
        Genome-wide methylation profiles reveal quantitative views of human aging rates.
        Mol. Cell. 2013; 49: 359-367
        • Horvath S.
        DNA methylation age of human tissues and cell types.
        Genome Biol. 2013; 14: R115
        • Johansson Å.
        • Enroth S.
        • Gyllensten U.
        Continuous aging of the human DNA methylome throughout the human lifespan.
        PLoS One. 2013; 8e67378
        • Florath I.
        • Butterbach K.
        • Müller H.
        • Bewerunge-hudler M.
        • Brenner H.
        Cross-sectional and longitudinal changes in DNA methylation with age: An epigenome-wide analysis revealing over 60 novel age-associated CpG sites.
        Hum. Mol. Genet. 2014; 23: 1186-1201
        • Alsaleh H.
        • Haddrill P.R.
        Identifying blood-specific age-related DNA methylation markers on the Illumina MethylationEPIC® BeadChip. Forensic Sci Int. 303. Elsevier Ireland Ltd,, 2019109944
        • Merid S.K.
        • Novoloaca A.
        • Sharp G.C.
        • Küpers L.K.
        • Kho A.T.
        • Roy R.
        • et al.
        Epigenome-wide meta-analysis of blood DNA methylation in newborns and children identifies numerous loci related to gestational age.
        Genome Med. Genome Med. 2020; 12: 1-17
        • Jung M.
        • Pfeifer G.P.
        Aging and DNA methylation.
        BMC Biol. 2015; 13: 7
        • Freire-Aradas A.
        • Phillips C.
        • Lareu M.V.
        Forensic individual age estimation with DNA: From initial approaches to methylation tests.
        Forensic Sci. Rev. 2017; 29: 121-144
        • Levine M.E.
        • Lu A.T.
        • Quach A.
        • Chen B.H.
        • Assimes T.L.
        • Bandinelli S.
        • et al.
        An epigenetic biomarker of aging for lifespan and healthspan.
        Aging (Albany NY). 2018; 10: 573-591
        • Lu A.T.
        • Quach A.
        • Wilson J.G.
        • Reiner A.P.
        • Aviv A.
        • Raj K.
        • et al.
        DNA methylation GrimAge strongly predicts lifespan and healthspan.
        Aging (Albany NY). 2019; 11: 303-327
        • Noroozi R.
        • Ghafouri-Fard S.
        • Pisarek A.
        • Rudnicka J.
        • Spólnicka M.
        • Branicki W.
        • et al.
        DNA methylation-based age clocks: From age prediction to age reversion.
        Ageing Res Rev. 2021; 68101314
        • Bibikova M.
        • Barnes B.
        • Tsan C.
        • Ho V.
        • Klotzle B.
        • Le J.M.
        • et al.
        High density DNA methylation array with single CpG site resolution. Genomics. 98. Elsevier Inc,, 2011: 288-295
        • Weidner C.I.
        • Lin Q.
        • Koch C.M.
        • Eisele L.
        • Beier F.
        • Ziegler P.
        • et al.
        Aging of blood can be tracked by DNA methylation changes at just three CpG sites.
        Genome Biol. 2014; 15: R24
        • Zbieć-Piekarska R.
        • Spólnicka M.
        • Kupiec T.
        • Parys-Proszek A.
        • Makowska Z.
        • Pałeczka A.
        • et al.
        Development of a forensically useful age prediction method based on DNA methylation analysis.
        Forensic Sci. Int Genet. 2015; 17: 173-179
        • Freire-Aradas A.
        • Phillips C.
        • Mosquera-Miguel A.
        • Girón-Santamaría L.
        • Gómez-Tato A.
        Casares De Cal M, et al. Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system.
        Forensic Sci. Int Genet. 2016; 24: 65-74
        • Hong S.R.
        • Jung S.E.
        • Lee E.H.
        • Shin K.J.
        • Yang W.I.
        • Lee H.Y.
        DNA methylation-based age prediction from saliva: High age predictability by combination of 7 CpG markers.
        Forensic Sci. Int Genet. 2017; 29: 118-125
        • Jung S.E.
        • Lim S.M.
        • Hong S.R.
        • Lee E.H.
        • Shin K.J.
        • Lee H.Y.
        DNA methylation of the ELOVL2, FHL2, KLF14, C1orf132/MIR29B2C, and TRIM59 genes for age prediction from blood, saliva, and buccal swab samples. Forensic Sci Int Genet. 38. Elsevier,, 2019: 1-8
        • Lee H.Y.
        • Jung S.E.
        • Oh Y.N.
        • Choi A.
        • Yang W.I.
        • Shin K.J.
        Epigenetic age signatures in the forensically relevant body fluid of semen: A preliminary study.
        Forensic Sci. Int Genet. 2015; 19: 28-34
        • Bekaert B.
        • Kamalandua A.
        • Zapico S.C.
        • Van De Voorde W.
        • Decorte R.
        Improved age determination of blood and teeth samples using a selected set of DNA methylation markers.
        Epigenetics. 2015; 10: 922-930
        • Lee H.Y.
        • Hong S.R.
        • Lee J.E.
        • Hwang I.K.
        • Kim N.Y.
        • Lee J.M.
        • et al.
        Epigenetic age signatures in bones.
        Forensic Sci. Int. Genet. 2020; : 46
        • Fleckhaus J.
        • Schneider P.M.
        Novel multiplex strategy for DNA methylation-based age prediction from small amounts of DNA via Pyrosequencing.
        Forensic Sci. Int. Genet. 2020; 44102189
        • Zubakov D.
        • Liu F.
        • Kokmeijer I.
        • Choi Y.
        • van Meurs J.B.J.
        • van IJcken W.F.J.
        • et al.
        Human age estimation from blood using mRNA, DNA methylation, DNA rearrangement, and telomere length.
        Forensic Sci. Int. Genet. 2016; 24: 33-43
        • Aliferi A.
        • Ballard D.
        • Gallidabino M.D.
        • Thurtle H.
        • Barron L.
        • Syndercombe Court D.
        DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.
        Forensic Sci. Int. Genet. Elsevier,, 2018: 215-226
        • Heidegger A.
        • Xavier C.
        • Niederstätter H.
        • de la Puente M.
        • Pośpiech E.
        • Pisarek A.
        • et al.
        Development and optimization of the VISAGE basic prototype tool for forensic age estimation.
        Forensic Sci. Int. Genet. Elsevier,, 2020102322
        • Woźniak A.
        • Heidegger A.
        • Piniewska-róg D.
        • Pośpiech E.
        • Pisarek A.
        • Kartasińska E.
        • et al.
        Development of the VISAGE enhanced tool and statistical models for epigenetic age estimation in blood.
        Buccal Cells Bones. 2021; 13
        • Aliferi A.
        • Sundaram S.
        • Ballard D.
        • Freire-Aradas A.
        • Phillips C.
        • Lareu M.V.
        • et al.
        Combining current knowledge on DNA methylation-based age estimation towards the development of a superior forensic DNA intelligence tool.
        Forensic Sci. Int. Genet. 2021; 102637
        • Smeers I.
        • Decorte R.
        • Van de Voorde W.
        • Bekaert B.
        Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation.
        Forensic Sci. Int. Genet. Elsevier, 2018: 128-133
        • Xu C.
        • Qu H.
        • Wang G.
        • Xie B.
        • Shi Y.
        • Yang Y.
        • et al.
        A novel strategy for forensic age prediction by DNA methylation and support vector regression model.
        Sci. Rep. 2015; 5: 17788
        • Vidaki A.
        • Ballard D.
        • Aliferi A.
        • Miller T.H.
        • Barron L.P.
        • Syndercombe Court D.
        DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing.
        Forensic Sci. Int Genet. 2017; 28: 225-236
        • Naue J.
        • Hoefsloot H.C.J.
        • Mook O.R.F.
        • Rijlaarsdam-Hoekstra L.
        • van der Zwalm M.C.H.
        • Henneman P.
        • et al.
        Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression.
        Forensic Sci. Int Genet. 2017; 31: 19-28
        • Freire-Aradas A.
        • Phillips C.
        • Girón-Santamaría L.
        • Mosquera-Miguel A.
        • Gómez-Tato A.
        • Casares de Cal M.Á.
        • et al.
        Tracking age-correlated DNA methylation markers in the young.
        Forensic Sci Int Genet. Elsevier, 2018: 50-59
        • Alisch R.S.
        • Barwick B.G.
        • Chopra P.
        • Myrick L.K.
        • Satten G.A.
        • Conneely K.N.
        • et al.
        Age-associated DNA methylation in pediatric populations.
        Genome Res. 2012; 22: 623-632
        • Fraga M.F.
        • Ballestar E.
        • Paz M.F.
        • Ropero S.
        • Setien F.
        • Ballestar M.L.
        • et al.
        Epigenetic differences arise during the lifetime of monozygotic twins.
        Proc. Natl. Acad. Sci. USA. 2005; 102: 10604-10609
        • Cho S.
        • Jung S.E.
        • Hong S.R.
        • Lee E.H.
        • Lee J.H.
        • Lee S.D.
        • et al.
        Independent validation of DNA-based approaches for age prediction in blood.
        Forensic Sci. Int Genet. 2017; 29: 250-256
      1. Ehrich M., Correll D., Boom D.Van Den Introduction to EpiTYPER for quantitative DNA methylation analysis using the MassARRAY ® System. Seq Appl Note [Internet]. 2006;Doc. No. 8(8876):1–8. Available from: www.sequenom.com.

      2. Koenker R., Portnoy S., Ng P.T., Zeileis A., Grosjean P., Moler C., et al. Quantile Regression, Package “ quantreg.” 2019.

      3. Cannon A. Package “ qrnn.” 2019.

      4. Steinwart I., Thomann P., Farooq M. Package “ liquidSVM.” 2017.

      5. Alfons A. Package “cvTools”: Cross-validation tools for regression models. 2015.

      6. Wickham H., Chang W. Create Elegant Data Visualisations Using the Grammar of Graphics, Package “ggplot2.” 2019.

        • Eipel M.
        • Mayer F.
        • Arent T.
        • Ferreira M.R.P.
        • Birkhofer C.
        • Gerstenmaier U.
        • et al.
        Epigenetic age predictions based on buccal swabs are more precise in combination with cell type-specific DNA methylation signatures.
        Aging (Albany NY). 2016; 8: 1034-1048
        • Spólnicka M.
        • Pośpiech E.
        • Pepłońska B.
        • Zbieć-Piekarska R.
        • Makowska
        • Pięta A.
        • et al.
        DNA methylation in ELOVL2 and C1orf132 correctly predicted chronological age of individuals from three disease groups.
        Int. J. Leg. Med. 2018; 132: 1-11