Highlights
- •A bioinformatic pipeline is described to estimate genetically variable peptide profiles from whole genome sequencing data.
- •The pipeline is designed to consider either short- or long-read massively parallel sequencing data.
- •A semicontinuous likelihood that considers linkage and codon degeneracy is introduced.
- •The likelihood formulation is applied to single-source samples and mixtures.
Abstract
Keywords
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to Forensic Science International: GeneticsReferences
- Enzyme polymorphisms in man.Proc. R. Soc. Lond. Ser. B Biol. Sci. 1966; 164: 298-310
- Forensic proteomics.Forensic Sci. Int Genet. 2021; 54102529
- Does trypsin cut before proline?.J. Proteome Res. 2008; 7: 300-305
- MS-GF+ makes progress towards a universal database search tool for proteomics.Nat. Commun. 2014; 5: 1-10
- Comet: an open‐source MS/MS sequence database search tool.Proteomics. 2013; 13: 22-24
- Crux: rapid open source protein tandem mass spectrometry analysis.J. Proteome Res. 2014; 13: 4488-4491
- Enhanced global post-translational modification discovery with MetaMorpheus.J. Proteome Res. 2018; 17: 1844-1851
- Genomic mapping by fingerprinting random clones: a mathematical analysis.Genomics. 1988; 2: 231-239
- Allelic variation in gene expression is common in the human genome.Genome Res. 2003; 13: 1855-1862
- Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human.Nat. Methods. 2009; 6: 613-618
- Interpretation of repeat measurement DNA evidence allowing for multiple contributors and population substructure.Forensic Sci. Int. 2005; 148: 47-53
- LoComatioN: a software tool for the analysis of low copy number DNA profiles.Forensic Sci. Int. 2007; 166: 128-138
- DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands.Forensic Sci. Int. 1994; 64: 125-140
- Current sequencing technology makes microhaplotypes a powerful new type of genetic marker for forensics.Forensic Sci. Int Genet. 2014; 12: 215-224
- Accurate, scalable and integrative haplotype estimation.Nat. Commun. 2019; 10: 1-10
- dbSNP: the NCBI database of genetic variation.Nucleic Acids Res. 2001; 29: 308-311
- Demonstration of protein-based human identification using the hair shaft proteome.PloS One. 2016; 11e0160653
- Population genetics in forensic DNA typing.Science. 1991; 254: 1745-1750
- Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease The eGTEx Project.Nat. Genet. 2017; 49: 1664
- The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans.Science. 2015; 348: 648-660
- The sequence alignment/map format and SAMtools.Bioinformatics. 2009; 25 (Aug 15): 2078-2079
- The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.Genome Res. 2010; 20: 1297-1303
- WhatsHap: weighted haplotype assembly for future-generation sequencing reads.J. Comput. Biol. 2015; 22: 498-509
- 1000 Genomes Project Consortium. Alignment of 1000 Genomes Project reads to reference assembly GRCh38.Gigascience. 2017; 6 (gix038)
- BCFtools/csq: haplotype-aware variant consequences.Bioinformatics. 2017; 33: 2037-2039
- The ensembl variant effect predictor.Genome Biol. 2016; 17: 122
- Suffix arrays: a new method for on-line string searches.siam J. Comput. 1993; 22: 935-948
- Insights into the regulation of protein abundance from proteomic and transcriptomic analyses.Nat. Rev. Genet. 2012; 13: 227-232
- An algorithm for random match probability calculation from peptide sequences. Forensic Science.Forensic Sci. Int Genet. 2020; 47: 47
- Characterisation of artefacts and drop-in events using STR-validator and single-cell analysis. Forensic Science.Forensic Sci. Int Genet. 2017; 30: 57-65
- A universal strategy to interpret DNA profiles that does not require a definition of low-copy-number. Forensic Science.Forensic Sci. Int Genet. 2010; 4: 221-227
- Interpreting low template DNA profiles. Forensic science.Forensic Sci. Int Genet. 2009; 4: 1-10
- Population-specific FST values for forensic STR markers: a worldwide survey.Forensic Sci. Int Genet. 2016; 23: 91-100
Team RC. R: A language and environment for statistical computing. R Found Stat Comput Vienna, Austria. 2017.
- Welcome to the Tidyverse.J. Open Source Softw. 2019; 4: 1686
- ggplot2: elegant graphics for data analysis.Springer, 2016
- plotROC: a tool for plotting ROC curves.J. Stat. Softw. 2017; 79: 79
- The mutational constraint spectrum quantified from variation in 141,456 humans.Nature. 2020; 581: 434-443
- Recent explosive human population growth has resulted in an excess of rare genetic variants.Science. 2012; 336: 740-743
- Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models.Genetics. 2016; 202: 235-245
- Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome.PLoS Genet. 2011; 7: 10
- SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.PLoS One. 2012; 7e37558
- The role of phylogenetically conserved elements in shaping patterns of human genomic diversity.Mol. Biol. Evol. 2018; 35: 2284-2295
- Fixed-bin analysis for statistical evaluation of continuous distributions of allelic data from VNTR loci, for use in forensic comparisons.Am. J. Hum. Genet. 1991; 48: 841-855
- A compilation of tri-allelic SNPs from 1000 Genomes and use of the most polymorphic loci for a large-scale human identification panel.Forensic Sci. Int Genet. 2020; 46102232
- Enhanced mixture interpretation with macrohaplotypes based on long-read DNA sequencing.Int. J. Leg. Med. 2021; 135: 2189-2198
- Accurate, scalable cohort variant calls using DeepVariant and GLnexus.Bioinformatics. 2020; 36: 5582-5589
- Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies.Hum. Mutat. 2010; 31: 67-73
- Comparison of three variant callers for human whole genome sequencing.Sci. Rep. -Uk. 2018; 8: 1-6
- Probabilistic genotyping software: an overview.Forensic Sci. Int.: Genet. 2019; 38: 219-224
- Developing allelic and stutter peak height models for a continuous method of DNA interpretation.Forensic Sci. Int Genet. 2013; 7: 296-304
- Modeling allelic analyte signals for aSTRs in NGS DNA profiles.J. Forensic Sci. 2021; 66: 1234-1245
- Protein-based forensic identification using genetically variant peptides in human bone.Forensic Sci. Int. 2018; 288: 89-96
- Comparison of protein expression levels and proteomically-inferred genotypes using human hair from different body sites.Forensic Sci. Int Genet. 2019; 41: 19-23
- Proteomic genotyping of fingermark donors with genetically variant peptides.Forensic Sci. Int Genet. 2019; 42: 21-30
- A guide to results and diagnostics within a STRmixTM report.Wiley Interdiscip. Rev.: Forensic Sci. 2019; 1e1354
- A continuous statistical phasing framework for the analysis of forensic mitochondrial DNA mixtures.Genes. 2021; 12: 128
- Lab retriever: a software tool for calculating likelihood ratios incorporating a probability of drop-out for forensic DNA profiles.BMC Bioinfom. 2015; 16: 298
- A review of probabilistic genotyping systems: EuroForMix.DNAStatistX STRmixTM. Genes. 2021; 12: 1559
- Likelihood ratio statistics for DNA mixtures allowing for drop-out and drop-in.Forensic Sci. Int.: Genet. Suppl. Ser. 2011; 3: e240-e241
- Validating TrueAllele® DNA mixture interpretation.J. Forensic Sci. 2011; 56: 1430-1447
Plott TJ, Karim N., Durbin-Johnson BP, Swift DP, Scott Youngquist R., Salemi M., et al. Age-Related Changes in Hair Shaft Protein Profiling and Genetically Variant Peptides. Forensic Science International: Genetics.