Highlights
- •We have optimized the recognition of body fluids from 16S sequence data.
- •The new data handling workflow is based on PLS in combination with LDA.
- •Large datasets were used to evaluate method performance.
- •In a cross-validation, sensitivities were =0.99 for fecal and oral samples and 0.98 for vaginal samples.
- •High method robustness was demonstrated by testing and training on different datasets.
Abstract
In forensics the DNA-profile is used to identify the person who left a biological
trace, but information on body fluid can also be essential in the evidence evaluation
process. Microbial composition data could potentially be used for body fluid recognition
as an improved alternative to the currently used presumptive tests. We have developed
a customized workflow for interpretation of bacterial 16S sequence data based on a
model composed of Partial Least Squares (PLS) in combination with Linear Discriminant
Analysis (LDA). Large data sets from the Human Microbiome Project (HMP) and the American
Gut Project (AGP) were used to test different settings in order to optimize performance.
From the initial cross-validation of body fluid recognition within the HMP data, the
optimal overall accuracy was close to 98%. Sensitivity values for the fecal and oral
samples were ≥0.99, followed by the vaginal samples with 0.98 and the skin and nasal
samples with 0.96 and 0.81 respectively. Specificity values were high for all 5 categories,
mostly >0.99. This optimal performance was achieved by using the following settings:
Taxonomic profiles based on operational taxonomic units (OTUs) with 0.98 identity
(OTU98), Aitchisons simplex transform with C = 1 pseudo-count and no regularization (r = 1) in the PLS step. Variable selection did not improve the performance further.
To test for robustness across sequencing platforms, we also trained the classifier
on HMP data and tested on the AGP data set. In this case, the standard OTU based approach
showed moderately decline in accuracy. However, by using taxonomic profiles made by
direct assignment of reads to a genus, we were able to nearly maintain the high accuracy
levels. The optimal combination of settings was still used, except the taxonomic level
being genus instead of OTU98. The performance may be improved even further by using
higher resolution taxonomic bins.
Keywords
To read this article in full you will need to make a payment
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to Forensic Science International: GeneticsAlready a print subscriber? Claim online access
Already an online subscriber? Sign in
Register: Create an account
Institutional Access: Sign in to ScienceDirect
References
- Misleading DNA Evidence: Reasons for Miscarriages of Justice.Elsevier, 2014
- Persistence of DNA deposited by the original user on objects after subsequent use by a second person.Forensic Sci. Int. Genet. 2014; 8: 219-225
- The complexities of DNA transfer during a social setting.Legal Med. 2015; 17: 82-91
- Prevalence of human cell material: DNA and RNA profiling of public and private objects and after activity scenarios.Forensic Sci. Int. Genet. 2016; 21: 81-89
- The implications of shedder status and background DNA on direct and secondary transfer in an attack scenario.Forensic Sci. Int. Genet. 2017; 29: 48-60
- Evaluation of prostate-specific antigen (PSA) membrane test assays for the forensic identification of seminal fluid.J. Forensic Sci. 1999; 44: 1057-1060
- Identification of human semenogelin in membrane strip test as an alternative method for the detection of semen.Forensic Sci. Int. 2007; 169: 27-31
- Molecular approaches for forensic cell type identification: on mRNA, miRNA, DNA methylation and microbial markers.Forensic Sci. Int. Genet. 2015; 18: 21-32
- Forensic body fluid identification: state of the art.Res. Rep. Forensic Med. Sci. 2016; 6: 11-23
- Body fluid prediction from microbial patterns for forensic application.Forensic Sci. Int. Genet. 2017; 30: 10-17
- A framework for human microbiome research.Nature. 2012; 486: 215-221
- The healthy human microbiome.Genome Med. 2016; 8
- The Earth Microbiome project: successes and aspirations.BMC Biol. 2014; 12: 69
- Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities.Appl. Environ. Microbiol. 2009; 75: 7537-7541
- Qiime allows analysis of high-throughput community sequencing data.Nat. Methods. 2010;
- Search and clustering orders of magnitude faster than BLAST.Bioinformatics (Oxf., Engl.). 2010; 26: 2460-2461
- VSEARCH: a versatile open source tool for metagenomics.PeerJ. 2016; 4: e2584
- UPARSE: highly accurate OTU sequences from microbial amplicon reads.Nat. Methods. 2013; 10: 996-998
- The use of bacteria for the identification of vaginal secretions.Forensic Sci. Int. Genet. 2010; 4: 311-315
- Vaginal microbial flora analysis by next generation sequencing and microarrays; can microbes indicate vaginal origin in a forensic context?.Int. J. Legal Med. 2012; 126: 303-310
- Waste not, want not: why rarefying microbiome data is inadmissible.PLoS Comput. Biol. 2014; 10: e1003531
- It's all relative: analyzing microbiome data as compositions.Ann. Epidemiol. 2016; 26: 322-329
- The NIH Human Microbiome Project.Genome Res. 2009; 19: 2317-2323
- Context and the human microbiome.Microbiome. 2015; 3: 52
- Chimeric 16s rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons.Genome Res. 2011; 21: 494-504
- microclass: an R-package for 16s taxonomy classification.BMC Bioinform. 2017; 18
- R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing.Vienna, Austria, 2008
- The Statistical Analysis of Compositional Data.Chapman and Hall, 1986
- Estimation of Principal Components and Related Models by Iterative Least Squares.Academic Press, New York1966
- The use of multiple measurements in taxonomic problems.Ann. Eugen. 1936; 7: 179-188
- Mining for genotype-phenotype relations in saccharomyces using partial least squares.BMC Bioinform. 2011; 12: 318
- Improving stability and understandability of genotype-phenotype mapping in Saccharomyces using regularized variable selection in L-PLS regression.BMC Bioinform. 2012; 13: 327
- mRNA profiling using a minimum of five mRNA markers per body fluid and a novel scoring method for body fluid identification.Int. J. Legal Med. 2013; 127: 707-721
- Rapid and inexpensive body fluid identification by RNA profiling-based multiplex High Resolution Melt (HRM) analysis.F1000Res. 2013; 2: 281
- A collaborative European exercise on mRNA-based body fluid/skin typing and interpretation of DNA and RNA results.Forensic Sci. Int. Genet. 2014; 10: 40-48
- Human-associated microbial populations as evidence in forensic casework.Forensic Sci. Int. Genet. 2018; (in press)
- Topographical and temporal diversity of the human skin microbiome.Science (New York, NY). 2009; 324: 1190-1192
- Skin microbiome: genomics-based insights into the diversity and role of skin microbes.Trends Mol. Med. 2011; 17: 320-328
- Revised estimates for the number of human and bacteria cells in the body.PLoS Biol. 2016; 14
- A systematic search for discriminating sites in the 16s ribosomal RNA gene.Microb. Inform. Exp. 2014; 4: 2
- Structure, function and diversity of the healthy human microbiome.Nature. 2012; 486: 207-214
- The truth about metagenomics: quantifying and counteracting bias in 16s rRNA studies.BMC Microbiol. 2015; 15
- PCR-induced sequence artifacts and bias: insights from comparison of two 16s rRNA clone libraries constructed from the same sample.Appl. Environ. Microbiol. 2005; 71: 8966-8969
- Performance comparison of illumina and ion torrent next-generation sequencing platforms for 16s rRNA-based bacterial community profiling.Appl. Environ. Microbiol. 2014; 80: 7583-7591
- Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.Appl. Environ. Microbiol. 2007; 73: 5261-5267
- Metagenomic microbial community profiling using unique clade-specific marker genes.Nat. Methods. 2012; 9: 811-814
- MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.PeerJ. 2015; 3: e1165
Article info
Publication history
Published online: July 15, 2018
Accepted:
July 13,
2018
Received in revised form:
July 12,
2018
Received:
March 26,
2018
Identification
Copyright
© 2018 Elsevier B.V. All rights reserved.