Optimizing body fluid recognition from microbial taxonomic profiles

  • Eirik Nataas Hanssen
    Corresponding author at: Department of Forensic Biology, Oslo University Hospital, P.O. Box 4950 Nydalen, N-0424 Oslo, Norway.
    Department of Forensic Biology, Oslo University Hospital, P.O. Box 4950 Nydalen, N-0424 Oslo, Norway

    Department of Forensic Medicine, University of Oslo, P.O. Box 4950 Nydalen, N-0424 Oslo, Norway
    Search for articles by this author
  • Kristian Hovde Liland
    Faculty of Science and Technology, Norwegian University of Life Sciences, P.O. Box 5003, N-1432 Ås, Norway

    Faculty of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, P.O. Box 5003, N-1432 Ås, Norway
    Search for articles by this author
  • Peter Gill
    Department of Forensic Biology, Oslo University Hospital, P.O. Box 4950 Nydalen, N-0424 Oslo, Norway

    Department of Forensic Medicine, University of Oslo, P.O. Box 4950 Nydalen, N-0424 Oslo, Norway
    Search for articles by this author
  • Lars Snipen
    Principal corresponding author.
    Faculty of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, P.O. Box 5003, N-1432 Ås, Norway
    Search for articles by this author


      • We have optimized the recognition of body fluids from 16S sequence data.
      • The new data handling workflow is based on PLS in combination with LDA.
      • Large datasets were used to evaluate method performance.
      • In a cross-validation, sensitivities were =0.99 for fecal and oral samples and 0.98 for vaginal samples.
      • High method robustness was demonstrated by testing and training on different datasets.


      In forensics the DNA-profile is used to identify the person who left a biological trace, but information on body fluid can also be essential in the evidence evaluation process. Microbial composition data could potentially be used for body fluid recognition as an improved alternative to the currently used presumptive tests. We have developed a customized workflow for interpretation of bacterial 16S sequence data based on a model composed of Partial Least Squares (PLS) in combination with Linear Discriminant Analysis (LDA). Large data sets from the Human Microbiome Project (HMP) and the American Gut Project (AGP) were used to test different settings in order to optimize performance. From the initial cross-validation of body fluid recognition within the HMP data, the optimal overall accuracy was close to 98%. Sensitivity values for the fecal and oral samples were ≥0.99, followed by the vaginal samples with 0.98 and the skin and nasal samples with 0.96 and 0.81 respectively. Specificity values were high for all 5 categories, mostly >0.99. This optimal performance was achieved by using the following settings: Taxonomic profiles based on operational taxonomic units (OTUs) with 0.98 identity (OTU98), Aitchisons simplex transform with C = 1 pseudo-count and no regularization (r = 1) in the PLS step. Variable selection did not improve the performance further. To test for robustness across sequencing platforms, we also trained the classifier on HMP data and tested on the AGP data set. In this case, the standard OTU based approach showed moderately decline in accuracy. However, by using taxonomic profiles made by direct assignment of reads to a genus, we were able to nearly maintain the high accuracy levels. The optimal combination of settings was still used, except the taxonomic level being genus instead of OTU98. The performance may be improved even further by using higher resolution taxonomic bins.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Forensic Science International: Genetics
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Gill P.
        Misleading DNA Evidence: Reasons for Miscarriages of Justice.
        Elsevier, 2014
        • van Oorschot R.A.H.
        • Glavich G.
        • Mitchell R.J.
        Persistence of DNA deposited by the original user on objects after subsequent use by a second person.
        Forensic Sci. Int. Genet. 2014; 8: 219-225
        • Goray M.
        • Oorschot R.A.H.v.
        The complexities of DNA transfer during a social setting.
        Legal Med. 2015; 17: 82-91
        • van den Berge M.
        • Ozcanhan G.
        • Zijlstra S.
        • Lindenbergh A.
        • Sijen T.
        Prevalence of human cell material: DNA and RNA profiling of public and private objects and after activity scenarios.
        Forensic Sci. Int. Genet. 2016; 21: 81-89
        • Fonneløp A.E.
        • Ramse M.
        • Egeland T.
        • Gill P.
        The implications of shedder status and background DNA on direct and secondary transfer in an attack scenario.
        Forensic Sci. Int. Genet. 2017; 29: 48-60
        • Hochmeister M.N.
        • Budowle B.
        • Rudin O.
        • Gehrig C.
        • Borer U.
        • Thali M.
        • Dirnhofer R.
        Evaluation of prostate-specific antigen (PSA) membrane test assays for the forensic identification of seminal fluid.
        J. Forensic Sci. 1999; 44: 1057-1060
        • Pang B.C.M.
        • Cheung B.K.K.
        Identification of human semenogelin in membrane strip test as an alternative method for the detection of semen.
        Forensic Sci. Int. 2007; 169: 27-31
        • Sijen T.
        Molecular approaches for forensic cell type identification: on mRNA, miRNA, DNA methylation and microbial markers.
        Forensic Sci. Int. Genet. 2015; 18: 21-32
        • Harbison S.
        • Fleming R.
        Forensic body fluid identification: state of the art.
        Res. Rep. Forensic Med. Sci. 2016; 6: 11-23
        • Hanssen E.N.
        • Avershina E.
        • Rudi K.
        • Gill P.
        • Snipen L.
        Body fluid prediction from microbial patterns for forensic application.
        Forensic Sci. Int. Genet. 2017; 30: 10-17
        • Human Microbiome Project Consortium
        A framework for human microbiome research.
        Nature. 2012; 486: 215-221
        • Lloyd-Price J.
        • Abu-Ali G.
        • Huttenhower C.
        The healthy human microbiome.
        Genome Med. 2016; 8
        • Gilbert J.A.
        • Jansson J.K.
        • Knight R.
        The Earth Microbiome project: successes and aspirations.
        BMC Biol. 2014; 12: 69
        • Schloss P.D.
        • Westcott S.L.
        • Ryabin T.
        • Hall J.R.
        • Hartmann M.
        • Hollister E.B.
        • Lesniewski R.A.
        • Oakley B.B.
        • Parks D.H.
        • Robinson C.J.
        • Sahl J.W.
        • Stres B.
        • Thallinger G.G.
        • Van Horn D.J.
        • Weber C.F.
        Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities.
        Appl. Environ. Microbiol. 2009; 75: 7537-7541
        • Caporaso J.
        • Kuczynski J.
        • Stombaugh J.
        • Bittinger K.
        • Bushman F.
        • Costello E.
        • Fiere N.
        • Pena A.
        • Goodrich J.
        • Gordon J.
        • Huttley S.
        • Kelley G.A.
        • Knights D.
        • Koenig J.
        • Lozupone C.
        • McDonald D.
        • Muegge B.
        • Pirrung M.
        • Reeder J.
        • Sevinsky J.
        • Turnbaugh P.
        • Walters W.
        • Widmann J.
        • Yatsunenko T.
        • Zaneveld J.
        • Knigh R.
        Qiime allows analysis of high-throughput community sequencing data.
        Nat. Methods. 2010;
        • Edgar R.C.
        Search and clustering orders of magnitude faster than BLAST.
        Bioinformatics (Oxf., Engl.). 2010; 26: 2460-2461
        • Rognes T.
        • Flouri T.
        • Nichols B.
        • Quince C.
        • Mahé F.
        VSEARCH: a versatile open source tool for metagenomics.
        PeerJ. 2016; 4: e2584
        • Edgar R.C.
        UPARSE: highly accurate OTU sequences from microbial amplicon reads.
        Nat. Methods. 2013; 10: 996-998
        • Fleming R.I.
        • Harbison S.
        The use of bacteria for the identification of vaginal secretions.
        Forensic Sci. Int. Genet. 2010; 4: 311-315
        • Benschop C.C.G.
        • Quaak F.C.A.
        • Boon M.E.
        • Sijen T.
        • Kuiper I.
        Vaginal microbial flora analysis by next generation sequencing and microarrays; can microbes indicate vaginal origin in a forensic context?.
        Int. J. Legal Med. 2012; 126: 303-310
        • McMurdie P.J.
        • Holmes S.
        Waste not, want not: why rarefying microbiome data is inadmissible.
        PLoS Comput. Biol. 2014; 10: e1003531
        • Gloor G.B.
        • Wu J.R.
        • Pawlowsky-Glahn V.
        • Egozcue J.J.
        It's all relative: analyzing microbiome data as compositions.
        Ann. Epidemiol. 2016; 26: 322-329
        • NIH HMP Working Group
        • Peterson J.
        • Garges S.
        • Giovanni M.
        • McInnes P.
        • Wang L.
        • Schloss J.A.
        • Bonazzi V.
        • McEwen J.E.
        • Wetterstrand K.A.
        • Deal C.
        • Baker C.C.
        • Di Francesco V.
        • Howcroft T.K.
        • Karp R.W.
        • Lunsford R.D.
        • Wellington C.R.
        • Belachew T.
        • Wright M.
        • Giblin C.
        • David H.
        • Mills M.
        • Salomon R.
        • Mullins C.
        • Akolkar B.
        • Begg L.
        • Davis C.
        • Grandison L.
        • Humble M.
        • Khalsa J.
        • Little A.R.
        • Peavy H.
        • Pontzer C.
        • Portnoy M.
        • Sayre M.H.
        • Starke-Reed P.
        • Zakhari S.
        • Read J.
        • Watson B.
        • Guyer M.
        The NIH Human Microbiome Project.
        Genome Res. 2009; 19: 2317-2323
        • McDonald D.
        • Birmingham A.
        • Knight R.
        Context and the human microbiome.
        Microbiome. 2015; 3: 52
        • Haas B.J.
        • Gevers D.
        • Earl A.M.
        • Feldgarden M.
        • Ward D.V.
        • Giannoukos G.
        • Ciulla D.
        • Tabbaa D.
        • Highlander S.K.
        • Sodergren E.
        • Methé B.
        • DeSantis T.Z.
        • Human Microbiome Consortium
        • Petrosino J.F.
        • Knight R.
        • Birren B.W.
        Chimeric 16s rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons.
        Genome Res. 2011; 21: 494-504
        • Liland K.H.
        • Vinje H.
        • Snipen L.
        microclass: an R-package for 16s taxonomy classification.
        BMC Bioinform. 2017; 18
        • R Development Core Team
        R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing.
        Vienna, Austria, 2008
        • Aitchison J.
        The Statistical Analysis of Compositional Data.
        Chapman and Hall, 1986
        • Wold H.
        Estimation of Principal Components and Related Models by Iterative Least Squares.
        Academic Press, New York1966
        • Fisher R.A.
        The use of multiple measurements in taxonomic problems.
        Ann. Eugen. 1936; 7: 179-188
        • Mehmood T.
        • Martens H.
        • Sæbø S.
        • Warringer J.
        • Snipen L.
        Mining for genotype-phenotype relations in saccharomyces using partial least squares.
        BMC Bioinform. 2011; 12: 318
        • Mehmood T.
        • Warringer J.
        • Snipen L.
        • Sæbø S.
        Improving stability and understandability of genotype-phenotype mapping in Saccharomyces using regularized variable selection in L-PLS regression.
        BMC Bioinform. 2012; 13: 327
        • Roeder A.D.
        • Haas C.
        mRNA profiling using a minimum of five mRNA markers per body fluid and a novel scoring method for body fluid identification.
        Int. J. Legal Med. 2013; 127: 707-721
        • Hanson E.K.
        • Ballantyne J.
        Rapid and inexpensive body fluid identification by RNA profiling-based multiplex High Resolution Melt (HRM) analysis.
        F1000Res. 2013; 2: 281
        • van den Berge M.
        • Carracedo A.
        • Gomes I.
        • Graham E.A.M.
        • Haas C.
        • Hjort B.
        • Hoff-Olsen P.
        • Maroñas O.
        • Mevåg B.
        • Morling N.
        • Niederstätter H.
        • Parson W.
        • Schneider P.M.
        • Court D.S.
        • Vidaki A.
        • Sijen T.
        A collaborative European exercise on mRNA-based body fluid/skin typing and interpretation of DNA and RNA results.
        Forensic Sci. Int. Genet. 2014; 10: 40-48
        • Quaak F.C.A.
        • Duijn T.V.
        • Hoogenboom J.
        • Kloosterman A.D.
        • Kuiper I.
        Human-associated microbial populations as evidence in forensic casework.
        Forensic Sci. Int. Genet. 2018; (in press)
        • Grice E.A.
        • Kong H.H.
        • Conlan S.
        • Deming C.B.
        • Davis J.
        • Young A.C.
        • NISC Comparative Sequencing Program
        • Bouffard G.G.
        • Blakesley R.W.
        • Murray P.R.
        • Green E.D.
        • Turner M.L.
        • Segre J.A.
        Topographical and temporal diversity of the human skin microbiome.
        Science (New York, NY). 2009; 324: 1190-1192
        • Kong H.H.
        Skin microbiome: genomics-based insights into the diversity and role of skin microbes.
        Trends Mol. Med. 2011; 17: 320-328
        • Sender R.
        • Fuchs S.
        • Milo R.
        Revised estimates for the number of human and bacteria cells in the body.
        PLoS Biol. 2016; 14
        • Vinje H.
        • Almøy T.
        • Liland K.H.
        • Snipen L.
        A systematic search for discriminating sites in the 16s ribosomal RNA gene.
        Microb. Inform. Exp. 2014; 4: 2
        • Consortium T.H.M.P.
        Structure, function and diversity of the healthy human microbiome.
        Nature. 2012; 486: 207-214
        • Brooks J.P.
        • Edwards D.J.
        • Harwich M.D.
        • Rivera M.C.
        • Fettweis J.M.
        • Serrano M.G.
        • Reris R.A.
        • Sheth N.U.
        • Huang B.
        • Girerd P.
        • Strauss J.F.
        • Jefferson K.K.
        • Buck G.A.
        The truth about metagenomics: quantifying and counteracting bias in 16s rRNA studies.
        BMC Microbiol. 2015; 15
        • Acinas S.G.
        • Sarma-Rupavtarm R.
        • Klepac-Ceraj V.
        • Polz M.F.
        PCR-induced sequence artifacts and bias: insights from comparison of two 16s rRNA clone libraries constructed from the same sample.
        Appl. Environ. Microbiol. 2005; 71: 8966-8969
        • Salipante S.J.
        • Kawashima T.
        • Rosenthal C.
        • Hoogestraat D.R.
        • Cummings L.A.
        • Sengupta D.J.
        • Harkins T.T.
        • Cookson B.T.
        • Hoffman N.G.
        Performance comparison of illumina and ion torrent next-generation sequencing platforms for 16s rRNA-based bacterial community profiling.
        Appl. Environ. Microbiol. 2014; 80: 7583-7591
        • Wang Q.
        • Garrity G.M.
        • Tiedje J.M.
        • Cole J.R.
        Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.
        Appl. Environ. Microbiol. 2007; 73: 5261-5267
        • Segata N.
        • Waldron L.
        • Ballarini A.
        • Narasimhan V.
        • Jousson O.
        • Huttenhower C.
        Metagenomic microbial community profiling using unique clade-specific marker genes.
        Nat. Methods. 2012; 9: 811-814
        • Kang D.D.
        • Froula J.
        • Egan R.
        • Wang Z.
        MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.
        PeerJ. 2015; 3: e1165