If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Highlights the ability of MPS to complement and enhance current forensic ecology and human identification tools.
•
Provides an overview of key studies that explore the use of environmental DNA (eDNA) from high and low biomass samples in a forensic context.
•
Describes recent key developments in evidence evaluation that are leading the integration of eDNA techniques into forensic casework.
•
Discusses the current inconsistencies in eDNA reporting and the required validation criteria for eDNA to be fully accepted in court.
Abstract
Massively parallel sequencing (MPS) has revolutionised the field of genomics enabling substantial advances in human DNA profiling. Further, the advent of MPS now allows biological signatures to be obtained from complex DNA mixtures and trace amounts of low biomass samples. Environmental samples serve as ideal forms of contact trace evidence as detection at a scene can establish a link between a suspect, location and victim. Many studies have applied MPS technology to characterise the biodiversity within high biomass environmental samples (such as soil and water) to address questions related to ecology, conservation, climate change and human health. However, translation of these tools to forensic science remains in its infancy, due in part to the merging of traditional forensic ecology practices with unfamiliar DNA technologies and complex datasets. In addition, people and objects also carry low biomass environmental signals which have recently been shown to reflect a specific individual or location. The sensitivity, and reducing cost, of MPS is now unlocking the power of both high and low biomass environmental DNA (eDNA) samples as useful sources of genetic information in forensic science. This paper discusses the potential of eDNA to forensic science by reviewing the most explored applications that are leading the integration of this technology into the field. We introduce novel areas of forensic ecology that could also benefit from these tools with a focus on linking a suspect to a scene or establishing provenance of an unknown sample and discuss the current limitations and validation recommendations to achieve translation of eDNA into casework.
Environmental trace evidence, often referred to as Forensic Ecology, is a well-recognised area of forensic science that utilises the interaction between people and the environment in relation to crime [
]. Environmental samples serve as ideal forms of contact trace evidence as the detection at a scene can provide a source of valuable forensic information to establish a link between a suspect, location and victim. Traditional forensic ecology requires the characterisation of physical specimens such as plant fragments, pollen, diatoms, fungi or insects, via morphology and thus requires a diverse range of expertise, e.g. a botanist, palynologist, mycologist and entomologist. Despite large numbers of unsolved crimes, and thus the need to develop additional tools to link suspects to crimes [
], forensic ecology remains under-utilised in casework due to a limited number of experts. However, the recent advent of Massive Parallel Sequencing (MPS) has sparked interest in environmental trace evidence as it offers a means to rapidly detect and characterise multiple species simultaneously without the need to isolate individual specimens from a complex mixture, has less emphasis on specialist morphological expertise and presence of good quality samples, and offers the opportunity to detect trace levels of environmental DNA (eDNA) from low biomass samples.
The most common technique used to characterise the complex genetic diversity within an environmental sample is termed ‘DNA metabarcoding’ and involves PCR-based enrichment of a taxonomically informative gene termed a ‘barcode’ using universal primers for a single taxonomic group [
]. While detection of the microbiome (bacteria, fungi, viruses and archaea) is often termed ‘Microbial Forensics’, and provides a strong foundation for the integration of DNA metabarcoding analysis to forensic science [
], eDNA is an extension of this concept which encompasses higher level taxonomic groups including plants (including pollen and diatoms) and invertebrates, which are also useful indicators in forensic ecology. As most environmental sample types (such as plant fragments, soil and water) contain a high biomass, detection and characterisation of the eDNA within these samples is relatively straight forward using MPS. However, the sensitivity of this technology now offers the opportunity to push the boundaries of forensic ecology and explore the use of eDNA from low biomass samples such as dust, as well as the biological signals present on the surface of objects which previously was not feasible.
Trace amounts of eDNA can be transferred to objects via air when exposed to a particular environment [
] or transferred from an individual by touch or when two items come into contact. As a result, surfaces possess an eDNA signal that reflects that particular interaction, though present at very low levels in comparison to high biomass samples such as soils. For example, geographically distinct microbiome signals have been detected from dust, air and surface samples across different households [
]. Such signals could prove useful in casework to identify an individual, identify the source of an unknown sample or to track the movement of individuals. MPS has repeatedly demonstrated highly discriminative, and informative signals from high biomass and low biomass environmental samples, and has been explored in a forensic context for soil discrimination [
], integration into casework is challenging. Environmental DNA is a novel tool that combines DNA profiling with forensic ecology, both of which apply different approaches to evaluate and report evidence. Here, we review recent key developments in the most explored areas of forensic eDNA that are leading the integration of this MPS tool into casework with a focus on the use of eDNA for linking a suspect to a scene or establishing provenance of an unknown sample. We introduce novel areas of forensic ecology that could also benefit from these tools and discuss the current limitations and the recommended validation framework outlined for microbial forensics which is equally applicable to all aspects of trace eDNA to achieve translation into casework.
2. Forensic soil science
Soils are the most explored area of eDNA in forensic science as the soil matrix hosts a high biomass of diverse organisms which can be easily detected and characterised [
]. Soils serve as an ideal form of contact trace evidence as they are highly individualistic, relatively easy to characterise, have a high transfer and retention probability (typically present on car tyres, footwear and clothing), and are often overlooked in attempts to conceal evidence [
]. Forensic studies have shown that the eDNA profiles in soils are characteristic of a particular environment thus offering a tool to establish a link between an unknown sample and reference samples [
]. In the early stages of MPS, microbial diversity (which includes bacteria, archaea and fungi) was the most commonly explored. However, eDNA enables the detection and characterisation of other taxonomic groups (e.g. plants, invertebrates) which can provide additional distinguishing features. As a result, the potential of eDNA for intelligence purposes to predict the origin of an unknown soil or dust sample [
eDNA within bulk soil samples have predominantly been explored as a means to discriminate between locations or establish a link between a suspect and a crime scene or victim. For such analysis to be accepted in forensic science, the technique should provide enough discriminatory power such that similar ecological habitats can be differentiated but not so that within-site heterogeneity results in samples from the same location being deemed unique [
]. The concept of soil microbiomes for this purpose is not new and has previously been examined using terminal restriction fragment length polymorphism (TRFLP) and denaturing gel electrophoresis (DGGE) [
]. However, a comparison between MPS methods (454, Ion Torrent and Illumina), rRNA intergenic space analyses (RISA), TRFLP and a microarray approach, as well as to n-alkanes and fatty alcohol profiling currently accepted in UK courts, showed that targeting the ribosomal 16S RNA gene using RISA, TRFLP and MiSeq produced the most reliable and significantly discriminating profiles between adjacent, similar soil types [
]. Similarly, 88 % assignment accuracy was achieved when MPS was used to examine microbial profiles from 10 diverse habitats (expected to have different microbial communities) and 9 similar habitats (expected to have a more similar structure) [
]. Demaneche et al. 2017 performed blind tests to determine the origin of two questioned samples, (one from a mock crime scene and a 50:50 mix of the crime scene and alibi site) using three control samples [
]. Both RISA and 16S MPS discriminated single source samples sufficiently but a combination of two techniques was required to show that the origin was a mixture of soils. MiSAFE Project, a European initiative to standardise soil forensic DNA analysis involving 12 laboratories, also demonstrated that RISA and 16S MPS could both correctly affiliate a sample from a spade recovered from a suspect’s home to the crime scene in the scenario. Furthermore, a number of studies have also incorporated temporal variation to demonstrate that profiles generated from soils collected weeks or months after a crime will likely be representative of the location where the soil transfer took place [
]. This study showed that all shoe samples could be correctly classified to individuals despite temporal variation and indicates that floor-associated microbial taxa often increased in abundance on the shoe sole as the participant walked through that space, and that floor samples had significant predictive power despite being taken in areas that the sole did not directly come into direct contact. While previous studies of soil and dust show promise for establishing associations between samples, the number of reference samples included in each study is low (Table 1) and thus the statistical power that can be applied to support an association is somewhat limited. For example, while figures such as 88 % assignment accuracy appear promising [
], it indicates a 12 % inaccurate or false prediction which may be challenged in a court setting.
Table 1Summary of studies that apply MPS to soils and dust in a forensic context. Where possible details of specific barcodes and primers used are provided. *unspecified.
Soil comparisons for evidential value
Reference
Sample Type
Country
# References sites
Sampling Period
Barcode/ Taxa Targeted
Distance Metric
Evaluation and Reporting
Giampaoli et al 2014
Soil
Italy
2 sites, 150 km apart
3 months
Eukaryotes, plants and bacteria
Chi Square
Abundance charts, PCA
Young et al 2014
Soil
Australia
2 sites, 14 km apart
na
16S, ITS, 18S, trnL
Bray-Curtis
Hierarchical clustering, Dissimilarity
Khodakova et al 2014
Soil
Australia
2 Parklands, 3 km apart
na
Shotgun, WMS, AP-PCR
Bray-Curtis
Hierarchical clustering, NMDS, PERMANOVA, CAP
Young et al 2015
Soil
Australia
7 sites, up to 180 km apart
Six weeks
18S
Bray-Curtis
NMDS, CAP
Jesmok et al 2016
Soil
United States
19, various distances
1 year (every 3 months)
16S V3-V4 (357 F/806R)
Bray-Curtis, Sorensen-Dice
Abundance charts, NMDS, Machine Learning, ANOSIM
Demaneche et al 2017
Soil
France
3 control sites, 2 unknowns as blind study
na
RISA, 16S
Euclidean
Hierarchical clustering, PCA, ANOVA
Habtom et al 2017
Soil
Israel
3 sites, (up to 100 m apart)
na
Non-MPS: RISA, rpoB TRFLP (rpoB and 16S), 16S microarray
Bray-Curtis
Dissimilarity, MRPP test
IonTorrent, MiSeq: 16S V4 (515 F-806R)
454: 16S V1-V3 (27F-519R)
Provenance prediction for forensic intelligence
Reference
Sample Type
Country
# References sites
Sampling Period
Barcode/ Taxa Targeted
Distance Metric
Evaluation and Reporting
Damaso et al 2018 (NOT MPS - LHPCR)
Soil
Miami-Dade Florida
18 sites (n = 1332)
1 year (wet and dry season)
Bacteria, ungi, plant and archaea
Bray-Curtis
Hierarchical clustering, Machine Learning
3 spatial scales
Flojgaard et al 2019
Soil
Denmark
130 sites
na
Eukaryotes 18S, fungi (ITS2), plants (ITS2) and insects (mt16S)
Abundance*
NMDS, Machine Learning
300km East - West
Habtom et al 2019
Soil
Israel
5 sites
na
16S
Bray-Curtis
Hierarchical clustering, NMDS, MRPP, Liklihood Ratio
The cost of MPS has reduced over the past 10 years which has vastly increased sampling capacity. This has enabled large scale exploration of soil biota to assess and understand soil biogeographical variation at a continental scale [
] and the ability to target multiple taxonomic groups at a lower cost allowing a more comprehensive picture of diversity and thus increased discriminatory power of soil comparisons [
]. Such capability provides a framework for applying more robust statistics to soil comparisons of evidential value but also unlocks the potential of soil biota as a means to identify the provenance of an unknown sample for intelligence purposes, in the absence of case-specific reference samples.
2.2 Provenance prediction for forensic intelligence
2.2.1 Geographical variation at community level
Recent studies have utilised geographical variation at the community level to predict soil origin (Table 1). Flojgaard et al. 2019 used 130 sites representing 24 environmental strata across five geographical regions in Denmark to investigate the potential of eDNA to predict soil provenance in terms of environmental conditions, habitat type and geographical region [
]. The study determined that fungal taxa were best predictors of light, soil moisture and pH whereas plant taxa were best predictors of soil nutrients. Damaso et al. 2018 used LH-PCR to assess biogeographic patterns of four target groups (bacteria, fungi, plant and archaea) at multiple scales across Miami-Dade, Florida [
]. The study showed that soil biota is spatially autocorrelated with statistically similar biotic composition observed in geographically closer samples and indicates that all four taxa (bacteria, fungi, archaea and plant) were important for predicting provenance between soils at a local scale [
]. Using Random-Forest Classification, the provenance prediction accuracy decreased at smaller scales (>91 % at the soil type level, 98 % at transect level and 67 % at subplot level) but was likely attributed to lower sampling intensity at smaller scales. Similarly, Habtom et al. 2019 reported geographical location to be more important than soil type for soil bacteria using 16S rRNA gene region and that the influence of geographical location varied at different spatial scales [
]. At a local scale (25−1000 m) the farther apart two soil communities, the more they differed, however, at a regional scale (1−260 km) differences between communities did not increase with geographic distance: instead the dominating factor was annual precipitation, soil sodium and ammonium levels. Habtom et al. 2019 also report only 12.3 % of the TRFs were specific to a single soil type at a specific site [
] and in agreement with Damaso et al. 2018, indicate that finer scale resolution requires more ‘peaks’ to accurately classify soil origin and that ‘rare’ peaks are important to provide “uniqueness” to the sample. It has also been suggested that within-site variability influences prediction success as samples collected from a homogenous grassland were more similar than those from a shrubland [
] thus the prediction error reduced in the latter. These studies also highlight that within-site variation over time can also be influenced by human intervention [
]. For example, when multiple woodlot sites were analysed, one showed a high within-site heterogeneity over the 8 weeks as it was adjacent to a large gravel pit which had been recently converted to a park [
Whilst these studies demonstrate the potential of eDNA to determine the source of an unknown soil sample, often in casework soil particles are transferred to objects or clothing in the form of dust. Previous studies have indicated local scale variation in dust samples [
Spatial variability in airborne bacterial communities across land-use types and their relationship to the bacterial communities of potential source environments.
] however, the large scale variation in dust biota is valuable for forensic intelligence. Barberan et al. 2015 demonstrated that airborne microbial communities exhibit non-random geographical patterns on a continental scale and identified that dust associated bacterial communities were primarily derived from bacteria likely to be found on plants and soils regardless of degree of urbanisation, while the imprint of human associated bacteria (primarily skin) was low [
]. Despite the presumed low biomass nature of dust samples, the external surfaces of homes across the United States (as part of the Wild Life at Home Project) showed an average of ∼4700 bacterial and ∼1400 fungal phylotypes with a total of 112,000 and 57,000 bacterial and fungal phylotypes observed across all samples [
]. Most phylotypes were restricted to a small subset of samples with 88 % and 94 % of the bacteria and fungi phylotypes respectively only detected in 10 % of the collected samples. In agreement with Flojgaard et al. 2019 who suggest fungi as promising indicators for soil provenance, fungi also showed a stronger correlation with geographical origin than bacteria in dust: 72.4 % of fungal taxa were found in <10 % of the dust samples supporting the argument that fungi exhibit a higher degree of geographical endemism [
] and thus may be more useful indicators for provenance in a forensic context. The median prediction error of an unknown dust sample was 230 km with 5% of samples achieving better than 58 km and 5% worse than 1039 km [
]. The prediction accuracy was lower for samples taken at locations with low sampling intensity and those with low fungal richness and again was also dependent on the geographical region in question [
]. For example, urban and suburban (developed) land types had lower prediction errors and high prediction region coverage probabilities relative to less developed areas.
2.2.2 Identification of specific indicator taxa
Species identification aids an investigation by identifying a species with a restricted geographical distribution which can narrow the likely origin of an unknown sample. For example, Giampaoli et al. 2014 showed that soil samples from lake areas could only be separated by exclusively analysing plants rather than the whole eukaryote community [
]. A number of studies have indicated that identification of specific taxa from either dust or soil can be indicative of location based on their preferred environmental conditions. One study examined plant, eukaryotes and bacteria DNA in soils from a lake and a farm found that the farm samples had an increased viridiplantae (i.e. green plants) component and detected aquatic plants (Ceratophyllum), ferns (Dryopteridaceae), oaks (Quercus) water-milfoil (Myriophyllum) but was devoid of plants common in farms, country landscapes and houses indicative of a woodland environment close to freshwater [
], for example Cellulomonas (abundant actinobacterial genera) was reported as a predictor of low pH soil whereas Terriglobus (most abundant acidobacteria) described as a predictor of high pH, and the fungal genus Cladosporium was dominant in samples from humid areas with low pH soils [
]. For example, Eutypa lata, a common pathogen of grapevines, was not found in a single dust sample east of Sierra Nevada mountain range but had reasonably high occurrence probabilities in samples collected in Northern California thus the detection of Eutypa lata suggests a dust sample is more likely to have originated from grape-growing regions in Western US than other regions tested [
]. Alternatively, teratosphaeria microspore was much more ubiquitous, occurring most frequently around the Great Lakes and along the West Coast, therefore the absence of this species suggests that sample is unlikely to have originated from these regions. Flojgaard et al. identified 41 plants with distinct regional or local confined distributions and 37 ‘rare’ plant species where at least one of these 78 species detected in 98/130 sites [
]. Using a mock case sample from an old growth beech forest in Eastern Denmark they used soil biota to predict an origin with intermediate soil pH and moisture, slightly fertile soil and low light conditions indicative of a forest, high forest and oak habitat types and specifically identified M. uniflora which is limited to Eastern parts of Denmark.
Both concepts of indicating provenance of an unknown soil or dust sample show promise for forensic science and demonstrate the ability to detect site-specific eDNA signals from low-biomass samples. Soils offer a wealth of genetic information and thus integration of multiple taxonomic markers can greatly increase the power of soil analysis. However, a broad scale survey of the biogeographic patterns across a particular country or region, an understanding of the ecological distributions of specific taxa, and characterisation of multiple taxa such as plants (pollen), bacteria, animals (e.g. arthropods) are necessary for improved predictions. Often the availability of appropriate reference databases is limited, and therefore questioned samples are typically only compared to a small number of reference samples collected given the context of a particular case or scenario. While MPS is facilitating the generation of large scale soil surveys, primarily for ecological purposes, the feasibility of using soil derived DNA reference datasets to predict the origin of airborne dust samples recovered from clothing or the surface of items remains unknown. Nevertheless, the success of DNA metabarcoding applied to soils, and dust, are encouraging for application of similar methodology to other forms of environmental trace evidence.
3. Forensic botany
3.1 Pollen and diatom assemblages
Plant material, pollen and diatoms are widely recognised as powerful associative evidence used to link a person or object to a location. Such trace evidence can provide information on the source environment and likely origin of evidentiary material but can also be useful in determining whether a death was accidental, suicide or murder and determining if the crime scene is a primary or secondary scene [
]. Similarly, diatoms are a eukaryotic, unicellular, golden brown microalgae most often encountered in naturally occurring water bodies and thus have been studied for forensic scenarios associated with freshwater environments [
The potential to determine a postmortem submersion interval based on algal/diatom diversity on decomposing mammalian carcasses in brackish ponds in Delaware.
However, a single environmental sample will contain many microscopic pollen and diatom species that form a complex assemblage, similar to those observed in soil. For example, microscopic analysis of pollen uses a light microscope and requires at least 200–300 palynomorphs (pollen or spores) to be counted per sample [
]. Wiltshire et al. 2015 applied pollen and fungi analysis to a murder case and suggest that while rare palynological markers can confer high specificity for geolocations, whole assemblages and profiles of rare markers accentuate the ‘uniqueness’ of the profile [
] therefore it is important to obtain a complete picture of the taxa present. As such, detailed characterisation of a complex pollen or diatom assemblage using morphology will be highly time consuming. Furthermore, one of the greatest concerns in both forensic palynology and diatom analysis is the accuracy of identification as the subjective nature of taxonomic assignments could ultimately mislead an investigation [
]. Individuals must be highly skilled yet there are currently no dedicated facilities focussed on this training. While trace levels of plant DNA have previously been detected in illegal drugs [
Combined toxicological and genetic auditing of traditional Chinese medicines provides a means of detecting adulterants and improving pharmacovigilance.
], research into the use of MPS for pollen and diatom assemblages is lacking.
MPS enables multiple pollen grains in a mixed sample to be analysed simultaneously allowing rapid processing of multiple samples and a more objective taxonomic assignment based on DNA [
]. Further, Schield et al. 2016 found that pine (pinus echinate) pollens could maintain a viable source of DNA for criminal investigations after 14 days on cotton clothing [
] suggests the possibility to similarly detect trace levels of pollen on surfaces for forensic analysis, as previously explored for airborne microbiome in dust [
]. While DNA metabarcoding has the potential to detect trace levels of plant DNA, this approach cannot directly differentiate at the source level (i.e. whether the DNA originated from pollen, a seedling or a root fragment), which may provide additional activity level information given the context of a particular case. However, the advantages of MPS lies in cases involving complex mixtures (e.g. soils), or those involving microscopic or trace material (e.g. pollen). Therefore, MPS would not be required in cases where a single root or seed could be identified and isolated, and instead standard DNA barcoding would be sufficient. Despite the potential of pollen DNA profiling to provide a complementary tool to morphological analysis, it has yet to be applied in forensic investigations.
While the MPS approach has been shown to detect a higher number of diatoms (270 compared to 104 identified initially by light microscopy [
]), this tool has not been widely explored in forensic ecology. Early forensic studies applied PCR based approaches targeting plankton to assist with drowning cases: PCR-DGGE of the 16SrRNA gene in tissue samples [
]. The first application of MPS to drowning used 16S rRNA 454 pyrosequencing of blood, tissue and water samples to identify microbes associated with drowning [
] and a subsequent study examined bacterial succession of biofilms formed on submerged vertebrate remains to estimate post-mortem submersion interval [
]. The pyrosequencing approach used produced ‘fingerprints’ which do not allow individual taxa identification and had a limited sensitivity that required a two-step PCR to obtain analysable profiles from some water samples [
], the use of diatom DNA to link a suspect to a waterbody or identify geographical provenance is unexplored. Further, diatom DNA transfer and recovery, crucial elements for demonstrating the feasibility of this approach for forensic science, are also unexplored.
3.2 Plant phyllosphere
The signals observed between the airborne microbiome and surfaces within a particular environment presents the opportunity to explore this concept to enhance forensic botany. Often species identification of a plant fragment is not sufficient to indicate provenance or substantiate a link between a suspect, victim or crime scene due to a high prevalence of that species within the region. While population genetics using microsatellites or SNPs can be used to distinguish between different populations of the same plant species, and has been successfully applied to the illegal timber trade [
], extensive assay development is required for each species of interest which is costly and not ideal in forensic practice where a broad range of species are encountered across different cases. Furthermore, plants in close proximity at the scales encountered in casework, may be very similar genetically and thus cannot be resolved using such DNA techniques. In an ecological context, extensive research has been done investigating the microbiome of the phyllosphere, (the area of plant exposed to the open air and colonised by microbes) which suggests that airborne microbiomes transferred to plant surfaces are site-specific [
]. This could offer a valuable forensic tool in cases where further discrimination at a local scale is required to identity provenance.
4. Human identification
Establishment of the Human Microbiome Project (HMP) has exposed the forensic science community to the possibility of utilising the human-associated bacteria as a novel tool to assist investigations (Table 2). Characterisation of the microbiome has been explored for body tissue prediction and potential use in sexual assault. Tridico et al. 2014 used DNA metabarcoding of the 16S V4 region to demonstrate that individuals harbour unique microbiota on pubic hairs with significant differences observed between males and females, driven by presence of Lactobacillus in females. This study also showed that two co-habiting individuals shared similar microbiome signals indicating the potential of microbial communities to identify links in sexual assault cases [
]. Stahringer et al. 2012 used twins to suggest that salivary microbiome could provide a ‘microbial fingerprint’ specific for individuals as there was very little or no genetic influence on salivary microbiome composition, instead differences were mainly attributed to environmental factors such as diet, oral hygiene, antibiotics, smoking, alcohol and drug consumption [
]. Leake et al. 2016 showed that saliva from two individuals could be differentiated when sampled at four time point over one year using 16S rRNAV5 and rpoB (for deeper classification of streptococci species one of the most abundant aerobic genera found in saliva) [
]. Another study used 16S V3 and V4 to explore body fluid prediction success when an individual has saliva (either neat or diluted) deposited from another donor on their hands [
]. Of samples tested, 94 % correctly predicted as skin or saliva and a significant difference was observed between diluted versus neat saliva suggesting that low levels of saliva on skin will start to resemble the skin microbiome [
]. Benschop et al. 2012 used 16S rRNA V5 and V6 to explore the microbiota in 240 clinical vaginal samples and used this to develop a more targeted microarray for determining vaginal origin [
]. Similarly, Quaak et al. 2017 developed a targeted microbial microarray approach to differentiate faecal samples from multiple individuals and thus link a question sample to a particular donor [
], which has since been expanded and successfully applied in two cases, (a sexual assault and a robbery) to determine the sub-source of question samples [
Table 2Summary of the human-associated microbiome studies that apply MPS in a forensic context. ‘na’ indicates a single time point was examined or sampling times were not reported. # indicates that there was no replication per individual within each time point. ‘NR’ indicates details were not reported.
Saliva, body fluid, faecal matter and hair samples represent sources of relatively high microbial biomass, whereas skin-associated microbiomes transferred to objects via touch represent low biomass samples and are therefore more challenging. While still in its infancy, the concept of a ‘personalised microbiome’ present on the skin of an individual is an attractive concept as it offers an additional human identification tool to associate an individual to a particular event, or location.
4.1 Skin microbiome
Fierer et al. 2010 was first to demonstrate the concept of a ‘personalised microbiome’ in a forensic context by demonstrating that the microbial community similarity between individuals and their own computer mice was greater than the similarity between them and another individual’s computer [
]. A number of studies have since expanded this concept to other substrates including mobile phones, shoes and fabrics which have shown similar promise for differentiating individuals (Table 2) [
]. Regardless of substrate type, all studies indicate that the ‘personalised microbiome’ transferred by touch is likely driven by low abundant taxa. Meadow et al. 2014 used 17 participants to show that, on average, an individual only shared 5% more Operational Taxonomic Units (OTUs) with their own phone than with someone else’s phone. Similarly, Oh et al. found that a small number of low abundant taxa (<5% relative abundance) typically had the highest resolving power between individuals [
] and Lee et al. 2016 reported that specific microbial genus present on the hand of an individual were present on fabric at low abundance (<3%) only after contact [
]. The idea that the low abundant taxa are the discriminating taxa is important for establishing the robustness of this personalised microbiome in a practical setting and implementing quality assurance measures.
The temporal stability of the microbiome over time is a crucial factor in determining the feasibility of skin-surface microbiome matching in a forensic context. A number of studies have demonstrated that the variability in skin microbiota community composition between healthy individuals exceeds the variability within an individual over time [
]. However, only 15 % of the phylotypes observed on the palm (excluding singletons) were observed at any other time point and temporal variation on skin surfaces was driven by the presence and absence of transient taxa [
]. Flores et al. suggest that individual samples collected closer in time did not have more similar communities whereas other studies suggest that long term similarity was lower than short term similarity at the species level [
]. The degree of temporal variability on the skin is also reported to be a personalised feature, where individuals with more diverse microbiotic communities show more stability over time [
] suggesting that samples collected at a given time point may not adequately characterise an individual’s microbiome; this is potentially problematic in a forensic setting. As a result, the rate of change of an individual’s microbiome should perhaps be incorporated into analysis adding further complexity [
], the microbiome present on a surface pre-contact (background signal), and the transfer and persistence of a personalised skin microbiome post-contact, present a complex dynamic that must be considered when evaluating skin microbiome signals. Some studies suggest that surface types prone to rapid turnover (e.g. mobile phones) are less stable in community structure due to low biomass and high volatility in hand-associated microbial communities, indicating that such surfaces may be useful for identifying an individual but less likely to be useful for tracking the recent movements of an individual [
]. While Wilkins et al. 2017 could match the correct occupant or occupants to a household with 67 % accuracy, the accuracy decreased substantially when skin and surface microbiome were collected in different seasons and as time since collection of skin samples increased with no accurate matches made after a delay of three seasons [
]. They showed that both skin and surface microbiota contain populations of stable OTUs that persisted for many seasons (comprising ∼30 % of the OTUs present at any time), as well as transient OTUs that persisted for one season, or shorter, and that the temporal changes select against low abundance microorganisms and thus those most useful in identifying the individual who left the trace. As the stability of microbiota after contact is two-directional, this study also determined that the skin-after-surface delays yielded a higher accuracy than surface-to-skin delays of the same magnitude. While this study provides useful insight, it was done in a single indoor environment where occupants are continuously depositing microbiota, the persistence of human microbiota for forensic science in public spaces and at time scales of hours or days is unknown.
4.2 Alternative approaches to skin microbiome
Whilst the majority of studies to date have utilised the 16S V4 region using the barcoded fusion primers 505R/806 (Table 2) as developed by the Human Microbiome Project (HMP), literature suggests that this approach may not be optimal for characterising skin microbiota [
]. Although the current protocol is well-established, relatively inexpensive and provides useful insight to the potential of the skin-surface microbiome in forensic science, it was primarily developed to characterise microbiota of other habitat types (i.e. gastrointestinal) and therefore does not accurately characterise the skin microbiota [
]. Using the 16S rRNA V4 gene region, Wilkens et al. 2017 demonstrated that on average only 10 % of the surface microbiome could be attributed to human skin-associated families (e.g. Straphylococcaceae, micrococcaceae, Cornyebacteriaceae and Streptococcaceae) with other abundant families (Sphingomonadaceae, Methylobacteriaceae, Pseudomonadaceae, rhodobacteraceae, and Xanthomonadaceae) likely derived from the environment (soil and vegetation) [
]. In particular, the authors indicate an underrepresentation of phylum Actinobacteria which includes the important human skin genus Propionibacterium and suggest that future skin-surface microbiota matching may benefit from additional or alternative primer sets better suited for human taxa. Meisel et al. 2016 also showed that the V4 region poorly captured skin microbiota, especially Propionibacterium, whereas whole metagenome sequencing (WMS) most accurately captured skin microbiome community composition, with the V1-V3 region providing highly similar results [
]. A number of studies have taken a different approach of examining the microbiota at a strain-level to identify SNVs which been more robust to temporal variability [
]. As a result, a novel assay named hidSkinPLex has been developed for forensic identification which incorporates 286 bacterial (and phage) skin microbiome markers [
]. Using the hidSkinPLex, Schmedes et al. 2018 showed that all samples (n = 72) could be correctly classified to individual with up to 94 % accuracy and body site origin up to 86 % accuracy [
]. The hidSkinPLex offers a standardised assay for targeting the most discriminative microbial markers for forensic discrimination of body tissue and individuals, however, no such standards have yet been established for other eDNA applications and there is no clearly defined thresholds for data analysis nor specified criteria for reporting and evaluation of eDNA data.
5. Forensic eDNA moving forward
There is currently a lack of continuity and standardisation in the data analysis and reporting of eDNA comparisons in forensic studies. Regardless of the sample type or taxonomic group targeted, a similar data analysis framework is required to convert sequencing data to a taxonomic assemblage and subsequently determine the similarity of an unknown to a pool of reference samples. While a global effort has been made to characterise and standardise DNA metabarcoding through the Human Microbiome Project and the Earth Microbiome Project, these initiatives are predominantly focussed at ecological, or medical based research. As such, these provide a framework for forensic applications, particularly from a laboratory perspective. However, standardisation, and a specified criterion, for evaluating and reporting eDNA results in the context of quality control must be established.
5.1 Quality controls
Quality assurance is a major criteria for all forensic science and as such similar measures must be considered for eDNA analysis both within the laboratory and during data analysis to ensure that the sequences included are intrinsic to the sample and are not an artefact of data analysis or background signal introduced by reagents or the analytical environment. This is particularly important when analysing microbial signals and low biomass eDNA. A previous study demonstrated that even when measures are taken to control the environment, a microbiome signal can always be detected, even from negative controls in an ancient DNA clean lab [
]. Despite the importance of monitoring background signals for quality control, many forensic eDNA studies either fail to include extraction blank controls (EBCs), do not report them or do not sequence EBCs and filter the sample data accordingly.
DNA metabarcoding is regarded as semi-quantitative as the number of sequences per taxa can only provide an indication of relative abundance due to inherent bias in the methodology. To successfully construct an MPS library from a DNA extract three key components are required: (1) amplification of the barcode region, (2) incorporation of a unique molecular ‘index’ to separate sequences into samples during data analysis, and (3) the incorporation of MPS platform-specific sequencing adapters. The optimal approach involves a single step PCR using fusion primers (PCR primers with unique indexes and adapter sequences included) as this minimises biases in abundance and labels all molecules with the unique index at this initial step. This is the most widely utilised approach for 16S rRNA (bacteria), ITS (fungi) and 18S rRNA (eukaryotes) developed by the EMP and HMP. However, a two-step approach can also be used where the first step amplifies the barcode region via PCR and then the second step introduces the adapter sequences (either by PCR or adapter ligation): indexes can be incorporated at either step one or step two. This approach increases bias in abundance (favouring the most abundant taxa), and where indexes are introduced in the second step there is potential for undetectable cross-contamination as the amplified sequences cannot be traced to the original source. The use of internal controls for more accurate quantitative measures and inclusion of positive controls have been suggested [
A number of additional data filtering approaches have been employed to further ensure confidence in the taxa (or operational taxonomic units, OTUs) included in the analysis. For example, Grantham et al. discarded sequences that were ≤75 % similar to a reference in the fungal UNITE database to restrict analysis to those that they were confident were fungal [
]. Another approach is to accept or reject taxa based on the occurrence and frequency across the sample set. Many microbial community characterisation studies exclude low abundance OTUs as a way to reduce noise without losing information, which has been suggested to enhance the detection of relationships between different samples [
]. However, as we have described in previous sections, the rare taxa are often those that drive the signal, particularly for human skin microbiome samples, and local scale soil and dust samples, therefore the optimal threshold for such filtering is unclear.
Accurate species identification is equally important, if not more important, when such information is used to identify indicators for specific geographical provenance of an unknown sample. As discussed previously for human skin microbiome, a single gene region is not always sufficient for an accurate species level identification and this issue is particularly problematic for plant identification [
]. Current eDNA approaches applied to forensic science, typically target a broad taxonomic group using a single barcode approach which results in limited taxonomic resolution and a high number of unclassified sequences. For example, trnL and rbcL are the most commonly used barcodes for plants which are known to have a limited taxonomic resolution. As a result, multi-locus approaches to characterise plant assemblages have been explored for other applications [
]. Furthermore, the reference database used to assign taxonomy is also crucial. While reliable curated databases exist for the most widely used barcodes (e.g. Greengenes for bacterial 16S rRNA, SILVA for eukaryote 18S rRNA, UNITE for fungal ITS, and Rsyst for diatom rbcL and 18S), other taxonomic groups of interest, or alternative barcodes, require reference sequences to be sourced from NCBI or a reference sequences must be generated from known voucher specimens [
]. Furthermore, detection of a robust indicator taxa should be supported by a minimum sequence coverage threshold, for which currently there are no guidelines.
5.2 Moving towards a likelihood ratio
The analysis techniques used to assess similarities or differences between profiles should possess a high level of objectivity, while being comprehensible to a trier of fact [
]. Comparisons between samples requires rarefaction to standardise the sequence depth across all samples, which can vary significantly across forensic studies and is sometimes not reported. Habtom et al. 2017 suggest that the discriminative power of MPS was correlated with sequencing depth ranging from 6,000 to 60,000 sequences per sample [
]. High biomass samples typically allow for a higher sequencing depth compared to low biomass samples especially if background sequences from EBCs are appropriately considered [
]. Currently, there are no recommended minimum thresholds for sequencing depth for samples to be accepted and subsequently included in determination of a “match”. Due to these inconsistencies, comparisons between multiple datasets becomes difficult. To resolve this, Allwood et al. 2020 have proposed a standardised approach for fungi analysis in dust [
Initial studies that explore the feasibility of eDNA for forensic comparisons report the similarity or differences between samples from different locations or individuals and display results using abundance charts, non-metric multidimensional scaling (NMDS) and Hierarchical clustering analysis (Fig. 1), sometimes with associated statistical significance e.g. using PERMANOVA. While useful visualisation tools, the latter two analyses rely on the selection of a dissimilarity measure to compare community structures between samples. In microbial ecology, UniFrac is recommended which takes phylogenetic distance of the taxa present into account, however many forensic studies apply Bray-Curtis dissimilarity (BC) which is based solely on the number and abundance of shared taxa between two profiles. Jesmok et al. 2016 suggest that for forensic soil comparisons, Sorensen-Dice dissimilarity outperformed BC resulting in tighter location clustering and higher classification accuracy [
]. As Sorensen-Dice is calculated based only on the number of unique sequences shared across two profiles the authors suggest that this measure is less sensitive to fluctuations that occur over space and time. While these visualisation tools present findings and relationships between samples in a simplistic manner (Fig. 1), they do not conform to the Bayesian frameworks that the human DNA profiling have become accustomed.
Fig. 1Simple example of the three most commonly used eDNA data visualisation tools. (A) Abundance charts showing the relative abundance of 10 taxa detected at three different sites. The similarity/dissimilarity of eDNA signals obtained from replicate samples across three sites (indicated by colour) using (B) non-metric Multidimensional Scaling (NMDS), and (C) Hierarchical cluster analysis. The full circles represent reference samples and the hollow circle represents an unknown sample originating from the blue site (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article).
The decreasing cost of MPS has enabled high-throughput analysis of samples making it possible to apply more robust statistical analysis to eDNA for predicting the most likely origin of an unknown sample given a reference sample set [
]. One study evaluated five machine learning algorithms in R (K-Nearest Neighbour, Decision Trees, Random Forest and Neural Networks) for their respective ability to recognise biotic patterns in soils, and thus accurately classify samples at different spatial scales regardless of seasonal collection [
]. This study found that Random Forest significantly outperformed all other algorithms regardless of spatial level. Another study used discriminant analysis built on Bayes Theorem to classify a new observation by estimating the distribution of each species’ occurrence probability using available samples and inverting the probabilities to predict the spatial origin of a new sample [
]. This Bayes rule approach selects the spatial location that maximizes the log-likelihood of the new sample and provides a measure of certainty on each prediction. Only a single study has introduced a likelihood-ratio framework for quantitative evaluation of soil microbial DNA profile evidence where two competing propositions are considered based on the probability of observing each Bray Curtis score if each hypothesis is correct [
]. Quaak et al. 2018 introduced a similar decision model to predict the cellular source of a sample using microbiomes by displaying a histogram of Bray Curtis distance between samples from the same individual and samples from different individuals [
], this approach was subsequently applied in two cases allowing the likelihood of results to be presented using a verbal scale, in line with standard practice in human DNA profiling [
]. Hanssen et al. 2017 have also suggest the idea of an ‘inconclusive’ category to avoid false negatives where intermediate posterior probabilities indicate uncertain predictions [
]. While these studies move towards a forensic framework for evaluating comparison eDNA evidence, Jesmok et al. 2016 suggests a single analysis technique is unlikely to meet forensic needs and thus a combination of taxonomic abundance charts and NMDS, for example, as well as more objective statistical representation e.g. machine learning [
5.3 Number of reference samples required for eDNA evaluation
Traditionally forensic ecology comparisons of soils, pollen, or diatom assemblages do not rely on large reference databases or statistics to evaluate an association, yet with eDNA there is a common assumption/requirement that robust statistics be applied, such as those described in the previous section. This is a major barrier for translating eDNA analysis into casework and integrating this tool with existing forensic ecological approaches. For forensic DNA analysis of human STR profiles, allele frequency databases of different ethnicities from the local population are collected and assembled from which robust statistics can be reported to provide an acceptably accurate measure of the rarity of a DNA profile [
]. Typically, reference samples are collected only in context of the specific case, such as crime scene sites, alibi site and suspect’s home, and are compared with the unknown sample. For example, a murder case involving pollen and fungi analysis only looked at the crime scene and four local reference points that the suspect may have been likely to have picked up pollen or fungi on their shoes [
]. In this case, no statistical analysis was used on the assemblages to evaluate the ‘strength’ of the match. Similarly, a burglary case where a high abundance of hypericum pollen grains were recovered from a suspect’s clothing, the value of the evidence was supported only by the lack of detection of hypericum in forensic work over the past 30 years rather than statistical evaluation [
]. The same is the case for soil evidence where questioned samples are compared to a small number of control samples and reported as a verbal scale of ‘Degree of Comparability’ based on the number of physical and chemical characteristics in common between two samples [
Guidelines for Conducting Criminal and Environmental Soil Forensic Investigations (version 10.1): Centre for Australian Forensic Soil Science, Report CAFSS_076.
As the statistical evaluation of eDNA in a forensic context improves, the number and appropriate selection of reference samples required for comparisons remains an open question. From an eDNA perspective it has been implied that careful selection of the pool of potential matches may be one of the most important determinants of prediction accuracy in a practical setting as the accuracy is highly sensitive to the number of individuals, or groups of individuals, to which a sample could be matched [
]. Similarly, Jesmok et al. 2016 showed that overlap among habitats often occurred in NMDS plots when many habitat types were examined simultaneously but sites could be resolved when pairs of triads were compared separately [
]. This was due to NMDS forming clusters via the rank order of dissimilarity for all profiles being compared so highly similar profiles, or sets of profiles, will be forced together, resulting in misleading similarity between them. For soils, a systematic approach can be applied where DNA would only likely be applied to discriminate soils of similar soil type [
], however, for samples such as water, or surface swabs, no such matrix exists for analysis prior to eDNA to assist with this issue. Therefore, clear guidelines and minimum sampling requirements should be established for the evaluation of eDNA evidence from different sample types.
5.4 Validation
As with many areas of forensic science, eDNA originated in a research environment and requires validation for successful transition to operational casework. Validation of microbial forensics (which could more broadly be applied to eDNA analysis) has been described to incorporate technical validation, (or developmental validation), biological variation, and forensic validation [
]. Technical validation encompasses criteria such as detection limit, reproducibility, robustness, specificity, correct data handling, thresholds and bioinformatic analysis. Biological validation demonstrates the variability of the eDNA signal using databases of relevant samples for a probabilistic approach and development of probabilistic models and statistics. Forensic validation demonstrates the application of the technique to forensic samples given the limitations encountered such as limited sample size, environmental exposure, mixing with other eDNA signals and time delays. While technical validation has largely been addressed for soil and human microbiome applications, the high-throughput capability offered by MPS is increasing the biological validation criterion. Drawing from studies in broader ecology, we are gaining an understanding of the natural eDNA signal variation associated with environmental variables such as season, rainfall and temperature. However, from a forensic viewpoint, an understanding of sample mixing dynamics, and the impact of reduced sample size (e.g. dust vs soil) will provide additional validation data to better understand the potential of this trace evidence and help to identify the boundaries within which this method can be robustly applied. The third criteria, forensic validation, is more challenging to address without court acceptance of the technique. To overcome this, studies typically introduce mock case samples, blind testing and inter-laboratory collaborations (e.g. MiSAFE or ENFSI Animal, Plant and Soil traces group http://enfsi.eu/about-enfsi/structure/working-groups/animal-plant-and-soil-traces/), or direct comparison to existing court accepted methods as in the case of soil [
]. Nevertheless, it has been suggested that forensic validation should not prevent the use of these techniques in casework but stress the need to be prudent and analyse the limitations and benefits [
] in order to demonstrate the use of the technique in practice and ultimately obtain forensic validation.
eDNA encompasses many sample types and targets complex signals that do not arise from a discrete entity, as is the case in human DNA profiling, therefore it is unlikely that a single developed and validated standard operating procedure (SOP) will be appropriate for all eDNA scenarios. As a result, Budowle et al. 2008 provide a detailed framework for developing a validation plan and a checklist for microbial forensics to encompass all aspects from sample collection and preservation, DNA extraction, data analysis and interpretation, in attempt to establish minimal acceptable validation criteria for microbial forensics [
]; such framework is equally applicable to eDNA applications. For all new techniques, a detailed SOP should document all processes including the appropriate controls, a list of appropriate literature references for each validation stage, a qualitative and quantitative statement about the analysis outcome, databases and analysis thresholds applied, as well as the conditions under which the standard interpretation is not effective or reliable. Such validation has been achieved in a number of instances for example, identification of an unknown sample to a body tissue by the NFI [
As the methodology for obtaining complex biological signals from environmental samples is generally accepted by the scientific community, specific validation measures can be explored for the application to forensic science. While this approach shows promise in an experimental setting, in reality, there will be microbial interaction between the sample surface and different environments over time if that surface is moved from the original location. This presents a layer of complexity when applying eDNA as a forensic tool. Undoubtedly, there will be a time lapse between the removal of the item from the source environment and the recovery of the item by authorities. Furthermore, to obtain authentic microbial signals from low-biomass eDNA samples one must have an understanding of the additional quality controls required. Nevertheless, MPS has revolutionised the field of ecology and now paves the way for more sensitive and discriminatory techniques within forensic ecology.
References
Wiltshire P.E.
Forensic ecology, botany, and palynology: some aspects of their role in criminal investigation.
Criminal and Environmental Soil Forensics. Springer,
2009: 129-149
Spatial variability in airborne bacterial communities across land-use types and their relationship to the bacterial communities of potential source environments.
The potential to determine a postmortem submersion interval based on algal/diatom diversity on decomposing mammalian carcasses in brackish ponds in Delaware.
Combined toxicological and genetic auditing of traditional Chinese medicines provides a means of detecting adulterants and improving pharmacovigilance.
Guidelines for Conducting Criminal and Environmental Soil Forensic Investigations (version 10.1): Centre for Australian Forensic Soil Science, Report CAFSS_076.