If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Forensic Science SA, PO Box 2790, Adelaide, SA 5000, AustraliaSchool of Biological Sciences, Flinders University, GPO Box 2100 Adelaide, SA, 5001, Australia
Faculty of Law, Criminal Justice and Public Administration, School of Criminal Justice, University of Lausanne, Lausanne-Dorigny, SwitzerlandFaculty of Law, Criminal Justice and Public Administration, School of Criminal Justice and Fondation pour la formation continue UNIL-EPFL, University of Lausanne, Lausanne-Dorigny, Switzerland
We provide a Bayesian network construction methodology for forensic DNA results.
•
The methodology features evaluation given competing activity propositions.
•
Our modelling approach leads to network architectures containing standard elements.
•
We demonstrate the process on a mock case scenario.
Abstract
The hierarchy of propositions has been accepted amongst the forensic science community for some time. It is also accepted that the higher up the hierarchy the propositions are, against which the scientist are competent to evaluate their results, the more directly useful the testimony will be to the court. Because each case represents a unique set of circumstances and findings, it is difficult to come up with a standard structure for evaluation. One common tool that assists in this task is Bayesian networks (BNs). There is much diversity in the way that BN can be constructed. In this work, we develop a template for BN construction that allows sufficient flexibility to address most cases, but enough commonality and structure that the flow of information in the BN is readily recognised at a glance. We provide seven steps that can be used to construct BNs within this structure and demonstrate how they can be applied, using a case example.
]. It is also accepted that the higher up the hierarchy the propositions are, against which the scientists are competent to evaluate their results, the more directly useful the testimony will be to the court limiting the risk of misleading [
]. A number of advisory bodies, and leading thinkers in the field of forensic science, advocate the evaluation of results in light of propositions regarding competing activities in preference to only considering questions of source (or sub-source, or even sub-sub-source [
The Logic of Forensic Proof: Inferential Reasoning in Criminal Evidence and Forensic Science: Guidance for Judges, Lawyers, Forensic Scientists and Expert Witnesses.
The cases where forensic results ought to be reported considering activity level propositions have been delineated in the ENFSI guideline for evaluative reporting [
]. It is needed when the amount of collected trace material is low and that considerations of transfer, persistence and recovery require specialised forensic knowledge. There is a widespread recognition that there is danger in leaving such assessments to non-forensic scientists and that it is the duty of the scientists to guide the court appropriately [
]. Other typical cases where the findings lead themselves naturally to an interpretation considering activity level propositions is when the source of the trace material is not disputed in the case, but only the mechanisms whereby the trace material was transferred is debated.
There have been numerous publications demonstrating the importance of considering activity level propositions when evaluating forensic results [
Probabilistic evidential assessment of gunshot residue particle evidence (Part I): Likelihood ratio calculation and case pre-assessment using Bayesian networks.
]. These developments reflect a tendency towards a broader recognition and acceptance of evaluations of findings given activity level propositions amongst the forensic community (and specifically, as is the focus of our paper, the forensic biology community).
The primary author’s laboratory (Forensic Science SA) has recently begun providing reports considering activity level propositions and has received very positive feedback from stakeholders.
However, the transition from recommendations to widespread operational practise is not easy for two reasons that are often raised by practitioners: (1) the paucity of data to support the reporting and (2) the lack of a standardised way to construct the inferential scheme. In this paper, we deal with the second aspect with the objective to facilitate the transition.
Because each case represents a unique set of circumstances, allegations (regarding posited activities) and results, it is difficult to come up with a standard structure for evaluation. One common tool that assists in this task is Bayesian networks (BN), which are a graphical way of displaying and conducting complex probability evaluations [
A BN is a representation of conditional independence assumptions made when evaluating a set of findings. Different BN architectures that have been constructed on the same set of assumptions will evaluate findings in the same way. When we state that there is 'diversity in the construction' we mean that different scientists may hold different conditional independence assumptions in the first place, and that this is one reason that leads to diversity in BN structures.
One consequence of this is that there are many different ways of probabilistically evaluating the same set of findings. The hope would be that a set of results, even being evaluated in BNs with different architecture, would yield a similar level of support for one proposition over the other. It should be noted, however, that different BN architectures may reflect different assumptions and underlying assessments, which may lead to differences in assigned values of evidence. The use of sensitivity analyses can help examine how robust the resulting strength of evidence assessment (here interpreted in terms of a likelihood ratio, LR for short) is to probability assignments underlying the nodes of the BN [
Using sensitivity analyses in Bayesian Networks to highlight the impact of data paucity and direct future analyses: a contribution to the debate on measuring and reporting the precision of likelihood ratios.
]. There is, however, little guidance on the incorporation of biological forensic results into a BN that considers competing propositions regarding activities.
To address this issue, and coming out of a necessity of routine casework, we have developed a template for BN construction that allows flexibility across cases, but enough commonality and structure that the flow of information in the BN is readily recognised at a glance. Although the method that we provide here is by no means the only manner in which BNs can be constructed, we share our methodology so that others can benefit, and build on what we present. We explain our method by working through an example.
1.2 A practical focus on Bayesian network construction
This paper came as a result of one of the authors implementing an evidence evaluation service in their DNA laboratory, which considered competing activity level propositions. The standard up until that point in the laboratory had been evaluating the DNA results using sub-source level propositions and providing no guidance on their meaning in relation to activities, except if asked in court.
What the author found when implementing their evaluation system, with BNs as a supporting framework, was that there were several key difficulties to applying the theory in practise:
1)
As previously mentioned, there is a lack of simple step-by-step guides on constructing BNs in situations in which one wishes to consider competing activities for evaluating forensic biology findings. It is often difficult to see how to apply the general texts on BN construction to practical situations which are characterised by the many complexities and deficiencies of the real world of a practicing laboratory. The authors’ experience from speaking to individuals from laboratories, reporting scientists and managers, about BNs and activity level evaluation is that they feel that the field is too complex to break into. This complexity is a barrier to better science being done, so one of the aims of this paper is to try and break that barrier down with an accessible, semi-instructional, publication.
2)
The pressures on laboratories are for high throughput, quick turnaround times, standardisation, ease of use, ease of explanation and comprehension. The method we report here has come from the implementation in a practicing forensic biology laboratory and so has been developed with laboratory pressures in mind. For example, as the reader reaches later sections of the paper, the BN construction we describe can lead to structures seemingly more detailed than necessary, with a possible decrease in efficiency of the BN. However, our experience has found that the increase in comprehension, and the increase in the ability to explain the pathways of transfer being considered to a lay person obtained through a rich network structure is considerable. It is a step out from the usual construction of BNs, by using them not just as a means of assigning and computing probabilities, but as a graphical means of explaining the evaluation to others. The use of such network structures makes it possible, at a glance, to see what considerations have been made in the evaluation of results.
3)
Theory tells us that BN construction should be possible prior to the findings being obtained. This is the founding idea in triaging processes such Case Assessment and Interpretation (CAI) [
]. The reality of constructing BNs for cases can make this difficult, and it will often not be practical to fine tune construction of a BN until, at least, it is known what examinations are to be conducted. The reality of forensic science is that laboratories:
A.)
Work under conditions where they have limited and incomplete information about the alleged crime;
B.)
Have limited resources and time to examine exhibits in a case, or to rework sub-optimal samples;
C.)
Will often have one chance to examine an item and so will sample any biological material of potential importance (even if this does not have an explanation in amongst the information given to them about the crime);
D.)
Must choose which exhibits (out of potentially many that are of probative importance) will be examined, or which will be examined dependant on the findings of other examinations;
E.)
Work in a segmented manner, i.e. cases may be examined by individuals prior to being allocated to someone else who must evaluate the findings and report the results.
These factors make it impractical to develop BNs prior to findings being obtained.
4)
There is a strong desire for standardisation in operational forensic laboratories. This desire leads to the need for easy and quick, but still sound, BN construction, peer review, and the ability for a scientist to pick up another’s work to testify in court if required. It is also a desire by accreditation bodies.
Our hope is that the method described below achieves the goal of providing an accessible way to think about constructing BNs that is sound in theory, but also meets the needs of an everyday forensic laboratory.
2. Methods
We construct BN for cases using software Hugin Expert (www.hugin.com). The probabilities used in the conditional probability tables are constantly updated for casework from the literature and in-house studies. While there is much contention around the application of results from one study and applying to another, slightly different, situation, we do not touch on those issues here. For the reader who wishes to explore such topics we suggest [
Using sensitivity analyses in Bayesian Networks to highlight the impact of data paucity and direct future analyses: a contribution to the debate on measuring and reporting the precision of likelihood ratios.
]. The work here is more focussed on the development of BN architecture to address case findings.
2.1 Steps of BN construction
We provide seven steps that describe the general process of BN construction when evaluating forensic biology results in light of competing propositions regarding activities. Some of these steps are stated very generally and there exist entire studies that could go into more detail. During the seven steps we will provide a worked example of their use.
The steps are:
Step 1: Define the main proposition node
Step 2: Define activity node(s)
Step 3: Group similar findings
Step 4: Define findings node(s)
Step 5: Define transfer and persistence node(s)
Step 6: Define root nodes(s)
Step 7: Checking for absolute support within the BN
Note that in all examples we use in the paper we colour the nodes so that black refers to the propositional node, blue to the activity nodes, yellow to the transfer, persistence and accumulation nodes,
An accumulation node fuses two or more variables into a single node. We term this ‘accumulation’ as these nodes typically refer to the accumulation of biological material. It is the opposite of a technique called parent divorcing [
] where the complexity of a node is broken down by reformulating it in one or several layers of intermediate, more basic nodes.
red to the findings nodes and grey to the root nodes. This colouring has no other purpose than providing quick comprehension of the BN at a glance.
3. Worked example
We start with the following case scenario. A 24-year-old girl (C), who normally lives with her biological mother (M) and father (F) has stayed for a week at her older brother’s (D) house. A friend of the girl receives a phone call from the girl stating that her brother has bitten her on the vagina, over her underwear. The friend picks up the girl and they go immediately to the police, where the underwear is seized
We assume for our calculation the underwear was appropriately seized, packaged, transported and stored. We also do not consider aspects of degradation between seizing and sampling or redistribution of DNA on the inner packaging in our evaluation.
and a reference from the girl taken. The police then arrest the brother and take a reference DNA sample from him. We have the following:
•
The prosecution case is that the brother has bitten the girl on the vagina, over the underwear.
•
The defence case is that the girl has been staying at the brother’s home, but no biting occurred.
The underwear is examined at the local forensic science centre and the following was found:
1.
Faecal staining was present on the inner and outer crotch of the underwear.
2.
An RSID test for saliva on the crotch of the underwear gave a positive reaction.
3.
A tapelift of the outer crotch of the underwear yielded a single source DNA profile that matched the complainant. There was no indication of a second contributor to the autosomal profile, however the quantification result revealed the presence of low level of male DNA. The presence of the complainant’s DNA in such high amounts meant that male DNA was not able to be profiled using autosomal profiling systems. No other autosomal profiling was carried out.
4.
Y-STR profiling of the outer crotch tapelift DNA extract yielded a single source profile that matched the brother’s Y-STR reference.
5.
A tapelift of the outer front of the underwear yields a single source DNA profile that matched the girl. The quantification result reveals no male DNA to be present. No further work is carried out on this item.
We now set to develop the BN to assess these results in light of the activity level propositions. We also start using the single letter abbreviations for individuals, so that the descriptions align with figures.
3.1 Step 1: define proposition node
Determine the competing propositions that reflects what each party is putting forward and all associated activities with each. Remember that when one party is suggesting that an individual has not been involved in an activity, this may mean that they are stating the activity did not occur, or it may be that they are stating the activity occurred, but with someone other than the individual. These two options will result in different BN constructions.
In our example the competing propositions that reflect respectively, prosecution and defence case information, in this matter have been given as:
•
D has bitten C on the vagina, over her underwear.
•
C has been staying at D’s home, but no biting occurred.
Under the defence proposition there is no indication that anyone bit C and so there is no need to consider an alternate offender in the BN. An example, that would imply an alternate offender would be for example when it is not disputed that an assault took place.
3.2 Step 2: define activity node(s)
Draw a propositional node and one node for each activity (it is important that this should be a real activity and not an explanation of the phenomenon, such as saying ‘secondary transfer’). Make the propositional node the parent for all the activity nodes. We advocate using this structure even for activities that are not in question (if they are true under both propositions) i.e. the activities that are important to the evaluation of the findings, but are common in the description of events by both prosecution and defence (e.g. the victim and suspect had dinner together, prior to an alleged assault). This creates a more populated BN (i.e. if one specific state of a node is always true a BN could be constructed where the node is not present), but the added complexity is necessary to properly account for non-disputed activities that may impact on the forensic results.
For our example, the main propositions are broken into the three sub-activities defined as follows
Note that activities 2 and 3 in the list could be considered background information, and as such not dependent on the propositions. Indeed, in the BN we demonstrate the nodes for activities 2 and 3 possess the same probabilities given both states of the propositional node, and we could simply omit the arcs drawn from proposition to these two activity nodes without any effect on the evaluation. We could, in fact omit the nodes altogether and adjust probabilities in other nodes. While the inclusion of these nodes in the BN is a loss of efficiency, in the computational sense of BN construction, we have found the improvement in comprehensibility of the BN, and how it relates to the case, is substantial, and worth the trade-off. We draw in the arcs from proposition to activity nodes because, while the activities have occurred under both propositions, they do make up part of the version of events for each party, and could be considered as part of an extended form of the proposition i.e. instead of the propositions given in step 1 an extended form may be:
•
C normally lives with F, but had been recently staying with D in his home, during which time D has bitten C on the vagina, over her underwear,
•
C normally lives with F, but had been recently staying with D in his home, during which no biting occurred.
:
1.
D bit C
2.
C and D lived together
3.
C and F cohabitated
In this case the third activity that we will consider is required due to the fact that because the DNA profiling evidence is Y-STR and D is the son of F, they will share a Y-STR profile. As C normally lived together with F it may be that the male DNA detected on the underwear of C is from F. Drawing the components of the BN so far is shown in Fig. 1.
Fig. 1BN for worked example after first two steps of BN construction.
We recognise the term ‘similar’ is vague. In this context, we mean samples that are taken from closely related areas of the same sample, where we might expect that due to propensity for DNA to transfer, a presence of DNA in one area and absence in an adjacent is likely to have little evaluative difference to simply considering the presence of DNA in the general area. There is no general manner in which ‘similar’ can be defined. Whether findings are grouped is a decision that needs to be made on a case by case basis.
samples e.g. palm swab/finger swab/fingernail swab or tapelift of inner front, crotch, and front waistband of underpants. The propensity of DNA to transfer and the sensitivity of DNA analysis mean that the collected trace material can effectively be considered a single item. Considering multiple items separately leads to a complex set of dependencies, for which their full consideration will not bring more insight into the evaluation of the findings in this case.
In our case, we combine the findings of the two tapelifts of the outer surface of the underwear. In both cases the results were that high levels of C’s DNA were present (this is expected as they are her underwear and would be assumed during DNA profile interpretation). On one of the tapelifts, low levels of male DNA were detected and the subsequent Y-chromosome profiling generated a Y-STR profile. We will consider the two tapelifts as being one large tapelift of the entire outer front and crotch of the underwear and consider that low levels of male DNA were present.
3.4 Step 4: define findings node(s)
Add ‘findings’ nodes (at this stage unlinked to the activity or proposition node) below the activity nodes. There will be one findings node for each group of findings relevant to the propositions. It is advisable not to ignore, or leave out results from the BN as they can have an impact on the relative support given to each proposition. This is true even for screening tests for body fluid that found no indication of that fluid and so no samples were taken for DNA analysis. Also, the finer the resolution between node states (for example DNA amounts could be expressed in 1 ng brackets, or at a finer resolution in 100 pg brackets) that can be used in a findings node the better the BN will be able to use the information in the findings. The offset for this is that sometimes the availability of data will be such that few (or often only a binary presence/absence) delineations are possible.
In our running example, we now add the findings nodes to the BN. There are two results, one is that examining the outer surface of the crotch of the underwear yielded a positive RSID result for saliva and the other is that a low level of male DNA was found that had a matching Y-STR DNA profile with D. Because D and F are father and son, they are expected to possess the same Y-STR profile and so we have called the result the presence of the ‘Family YSTR profile’. Adding these two findings nodes leads to the BN seen in Fig. 2.
Fig. 2BN for worked example after first four steps of BN construction.
The fact that only a low quantity of male DNA had been detected has been incorporated into the YSTR finding (that can be high or low in terms of quantity). By doing so it allows to bring significance to the low quantity detected in this case. There are alternative valid ways to construct these findings, for example with three nodes e.g.: one of the RSID result, one for the quantity of male DNA and one for the YSTR profile but without considering the quantity. Here, we decided that the quantity of DNA could be jointly considered with the YSTR findings.
3.5 Step 5: define transfer and persistence node(s)
Add transfer and persistence nodes that describe the mechanisms by which the activities would lead to the findings. The following should be considered:
•
There may be multiple activities that all contribute to a single result.
•
Some pathways will require multiple steps (nodes).
•
The order of activities, and therefore transfers, may be important and could affect the way in which transfers are mapped.
•
There may be nodes which are purely present as ‘accumulation nodes’ that combine the results of multiple transfers to the same object. These may not be strictly necessary (in that the BN could be constructed without them) but can help with comprehensibility.
Now we turn to our running example. In this step the transfer, persistence and accumulation nodes are added that link the activities to the findings. All three activities can lead to the detection of the family Y-STR profile on the outer surface of C’s underwear and for simplicity we will add a node to accumulate the two sources of D’s DNA before the YSTR findings node. Only the activity of biting would lead to the presence of D’s saliva on C’s underwear and so only one path from this activity node to the RSID node will be present. We left out on purpose the possibility for D’s saliva being present on his hands and being transferred on the C’s underwear by touching it. The aim of the BN is to reflect the alleged incident and not to cover all possible explanations that could lead to the observed outcomes. In the present case, D stated that C only stayed at his home and denied any activity with the underwear of C. Hence the explanation of a saliva contaminated hand coming to contact with the underwear is not considered as a potential route to the RSID finding. The resulting BN after adding in the nodes for step 5 can be seen in Fig. 3.
Fig. 3BN for worked example after first five steps of BN construction.
Again, the level of complexity of the modelling used in the transfer and persistence nodes can vary depending on:
•
the need for finer scale considerations within the BN;
•
the ability of the user;
•
the availability of data for modelling.
For example, the ‘D saliva on C underwear from biting’ node in our example possesses three states, ‘high’, ‘low’ and ‘none’. Readers who view the Supplementary material will see how we assign probabilities to these states from available literature. Results from different studies were combined to assign these probabilities as we could find no study that directly studied saliva transferred from biting clothes. An alternative to this modelling, done outside the BN in our example, would have been to construct nodes within the BN that represented the distributions for each element of calculation. This would have had the advantage that it would allow for any likelihood function (such as [
]), at the cost of additional complexity within the BN. Such modelling is beyond the scope of what we wish to demonstrate in this paper, and while we keep the BN quite simple, the interested reader could see Biedermann et al. [
Implementing statistical learning methods through Bayesian networks (Part 2): Bayesian evaluations for results of black toner analyses in forensic document examination.
] for further information in modelling distributions within a BN.
3.6 Step 6: define root nodes(s)
Add in ‘root’ nodes. These are nodes that do not refer to any activities, but have a relevance parental relationship with either the transfer steps or the findings nodes. Examples of this type of node are: background levels of saliva on underpants, background levels of DNA on hands, or contamination of exhibit.
In our example, at this stage the BN is starting to take form and clear pathways are present that represent propositions about activities and results, and connect them. We need to consider root nodes that can also explain some of the findings. There is a fine line between the number of these nodes that should be added and the complexity of the BN. For example we could add in nodes to consider the possibility of contamination, or coincidentally matching background DNA to the YSTR findings node, however given the alternate possibility of the source of the DNA being F, given the rarity of a contamination event in this case or the profile frequency in comparison to most of the other transfer and persistence probabilities and given the fact that both prosecution and defence are willing to accept that this result has arisen from either D (by innocent means or not) or F, the additional nodes will add very little to the BN and have negligible effect on the strength of the results (i.e., the likelihood ratio).
However, it is important to consider root nodes for the positive RSID saliva test on the outer crotch of the underwear, specifically given the presence of faeces on the underwear (a substance known to give positive RSID results [
]). We add two root nodes, one for the effect that the presence of faeces will have and one for general background levels of saliva on underwear. Note that if a very compact BN was desired, and given that the faeces node will be instantiated,
Any state within any node of the BN can be set as being true (with all other states within that node therefore being false). Information provided to a BN in this manner is called ‘instantiation’ (i.e. the user is instantiating nodes) and once done the laws of probability can be used to propagate the information throughout the BN and update the probabilities for states in non-instantiated nodes.
these root nodes could be omitted, with their probabilities being added directly to the RSID findings node. However, laying the BN out in the manner shown in Fig. 4, makes it clear at a glance (and potentially much more comprehensible to a lay person) what considerations have been taken into account.
Fig. 4BN for worked example after first six steps of BN construction.
3.7 Step 7: checking for absolute support within the BN
The final step relates partially to the architecture of the BN, but also the probabilities entered into conditional probability tables that underlie each node. Try to avoid specifying BNs where the instantiation of any single finding will lead to all the posterior probability being placed on a single proposition, because this would amount to a categorical conclusion.
Note that a premise of our analysis is that various competing versions of the alleged events can lead to the same findings (DNA profiling results), though with different probabilities. Hence, given the findings, one cannot then arrive at a categorical conclusion about a single version of the event (alleged activity), to the exclusion of the other possibilities.
This can be avoided in two ways. Firstly, for the probabilities associated with transfer and persistence used in the conditional probability tables that underlie each node, do not use values of 0 or 1. For nodes whose probability tables rely on counts of experimental observations, probability assignment for node states is based on standard techniques for probability specification [
For example, imagine the simple case in which a node contains only two states, such as ‘transfer’ and ‘no transfer’, as may be the case for a variable that considers whether or not material will be transferred as a result of the activity. Experimental conditions, that seek to replicate the activity, are conducted and from 20 experiments, 15 transfers were observed, and 5 absences of transfer were observed. In this example there are two states (I) that the node can take and so the probabilities entered into the node are (15 + 1)/(20 + 2) ∼ 0.73 for ‘transfer’ and (5 + 1)/(20 + 2) ∼ 0.27 for ‘no transfer’.
The exception to the above is when considering accumulation nodes (such as the ‘D DNA present on C underwear’ node in Fig. 4, which will contain only values of 0 or 1), where transfer and persistence is not being considered, but rather the combination of two or more findings.
Secondly, all results should be observable under either proposition. This will typically mean that there will be either two competing activities that can lead to the same finding (for example the presence of the Family YSTR on the underwear of C can be explained by either D having bitten C, D having cohabited with C, or C living with F), or an activity and a ‘root node’ will present some alternate account of the findings (for example the presence of saliva on the underwear of C could be from D biting C, or from the background presence of saliva expected on underwear). In some cases, this may be in the form of a coincidental DNA profile match or a contamination event. One way to test whether the BN has been constructed to satisfy this point is to firstly instantiate one state of the proposition node and check that none of the findings nodes have all their probability on a single state (the red nodes should not have a state equal to 100%). Then instantiate the other state in the propositional node and carry out the same check.
There may be an exception to the above and this could be when it is known ahead of time that only a negative result has been obtained. For example, a vaginal swab from an alleged rape may be examined for sperm and none found. In this case, it may be acceptable to only provide pathways for the two states of the propositional node to account for an absence of sperm and not worry about the presence side. While such a construction is easier and less complex, it is also specific to the case at hand and will not be broadly applicable to other sexual assault cases without the presence of sperm pathways filled.
In our example, from the manner we have constructed the BN it is clear that there are multiple routes that lead to the presence of the family YSTR on the underpants of C, some of which are innocent and one of which aligns with the alleged criminal activity. For the RSID test node we have the one pathway that aligns with the alleged criminal activity, but also have two other pathways that are represented by the two root nodes. Note that instantiating (either state) the propositional node does not result in all posterior probability being assigned to one state of the findings nodes (see Fig. 5). This means that the BN will not end up with a situation where all probability is present on one of the states in the propositional node when findings nodes are instantiated. Note that combinations of instantiations of root nodes may cause this effect depending on how they are set up and so should also be instantiated to the state they will be in the final evaluation for the step 7 check (as they are in Fig. 5).
Fig. 5BN for worked example completed, with Hp (upper) and then Hd (lower) instantiated in proposition node.
Note that the values shown for the probabilities of each state in Fig. 5 depend on the probabilities entered into conditional probability tables. We have not expanded on how we chose the delineations for each node or the data we have relied on to populate probabilities in the main text of the paper, as this work is about the qualitative BN architecture only. However, for the interested reader, the worked example we have presented here is provided as Supplementary material, which goes into how each probability was assigned.
4. Discussion
Having constructed the BN and filled the conditional probability tables for each node the value of the evidence can now be evaluated using the BN. There are two ways in which this can be done that yield the same numerical value, but technically arise from considering different terms within Bayesian inference. We start with the well-known formula:
Where I represents case information, and has been included to underline its importance for activity level propositions. Consider that the propositions in the above formula relate to the propositional node of the BN that has been constructed. When assigning probabilities to the propositional node it is common for equal values to be used (0.5 for both Hp and Hd in this instance). When this is the case we end up with:
as the ratio of the prior probabilities is one. Note that this is only in the context of the BN, as it is being used by the scientist and does not relate to the probabilities assigned to the propositions by the Court or jury. As scientists, we seek the likelihood ratio, which is the right-hand term in the equation above. Given the above relationship between posterior odds and likelihood ratio, there are two ways of obtaining the numerical value of the likelihood ratio. One is to calculate the ratio of the probabilities associated with the findings in the case (in a results node) when the proposition node is instantiated first in the Hp state and then in the Hd state. Doing this provides the probabilities of the evidence given each proposition and thus one can calculate the likelihood ratio. When doing this for a BN that has multiple results nodes it may be required to add an additional node to the BN that combines the findings (as seen in Fig. 6a).
Fig. 6Two methods available for evaluating the strength of the findings, using the BN from Fig. 5 either by calculating the LR by way of a results node (A, left showing the lower results nodes of the BN and the new case specific results node) or the posterior odds by way of a Function node (B, right showing the proposition node of the BN and the function node). The value obtained in the first case is 6.666/35.019 (about 0.19) and in the second case 15.991/84.009 (about 0.19). Therefore, the results are about 5 times more probable given defence’s proposition than prosecution’s.
Alternatively, the states of the results nodes that align with the case findings can be instantiated and the ratio taken of the two probabilities for the Hp and Hd state in the proposition node. Doing this provides the probabilities of the propositions given the evidence and so is calculating the posterior odds. But as per the equation above, as long as equal prior probabilities have been assigned to the two propositional states then this is also the value of the likelihood ratio. Given the mathematical equality, either method of calculating the LR is acceptable as long as the scientist realises what is being calculated when they instantiate. In this second BN, it can be useful to add a function node to the BN that automatically calculates the ratio of the propositional probabilities (as shown in Fig. 6b) [
We provide the Hugin file for the BN in Fig. 5 as Supplementary material, with both a case specific findings node and a value of evidence function node so that the interested reader can see the set-up and workings of both.
4.1 Considering an alternate offender
It was mentioned in step 1 that the claims of the defence may be that no offence has occurred (as in the example we have worked through in the paper), or that an offence has occurred, but was carried out by someone other than the POI. A recent work by Kokshoorn et al. [
] explored the effect that these different assumptions have on BN construction. In our worked example, imagine that the propositions had been:
•
D has bitten C on the vagina, over her underwear.
•
C has been staying at D’s home, where someone else bit her on the vagina, over her underwear
The alternate proposition now requires that we consider the presence (or absence) of biological material from an alternate offender (AO). This requires an additional activity node ‘AO bit C’.
It may require more than one additional activity node, depending on who it is being claimed the alternate offender, and whether the complainant has had recent contact with them. For our example, we will assume the AO is an unknown person who has had no prior contact.
Fig. 7 shows the BN construction for the new set of propositions.
Fig. 7BN constructed to consider an alternate offender.
Note that in the BN given in Fig. 4 there was no need to consider background DNA on the underwear as its presence or absence had no effect on the probability of the findings under either proposition. However, when an unprofiled alternate offender is considered, then the presence of background DNA must be included in the evaluation because the presence of DNA from an unknown source could be from background DNA, or from an alternate, unprofiled, offender. We also consider the chance match of the YSTR profile of the AO with the Family YSTR profile in the ‘AO YSTR matches Family YSTR’ node.
4.2 A question of generality and detail
When constructing BNs for case evaluation, a question of generality arises. The general structure of a BN, and the considerations of the findings that are expected given each of the propositions is the fundamental basis for Case Assessment and Interpretation (CAI) developed in the 1990s [
]. While evaluations should not be findings led, there needs to be a level of adaptability of the BN to changes in examination, testing and findings obtained as part of the investigative phase of the case. To deny this adaptability would be to decrease the ability to provide case-tailored advice. The authors' experience is that each case tends to require some level of specific consideration with BN construction, in particular when proposition relate to competing activities. However, carrying out BN construction for a number of cases reveals that there are some generic constructional steps (see Section 2.1) that tend to be applicable across a broad range of situations. There is no cut-off point between what makes a general model and what makes a case-specific model, it depends on the desired level of generality, as well as the issues in the case. Taken to the extreme, all cases could be generalised to BN that possessed only two nodes, one for propositions and one for findings, with an arc drawn between them, although this is likely to be of little use in evaluating findings for most real cases.
4.3 Assumptions in BN modelling and underlying data
Besides questions of generality, practitioners also face questions regarding the assumptions encoded in particular network structures, such as “How can I ‘know' whether this network structure is sound?”, or “How can I justify this particular network construction?”. Such questions may arise in connection with local elements of a network, or a network as a whole. Traditionally, network construction – i.e., the specification of relevance relationships – can be guided by likelihood ratio formulae for the evaluation scientific evidence as provided in existing literature. For general examples of this approach see, for example, [
]. Pioneering examples illustrating methodology for more complex BN structures that agree with standard calculations in various kinship analysis cases are given in [
]. However, for many current casework evaluations given activity level evaluations, such standard formulae and associated BN structures do not exist, making the construction of case-tailored BNs challenging. More generally, Dawid et al. [
]). While this does not imply that, for any given case, there exists a unique network, it would not be sensible to require all analysts to come up with the same BN. What can be required, however, is that all analysts maintain justified network constructions. This requires a detailed inspection of all structural elements of a BN. Operationally, this kind of network inspection can be supported by built-in functions of BN software, such as Hugin, that provide graphical illustrations of dependencies and independencies among variables of a BN (i.e., so-called d-separation properties).
The justification of BN structures also extends to the numerical specification of node probability tables. Although, as noted in Section 2, this paper concentrates mainly on qualitative network structures, it is relevant to emphasize that node probability specification is a crucial element in BN construction. Again, as noted also in [
], the notion of justification is important in this context: as there are no ‘true' or ‘false’ probabilities, it becomes relevant to inquire about how particular probability values are assigned, and on what bases – so that a competent review can be conducted and open discussions about probability assignment are encouraged. This is in line with what the ENFSI guideline refers to as the '(…) body of knowledge that should be available for auditing and disclosure” ([
]). In addition to scrutinizing probability assignment, analysts should also examine BNs to detect those node probabilities that most critically impact on probative value. This can be achieved, for example, through sensitivity analyses (e.g., [
Using sensitivity analyses in Bayesian Networks to highlight the impact of data paucity and direct future analyses: a contribution to the debate on measuring and reporting the precision of likelihood ratios.
There is also the need to recognise the limitations of the data being used to assign probabilities, the conditions under which that the data were produced and differences that may exist between the experimentally derived data and the findings being evaluated. For a discussion on this topic we refer the reader to [
We have presented here a methodology for constructing BNs for the biology component of forensic evaluations when activity level propositions are being considered by the Court. The method is flexible enough that many cases with different circumstances can be evaluated in this way, but standard enough that an analyst looking at the BN (or starting to think about how to construct a BN for a case) should be able to progress quite quickly through the architecture and understand the flow of information. The seven proposed steps are relatively simple, although there can be much work that underlies each step, and we have demonstrated their use in a worked example that is based on a real case example. More generally, techniques for sound BN structure building extend to considerations of combining evidence, a topic that we consider inevitable for any attempt to derive more fine-grained analyses [
We emphasize that our template approach does not prescribe any pre-built network structures, it only recommends steps for structuring the thinking process. Indeed, we insist on the importance of the expert being in full control of the construction and use of the BN at any point, as BNs should be constructed on a case per case basis to ensure coherence and a faithful capturing of the case features. BNs are expert support systems, not intended to replace experts but to assist them in their critical thinking.
We deliberately chose the worked example as it highlights the importance of evaluating cases considering activity level propositions, rather than just source level propositions. In this case a typical ‘sub-source level’ report would report that a RSID test for human saliva carried out on the outer crotch of the underwear gave a positive result and that a tapelift of the outer underwear yielded an autosomal profile that matched the complainant and a Y-STR profile that matched the defendant (with a profile probability of, for example, 0.001). Even if the report went on to detail the possible causes of false positives (or coincidental matches) for the findings, if it did not place those in the context of activities one can see how the results would appear to a lay jury to strongly support the prosecution version of events over the defence version.
In this case however, the presence of faecal staining on the underwear means that the RSID saliva positive finding lends very little support to either one of the propositions over the other. Additionally, the activity of biting would lead to an expected high level of DNA transfer, whereas only a low level of matching DNA was detected, and this could be from the father of the complainant. When an RSID positive result is instantiated, the presence of faecal material is instantiated and the low levels
See Supplementary material for details on the findings and their interpretation
of the family YSTR is instantiated the posterior probabilities in the propositional node slightly support the defence proposition over the prosecution proposition. This result is contrary to how the lay jury is likely to have interpreted the results without any guidance from the scientist. Such a difference can have a major impact on the outcome of a trial.
Acknowledgements
Points of view in this document are those of the authors and do not necessarily represent the official position or policies of their organisations. Alex Biedermann gratefully acknowledges the support of the Swiss National Science Foundation through grant No. BSSGI0_155809 and the University of Lausanne.
Appendix A. Supplementary data
The following are Supplementary data to this article:
The Logic of Forensic Proof: Inferential Reasoning in Criminal Evidence and Forensic Science: Guidance for Judges, Lawyers, Forensic Scientists and Expert Witnesses.
Probabilistic evidential assessment of gunshot residue particle evidence (Part I): Likelihood ratio calculation and case pre-assessment using Bayesian networks.
Using sensitivity analyses in Bayesian Networks to highlight the impact of data paucity and direct future analyses: a contribution to the debate on measuring and reporting the precision of likelihood ratios.
Implementing statistical learning methods through Bayesian networks (Part 2): Bayesian evaluations for results of black toner analyses in forensic document examination.