Forensic Science International: Genetics
Volume 5, Issue 4 , Pages 308-315, August 2011

Validation of DNA-based identification software by computation of pedigree likelihood ratios

Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands

Received 17 February 2010; received in revised form 7 May 2010; accepted 21 June 2010. published online 23 August 2010.

Article Outline

Abstract 

Disaster victim identification (DVI) can be aided by DNA-evidence, by comparing the DNA-profiles of unidentified individuals with those of surviving relatives. The DNA-evidence is used optimally when such a comparison is done by calculating the appropriate likelihood ratios. Though conceptually simple, the calculations can be quite involved, especially with large pedigrees, precise mutation models etc. In this article we describe a series of test cases designed to check if software designed to calculate such likelihood ratios computes them correctly. The cases include both simple and more complicated pedigrees, among which inbred ones. We show how to calculate the likelihood ratio numerically and algebraically, including a general mutation model and possibility of allelic dropout. In Appendix A we show how to derive such algebraic expressions mathematically.

We have set up these cases to validate new software, called Bonaparte, which performs pedigree likelihood ratio calculations in a DVI context. Bonaparte has been developed by SNN Nijmegen (The Netherlands) for the Netherlands Forensic Institute (NFI). It is available free of charge for non-commercial purposes (see www.dnadvi.nl for details). Commercial licenses can also be obtained. The software uses Bayesian networks and the junction tree algorithm to perform its calculations.

Keywords: Disaster victim identification, Likelihood ratio, Kinship, Bayesian networks, Software validation

 

Back to Article Outline

1. Introduction 

In disaster situations, DNA is a useful tool in the identification process of the victims. After such a disaster has occurred, surviving relatives of those presumed to have perished may be typed, thus giving rise to pedigrees which each contain one or more missing persons. Given the DNA-profile of an unidentified individual (UI), a pedigree and a missing person MP in , one wishes to calculate the probability that UI=MP. Without further information and only DNA-profiles at our disposition, this is of course not possible; however, what we can do is compute the likelihood ratio

(1.1)
where


P(X|Y) denotes the conditional probability of the realization of X, given the realization of Y;

Hr (r for related) states that missing person MP is equal to unidentified individual UI;

Hu (u for unrelated) states that missing person MP is not related to unidentified individual UI;

E consists of DNA-profiles of UI, a pedigree of which MP is a member and at least one DNA-profile in .

The LR is an interesting quantity since, given a priori odds P(Hr)/P(Hu), it allows one to obtain the a posteriori odds P(Hr|E)/P(Hu|E): these are equal to the LR times the a priori odds. In closed disaster victim identification (DVI) situations (where the list of deceased individuals is known), it is often possible to meaningfully define prior odds. In addition, in many cases Hr and Hu are the only scenario’s with non-zero prior probability. This means that the DNA-evidence can be used to obtain the probability, given the DNA-profiles that are available, that UI=MP.

Usually, the DNA-profiles are those used in forensics, consisting of the genotype at a number of (nearly) independent loci. The likelihood ratio (1.1) is therefore computed as the product of the single-locus LR’s.

In a DVI situation one wishes to compute the LR’s (1.1) for every combination of UI and MP. This results, for a closed case with n victims, in n2 calculations. Hence, one needs an efficient algorithm in order to get the results in reasonable time. At present, there seems to be no software that can do so in an automated way and also allows for arbitrary pedigrees, including inbred ones. To resolve this, the Netherlands Forensic Institute (NFI1) has commissioned the development of new software. The resulting program, called Bonaparte, has been developed by the Dutch Foundation of Neural Networks in Nijmegen (SNN Nijmegen2). Bonaparte generates Bayesian networks from the pedigrees and then uses the junction tree algorithm (cf. [1]) to perform calculations therein. Its model is the Mendelian inheritance model with the possibility of mutation, where in addition allele frequencies are derived from reference databases in a way the user can determine. A live demonstration version can be accessed via http://www.dnadvi.nl. A more detailed presentation of the programme is given in [2].

To our knowledge, there does not exist a generally accepted validation standard to verify whether or not the computation of (1.1) has been done correctly. Simple pedigrees are easy to test and in addition, one may check if likelihood ratios are identical for certain relationships (such as half-sibling or uncle–nephew without mutation model, or more generally the pedigrees described in [3].) But for those latter cases, one then checks equality (and only in the absence of mutation possibility) rather than correctness.

Recently, Drabek (cf. [4]) has published an overview of software that computes (1.1) and has assessed performance, user-friendliness and documentation for two of these (Familias and Paternity Index). The validation of (1.1) was done by comparing the software’s output to that of DNA-View,3 a commercial program with many functionalities that is broadly used.

In our opinion, this is a somewhat unsatisfactory situation, since if a discrepancy is observed, it may be unclear which output is correct, and moreover only features incorporated into DNA-View can be tested. Therefore, we have defined a set of test cases for which we have calculated the LR numerically, and in many cases, also algebraically. There are several advantages of having the algebraic expressions: it allows for the verification of LR formulas when returned by the software to be tested, their evaluation is computationally much less demanding than performing large summations, and the algebraic expression allows one to see which allele frequencies play which role in the likelihood ratio.

The purposes of this paper (in no particular order) are (i) to present the software Bonaparte and to describe its model, (ii) to contribute to the discussion about the validation of such software, and (iii) to show how likelihood ratios for some more complicated pedigrees can be calculated by hand.

The outline is as follows. In Section 2, we give an overview of Bonaparte’s model and user-defined settings. Section 3 is preliminary and introduces some notation, that is used in Section 4 where we describe the test cases, indicating which feature these primarily test, and how we have computed the algebraic expression for the LR. In Section 5 we describe Bonaparte’s performance on these cases. In Appendix A we show how the algebraic expressions for the LR have been obtained.

Back to Article Outline

2. Bonaparte 

First, we describe some features that are specific to Bonaparte’s current model. It consists of the standard Mendelian inheritance model, enriched with the possibility of mutation, and allelic dropout in the profile. Subsequent versions are planned in which this model will be refined and enlarged. For example, the currently being developed version allows for the association of Y-STR and mitochondrial DNA-profiles to individuals, and contains matching algorithms for these DNA-technologies. Also, direct matching of unidentified individuals against each other to detect possible family relationships between them will be possible in this shortly coming version. Different mutation models and a correction for subpopulations (θ-correction) are being considered for future versions. This article focuses on the validation of the currently implemented model, which deals with (1.1) for autosomal DNA.

2.1. Pedigrees, founders 

A graphical interface allows the user to define pedigrees and associate individuals to them. The pedigrees may be inbred, but must be connected. We call a node a founder if it has no parental nodes in the pedigree. The alleles of the founders are called founder alleles.

2.2. Allele frequencies 

Likelihood ratios are functions of the population frequencies of the alleles exhibited by the typed individuals. These allele frequencies are derived from a population sample S. Bonaparte has, for each locus L in S, a set AL of alleles known to exist at the locus for the relevant population. Based on AL and S, there are three types of alleles for locus L:


(i)common alleles: the alleles in S (these alleles necessarily occur in AL);

(ii)rare alleles: alleles not in S, but in AL;

(iii)new alleles: alleles not in AL (hence not in S either).

When a new allele is encountered in a pedigree or UI, it will be registered in the allelic ladder for all computations involving that pedigree or UI.

When the computation of LR’s is requested, the user must specify three positive numbers λc, λr, λn. Suppose that AL contains kc common allele types, and kr rare ones. Furthermore, suppose that and UI have kn new allele types. Let

It is assumed in the calculation of (1.1) that a randomly drawn founder allele is allele a with probability
(2.1)
where N is the number of alleles counted in the database of the locus under consideration, na is the number of alleles of type a in the database, λa is equal to λc, λr, λn according to the type of allele a, and Λ=kcλc+krλr+knλn is the sum over all λ.

The expression (2.1) can be interpreted as the expectation of the posterior marginal distribution of the allele frequency of allele a, where posterior is w.r.t. the database, and the prior distribution of the vector of allele frequencies is a Dirichlet one with parameters λi. This is as far as the interpretation goes however, since the probability of drawing multiple alleles is simply taken to be the product of the appropriate (2.1). The idea of using a Dirichlet distribution to take sampling variation into account is well known and has many incarnations, e.g. as the size-bias correction (cf. [5]), or to deal with population substructure (cf. [6]). Curran et al. have investigated the effect of the prior for match probabilities, in [7]. When data from several population groups are available, then the λi can be empirically estimated using Newton’s method, as described e.g. in [8, §3.7].

2.3. Mutation model 

When a parent passes an allele to an offspring, there is a possibility that it mutates. Let

denote the probability that parental allele a is transmitted to a child as allele b. Let M be the matrix with entries Ma,b.

Bonaparte uses a gender-independent uniform mutation model. According to this model,

(2.2)
where k=kc+kr+kn is the total number of allele types present in the database, and UI. The user may choose between different pre-set values of μ. The chosen value of μ is used for all loci and genders, except in the special case when μ=0 and contains a mutation (see Example 4.2).

This uniform mutation model, although not a realistic one, has the advantage that it does not seriously underestimate the probability of any specific mutation, which would hinder identification in case such a mutation has occurred. It is also computationally attractive.

2.4. F-allele 

Suppose that it is known that only one allele on a locus has been typed, and that nothing is known concerning the second allele. The missing allele is then considered to be a random allele, and denoted by F. Hence, if the typed allele is x, the genotype is denoted (x, F).

Back to Article Outline

3. Algebraic preliminaries 

For many test cases, we will derive the algebraic form of the LR. In order to do so, we establish some notation in this section. We assume that we are working on one locus, which has k=kc+kr+kn different allele types. Further, we have chosen one UI and one MP, who belongs to pedigree .

Let GUI be the genotype of the unidentified individual, and let be the genotypes observed in pedigree . Then and hence (1.1) can be computed as

(3.1)
The denominator of this last expression is straightforward, the difficulty lies in the numerator.

For the computation of (3.1), we consider a genotype to be an ordered pair of alleles (g1, g2), i.e., (a, b)(b, a) unless a=b. We do so since it is more convenient in computations and it does not alter the likelihood ratio (3.1), since the probability of obtaining (a, b) as the genotype of the MP is the same as that of obtaining (b, a). Hence, to evaluate (3.1) we note that

(3.2)
since UI is considered as the founder of its own one-person pedigree under Hu.

3.1. Allele transmissions 

Let p=(p1, …, pk)t be the (column-)vector of allele frequencies. Let

where Mt denotes the transpose of M. The number qa is the probability that an unobserved founder allele is passed on to an offspring as allele a. Also, we define Q to be the matrix with entries
(3.3)
The number Qa,b is the probability that an unobserved founder allele is passed on as allele a to one offspring and as allele b to another. Notice that if mutation is impossible then Qa,b=δa,bpa (where δ is the Kronecker delta function: δa,b=1 if a=b and δa,b=0 if ab).

3.2. Inheritance functions 

Next, we define some functions that compute the probability of having an offspring of a given genotype, given the genotypes of one or of both parents. For the case with two parents, we define

(3.4)
where denotes the complement of x in {1, 2}. Then I2(a1, a2, b1, b2, c1, c2) is the probability that parents with (ordered) genotypes (a1, a2) and (b1, b2) have a child with (ordered) genotype (c1, c2).

For the case where one parent has been typed and the other is a founder, we define

(3.5)
which is the probability of observing (ordered) genotype (c1, c2) in an individual with one typed and one untyped founder parent, if the genotype of the typed parent is (a1, a2).

Notice that I1 and I2 generalize immediately to the situation where the mutation probabilities depend on the gender.

Back to Article Outline

4. Test cases 

In this section we describe the test cases that we have defined, and how the gold standard has been established. As mentioned above, we consider a one-locus LR and consider all genotypes to be ordered; this does not affect the LR. In the examples that illustrate the computations we will use the uniform mutation model with mutation probability μ, but the expressions themselves are valid for an arbitrary mutation matrix M.

4.1. Standard trios 

In this case, we consider a pedigree consisting of father F, mother M and child C. Either a parent is missing, the child and possibly the other parent have been typed, or the child is missing and one or both parents have been typed. The LR for such cases can be computed using (3.2), (3.3), (3.4), (3.5).

4.1.1. Purpose 

The main purpose of these cases is to verify that paternity indices are computed correctly, including the motherless case.

Example 4.1.

Let μ=0 (no mutation), MP=F, and suppose that C and M are typed. We obtain the classical paternity case. If GUI=GC=GM=(aa), then LR=1/pa. This allows one to check if (2.1) has been calculated correctly.

Example 4.2.

Let μ=0 (no mutation), MP=F, GM=(aa), GC=(bb), GUI=(bc). Then

which evaluates to 0/0 since, there being a mutation from M to C, the pedigree is not consistent with the impossibility of mutation. On the other hand, if μ>0, and with LR(μ) the LR for the uniform model (2.2) with mutation parameter μ, then

which has a well-defined limit as μ0:

For this reason, Bonaparte first does a pedigree check. It mutations are revealed in the pedigree itself, then μ=10−9 is used for those loci, even though the user has set μ=0. In this way the outcome 0/0 is avoided. A consequence however is that other mutations, needed when UI=MP, on the concerned loci are also treated with μ=10−9.

4.1.2. F-alleles 

We have also defined input for these standard trios where profiles contained F-alleles (see 2.4) (notice that the allele is denoted F and the father F). In such cases, we can still use (3.4), (3.5) provided that we set

(4.1)
where in the second expression it is assumed that the F-allele is a founder allele. For example, if father F is missing, GC=(b, F), GM=(a, F), GUI=(b, F), then with μ=0, I2(b, F, a, F, b, F)=(1+2pb)/4 and I1(a, F, b, F)=3pb/4 hence LR=(1+2pb)/(3pb).

In general, for the uniform mutation model, (2.2), (3.4), (3.5), (4.1) yield

4.2. Incestuous case 

In this case we consider the pedigree described in Table 1.

Table 1. Pedigree for test-case incest.
IndividualFatherMotherTyped
FNo
MNo
DFMYes
MPFDNo

That is, a child MP is missing. Its father F is also its mother D’s father, and the mother D is the only typed relative.

4.2.1. Calculation 

Under Hr,

(4.2)
This formula, together with (3.2), yields the LR. Its proof is deferred to Appendix A.

4.2.2. Purpose 

As in the previous case, the only typed relative of the MP is a parent. The main purpose of this case is to check if it is possible to define an incestuous pedigree, and if the system is able to take this into account properly, including mutation.

4.3. Three generations 

In this case, we consider a pedigree in which a child C and a parent F of the missing person MP have been typed. Suppose that F, UI, C have genotypes (ab), (cd), (ef) respectively.

4.3.1. Full (unpruned) pedigree 

If we add the untyped parents of MP and C to the pedigree, then the LR equals

(4.3)

4.3.2. Pruned pedigree 

If we consider the pedigree to consist of F, MP, C only, then the alleles that MP and C did not inherit within the pedigree are founder alleles. Then the LR is equal to

(4.4)
The proofs are deferred to Appendix A.

4.3.3. Purpose 

This pedigree can be described in two seemingly equivalent ways, by choosing to add or not to add the untyped parents of MP and C. Depending on whether or not this is done, one obtains (a numerical specialization of) (4.3) or (4.4). It is also conceivable for the software to automatically ‘prune’ the pedigree by recursively removing untyped founders with at most one child. One can then use these formulas to test if pruning has been done correctly. The modification of these formulas to the case where only one untyped parent is included in the pedigree is straightforward.

4.4. Missing third sibling 

Let S1, S2 be siblings, and let MP be a sibling of S1, S2. The genotypes of S1, S2, UI are, respectively, (ab), (cd), (ef).

4.4.1. Gold standard 

The most straightforward way to compute the LR is by evaluating

(where each sum is from 1 to k) and

In Appendix A we show how to derive the generic algebraic expression, including mutation matrix.

4.4.2. Purpose 

Given two siblings, many identical-by-descent configurations of the observed alleles are possible: when mutation is allowed, eight configurations are possible (see Table A.1 in Appendix A). This test case determines if the software can handle this correctly.

4.5. Two typed same-sided aunts 

In this test case, MP has two typed aunts S1, S2, both sisters of the same parent S3 of MP. Suppose . The pedigree consists of the parents of the three siblings, the siblings, the other parent of MP and MP.

4.5.1. Gold standard 

The LR can be computed by brute force. The numerator, P(E|Hr) is given by

and the denominator, P(E|Hu), is given by

Notice, however, that P(E|Hr) is a large sum (containing k8 terms). Even though some simplifications would be easy to implement (see also (4.6)), it becomes tedious to verify the result of this test case when many such LR’s must be computed: for example, if 10 UI’s and 100 such MP’s are defined on 15 loci, then 15,000 such calculations are done. These can be done very fast with the algebraic expression at hand, which we have developed in Appendix A.

4.5.2. Purpose 

In addition to a purpose similar to the one for the previous case, one can use this case to see if pruning of the pedigree has been done correctly (there is one untyped founder that can be removed from the pedigree: the parent of MP that is unrelated to the typed siblings). Also, by comparing the computation time of the answer to that of the algebraic LR below, one can assess the efficiency of the algorithm in this case.

4.6. Complicated inbred pedigree 

The pedigree for this case is given in the picture below. In short, there is a marriage between persons 7 and MP, who share one grand-parent. The barred individuals in the pedigree are those who do not have a DNA-profile.

In this case, it is too difficult and no longer very informative to write down the generic LR algebraically. Instead we have computed the gold standard by calculating P(E|Hr) and P(E|Hu) separately, summing over all possible allelic configurations of the untyped individuals. There are 9, resp. 10 untyped individuals in this pedigree under Hr, resp. Hu. A straightforward attempt to calculate P(E|Hr) and P(E|Hu), using ordered genotypes, would result in a summation involving k18, resp. k20 terms (k being the number of alleles on the locus under consideration), which cannot be done on a standard computer in reasonable time. Therefore we have done some preliminary calculations to reduce the number of variables that need to be summed over: we use value abstraction (cf. [9]), switch to unordered genotype computation, and we remove individuals from the pedigree by performing the necessary summations.

4.6.1. Value abstraction and unordered genotypes 

If the typed relatives and the UI have t different alleles on the locus we consider, then we can replace the original set of k alleles by a new set of t+1 alleles, consisting of all typed alleles and an auxiliary allele X, whose frequency is the sum of the frequencies of all the unseen alleles. We denote unordered genotypes by {a, b}, i.e., {a, b}={b, a} is the unordered genotype corresponding to genotypes (a, b) and (b, a). The number of unordered genotypes for the reduced allele set is t(t+1)/2. For genotype i={i1, i2} we let

denote the probability of observing genotype i in a founder.

4.6.2. Summing out some untyped founders 

To do so, we define the function

which is the probability of observing genotypes {a1, a2} and {b1, b2} for half-siblings.4

Furthermore, we define

which is the probability of observing (unordered) genotype {c1, c2} in an individual with one typed and one untyped founder parent, if the typed parent has (unordered) genotype {a1, a2}. Analogously, is the probability that parents with genotypes {a1, a2} and {b1, b2} have a child with genotype {c1, c2}.

Then P(E|Hr) is given by

(4.5)
In this sum, the summation variables run over all genotypes, and the other gti denote the observed genotypes of family member i (see Figure). Naturally, gui is the genotype of the UI.

The expression for P(E|Hu) is similar: we need to replace gtui by gtmp and sum over that genotype as well. This yields

(4.6)
This is a summation involving (t(t+1)/2)5 terms, which is within the capabilities of a modern computer. For example, if the profiles of relatives are such that t=7 (i.e., 6 distinct alleles observed in the pedigree), then the number of genotypes after value abstraction is 28. The sum P(E|Hu) then involves 28517 million terms.

4.6.3. Purpose 

The pedigree in this case is inbred, which presents a computational obstacle to some algorithms, e.g. the Elston–Stewart algorithm. In addition, there are many untyped relatives, including all founders. This case therefore serves well to check not only correctness of the computed likelihood ratio, but also performance.

4.7. Algebraically 

The generic LR for this pedigree is too complicated to be of any use, even with μ=0. However, for specific choices of allelic configurations one can use (4.5), (4.6) to obtain the algebraic expressions for P(E|Hr), P(E|Hu) and LR. A few examples (with μ=0) are listed in Table 2.

Table 2. Allele configurations.
IndividualLR1LR2LR3
FM4(3,3)(2,6)(4,5)
FM5(1,3)(2,5)(1,5)
UI(3,3)(3,4)(2,4)
FM9(2,3)(3,3)(4,4)
FM10(3,4)(1,4)(2,3)

The corresponding LR’s are:

(4.7)
(4.8)
(4.9)

Back to Article Outline

5. Validation report 

Based on the scenario’s described above, we have defined test cases for Bonaparte by choosing specific DNA-profiles for the pedigrees mentioned in the previous section. The gold standard LR has been computed with the software Mathematica 6.0 up to machine precision (about 16 significant digits). No discrepancies at all were observed with Bonaparte’s output. We’ve used a standard desktop computer throughout (Windows XP, Intel Core Duo, 2.33 GHz, 2 GB RAM).

In addition, we have performed some of the same computations with the freely available program Familias (cf. [10]). This program can not automatically compute all LR’s between a list of MP’s and a list of UI’s, but it has many features incorporated into it, such as various mutation models and subpopulation correction. We have only tested the uniform mutation model for a few choices of profiles per test case. Familias performed well on these cases (the reported LR equals the LR given by our formulas up to at least the first seven decimal places) but was sometimes slower than Bonaparte, especially for test case 4.6.

Back to Article Outline

Appendix A. 

In this section we derive the algebraic form of the likelihood ratios for some of the test cases.

Terminology. We say that two alleles are identical-by-descent (ibd) if they are descendants of the same ancestral allele. This terminology is somewhat abusive, since the alleles need not be identical in state (due to mutation). We write ab to denote that alleles a and b are ibd and a≢b to denote that they are not.

A.1. Incest 

Proof of (4.2): If UI=MP then with probability 1/2, there is one pair of ibd alleles between GUI and GM and with probability 1/2, there are two. If there is one pair, then the situation is genetically non-incestuous so we get (3.5), which corresponds to the term of (4.2) containing the . If there are two ibd-pairs, then there are two equally likely possibilities: (i) both u1 and u2 are descendants of the same allele f of F (of which either m1 or m2 is also a descendant), so u1 and u2 are ibd with the same maternal allele, say m1. (ii) The alleles u1, resp. u2 are ibd with m1, resp. m2 or the other way around. In both cases we can use that if allele y is a descendant of allele x, then P(y=i|x=j)=Mi,ifi/qj. This gives the following probabilities: in case (i) suppose (without loss of generality) that u1 is inherited from paternal allele f and u2 from m1. Then . Summing over all the possibilities gives the four middle other terms in (4.2). In case (ii), suppose that u1 is ibd with m1 and u2 with m2. One of these ibd relations is through F, say this is the case for (u1, m1). Then . Summing over all possible configurations gives the final four terms in (4.2).

A.2. Three generations 

Proof of (4.3), (4.4): The numerator is the expansion of I1(a, b, c, d)·I1(c, d, e, f), which is the probability that GUI=(cd) and GC=(ef) given that GF=(ab) and Hr. In the denominator, the term pepf corresponds to the probability of observing GUI under Hu, and the remaining term is the probability of observing the genotype GC=(ef), given GF=(ab), under Hu.

A.3. Three siblings 

We start with the probability Ps((ab), (cd)) of having two siblings with genotypes (a, b), (c, d). This probability can be computed from the a priori probabilities of all possible ibd configurations. These are given in Table A.1. We define an indicator variable I specifying the ibd-pairs, which we will use in the sequel.

Table A.1. Prior ibd probabilities for siblings with genotypes (ab), (cd).
Iibd-pairsProbability
1ac, bd1/8
2ac, b≢d1/8
3ad, bc1/8
4ad, b≢c1/8
5a≢c, bd1/8
6a≢d, bc1/8
7None1/4

It is then easy to see that, with Q as in (3.3),

(A.1)
Now we compute the probability that the third sibling has genotype (ef), given that the first two sibs have genotypes (ab), (cd). To do so, we consider each of the scenario’s in Table A.1 separately, since conditional on the ibd-status of the alleles (ab), (cd), we can reconstruct all parental alleles which have given rise to a, b, c, d and hence also compute the probability that these parents will have a third offspring of type (ef). Indeed,
(A.2)
with I as in Table A.1.

Let . We will calculate the Pi(a, b, c, d, e, f).

We denote

the probability that an unobserved allele is passed on as allele ai to offspring i (i=1, 2, 3).

With this notation the first term is equal to

Similarly,

From these formulas, we can also derive P3, …, P6:
(A.3)
(A.4)
It remains to compute P7(a, b, c, d, e, f). Here we face the additional complication that it is not possible to reconstruct the configuration of the parental alleles: these may be either or , where is the parental allele of which x is a (possibly mutated) copy. Both of these parental configurations have equal probability. For the first configuration, we get probability

Then

Thus can be computed. The denominator is straightforward (given by (3.2)), hence we can now compute the LR.

Example A.1.

Let μ=0 and suppose that . Then we can apply the above with a=x, b=y, c=x, d=z, e=y, f=z. This yields

All Pi(x, y, x, z, y, z)=0 except for i=2 and i=7. These are

hence

Example A.2.

The above formulas apply directly in the case where the profiles contain F-alleles (i.e., in case of allele dropout). One needs to substitute (cf. (4.1))

For example, with and μ=0, we obtain

A.4. Two aunts 

We will encounter the situation where an unobserved founder allele has been passed on to an offspring as allele a and to a grandchild (through another offspring) as allele b. The probability of this happening can be summarized in matrix form: we define the matrix R=MtFM2 which has entries

We will also need

This is the probability that an unobserved founder allele is transmitted to allele y in a first child, to allele z in another child, and to allele x to a grandchild, which is not a child of the first two children.

Analogous to (A.2), we have

where I describes the ibd-pairs among (ab), (cd) as in Table A.1. This formula is the analogue of (A.2).

Analogous to the analysis in A.3, we define

Then

The terms with one ibd pair can be obtained from

Indeed, (A.3), (A.4) hold here as well. The last term is given by

As in A.3, this allows us to compute the LR, since P(GUI=(gh)|Hu)=pgph.

Remark A.1.

The LR for the pruned pedigree, where MP’s second parent is left out of the pedigree, is obtained by replacing qg, qh with pg, ph in the Pi(a, b, c, d, g, h).

Example A.3.

Let μ=0 and take . Then we can compute the LR by substituting a=x, b=y, c=x, d=y, g=x, h=z into the above formulas. This yields

The non-zero Pi(x, y, x, y, x, z) are P1, P2, P5, P7. These evaluate to
Therefore,

Notice that in this example pz does not play a role.

Back to Article Outline

References 

  1. Lauritzen S, Spiegelhalter D. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B (Methodological). 1988;50(2):157–224
  2. Bruijning-van Dongen C, Slooten K, Burgers W, Wiegerinck W. Bayesian networks for victim identification on the basis of DNA profiles. Forensic Science International: Genetics Supplement Series. 2009;2:466–468
  3. Skare O, Sheehan N, Egeland T. Identification of distant family relationships. Bioinformatics. 2009;25:2376–2382
  4. Drábek J. Validation of software for calculating the likelihood ratio for parentage and kinship. Forensic Science International: Genetics. 2009;3(2):112–118
  5. Balding D. Estimating products in forensic identification. Journal of the American Statistical Association. 1995;90(431):839–844
  6. Balding D, Nichols R. DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands. Forensic Science International. 1994;64:125–140
  7. Triggs C, Curran J. The sensitivity of the Bayesian HPD method to the choice of prior. Science & Justice. 2006;46(3):169–178
  8. Lange K, Mathematical . Statistical Methods for Genetic Analysis. 2nd ed.. New York: Springer-Verlag; 2002;
  9. Friedman N, Geiger D, Lotner N. Likelihood computations using value abstraction. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence. 2000;p. 192–200
  10. Egeland T, Mostad P, Mevåg B, Stenersen M. Beyond traditional paternity and identification cases. Selecting the most probable pedigree. Forensic Science International. 2000;110(1):47–59

PII: S1872-4973(10)00109-2

doi:10.1016/j.fsigen.2010.06.005

Forensic Science International: Genetics
Volume 5, Issue 4 , Pages 308-315, August 2011