Full length article| Volume 61, 102771, November 01, 2022

# EFMrep: An extension of EuroForMix for improved combination of STR DNA mixture profiles

Open AccessPublished:August 29, 2022

## Highlights

• EFMrep is a new software which extends the EuroForMix model.
• Possible to combine samples from different multiplexes.
• Increased information gain when samples are combined.
• Demonstration of real case with kinship.

## Abstract

The EuroForMix model has been extended to create a new open-source software called EFMrep which enables the combination of STR DNA mixture samples from different multiplexes. In addition to calculating combined likelihood ratios and carrying out deconvolution, the software also includes the capability to specify related unknown individuals. A graphical user interface has been implemented to ease the analysis for practitioners in real case work. The effect of combining multiple samples based on the PROVEDIt dataset was investigated, either from the same or different multiplexes. The information gain increases when more samples are combined. A head-to-head comparison against EuroForMix shows the benefit of a more general model. Guidelines are provided. A real case example was used to demonstrate how EFMrep could be used to combine multiple samples when a proposition includes kinship.

## 1. Introduction

Combining DNA STR profiles from different data sources to provide a single likelihood ratio (LR) is important to laboratories that use more than one multiplex system in their laboratory. Consensus methods of reporting have largely been replaced by probabilistic genotyping tools to evaluate mixtures. In this paper we define samples to be DNA profiles originating from different stains or different DNA extracts, whereas replicates originate from the same stain and same DNA extract but from different amplifications. Hence a DNA STR profile can either be a sample or a replicate. Steele et al. (2014) [
• Steele C.D.
• Greenhalgh M.
• Balding D.J.
Verifying likelihoods for low template DNA profiles using multiple replicates.
] demonstrated the advantage of calculating a multi-replicate LR.
The DNA interpretation software EuroForMix [
• Bleka Ø.
• Storvik G.
• Gill P.
EuroForMix: An open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts.
] was based on the gamma model proposed by Cowell et al. [
• Cowell R.G.
• Graversen T.
• Lauritzen S.L.
• Mortera J.
Analysis of forensic DNA mixtures with artefacts.
]. The authors of [
• Cowell R.G.
• Graversen T.
• Lauritzen S.L.
• Mortera J.
Analysis of forensic DNA mixtures with artefacts.
] also developed a model named DNAmixtures which enabled combination of DNA profiles from different crime-stain samples, where the contributors could have different mixture proportions in each of the extracts. Green and Mortera (2021) [
• Green P.J.
• Mortera J.
Inference about complex relationships using peak height data from DNA mixtures.
] implemented KinMix as an extension of DNAmixtures for modeling DNA mixtures with related contributors. STRmix™ (from version 2.5) and Statistefix (from version 4.0) have the capability to evaluate samples analyzed by different multiplexes [
• Schmidt M.
• Schiller R.
• Anslinger K.
• Wiegand P.
• Weirich V.
Statistefix 4.0: A novel probabilistic software tool.
,
• Taylor D.
• Bright J.-A.
• Kelly H.
• Lin M.-H.
• Buckleton J.
A fully continuous system of DNA profile evidence evaluation that can utilise STR profile data produced under different conditions within a single analysis.
].
The current version of EuroForMix (up to v3.4.0) allows the combination of multiple profiles. However, these must be replicates that are restricted by the following assumptions: a) analysed with same multiplex; b) samples have same mixture proportions, peak height distribution and amount of degradation (same analysis setting). Nevertheless, EuroForMix is widely adopted as a tool for efficient mixture interpretation, and mostly used when one sample is analyzed at a time.
In serious crimes with low template DNA from the person of interest, POI, (for example a minor contributor in a complex mixture) there is an urgent need to be able to combine different data sources from extracts of different case-stain materials. They may be analyzed with different multiplexes, to obtain as much information as possible about potential contributors. A recommended approach using EuroForMix would be to report the weight of evidence for the best profile, i.e., the highest quality profile. However, not utilizing all useful samples together in a joint weight of evidence calculation is sub-optimal.
In this paper we introduce EFMrep (EuroForMix with replicates) as an extension of the EuroForMix model (v3.x), where we have made it possible for the model parameters to be specified for individual samples. This increases the flexibility of the model, but also increases its complexity. However, this makes it possible to combine any kind of samples; the only assumption is that the same individuals are contributors (individuals are allowed to be absent in some profiles).
The paper is structured as follows. In the first part of the method section we provide an overview of the EFMrep model. In the second part, the analysis used to evaluate the impact of combining multiple samples in the same model is described. In the third part, a demonstration is provided of a real case example with kinship. The findings are summarized in the discussion.

## 2. Materials and methods

### 2.1 Mathematical details

#### 2.1.1 Model specification

The peak height model is specified as follows: $S$ DNA profiles are assumed to be contributed by the same K individuals/contributors, in the $M$ markers in total. For sample $s$ in marker $m$, peak height for allele $a$ follows a gamma distribution with shape and scale arguments
$αm,a,s=ωs−2τsbm,a,s−125100∑k=1Kπs,knm,a,k$

$βm,a,s=βs=μsωs2$

Here $nm,a,k$ is the {0,1,2} genotype contribution for individual $k$ at allele $a$ in marker $m$, and $bm,a,s$ is the fragment length in base pair of allele $a$ in marker $m$ for profile $s$. Note that $bm,a,s$ would typically be different when the profiles are analyzed from different multiplexes. The remaining variables are the model parameters: $μs$ and $ωs2$ are the expectation and coefficient-of-variation of the peak height for a full heterozygous contribution, respectively for sample $s$; $πs,k$ is the mixture proportion for individual $k$ at profile $s$, whereas $τs$ is the degradation slope for profile $s$. Drop-out and drop-in are modeled as described for EuroForMix, and here we extend the hyperparameters to be profile specific (in addition to marker specific): $ATm,s$ is the analytical threshold, $pCm,s$ is the drop-in probability, $λm,s$ is the rate parameter for the exponential drop-in model. Lastly, parameters to model n-1 and n+1 stutters can optionally be profile specific.

#### 2.1.2 Definition of the likelihood function, likelihood ratio and deconvolution

The likelihood function (a function of the unknown model parameters $θ$) where the evidence $E=(E1,…,EM)$ contains the peak heights, conditioned on the proposition $H$, is defined as follows:
$Lθ|H=∏m=1M∑gm∈GmHPrEm|θ,gmPr(gm|H)$

where the per-marker likelihood combines the information across all DNA profiles:
$PrEm|θ,gm=∏a=1Am∏s∈mf(Ym,a,s=ym,a,s|gm,θ)$

where $fYm,a,s|gm,θ$ is the probability density function of the gamma distribution (with shape and scale arguments $αm,a,s$ and $βm,a,s$). Here $gm$ is the joint genotype combination of the $K$ contributors at marker $m$ which decides the contribution values of $nm,a,k$, and this is traversed through the outcome $GmH$ which is decided by the proposition and allele outcome $Am$. Notice that not all $S$ profiles need to have all $M$ markers, hence the likelihood of a profile $s$ is only computed if it is present (therefore $s∈m$). For instance, if the kit of a profile does not have a particular marker, the likelihood of that marker is not considered. The implementation needs to keep track of this information.
A maximum likelihood approach was implemented to estimate the model parameters under two defined propositions, $H1$ and $H2$, similarly described for EuroForMix, where $θˆ1$ and $θˆ2$ are respectively provided.
For instance, the defined propositions could be stated as
• $H1$: Person of interest (POI) is a contributor to the $S$ profiles
• $H2$: An unknown contributor, not the POI, is a contributor to the $S$ profiles
The likelihood ratio is calculated as $LR=Lθˆ1|H1Lθˆ2|H2$, where the numerator and denominator are the maximum likelihood values conditioned under each of their respective propositions.
We also implemented the same deconvolution framework provided in EuroForMix, where the joint unknown genotype is inferred using Bayes rule: For a particular marker $m$ and proposition $Hi$, $i∈{1,2}$, the posterior probability is calculated as
$Prgm|Em,Hi∝PrEm|θˆi,gmPrgm|Hi.$

The posterior genotype probability for each contributor is obtained through marginalization.

### 2.2 EFMrep software

A new software called EFMrep was created; this extends the existing model of EuroForMix. The main difference between the two software is that EFMrep can assume separate model parameters for each profile. This makes it possible to combine several samples with the same contributors involved. Another difference is that EFMrep can define that any of the unknowns under each of the propositions are related to a specific typed individual (pairwise relationship only). The current version of EuroForMix only provides a single option under the alternative proposition (H2). However, all unknowns in EFMrep are assumed to be unrelated to each other (considered independently).
EFMrep contains a graphical user interface (GUI) to facilitate its use in casework. The model specification panel in EFMrep allows the user to decide whether different model parameters should be shared or not for different profiles. The software is open source and accessible at https://github.com/oyvble/EFMrep. A compiled version for Windows and a tutorial for using EFMrep with the real case example can be obtained from http://www.euroformix.com/EFMrep. Version 1.0.0 of EFMrep is used throughout the paper.

### 2.3 PROVEDIt

The PROVEDIt dataset [
• Alfonse L.E.
• Garrett A.D.
• Lun D.S.
• Duffy K.R.
• Grgicak C.M.
A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt.
] is a unique resource; it consists of 25,000 autosomal STR CE profiles divided into five datasets, each analysed with different multiplexes except for two both having IdentifilerPlus. The datasets were clustered into two sets (i.e., batches), with respect to those references that were contributors. Batch #1 consisted of two datasets, one with IdentifilerPlus (29 cycles) (Applied Biosystems) and the other with PowerPlex16 (Promega). Batch #2 consisted of three datasets, each with GlobalFiler (Thermo Fisher Scientific), Fusion 6C (Promega) and IdentifilerPlus (28 cycles), respectively. Different Applied Biosystems (ABI) CE instruments were used for the different datasets: ABI 3130 was used for the datasets with IdentifilerPlus (28 cycles) and PowerPlex16, whereas ABI 3500 was used for the others. Samples were run with several injection times, but for our study, the recommended injection times for each instrument were used: 5 s for 3130 and 15 s for 3500. The unfiltered version of the data (where all stutters are kept) was used (https://lftdi.camden.rutgers.edu/repository/PROVEDIt_1–5-Person%20CSVs%20UnFiltered.zip). All off-ladder (OL) alleles were removed. An overview of fragment length of each marker for the different multiplexes is provided in Fig. S1 (supplementary).

#### 2.3.1 Samples used

5694 single source profiles were used to tune the model hyperparameters (analytical threshold and drop-in model), whereas 2–4 person mixtures were used in LR calculations. Only mixtures with the same contributors run with either two (batch #1) or three (batch #2) different multiplexes were used. For batch #1 there were potentially 143, 89 and 69 samples for 2, 3 and 4 person mixtures, respectively, whereas for batch #2 there were potentially 380, 352 and 397 samples.
The following restrictions were then applied:
• Mixture proportion of POI must be less than 20%.
• The LR of POI must be in range log10LR> 3 and log10LR< 9.
• No more than four samples per multiplex were analysed in combination (randomly drawn).
• The mixture cannot fit better with an additional contributor (unknown) compared to that indicated as ground truth.
With these restrictions, the number of samples analysed was reduced to 5 two-person mixtures (one set), 25 three-person mixtures (three sets) and 50 four-person mixtures (see Table S1 and Table S2 in supplementary for full list). Ten were used at least twice when different POIs were evaluated (one three-person mixture and nine four-person mixtures). The dataset used for the mixture analysis is available in supplementary materials (see overview of samples in Table S1 and Table S2).

#### 2.3.2 Deciding thresholds and drop-in models

The single source profiles in the PROVEDIt dataset were used to determine the analytical threshold per multiplex. For each sample, a compilation was made of peak heights of all non-OL alleles that could not be attributed to known contributors or back- or forward stutter. The threshold was designated as the 95% upper percentile of the compiled peak heights [
• Bregu J.
• Conklin D.
• Terrill M.
• Cotton R.W.
• Grgicak C.M.
Analytical thresholds and sensitivity: establishing RFU thresholds for forensic DNA Analysi.
]; calculated across all markers for each of the five datasets (different multiplexes). This percentile choice was made since this gave values close to what could be seen as “elbow/knee point” of the empirical cumulative distribution, corresponding to a change in the distribution (see supplementary file). The drop-in model was estimated after the threshold had been decided. Here the drop-in probability parameter was calculated as the relative frequency of obtaining one drop-in (across all markers and samples). The lambda parameter was estimated using the maximum likelihood estimate.

#### 2.3.3 Propositions and calculations

Likelihood ratios were calculated for alternative propositions as follows:
• $H1:$The POI and $K−1$ known(s) are contributors to the combined samples
• $H2:$ One unrelated unknown and $K−1$ known(s) are contributors to the combined samples
where “known(s)” is the set of true contributing references where the POI is not included. The number of contributors, $K$, was defined as stated by the PROVEDIt dataset.
The LR was calculated for 1 – 4 combined samples with the same contributors. Different model parameters were assumed for each sample, and an Fst= 0.01 was applied for all calculations. All LRs were calculated based on the NIST1036 Caucasian allele frequency database [
• Hill C.R.
• Duewer D.L.
• Kline M.C.
• Coble M.D.
• Butler J.M.
U.S. population data for 29 autosomal STR loci.
]. Rare alleles were imputed with minimum frequency (1.39e-3) and frequencies were afterwards normalized.
The choice of using degradation, backward- or forward stutter model options was based on the adjusted maximum likelihood value under H1, $logL(θˆ|H1)−θ,$ by evaluating each sample alone. The model option with highest adjusted likelihood was selected, which is equivalent to selecting the model with smallest Akaike information criterion [
• Akaike H.
A new look at the statistical model identification.
].

#### 2.3.4 Analysis on information gain

We defined the information gain of combining several samples as
$IG=log10LRcomb−log10LRmax$

where $LRmax$ is the largest individual likelihood ratio observed in the samples analysed, and $LRcomb$ is the likelihood ratio obtained from the combination of the corresponding profiles into the same calculation. Since the identity of contributors is known, we know whether H1 or H2 is true. In this analysis we calculated the LR for the situations where H1 was true only. The information gain is positive if IG> 0 and negative if IG< 0.
Scripts for performing the calculations and analysis can be obtained by contacting the corresponding author.

#### 2.3.5 A practical combination example from the PROVEDIt dataset

As a separate analysis we wanted to investigate how the combination of several samples would compare to the current implementation of EuroForMix. To facilitate, we studied the following samples from the contribution group “2‐3‐49_50_29” (3-person mixtures in batch #2) where Ref29 was considered as the POI:
• 1: F6C-49_50_29-F03-M4a-0.09F6C-Q0.5_06
• 2: GF-49_50_29-B08-M4a-0.09GF-Q0.5_02
• 3: GF-49_50_29-G06-M3U60-0.09GF-Q6.7_07
• 4: IDPlus28-49_50_29-A09-M3e-0.186IP-Q1.6_001
Since samples 2 and 3 were analysed with the same multiplex kit we did a comparison where these were considered as replicates in EuroForMix. To carry out, we ran a comparison with EFMrep where the parameters were either shared (as for EuroForMix) or unshared (only possible for EFMrep). For this example we also ran non-contributor tests based on all non-contributing references from the PROVEDIt dataset. This comprised a total of 100 references, which were considered in turn as the POI. The following hypotheses were set: H1: “POI, Ref49 and Ref50 are contributors to the samples” vs H2: “An unrelated unknown, Ref49 and Ref50 are contributors to the samples”.

### 2.4 A real case

#### 2.4.1 Case circumstance

In 2019, a juvenile was brutally raped, but because she did not immediately report the assault to police, vaginal samples were not taken by the pathologist. As a consequence of this sexual assault, the victim became pregnant and she finally reported the crime. Following abortion, a sample from the uterus of the victim was collected by the pathologist. The abortion was carried out before visible embryonic development had occurred. A jar containing bloody liquid with fragments of tissue of different characteristics and colors, and a reference sample from the victim were sent to a laboratory for DNA analysis.
Six samples from different tissue fragments in the bloody liquid were collected and labeled A to F. All were typed using the Globalfiler amplification kit, and additionally, sample C was amplified (sample C-Fus) with the Fusion kit.
The purpose of the test was to determine if the evidence supported the presence of an early stage embryo where alleles would be shared by the victim (mother) and the perpetrator (father). The expectation was that either a single profile of the embryo or a DNA mixture from the victim and the unborn would be detected. With this information, a familial search for the father, who would be the alleged rapist, can be performed on a national DNA database.

#### 2.4.2 Dataset

The DNA profiles from the seven amplifications (alleles and peak heights) and the victim (only alleles) are available in the supplementary; allelic identities were changed in order to maintain the anonymity of the individuals for legal reasons, but the essential features of allele sharing and peak heights are retained in order to reflect the original data structure as accurately as possible. Consequently we report results for both the modified and the original data. A file of allele frequencies used to calculate the LRs is also provided.
Two-person DNA mixtures were obtained in all samples except for sample E, where a single source profile attributed to the victim was obtained. Samples A, C and C-Fus showed a major male and a minor female component attributed to the victim. For samples B, D and F, the mixture proportions were the opposite: the major component was from the female and the minor from the male.

#### 2.4.3 Analysis

First, it is interesting to evaluate the data to support the proposition that the male component is the child of the victim. Accordingly, the LR was calculated based on the following alternatives:
• $H1:$ The victim and one unknown (who is a son of the victim) are contributors to the samples
• $H2:$ The victim and one unknown (unrelated to the victim) are contributors to the samples
In the H2 proposition it is assumed that the unknown is unrelated to the victim, for instance the male component could originate from a contamination.
Secondly, the most likely genotypes of the male component were assigned by deconvolution. This was carried out with respect to proposition H1, under the assumption that the male is the son of the conditioned victim. The uncertainty of the assignments was based on marginal genotype probabilities. Deconvolution results were compared to those calculated from the H2 proposition, under the assumption that the unknown male was unrelated to the victim.
Results obtained from all seven samples were evaluated together with EFMrep using the following settings: AT= 36 rfu for Globalfiler and AT= 54 rfu for Fusion. Default drop-in model applied (lambda=0.01 and drop-in prob=0.05) and Fst= 0. Different model parameters were assumed for each sample (configured with EFMrep): No stutter model was applied for the final model since stutters had been removed by filtering. The degradation model was omitted for Sample C-Fus since a preliminary analysis estimated a degradation slope slightly greater than one. However, combining different kits is still possible with the degradation model turned on.

## 3. Results

### 3.1 PROVEDIt

#### 3.1.1 Deciding thresholds and drop-in models

Single source profiles were used to calculate analytical threshold and the drop-in model for each kit (Table 1).
Table 1The calculated analytical threshold (AT), drop-in probability (pC) and drop-in peak height parameter (lambda) values used for the PROVEDIt analysis.
BatchKitATlambdapC
1Identifiler (29c)470.02580.032
1PowerPlex16160.09620.063
2Identifiler (28c)110.4070.066
2Fusion 6 C450.0340.058
2GlobalFiler290.03550.046

#### 3.1.2 Information gain

Detailed results of the 1931 combination calculations are provided in Table S2 (supplementary). There was an increase of information gain when the number of combined samples was increased (Fig. 1): There were approximately 2–4 orders of magnitude increase when two samples were combined; 3–6 orders of magnitude when three samples were combined and 4–7 when four samples were combined. Not much difference was detected in the information gain between the 3p and 4p mixtures. While the 2p mixtures did not reveal the same gain, only a few data points were available for evaluation. It is uncertain what effect adding more samples would have on the information gain for the 2p mixtures. Only one example was observed where the information gain was lower than ‐1 (i.e., combining samples performed considerably worse than each sample evaluated separately). IDPlus29‐4_5_1_2‐D03‐0.04717IP_04 (log10LR=7.38) combined with IDPlus29‐4_5_1_2‐H05‐0.125IP_08 (log10LR 4.71) gave a combined LR of log10LR= 6.09 for POI=Ref4, hence an information gain of IG=-1.29 was obtained. The reason for this lowered LR was that both samples had an allele dropout at marker D8S1179 which was an unlikely occurrence, and the genotype of POI at marker D3S1358 was also unlikely under H2 (only 0.6%).

#### 3.1.3 A practical example from the PROVEDIt dataset

We compared LRs obtained from the EFMrep GUI with the output using the EuroForMix GUI for four selected samples in the “2‐3‐49_50_29” contribution group (3-person mixtures in batch #2) where Ref29 is the POI. See details in supplementary. The results were concordant (approximately the same) when the same model was applied. Interestingly, the combination of the two samples from the same kit (both Global Filer), increased the LR using EuroForMix: the information gain was 1.85, even when model parameters differed between the two samples. Surprisingly, the model validation passed with no failures (significance level 1%). However, peak height variation (PHvar) model parameter was estimated to be quite high: 0.56, which is an increase from 0.35 and 0.44 when the samples were evaluated separately. This increase of estimated PHvar parameter indicated that the two samples were not suitable to be combined. By using EFMrep instead, the information gain was calculated as 3.15, and this time the estimated PHvar parameters were close to those observed when samples were evaluated separately. The supplementary also illustrates the output from EFMrep if either three or four of the samples are included in the same analysis.
A non-contributor analysis was conducted using EFMrep, where a combination of different samples was included in the analysis. From Fig. 2 it is shown that the effect of combining multiple samples in the same model gives a reduction of the LR for non-contributors, in contrast to an increased LR for the true contributors.

### 3.2 A real case

See Section 2.4 for the case circumstances. Two sets of calculations are provided a) original b) modified. Results based on the latter are provided in the supplementary in lieu of the original data because of privacy reasons that prevent disclosure.

#### 3.2.1 LR calculations

LR values and corresponding mixture proportions for the male component of each sample were calculated under propositions H1: “the victim and the son are contributors to the samples” and H2: “the victim and an unknown unrelated are contributors to the samples” (Table 2). In sample E, only a single source profile attributed to the conditioned victim was detected, therefore LR= 1. As expected, highest LR values were obtained from samples where the male component was the major contributor (sample A, and both C samples); lowest LR values were obtained from samples where the victim was the major contributor.
Table 2The table shows both the estimated mixture proportion (Mx) for the male component (in %, for both propositions) and the LR results calculated for each sample and combined (modified data). The last column shows the LR for the original data (not modified). Proposition H1 conditions that the unknown is a child of the victim, whereas H2 conditions that the unknown is unrelated to the victim.
SampleMx H1/H2 SeparateMx H1/H2 CombinedLog10LR (modified)Log10LR (original)
Sample A69.9/68.470.1/70.19.337.83
Sample B10.5/3.039.16/9.161.501.91
Sample C73.0/67.073.1/73.17.525.80
Sample C-Fus68.8/62.568.8/68.97.205.56
Sample D4.23/1.815.30/5.300.390.39
Sample E0/00/000
Sample F15.5/6.1018.0/18.01.050.95
Combined10.148.46
Further analyses with EFMrep were carried out to test the influence of combining samples analysed with the same or different multiplexes (note that the modified analysis is detailed in the subsequent text). For example, sample A and sample C-Fus together (different multiplexes), resulted in a log10LR value of 10.05, whereas sample A and sample C together (same multiplex), resulted in a log10LR value of 9.46. When all three samples were combined together a log10LR of 10.14 was calculated. Finally, when all samples (A-F) were combined, the result was the same: log10LR= 10.14. This occurs because there is scant information provided by samples B, D, E and F. The LR values for the original data are provided in Table 2.
Estimated model parameters for the full model are included in Fig S2 (supplementary). The goodness of fit validation PP plot (Fig. S3 in supplementary) showed that the model fitted well. From Table 2, the estimated mixture proportion parameters for the separate samples deviates between the H1 and H2 propositions. However, when all samples were combined, estimates under H1 and H2 were almost the same.

#### 3.2.2 Deconvolution

Deconvolution under H1, which assumes that the male component was the child of the victim, obtained probability of 1 for the most likely assigned genotype for all markers, except for Penta D and Penta E, since these were analyzed from only one of the samples; C-Fus (Table S4 in supplementary). Deconvolution under H2 gave the same probability results as for H1, except for the Penta markers (but the same top ranked genotypes were obtained). Probabilities of deconvolved markers were compared: either based on sample C-Fus alone or with all samples combined. The 11/19 genotype at Penta E gave a probability of 0.950 for H1 and 0.973 for H2 when the sample was evaluated alone (Table S5 in supplementary). The probabilities changed to 0.952 and 0.992 respectively when all samples were combined. For Penta D, the 9/9 genotype gave a probability of 0.951 for H1 and 0.893 for H2, whereas these values increased to 0.954 and 0.957 respectively when all samples were combined.

## 4. Discussion

This paper presents EFMrep as a new probabilistic genotyping software which extends the model of EuroForMix to improve analysis when multiple samples, either using the same or different multiplex, are combined.
In the first part of the paper samples from the PROVEDIt dataset were used to perform an exploratory study where 2–4 samples were combined to calculate an LR. Calculations were restricted to examples where the POI had less than 20% mixture proportion (minor contributor) and returned a moderate LR in a range of log10LR= (3,9). The purpose of this was to focus on challenging comparisons, instead of considering situations where the LR was already large; for instance, an increase of LR from log10LR= 3 to log10LR= 8 would be more useful than an increase from log10LR= 10 to log10LR= 15, since higher values above log10LR= 9 have little court reporting impact (due to saturation of information).
It was demonstrated that almost all combined calculations returned an LR greater than that derived from each of the individual samples. Importantly, this shows that combining multiple samples is expected to return a positive information gain: a change of 3–7 in log10LR would be expected using the multiplexes used in the examples analysed in Section 3.1.
Propositions were restricted to include all known references to be contributors (except that the POI was replaced with an unknown under the alternative proposition). The reason for this was a) to increase the speed of the computations and b) if the propositions contained unknown contributors instead of known(s) (conditional), we would expect a larger information gain than that observed. The latter will be investigated in a future paper.
For this exploratory study the choice of stutter types that should be modeled in the analysis were based on the ground truth (i.e., the POI is known to be a contributor). Under real case circumstances however, this is not possible, and the recommended approach is to make the model choice under the assumption that the POI is not a contributor to the evidence (see [
• Bleka Ø.
• Benschop C.
• Storvik G.
• Gill P.
A comparative study of qualitative and quantitative models used to interpret complex STR DNA profiles.
] for an example). We recommend that laboratories perform their own internal validation of EFMrep accordingly.
A subset of the PROVEDIt samples were used as a practical example to compare EFMrep with EuroForMix. The same approximate values were obtained for the two software when the same model was used for each. However, EFMrep is preferable when the DNA profiles exhibit different behavior (i.e., different model parameters are estimated for each profile). This manifested as an increased LR without a concurrent increase in the peak height variation. Non-contributor analysis was conducted which demonstrated that increased information gain for true contributors corresponded to negative information gain for non-contributors.
In the second part of the paper (Section 3.2), a real case involving kinship mixture samples (a mother and her child) was presented. Samples providing scant information (samples B, D, E and F) did not contribute to the combined result – i.e., the best three samples together returned a log10LR= 10.14 which was the same as that achieved with all seven samples combined. The purpose of this case was to assign a genotype to the child so that a familial search for the father could be undertaken on a national DNA database. When deconvolution was carried out, it was observed that the benefit of combining multiple samples was greater under the “unknown unrelated” assumption than under the “unknown related” assumption.
In order to calculate the power of a database search with the deconvolved profile of the child, conditional simulations were run with Familias software [
• Kling D.
• Egeland T.
• Piñero M.H.
• Vigeland M.D.
Evaluating the statistical power of DNA-based identification, exemplified by “The missing grandchildren of Argentina”.
] making use of the R-package paramlink [
• Egeland T.
• Pinto N.
• Vigeland M.D.
A general approach to power calculation for relationship testing’.
]. The original data were used. The following propositions were considered: H1: “the individual in the database is the father of the child” versus H2: ”the individual in the database is unrelated to the child”. Both propositions conditioned that the victim is the mother of the child. When random father genotype profiles were simulated conditioning on the known genotypes in the family (child and mother), the median LR value [5% and 95% percentiles] was 1.452e09 [7.286e06, 2.771e11]), which indicates that there is sufficient power to solve this case if the father is actually present in the database.
EFMrep contains a kinship module where unknown contributors specified in any of the hypotheses can be assigned as related to any typed individual. It is possible to specify several related unknowns, but interrelatedness between these individuals cannot be specified. For example, if it is assumed that two unknown contributors are siblings where the profile of the typed mother is available, the implementation will utilize the pairwise information between the siblings and the mother, but not between the siblings. Other further extensions, to be implemented in a future update, would be to utilize relationships specified as a pedigree: For example, suppose that there are three typed siblings and the proposition is that an unknown contributor is the alleged father of all siblings (the current implementation considers one sibling at a time).

## 5. Conclusion

EFMrep provides an extended model of EuroForMix by allowing different model parameters to be assigned to each DNA profile involved in the analysis, rather than applying global model parameters to all. The analyses conducted in this paper showed that there is a benefit/advantage of using EFMrep over EuroForMix when multiple samples, including those from different multiplexes, are combined. When DNA profiles originate from the same extract, but from different amplifications of the same multiplex run with the same analysis setting, then the two tools should coincide. Conversely, if DNA profiles are generated from different extracts or different multiplexes then it is preferable to use EFMrep. We demonstrated the advantage of EFMrep with a real case involving kinship. Here it was observed that evaluations based on limited conditioned information, i.e., propositions that involve unknown unrelated contributors would especially benefit from the combination of samples. This benefit of applying the calculation to samples that have more conditioned information was not as great - for instance if a contributor is specified with a certain kinship or a conditioned known individual (no longer unknown).

## Declaration of Competing Interest

We confirm that there is no conflict of interest.

## Appendix A. Supplementary material

• Supplementary material

.
• Supplementary material

.
• Supplementary material

.
• Supplementary material

.

## References

• Steele C.D.
• Greenhalgh M.
• Balding D.J.
Verifying likelihoods for low template DNA profiles using multiple replicates.
Forensic Sci. Int.: Genet. 2014; 13: 82-89https://doi.org/10.1016/j.fsigen.2014.06.018
• Bleka Ø.
• Storvik G.
• Gill P.
EuroForMix: An open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts.
Forensic Sci. Int.: Genet. 2016; 21: 35-44https://doi.org/10.1016/j.fsigen.2015.11.008
• Cowell R.G.
• Graversen T.
• Lauritzen S.L.
• Mortera J.
Analysis of forensic DNA mixtures with artefacts.
Appl. Stat. 2015; 64: 1-32
• Green P.J.
• Mortera J.
Inference about complex relationships using peak height data from DNA mixtures.
J. R. Stat. Soc.: Ser. C. 2021; 70: 1049-1082https://doi.org/10.1111/rssc.12498
• Schmidt M.
• Schiller R.
• Anslinger K.
• Wiegand P.
• Weirich V.
Statistefix 4.0: A novel probabilistic software tool.
Forensic Sci. Int.: Genet. 2021; 55102570https://doi.org/10.1016/j.fsigen.2021.102570
• Taylor D.
• Bright J.-A.
• Kelly H.
• Lin M.-H.
• Buckleton J.
A fully continuous system of DNA profile evidence evaluation that can utilise STR profile data produced under different conditions within a single analysis.
Forensic Sci. Int.: Genet. 2017; 31: 149-154https://doi.org/10.1016/j.fsigen.2017.09.002
• Alfonse L.E.
• Garrett A.D.
• Lun D.S.
• Duffy K.R.
• Grgicak C.M.
A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt.
Forensic Sci. Int.: Genet. 2018; 32: 62-70https://doi.org/10.1016/j.fsigen.2017.10.006
• Bregu J.
• Conklin D.
• Terrill M.
• Cotton R.W.
• Grgicak C.M.
Analytical thresholds and sensitivity: establishing RFU thresholds for forensic DNA Analysi.
J. Forensic Sci. 2013; 58: 120-129https://doi.org/10.1111/1556-4029.12008
• Hill C.R.
• Duewer D.L.
• Kline M.C.
• Coble M.D.
• Butler J.M.
U.S. population data for 29 autosomal STR loci.
Forensic Sci. Int.: Genet. 2013; 7: e82-e83https://doi.org/10.1016/j.fsigen.2012.12.004
• Akaike H.
A new look at the statistical model identification.
IEEE Trans. Autom. Control. 1974; 19: 716-723https://doi.org/10.1109/TAC.1974.1100705
• Bleka Ø.
• Benschop C.
• Storvik G.
• Gill P.
A comparative study of qualitative and quantitative models used to interpret complex STR DNA profiles.
Forensic Sci. Int.: Genet. 2016; 25: 85-96https://doi.org/10.1016/j.fsigen.2016.07.016
• Kling D.
• Egeland T.
• Piñero M.H.
• Vigeland M.D.
Evaluating the statistical power of DNA-based identification, exemplified by “The missing grandchildren of Argentina”.
Forensic Sci. Int.: Genet. 2017; 31: 57-66https://doi.org/10.1016/j.fsigen.2017.08.006
• Egeland T.
• Pinto N.
• Vigeland M.D.
A general approach to power calculation for relationship testing’.
Forensic Sci. Int.: Genet. 2014; 9: 186-190https://doi.org/10.1016/j.fsigen.2013.05.001