If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
High throughput analysis of buccal scrape reference samples using the Identifiler system on the Applied Biosystems 3500xl Genetic Analyser is described. This platform is much more sensitive than previous platforms, e.g. 3130xl. The range of signal detection is also much greater and this means that the system is more tolerant to a wide range of input template concentrations. DNA quantity is not a limiting factor in the analysis of buccal scrapes, hence the entire analytical procedure (process) was designed around a target input of 1.5 ng, to minimise low level DNA profiles and associated events of allele dropout. A universal stochastic threshold/limit of detection (LOD) of 300 relative fluorescent units (rfu) was applied across all loci. This level is twice that of comparable systems such as SGM Plus analysed on 3130xl instruments. After analysis of dropout probabilities, heterozygous balance, drop-in and stutter characteristics, rule-sets were programmed into the FSS i-cubed software. The process has a success rate >95%. No discordances or mis-designations occurred when applied to a validation set of more than 1000 samples.
The United Arab Emirates (UAE) National DNA Database Centre, in conjunction with the Forensic Science Service (FSS) has recently implemented a high-throughput DNA profiling system that combines the Identifiler multiplex with the Applied Biosystems 3500xl Genetic Analyser system.
The 3500xl is a relatively new platform that behaves differently to the previous 3130xl version. In particular, the sensitivity of the instrument is greatly enhanced, and the range of detection is also much greater [
]. This increased range offers an advantage that can be exploited – the entire analytical procedure or ‘process’, which begins when the swab is extracted and finishes when the electropherogram (epg) is generated, is much more tolerant to a wide variation of input DNA concentrations. The validation reported in this paper was designed to characterise performance of the system to ensure that risks of mistyping were minimised, whilst maximising the success rates for the first analysis (if profiles were poor quality then they were reprocessed but the aim is to keep this need to a bare minimum, without compromising quality).
In order to characterise the DNA profiles generated by a process, it is necessary to evaluate heterozygote balance, stutter size, drop-in rates and associated peak heights [
The limit of detection (LOD) – typically set to ca. 50 rfu in classical systems. The purpose is to define a level where alleles can be clearly separated from background signal noise (this is a user defined analytical threshold that is used by software to accept or reject allele peaks).
(b)
The stochastic threshold (T) – many laboratories set this level to 150 rfu, the purpose is to define a threshold that limits the risks of heterozygote allele dropout resulting in a misidentified homozygote. Gill et al. [
] demonstrated how this level may be probabilistically determined to determine using logistic regression.
Once the various parameters listed above have been evaluated, the next step is to program ‘tolerances’ into expert system software (FSS-i3®) in order to automate the allele designation process, and to highlight to operators potential issues that may require re-analysis. The process was tested against a total of 1071 samples. The ‘first analysis’ success rate was >95% and miss-designation rates were 0% (none recorded).
Limit of detection (LQD)/limit of quantitation (LOQ): comparison of the empirical and the statistical methods exemplified with GC–MS assays of abused drugs.
] of 140 rfu in relation to input levels of 0.1–1.0 ng DNA. Furthermore, it was suggested that a single user defined analytical threshold applied across all dye-sets resulted in significant loss of allele designations.
] did not distinguish between the requirements of crime-stain analysis vs. reference sample analysis. The criteria for reference samples and for crime-stain samples need to be evaluated separately, since they are effectively two different processes and the conditions of testing are quite different. Crime-stains are of ‘unknown origin’ and are often compromised as mixtures or degraded samples. The quantity of DNA is limited. With reference sample buccal scrapes, several nanograms are usually available for analysis. In this study we evaluate reference samples only. The guidelines for reference samples cannot be applied ‘universally’ to crime-stain analysis.
In conjunction with target template input of 1.5 ng, we demonstrate that a universal (LOD/stochastic) threshold set to 300 rfu is ‘fit for purpose’ across all dye-sets. This threshold is twice that of the comparable 3130xl instrumentation. When this threshold is programmed into the FSS-i3® software as ‘tolerance’, the ‘first analysis’ success rate is at least 95%. Reporting guidelines and thresholds for crime stains will be evaluated in a separate study.
2. Materials and methods
2.1 Sample preparation and analysis
2.1.1 SGM plus
Single donor profiles were obtained from 14 Forensic Science Service (FSS) staff volunteers. DNA extraction was carried out using Qiagen EZ1 Investigator (Qiagen, UK), and quantified using PicoGreen® DNA assay (Invitrogen, Paisley, UK) following the manufacturers’ instructions.
Extracts were diluted to provide a DNA target quantity of 1.5 ng (total PCR input). Samples were amplified using the AMPFlSTR® SGMPlus multiplex (Applied Biosystems, UK) in duplicate at 28 cycles using a Tetrad™ thermocycler (Genetic Technologies, Inc. Miami, FL), following manufacturers’ recommendations. Detection was performed in duplicate by capillary gel electrophoresis using a 3100xl Genetic Analyser; injection parameters set to 1.5 kV for 10 s. Results were analysed using FSS DNA Insight v 1.2 to obtain allele designations, peak height and area information for a total of 84 DNA profiles, consisting of 924 loci for study.
2.1.2 Identifiler data set one
Three sets of buccal samples were collected from a total of 86 donors. DNA was extracted using the Qiagen Biorobot Universal and quantified using the PicoGreen® DNA assay and Tecan Infinite instrumentation. Samples were prepared for amplification using a Tecan EVO robot that was set to aliquot 1.5 ng (total input) template DNA. Samples were amplified multiple times to give a total of 246 profiles.
2.1.3 Identifiler dilution data set two
Buccal samples were obtained from eight donors; DNA extracted using the Qiagen Biorobot Universal (Qiagen, UK), and quantified using a Tecan Infinite instrument (Tecan Group Ltd., Switzerland). Samples were manually set up using a dilution range with the following DNA template concentrations: 0.05, 0.25, 0.5, 0.1, 1.0, 1.5, 2.0, and 2.5 ng/μl for each donor in triplicate, producing 192 profiles.
Amplifications were performed for Identifiler data sets one and two, using 28 cycles on a Dyad thermocycler (Genetic Technologies, Inc., USA) and AmpFlSTR® Identifiler® Kit (Applied Biosystems, UK), with a reaction volume of 25 μl. Capillary gel electrophoresis plates were prepared with a total dilution volume of 1.5ul donor sample and 13.5 μl GS 600 Liz® V2 size standard (Applied Biosystems, UK) with added HiDi formamide (Applied Biosystems, UK) mix (1:40 dilution ratio of GS 600 Liz® v2 size standard to HiDi formamide). Capillary gel electrophoresis was performed using a 3500xl Genetic Analyser (Applied Biosystems, UK) using an injection parameter of 1.2 kV for 24 s. Output data were analysed using GeneMapper® ID-X v1.2 (Applied Biosystems, UK) using AmpFlSTR® Identifiler® panel settings and lower limit of detection (LOD) analysis threshold of 50 rfu.
Following GeneMapper analysis, Identifiler data set one gave 246 full profiles (3936 loci) for study, and Identifiler data set two produced 192 full profiles, however seven loci were removed due to excessive (outlier) split peaks (A + 1) peaks leaving a total of 3065 loci for study
3. Results and discussion
3.1 Comparison of peak heights obtained from 3500xl and 3100xl sequencers
For each system, allele peak heights were separated into heterozygotes and homozygotes. Gamma distributions were created using a Matlab program (Fig. 1) and basic descriptive statistics were generated using Minitab (Electronic supplement: Tables 1 and 2).
Fig. 1Histogram and superimposed Gamma distribution (solid line) of observed heterozygous allelic peak heights (a) and homozygous peak heights (b) using Identifiler 3500xl, together with the fitted Gamma distributions (dashed line) of SGMPlus 3100xl allelic peak heights.
A comparison of Identifiler (data set one) and SGM Plus (Fig. 1) peak height distributions showed marked differences that can be primarily associated with the different chemistries and sequencing platforms utilised. The SGM Plus/3100xl system provided lower peak heights with a small amount of variation. In contrast, the Identifiler/3500xl system resulted in much greater peak with a correspondingly increased amount of variation. All samples analysed provided full DNA profiles in both series of experiments.
3.1.1 Heterozygous allele peak heights
For heterozygotes (Fig. 1a) the SGM Plus 3100xl probability density mode was at ca.1000 rfu. For the Identifiler 3500xl system, the probability density mode was at ca. 2500 rfu. Also significant inter-locus variation (no association with dye colour) was observed (Electronic supplement: Table 1): a minimum peak height was recorded as 326 rfu at D16S539, maximum peak height recorded as 12,913 rfu at D8S1179; the largest standard deviation of 1509 rfu seen at D19S433, with the smallest being 691 rfu at FGA. Mean peak heights ranged between 2095 and 6265 rfu, and median values between 2017 and 5982 rfu.
3.1.2 Homozygous allele peak heights
A histogram with superimposed fitted gamma probability distribution was plotted for the homozygous allelic peak heights from Identifiler 3500xl, together with homozygous data from SGM Plus profiles (Fig. 1b).
The visual comparison of peak height distributions showed that homozygous peak heights were larger than those obtained from heterozygous loci. SGM Plus 3100xl probability density mode was ca. 2500 rfu, whereas Identifiler 3500xl probability density mode was ca. 6000 rfu.
For Identifiler summary statistics, the homozygous peak heights varied according to locus (Electronic supplement: Table 2) in the same manner relative to heterozygous peaks. The lowest minimum peak height was recorded as 1168 rfu at CSF1PO, the largest maximum peak height recorded as 16,855 rfu at Amelogenin; the largest standard deviation of 4226 rfu seen at D8S1179, with the smallest at 1091 rfu for FGA (FGA also gave the smallest standard deviation for heterozygous alleles). Mean peak heights ranged between 3538 and 12,456 rfu, and median values similarly ranged between 3521 and 11,957 rfu.
3.1.3 Comparison of heterozygous and homozygous allele peak heights
As the homozygote peak is the product of two alleles that are identical by state (ibs) a relative increase in the size of the observed DNA signal is expected. To quantify this relationship, gamma distributions were fitted for homozygote and heterozygote peak heights, together with the summed peak heights for each heterozygous allelic pair (Fig. 2). The distributions for the homozygous and summed heterozygous peak heights were very similar, demonstrating that allelic contributions are additive and linear. This means that there is a simple and direct relationship between the expected height of a homozygote peak compared to the size of a heterozygote allelic peak at a locus. These data support the approach of Tvedebrink [
] to generate logistic models that compared dropout between heterozygotes and homozygotes – the latter were simulated by applying a covariate of 2H where H is the peak height of a heterozygote allele.
Fig. 2Fitted Gamma probability density curves for Identifiler data showing the distribution of peak heights from homozygous and heterozygous alleles, together with a curve of the summed peak heights of heterozygous allele pairs.
There are two primary risks associated with the process of allele designation:
(a)
A heterozygote is inaccurately recorded as a homozygote
Reason: Allele drop-out has occurred; or heterozygote imbalance has resulted in an allele that may be interpreted as a stutter, thus leaving a single peak to be designated as allelic.
(b)
A homozygote is inaccurately recorded as a heterozygote.
Reasons:
i.
A large stutter band is within range to be designated allelic (i.e. within the 50% heterozygous balance guideline) of a homozygote resulting in a heterozygote designation.
ii.
A drop-in event occurs in conjunction with a homozygote, which is designated as a heterozygote.
Risk assessments applied to reference sample profiles used in databases, cannot be transposed to crime-stain profiles. The provenance of reference samples is known; mixtures can only result by contamination. They are easily identified by rule-sets that identify more than two alleles at a locus. In order to filter the primary risks listed above, it is necessary to understand the characteristics of DNA profiles in terms of heterozygote balance, dropout probabilities, drop-in probabilities and stutter size.
3.3 Investigation of the lower tail of the allele peak height distribution
In practice, risks of mis-designation are directly associated with low-level DNA profiles. Consequently we are primarily interested in the lower tail of the distribution of allelic peak heights. The chance of allelic dropout (that could lead to a wrong designation of a heterozygote) is increased as the peak heights decrease.
Fig. 3 shows cumulative density curves for homozygote and heterozygote peak height data using the fitted gamma distributions of Fig. 1. The left tail distribution was investigated further to determine probabilities of the lower peak heights; these are recorded in Electronic supplement: Table 3.
Fig. 3Comparison of fitted Gamma cumulative density curves for heterozygote and homozygote peak height data.
Approximately 1.8% of heterozygous peaks were ≤1000 rfu; the equivalent proportion of homozygote peaks was at a height of approximately ≤2000 rfu. There was a low probability of 1.3 × 10−4 that a heterozygote would be below 300 rfu, and a remote chance of 8.6 × 10−6 that a homozygote peak would fall below this level.
3.4 Heterozygous balance
The heterozygous balance (Hb) is the ratio of peak heights. There are several methods, but for simplicity here we simply divide the smallest peak height by the largest peak height so Hb < 1 [
]. If a putative pair of alleles are less than this value then the profile designation is rejected (i.e. the sample would be reanalysed). Heterozygote balance was calculated and plotted against the corresponding heterozygote mean peak height for Identifiler (Electronic supplement: Fig. 1).
With 1.5 ng of input DNA, analysis showed that 99.63% (2999/3010) heterozygous allele pairs have a heterozygous balance proportion equal to or exceeding 0.60 (60%); this increases to 99.90% (3007/3010) for a heterozygous balance guideline of 0.55 (55%); and increases further to 99.97% (3009/3010) using the 0.50 (50%) guideline. Only one datum point was below the 50% guideline, with a heterozygous balance ratio of 0.335 (33.5%) from locus D16S539. The FSS-i3® software had a tolerance of 0.55, and this profile was rejected and flagged for reanalysis.
3.5 Stochastic threshold
The stochastic threshold is a guideline used to delineate the set of single banded loci that may be heterozygotes (if drop-out has occurred) [
]. Drop-out was observed at DNA template quantities between 0.05 and 0.5 ng; none was seen at larger quantities. The relationship between peak height and allele dropout was evaluated by logistic regression using the method of Gill et al. [
Fig. 4 shows the estimated logistic regression curve with approximated upper and lower 95% confidence intervals. The solid line provides the drop out probability corresponding to the peak height on the x-axis. As the probability of drop out approaches zero, peak heights become closer to 400 rfu.
The 3500xl instrument is much more sensitive compared to the 3100. To compare, logistic regression performed using SGM Plus/3100 data (not shown), resulted in 150 rfu peak height equivalent to a probability of dropout (Pr(D) = 0.01). The equivalent peak height from the Identifiler dilution data using the 3500xl CE system corresponded to 240 rfu.
A log 10 transformation of the data was used to analyse the tail of the distribution (Table 1). For the Identifiler/3500xl a stochastic threshold of 300 rfu corresponded to a lower Pr(D) = 0.003 and was therefore demonstrably more conservative than the traditional SGM plus/3130xl ‘equivalent’ at 150 rfu.
Table 1Estimated drop out probabilities for Identifiler 3500xl obtained from estimated logistic regression (Fig. 4) using 1.5 ng template DNA (data set 1), over a peak height range of 100–600 rfu.
The stutter ratio is calculated by dividing the stutter peak height by its ‘parent’ allele peak height. The classical stutter ‘guideline’ of 0.15 is often used in the analysis of major/minor mixtures [
]. In addition, many laboratories utilise the locus-stutter filters that are provided with commercial kits. Whereas a consideration of between locus stutter is important for case-work, it has limited relevance to reference sample analysis since there is ‘clear water’ between the stutter guideline and the heterozygous balance guideline of 0.5. The largest stutter ratio observed using target 1.5 ng DNA template and Identifiler/3500xl multiplex was ca. 0.2 (Electronic supplement: Fig. 2) – i.e. well below the heterozygous balance guideline. The FSS-i3® software uses a universal tolerance of 0.15 (this means that an allele in a stutter position that was greater than this level and less than the heterozygote balance tolerance of 0.55), would be rejected and sent for reanalysis.
A total of 99.84% (5058/5066) of data points fall below the 0.15 guideline, with approximately 0.16% (8/5066) stutter proportions exceeding this guideline. FGA and D19S433 were found to be the origin of the elevated stutter peaks, with stutter proportion ranges for FGA being 0.161–0.198, and D19S433 being 0.151–0.159.
3.7 Drop-in guideline
The definition of drop-in specifically refers to one or two allelic peaks that occur randomly and cannot be sourced [
]. This is distinct from ‘gross-contamination’ where multiple alleles arise from a single individual, giving rise to a classic mixture. Negative controls are evaluated to estimate the drop-in rate. Most drop-in events are low-level and cannot be distinguished from background ‘noise’ and this threshold is delineated by the ‘limit of detection’ (LOD) threshold. Consequently, the drop-in rate requires consideration relative to the size of the peaks.
In order to evaluate the Identifiler 3500xl process, two negative (water only) control amplification plates were analysed. This consisted of 89 and 43 negative samples respectively. Results showed that the frequency of a drop-in event was relatively high (between 13% and 19%). Results were listed in Table 2. Fig. 5 collates peak heights of drop-in events. However, all were below 150 rfu, i.e. drop-in was unlikely to compromise a 300 rfu universal threshold.
Table 2Occurrence of sporadic drop-in events where the total number of alleles observed cannot exceed two per profile and gross contamination events (where 3+ alleles are observed per profile). Determined from negative controls using water only amplifications (Identifiler 3500xl).
Water batch
Number of negative samples
Number of sporadic drop in contamination events (frequency)
3.8 A universal LOD/stochastic threshold for reference samples
We have implemented a 300 rfu combined LOD/stochastic threshold across all loci. There are certainly differences between loci in terms of amplification efficiency (Electronic supplement: Tables 1 and 2), but provided that sufficient template is provided, the 3500xl instrument has sufficient range to accommodate the wide variation between loci, within the Identifiler multiplex (this will clearly be an issue for crime-stain analysis however). Drop-in events were below the universal threshold. Stutters were less than 0.2 the size of the parent allele and this below the heterozygous balance threshold of 0.55 (FSS-i3® used a stutter tolerance of 0.15 to reject a designation).
Thresholds are easily implemented into programmed rule sets used by FSS-i3®, where they either reject samples, or provide a ‘flag’ for an operator to carry out visual investigation, Fig. 6 illustrates critical thresholds.
Fig. 6Diagrammatic representation of thresholds. The 300 rfu threshold acts as a combined stochastic/LOD threshold. No alleles are reported below this level. The drop-in (extra allele) threshold is 100 rfu.
The rule-sets were tested against 1071 samples in the validation exercise. An overall ‘first analysis’ rate of >95% was achieved with the Identifiler/3500xl process. The numbers of samples that may be analysed to validate any process are always strictly limited by available time and resources. This is why risk analysis to evaluate extremes of distributions provides valuable additional predictive information that cannot be achieved by ‘number crunching’. We showed that there was 1 in 10,000 chance of a heterozygote allele ca. 300 rfu peak height; Table 1 showed less than 1 in 300 chance of a drop-out event (so the chance of both events simultaneously occurring is conservatively estimated as 1 in 3,000,000 loci). There is an additional safeguard in that homozygote loci are extremely rare at the low level, ca. 1 in 1 million events. It would be very likely that such a profile would require reanalysis since heterozygotes are on average half the size of homozygote peaks. This means that other loci in the profile would probably fall below the 300 rfu universal threshold. Although peaks below 300 rfu are not reported, they are still recorded and may ‘fire’ rules that result in rejection of a sample. For example, an allele in a stutter position that is >0.15 the size of the parent allele and less than 300 rfu will fail the profile. Similarly an allele, e.g. a drop-in event >100 rfu will also result in a ‘flag’ for the operator.
Conversely, overamplification can also lead to interpretation problems because of the introduction of artefacts such as increased n − 1 and n + 1 stutters, minus A peaks and poor allele peak morphology. These events are captured by a ‘high signal’ threshold set at 20,000 rfu set in the FSS-i3® software, as a ‘flag’ for the operator to make a decision on the quality (tested successfully with a series of samples comprising 5 ng). Note that the dynamic range of the 3500xl instrument is much higher than for its predecessors.
Risks associated with reference sample and crime-stain databases have different consequences and require separate consideration [
]. Analysis of crime-stains will require separate (per locus) consideration, since lower levels of target template are encountered and there are pronounced differences between loci that will require evaluation [
]. In particular, the high level of drop-in observed indicates that the standard system will detect classic low-template DNA and this means that due consideration of drop-in, drop-out and heterozygote imbalance will be necessary [
Limit of detection (LQD)/limit of quantitation (LOQ): comparison of the empirical and the statistical methods exemplified with GC–MS assays of abused drugs.