If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Autosomal and Y-STRs, short amplicons, and sequence variation for more data for low input, degraded and mixed samples.
Abstract
For human identification purposes, forensic genetics has primarily relied upon a core set of autosomal (and to a lesser extent Y chromosome) short tandem repeat (STR) markers that are enriched by amplification using the polymerase chain reaction (PCR) that are subsequently separated and detected using capillary electrophoresis (CE). While STR typing conducted in this manner is well-developed and robust, advances in molecular biology that have occurred over the last 15 years, in particular massively parallel sequencing (MPS) [
], offer certain advantages as compared to CE-based typing. First and foremost is the high throughput capacity of MPS. Current bench top high throughput sequencers enable larger batteries of markers to be multiplexed and multiple samples to be sequenced simultaneously (e.g., millions to billions of nucleotides can be sequenced in one run). Second, compared to the length-based CE approach, sequencing STRs increases discrimination power, enhances sensitivity of detection, reduces noise due to instrumentation, and improves mixture interpretation [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Compatibility of the ForenSeq™ DNA Signature Prep Kit with laser microdissected cells: An exploration of issues that arise with samples containing low cell numbers.
]. Third, since detection of STRs is based on sequence and not fluorescence, amplicons can be designed that are shorter in length and of similar lengths among loci, where possible, which can improve amplification efficiency and analysis of degraded samples. Lastly, MPS offers a single format approach that can be applied to analysis of a wide variety of genetic markers of forensic interest (e.g., STRs, mitochondrial DNA, single nucleotide polymorphisms, insertion/deletions). These features make MPS a desirable technology for casework [
Accurate, rapid and high-throughput detection of strain-specific polymorphisms in Bacillus anthracis and Yersinia pestis by next-generation sequencing.
Sequencing the hypervariable regions of human mitochondrial DNA using massively parallel sequencing: Enhanced data acquisition for DNA samples encountered in forensic testing.
Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs.
FDSTools: a software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise.
]. The developmental validation of the ForenSeq MainstAY library preparation kit with the MiSeq FGx Sequencing System and ForenSeq Universal Software is reported here to assist with validation of this MPS system for casework [
Scientific Working Group on DNA Analysis Methods. Validation Guidelines for DNA Analysis Methods. 2016. 2016 [July 2019]. Available from: https://1ecb9588-ea6f-4feb-971a-73265dbf079c.filesusr.com/ugd/4344b0_813b241e8944497e99b9c45b163b76bd.pdf.
]. The results show that the system is sensitive, accurate and precise, specific, and performs well with mixtures and mock case-type samples.
To exploit the power of MPS for human identification, well-developed multiplex kits, that contain all reagents necessary to target markers of interest and prepare libraries for sequencing have been developed. Some commercially available MPS-based kits for library preparation of STRs, SNPs, and the mitochondrial genome are available, such as the ForenSeq DNA Signature Prep Kit (Verogen, San Diego, CA, USA), Precision ID GlobalFilerTM NGS STR Panel v2 (Thermo Fisher Scientific, Waltham, MA, USA), PowerSeq® 46GY System (Promega Corporation, Madison, WI, USA), Precision ID Identity Panel (Thermo Fisher Scientific), Precision ID Ancestry Panel (Thermo Fisher Scientific), ForenSeq Kintelligence Kit (Verogen), ForenSeq mtDNA Control Region Kit (Verogen), ForenSeq mtDNA Whole Genome Kit (Verogen), Precision ID mtDNA Control Region Panel (Thermo Fisher Scientific), Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific), and PowerSeq® CRM Nested System Custom (Promega Corporation). Developmental validation of these types of MPS-based kits has confirmed their utility for human identification purposes [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Compatibility of the ForenSeq™ DNA Signature Prep Kit with laser microdissected cells: An exploration of issues that arise with samples containing low cell numbers.
Cihlar J.A.-O.X., Amory C.A.-O.X., Lagacé R., Roth C., Parson W.A.-O., Budowle B. Developmental Validation of a MPS Workflow with a PCR-Based Short Amplicon Whole Mitochondrial Genome Panel. Genes. 2020;11(2073–4425 (Electronic)):1345. Epub 13 November 2020.
The ForenSeq DNA Signature Prep Kit contains a range of markers - 59 STRs, 94 identity informative SNPs, 56 ancestry informative SNPs, and 22 phenotypic informative SNPs [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
F.B.I. CODIS and NDIS Fact Sheet [cited 2022 June 7]. Available from: https://www.fbi.gov/services/laboratory/biometric-analysis/codis/codis-and-ndis-fact-sheet.
], a kit that targets a subset of these markers, focusing on autosomal and Y STRs can have additional practical value in forensic genetics. The ForenSeq MainstAY Kit (Verogen) was developed to meet forensic community needs by providing the reagents necessary for simultaneous amplification and library preparation of 52 STR markers (27 autosomal and 25 Y STRs) and Amelogenin with an optimum 1 ng of input DNA for sequencing on the MiSeq FGx system [
Verogen. ForenSeq MainstAY Kit: An affordable library prep for targeted sequencing of established autosomal and Y-STR markers in a single reaction. 2021 [cited 2021 June 2021]. Available from: https://verogen.com/wp-content/uploads/2021/07/ForenSeq-MainstAY-Kit-Datasheet-Document-VD2020055.pdf.
]. The primer sequences and amplicons for the markers used in this multiplex are the same as to those of the ForenSeq DNA Signature Prep kit with addition of one Y STR marker - DYS393 [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Verogen. ForenSeq MainstAY Kit: An affordable library prep for targeted sequencing of established autosomal and Y-STR markers in a single reaction. 2021 [cited 2021 June 2021]. Available from: https://verogen.com/wp-content/uploads/2021/07/ForenSeq-MainstAY-Kit-Datasheet-Document-VD2020055.pdf.
]. The MainstAY kit contains 96 unique dual indexes (UDIs) arrayed in one-time use wells of a pierceable 96-well plate. Reagents other than the primer mix, UDIs and control DNA are the same as those for the ForenSeq DNA Signature Prep Kit [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Verogen. ForenSeq MainstAY Kit Reference Guide 2021 [cited 2022 September 2022]. Available from: https://verogen.com/wp-content/uploads/2022/01/forenseq-mainstay-reference-guide-PCR1-vd2020050-c.pdf.
], the pool of amplified markers or libraries is sequenced on the MiSeq FGx Sequencing System. Studies reported here used the MiSeq FGx Reagent Micro Kit, and results were analyzed with the ForenSeq Universal Analysis Software v2.4 [
Verogen. Establishing robust thresholds and filters for the ForenSeq MainstAY Kit. VD2021030 2021 [cited 2021]. Available from: https://verogen.com/wp-content/uploads/2021/10/VerogenMainstAY-SWThresholds-VD2021030.pdf.
].
Reliability was evaluated in developmental validation studies, based on Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines [
Scientific Working Group on DNA Analysis Methods. Validation Guidelines for DNA Analysis Methods. 2016. 2016 [July 2019]. Available from: https://1ecb9588-ea6f-4feb-971a-73265dbf079c.filesusr.com/ugd/4344b0_813b241e8944497e99b9c45b163b76bd.pdf.
], on the system that includes ForenSeq MainstAY Kit, MiSeq FGx® Sequencing System and the ForenSeq® Universal Analysis Software v2.4 (UAS) with the MiSeq FGx® Reagent Micro Kit. The results from assessments of reagents, multiplex design, instrument performance, software, quality metrics, sensitivity, mixture resolution, species cross reactivity, and robustness are described herein. These studies may assist forensic laboratories with establishing processes for DNA genotyping and analysis using MPS and understanding limitations of the ForenSeq MainstAY system [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Scientific Working Group on DNA Analysis Methods. Validation Guidelines for DNA Analysis Methods. 2016. 2016 [July 2019]. Available from: https://1ecb9588-ea6f-4feb-971a-73265dbf079c.filesusr.com/ugd/4344b0_813b241e8944497e99b9c45b163b76bd.pdf.
Genomic DNA from 93 individual 1000 Genome samples (44 males and 49 females) of several different ancestries were purchased from Coriell Institute for Medical Research (Camden, NJ, USA; see Supplemental Table 1 for details). CEPH families 1454, 1459, and 1463 are included in this sample set. Control DNAs were NA24385 (provided in the ForenSeq MainstAY kit) and 2800 M (Promega Corporation, Madison, WI, USA). A degraded DNA series was purchased from InnoGenomics Technologies (New Orleans, LA, USA, [
Optimizing DNA recovery and forensic typing of degraded blood and dental remains using a specialized extraction method, comprehensive qPCR sample characterization, and massively parallel sequencing.
]) and described below. SRM 2391d samples A-C were obtained from NIST (Gaithersburg, MD, USA). Genomic DNA from non-human species included Old World primate rhesus monkey, five non-primate mammals (pig, cow, dog, cat, horse), one avian species (domesticated chicken) (purchased Zymogen Laboratories, San Diego, CA, USA), and one bacterial species Escherichia coli (E. coli) (purchased SIGMA-Aldrich, St. Louis, MO, USA).
Table 1Summary STR typing and total sample read count metrics for postmortem samples from six bones and five teeth from 11 individuals using ForenSeq MainstAY, the MiSeq FGx and default Universal Analysis Software settings, in a 96 sample run. The degradation index was not measured for all samples (not available (na)).
Four known PCR inhibitors of DNA amplification (indigo, hematin, tannic acid, and humic acid) were purchased from SIGMA-Aldrich (St. Louis, MO, USA). ForenSeq MainstAY kits (384 reaction size), ForenSeq DNA Signature Prep Kit (96 reaction size), MiSeq FGx Reagent Micro Kits, MiSeq FGx Reagent Kit (for ForenSeq DNA Signature libraries) and ForenSeq Universal Analysis Software v1.3 and v2.4 were supplied by Verogen (San Diego, CA, USA).
2.3 General methods
ForenSeq MainstAY enables simultaneous amplification of 52 STR markers (27 autosomal and 25 Y STRs) and Amelogenin [
Verogen. ForenSeq MainstAY Kit: An affordable library prep for targeted sequencing of established autosomal and Y-STR markers in a single reaction. 2021 [cited 2021 June 2021]. Available from: https://verogen.com/wp-content/uploads/2021/07/ForenSeq-MainstAY-Kit-Datasheet-Document-VD2020055.pdf.
]. The targeted loci, chromosomal locations and range of amplicon lengths are provided in Supplemental Table 2. The primer sequences and amplicons for the markers used in this multiplex are the same as those in the developmentally validated and NDIS-approved ForenSeq DNA Signature Prep Kit except for addition of the Y STR marker DYS393 [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Verogen. ForenSeq MainstAY Kit: An affordable library prep for targeted sequencing of established autosomal and Y-STR markers in a single reaction. 2021 [cited 2021 June 2021]. Available from: https://verogen.com/wp-content/uploads/2021/07/ForenSeq-MainstAY-Kit-Datasheet-Document-VD2020055.pdf.
]. The kit also contains 96 unique dual indexes (UDIs) arrayed in one-time use wells of a pierceable 96-well plate. The reagents other than the primer mix, UDIs and control DNA are the same as those in the ForenSeq DNA Signature Prep Kit [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Verogen. ForenSeq MainstAY Kit Reference Guide 2021 [cited 2022 September 2022]. Available from: https://verogen.com/wp-content/uploads/2022/01/forenseq-mainstay-reference-guide-PCR1-vd2020050-c.pdf.
]; the pool of amplified markers/libraries was sequenced using the MiSeq FGx Reagent Micro Kit on the MiSeq FGx Sequencing System (Verogen, San Diego, CA, USA). Results were analyzed in the ForenSeq Universal Analysis Software (UAS) v2.4 [
Verogen. Establishing robust thresholds and filters for the ForenSeq MainstAY Kit. VD2021030 2021 [cited 2021]. Available from: https://verogen.com/wp-content/uploads/2021/10/VerogenMainstAY-SWThresholds-VD2021030.pdf.
] using default settings (see below) and without manual edits, unless otherwise noted.
The UAS automatically analyzes the results as soon as the sequencing run is completed [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]. Sequencing files are converted to fastq files and aligned using the alignment algorithm for STRs developed for the ForenSeq DNA Signature Prep Kit [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]. First, the index reads are used to demultiplex the data from the sequencer. Second, for the library element to proceed to the alignment step, the forward and reverse target primer sequences that are part of reads 1 and 2 must both be of the correct sequence for the targeted amplicon. The flanking regions to the STR repeat region, or the sequence to either side of the repeat region on the amplicon, then are aligned. Both flanking regions must align, or the sequence is discarded. After aligning the flanking regions, the STR repeat sequence and repeat number are determined and reported in the UAS. The entire sequence of the amplicon is used to identify the locus, allele, and genotype. The length of the amplicon is not used to determine which locus was being interrogated [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
] was used in the ForenSeq MainstAY developmental validation study. In brief, low-level, background signal was assessed on the MiSeq FGx using no-template controls (NTCs) generated with the ForenSeq DNA Signature Prep Kit with the MiSeq FGx Reagent Kit with the default baseline value set to zero in the ForenSeq Universal Analysis Software. Ninety-six NTC reactions (water only) were prepared using ForenSeq DNA Primer Mix B (DPMB) with 230 STR and SNP loci (plus Amelogenin) and evaluated. Data indicated a mean number of aligned reads per locus in NTCs of 0.15, with a standard deviation of three reads across the 22,176 loci evaluated. A default baseline value of 10 reads per locus, as calculated by three standard deviations above the mean read number per locus (9.15 rounded up to the next integer), is coded in the ForenSeq Universal Analysis Software v1.3. When an STR or SNP allele sequence is present at 10 or fewer total reads, the sequence is not reported as the baseline value was not exceeded; when a sequence is present at 11 or more total reads, the sequence is reported as a possible allele. The AT is implemented in the UAS v1.3 as a percentage of total reads per locus with a minimum set at 650 reads. 10 reads correspond to 1.5% of 650 reads (9.75 rounded up to the next integer). Therefore, the AT for ForenSeq DNA Signature Prep Kit was set as > 1.5% of the total reads for loci with 650 or greater total reads, unless otherwise noted. For loci with fewer than 650 reads, the AT was > 1.5% of 650 or > 10 reads. The default analytical threshold (AT) for ForenSeq MainstAY was determined with a similar study using the average read depth from negative control samples plus three standard deviations which equated to 10.1 reads (see Determination of an Analytical Threshold in Results and Discussion). The results of the study were consistent with results observed with ForenSeq DNA Signature Prep Kit, and, therefore, the AT was maintained as described above as > 1.5% of total reads per locus with the minimum possible total reads per locus set at 650 reads.
Depending on the validation study, results were analyzed for read depth, repeat number and sequence of alleles, the normalized read depth for each locus (calculated as total reads for the locus divided by the total reads for the sample as a percent), intra-locus balance (or allele count ratio) of heterozygotes per locus by dividing the lower read depth by the higher read depth of the two alleles, inter-locus balance determined by comparing normalized read depth per locus, and concordance.
2.4 Validation studies
Validation studies were performed herein following the recommendations of the Scientific Working Group on DNA Methods (SWGDAM) [
Scientific Working Group on DNA Analysis Methods. Validation Guidelines for DNA Analysis Methods. 2016. 2016 [July 2019]. Available from: https://1ecb9588-ea6f-4feb-971a-73265dbf079c.filesusr.com/ugd/4344b0_813b241e8944497e99b9c45b163b76bd.pdf.
].
2.5 Characterization of genetic markers
Though the STRs targeted by the ForenSeq MainstAY multiplex are well-characterized [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Compatibility of the ForenSeq™ DNA Signature Prep Kit with laser microdissected cells: An exploration of issues that arise with samples containing low cell numbers.
Global patterns of STR sequence variation: Sequencing the CEPH human genome diversity panel for 58 forensic STRs using the Illumina ForenSeq DNA Signature Prep Kit.
], a study was performed to demonstrate detection of loci and variants (i.e., the technological basis for identifying the genetic markers), the mode of inheritance (i.e., through family studies), and the type of genetic variation (e.g., sequence and/or length variants) observed. One ng from each of 94 1000 Genome samples obtained from the Coriell Institute for Medical Research, Control DNA NA24385 and a negative amplification control, were prepared and sequenced at a total of 96 samples per run. Alleles per locus were called and compared with the known length-based types of the samples. Stutter reads exceeding the stutter filters in the default ForenSeq MainstAY Analysis method were removed from this analysis.
2.6 Control characterization
Control DNA NA24385 and negative amplification control libraries were analyzed to determine expected results and derive interpretation criteria. Control DNA NA24385 libraries (n = 384) were generated with 1 ng of input DNA and sequenced (32 samples per run, 12 sequencing runs). Negative amplification control libraries (n = 190) were generated and sequenced in two runs each with 96 samples. The negative amplification control runs were analyzed with 0% AT with an offline-generated fastq algorithm [
Illumina I. bcl2fastq2 Conversion Software v2.20 2019 [cited 2021]. Available from: https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/bcl2fastq/bcl2fastq2-v2–20-software-guide-15051736–03.pdf.
] and the STR analysis algorithm used in the UAS.
2.7 PCR conditions
The effects of multiplexing the specific primers in ForenSeq MainstAY were determined by generating libraries with three DNA samples: Control DNA NA24385, SRM 2391d B [
NIST. Certificate of Analysis: Standard Reference Material 2391d PCR-Based DNA Profiling Standard 2019 [cited 2021]. Available from: https://www-s.nist.gov/srmors/certificates/2391d.pdf.
Promega. 2800M Control DNA 2017 [cited 2017]. Available from: https://www.promega.com/products/forensic-dna-analysis-ce/str-amplification/2800m-control-dna/?catNum=DD7101.
] at 1 ng DNA input in triplicate with both the ForenSeq DNA Signature Prep Kit and MainstAY and sequenced following the recommended procedures for each kit [
Verogen. Universal Analysis Software MainstAY Product Line Module Version 2 Reference Guide. Revision A. 2022. 2022 [June 2022]. Available from: https://verogen.com/wp-content/uploads/2022/08/universal-analysis-software-v2-reference-guide-mainstAY-VD2022001-RevA.pdf.
]. Because the observed reads per sample differ between the two multiplexes, the reads were normalized as follows: the marker coverage was calculated for each locus for each library relative to the total reads for the library, and the percent marker coverage was multiplied by 75,000 for ForenSeq MainstAY libraries and 200,000 for the ForenSeq DNA Signature Prep Kit libraries. These normalization values were chosen empirically to scale the data on the plot for ease of comparison of results from the two library preparation kits.
The PCR reagent formulations are the same for DNA Signature Prep Kit and MainstAY, except that the primer mixes differ based on the loci targeted by each kit, and MainstAY primer mix is more concentrated to allow a smaller volume of primer mix (2 μL compared to 5 μL for DNA Signature Prep Kit) added into the PCR1 reaction, which allows for addition of larger template DNA volume. The MainstAY primer mix was tested by increasing and decreasing the total primer concentration by 10%, 20% and 30%. Libraries were generated with 2800 M DNA at 1 ng DNA input or NA12878 DNA at 1 ng DNA input. Male DNA 2800 M libraries were sequenced at 56 samples per run, and female DNA NA12878 libraries were sequenced at 96 samples per run.
The critical reagents in the PCR1 and PCR2 buffer formulations, magnesium sulfate and potassium chloride, were tested by increasing and decreasing the concentrations by 10%, 20% and 30%. Controls also were formulated at the ‘100%’ concentration to compare to the control optimized buffer. All buffer conditions were tested by generating libraries in triplicate with Control DNA NA24385 at 1 ng DNA input and negative amplification controls, and sequencing was performed with 96 samples per run. Stutter reads exceeding the UAS stutter filters in the default ForenSeq MainstAY Analysis method were removed from this analysis.
2.8 Sensitivity studies
Libraries were generated using the MainstAY Positive Control DNA NA24385 at the following DNA inputs: 4 ng, 2 ng, 1 ng, 500 pg, 250 pg, 125 pg, 62 pg, 31 pg, 16 pg, 8 pg, and 0 pg in quadruplicate. The libraries were sequenced with other libraries to achieve 96 samples per run. The total reads per sample, the reads per marker, the reads per allele and alleles typed for each marker were compiled. The calls were analyzed for concordance of detected alleles, alleles below the AT (those considered not detected) and alleles observed that were not expected (i.e., discordant) to include stutter reads exceeding the stutter filters in the default ForenSeq MainstAY Analysis Method in the UAS.
2.9 Stability and PCR inhibitor studies
Four known inhibitors were added at various concentrations to 6.25 ng Control DNA NA24385 in 50 μL total volume (0.125 ng/μL) as follows: indigo (133 µM), hematin (20–50 µM), tannic acid (0.5–4 µM), and humic acid (0.125–2.50 ng/μL). The treated Control DNA NA24385 samples were incubated with the inhibitors at the concentrations indicated, at room temperature for 30 min, before addition of 8 μL to the PCR1 reactions (1 ng DNA input). The libraries were prepared in triplicate and sequenced at 96 samples per run.
The degraded DNA series (purchased from InnoGenomics Technologies as described above) was prepared as follows at InnoGenomics Technologies. DNA extracted from blood was sheared by sonication at 50 °C for times ranging from 0 to 16 h. The InnoQuant HY real-time qPCR kit (InnoGenomics Technologies, New Orleans, LA, USA) was used according to manufacturer’s instructions to assess total human DNA and degradation state [
Optimizing DNA recovery and forensic typing of degraded blood and dental remains using a specialized extraction method, comprehensive qPCR sample characterization, and massively parallel sequencing.
InnoGenomics. InnoQuant HY Human and male DNA quantification and degration assessment kit using 7500 real-time PCR system user guide v1.5 2016 [cited 2020]. Available from: https://innogenomics.com/wp-content/uploads/files/InnoQuant_HY_Using_7500_Real_Time_PCR_System_User_Guide_v1_5.pdf.
]. Libraries were generated in triplicate and sequenced at 96 samples per run. The 0 hr sample had a degradation index of 1.0 and was used for comparison to the degraded samples.
2.10 Precision and accuracy
Precision for MPS systems can be defined as the repeatability and reproducibility of detecting the same calls for a sample tested multiple times by the same and by multiple operators, respectively. Accuracy can be defined as obtaining concordant calls for known samples typed with orthogonal methods, such as capillary electrophoresis (CE) [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]. The algorithm for typing STRs sequences and repeat numbers for ForenSeq MainstAY is the same as that used in the ForenSeq UAS v1.3 for the DNA Signature Prep Kit, previously shown to have high accuracy and precision [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]. CE data were collected using the Applied Biosystems 3130xl Genetic Analyzer Data Collection Software 3.0 and analyzed with GeneMapper ID software v3.2.1 (Thermo Fisher, Waltham, MA, USA). The call rate was calculated as the number of expected alleles detected as a percentage of total alleles detected.
Concordance studies were performed by generating libraries from 14 DNA samples previously typed using CE-based kits. SRM 2391d samples A, B and C (NIST, Gaithersburg, MD, USA), Control DNA 2800 M [
Promega. 2800M Control DNA 2017 [cited 2017]. Available from: https://www.promega.com/products/forensic-dna-analysis-ce/str-amplification/2800m-control-dna/?catNum=DD7101.
], Control DNA NA24385 and the nine 1000 Genomes Project samples were amplified and sequenced in triplicate using 1 ng input DNA per sample. The libraries were sequenced at 96 samples per run.
Repeatability was measured by one operator, generating 96 libraries of Control DNA NA24385 at 1 ng input DNA twice. Reproducibility was measured by two operators each generating 96 libraries. The libraries were sequenced at 32 samples per run across three sequencers/sequencing runs for a total of 12 sequencing runs and 384 total libraries. Precision was measured as the number of correct, expected alleles typed for the Control DNA NA24385 as a percentage of the total alleles typed. Accuracy was calculated by counting the number of expected typed alleles detected as a percentage of the total expected alleles for these samples. Call-rate was calculated for the accuracy samples as the number of expected alleles detected as a percentage of all the alleles detected for those samples.
2.11 Mixture studies
DNA samples (see below) were mixed at different ratios and subjected to library preparation. All libraries were generated with 1 ng of total input DNA, for mixtures and single source samples, in triplicate. Libraries were pooled and sequenced at 96 samples per run.
Male:male mixtures were generated using the Control DNAs NA24385 and 2800 M at ratios of 1:1, 3:1, 9:1, 19:1, 39:1 and 99:1 ratios with 2800 M as the minor contributor. Libraries were generated with the following amounts of NA24385:2800 M in pg: 500: 500 (1:1), 750: 250 (3:1), 900: 100 (9:1), 950: 50 (19:1), 975: 25 (39:1), and 990: 10 (99:1). Libraries also were generated with Control DNA NA24385 and 2800 M individually as single source controls.
Female:male and male:female mixtures were generated using NA12878 female DNA and NA18507 male DNA (see Supplemental Table 1) and mixed at 1:1, 3:1, 9:1, 19:1, 39:1, 99:1, 199:1, 249:1, 499:1, 1:3, 1:9, 1:19, 1:39, and 1:99 ratios. Libraries were generated with the following DNA amounts of NA12878:NA18507 in pg: 500: 500 (1:1), 750: 250 (3:1), 900: 100 (9:1), 950: 50 (19:1), 975: 25 (39:1), 990: 10 (99:1), 995: 5 (199:1), 996: 4 (249:1), 998: 2 (499:1), 250: 750 (1:3), 100: 900 (1:9), 50: 950 (1:19), 25: 975 (1:39), and 10: 990 (1:99). Libraries were also generated with NA12878 and NA18507 individually as single source controls.
2.12 MPS-specific studies
Data from libraries generated in “Characterization of Genetic Markers” section were used to assess the effects of barcoding/indexing of MainstAY libraries. In that section, libraries were prepared with 1 ng input utilizing all 96 UDI adapters provided in the MainstAY kit. Libraries were sequenced at 96 samples per run including the positive control NA24385, a total of 46 male samples and 49 female samples and a negative amplification control.
Additionally, data from the repeatability and reproducibility studies were used to assess the effects of barcoding/indexing libraries and signal cross talk during sequencing. Briefly, one operator (Operator 3) generated 96 libraries using all 96 UDI adapters with 1 ng of Control DNA NA24385. The libraries were sequenced at 32 samples per run across three sequencers/sequencing runs. All 96 samples were added to the runs during run creation. The sequencing runs also were analyzed using software from Illumina for generation of fastq files [
Illumina I. bcl2fastq2 Conversion Software v2.20 2019 [cited 2021]. Available from: https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/bcl2fastq/bcl2fastq2-v2–20-software-guide-15051736–03.pdf.
] with a sample sheet containing every permutation of misaligned adapter sequences possible from the 96 UDI sequences (i.e., 9216 combinations). The reads for each fastq file generated were compared to the fastq files generated for the expected index combinations, and the percentage of reads calculated were compared to expected types.
The libraries generated for the sensitivity study were also used to assess the effects of multiplexing libraries and potential run-to-run carryover on the sequencing instrument. Briefly, 44 libraries were generated using the Control DNA NA24385 at the following DNA inputs: 4 ng, 2 ng, 1 ng, 500 pg, 250 pg, 125 pg, 62 pg, 31 pg, 16 pg, 8 pg, and 0 pg in quadruplicate with MainstAY. The libraries were sequenced with other libraries at 96 samples per run. After performing the post-run wash as recommended in the MiSeq FGx Reference guide [
Verogen. MiSeq FGx Sequencing System Reference Guide. Document # VD2018006. Revision F. 2021. 2019 [February 2021]. Available from: https://verogen.com/wp-content/uploads/2021/02/miseq-fgx-system-reference-guide-vd2018006-f.pdf.
], the 44 libraries were re-pooled and sequenced on the same instrument with other libraries at 66 samples per run. Then, after performing the post-run wash, three replicates of each DNA input of the sensitivity study libraries were re-pooled and sequenced on the same instrument at 33 samples per run. All 96 libraries were included during run creation for all three sequencing runs so that if a sample was carried into the subsequent run, it would be detected.
2.13 Case-type (or Mock) samples
Human buccal samples, blood stains, blood spots, and hair samples (shed and plucked) were collected from volunteers who each signed an informed consent form authorizing the use of the de-identified samples for research use and publication. DNA from blood stains and control buccal samples both collected on sterile cotton swabs (Fisher Scientific, Hampton, NH, USA) were extracted with Chelex-100® (Bio-Rad, Richmond, CA,USA) followed by ethanol precipitation [
]. The blood stain extracts either showed no, light, moderate or heavy heme carryover (Supplemental Table 3). The Chelex extracted DNA was quantified using the Plexor® HY System ([
Promega. Plexor® HY System for the Applied Biosystems 7500 and 7500 FAST Real-Time PCR Systems 2017 [cited 2018]. Available from: https://www.promega.com/-/media/files/resources/protocols/technical-manuals/0/plexor-hy-system-for-the-applied-biosystems-7500-and-7500-fast-real-time-pcr-systems-protocol.pdf?la=en.
], Promega, Madison, WI, USA) with the 7500 Real-Time PCR System (Thermo Fisher, Waltham, MA, USA) following the manufacturers’ recommended protocols. Sets of DNA from buccal samples collected on sterile cotton swabs (Fisher Scientific, Hampton, NH, USA) and plucked (with roots) or shed hairs (no visible roots, 2 – 8 cm in length), from each of the four individuals (Supplemental Table 3), were extracted with the QIAamp DNA Investigator kit following the recommended protocols ([
Qiagen. QIAamp® DNA Investigator Handbook 2020 [cited 2021]. Available from: file:///Users/kstephens/Downloads/HB-0355–005_HB_QA_DNA_Investigator_0120_WW.pdf.
], Qiagen, Germantown, MD, USA) (Supplemental Table 3). The buccal and hair DNA extracts were quantified with the QuantiFluor® ONE dsDNA System following the recommended procedure ([
Promega. QuantiFluor® ONE dsDNA System 2019 [cited 2021]. Available from: file:///Users/kstephens/Downloads/QuantiFluor%20ONE%20dsDNA%20System%20Technical%20Manual.pdf.
], Promega, Madison, WI, USA). Human blood collected on Whatman® FTA™ cards (SIGMA-Aldrich, St. Louis, MO, USA) and buccal samples collected on sterile cotton swabs or Bode Buccal DNA Collector™ filter paper (Bode Technology, Lorton, VA, USA) were processed without extraction following the ForenSeq MainstAY reference guide for six individuals (B, C, E, F, G and H in Supplemental Table 3) [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Verogen. ForenSeq MainstAY Kit Reference Guide 2021 [cited 2022 September 2022]. Available from: https://verogen.com/wp-content/uploads/2022/01/forenseq-mainstay-reference-guide-PCR1-vd2020050-c.pdf.
]. Briefly, buccal swabs and filter papers were allowed to dry overnight at room temperature. Cell lysates were prepared with QuickExtract™ DNA Extraction Solution (VWR International, Radnor, PA, USA) by incubating paper punches or swab cuttings in 500 μL extraction solution for 1 min at 65 °C, inverted five times, incubated for 2 min at 98 °C, and stored at – 20 °C [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Verogen. ForenSeq MainstAY Kit Reference Guide 2021 [cited 2022 September 2022]. Available from: https://verogen.com/wp-content/uploads/2022/01/forenseq-mainstay-reference-guide-PCR1-vd2020050-c.pdf.
].
Six contemporary bone DNA extracts were provided by Dr. Rachel Houston (Department of Forensic Science, Sam Houston State University (SHSU), Huntsville, TX, USA) from the Willed Body Donation Program at the Southeast Texas Applied Forensic Science Facility. The bones were recovered from cadavers (postmortem intervals range from approximately one to eight years), placed at SHSU’s Applied Anatomical Research Center (AARC) Outdoor Research Facility, subjected to burning, embalming or cremation (Supplemental Table 3) and extracted for DNA using PrepFiler™ Forensic DNA Extraction kit (Thermo Fisher, Waltham, MA, USA) [
Stray J, Holt A, Brevnov M, Calandro LM, Furtado MR, Shewale JG. Extraction of high quality DNA from biological materials and calcified tissues. Forensic Science International: Genetics Supplement Series. 2009;2(1):159–60.
]. Five contemporary tooth samples were purchased from InnoGenomics Technologies (New Orleans, LA, USA), and DNAs were extracted with the dental forensic kit (DFK®) (InnoGenomics Technologies, New Orleans, LA, USA) (Supplemental Table 3) [
Optimizing DNA recovery and forensic typing of degraded blood and dental remains using a specialized extraction method, comprehensive qPCR sample characterization, and massively parallel sequencing.
]. Degradation status was measured for a subset of the samples tested using the Quantifiler® Human DNA Quantification Kit (Thermo Fisher, Waltham, MA, USA).
Libraries were prepared using 1 ng of input DNA, unless otherwise noted, sequenced with other libraries at 96 samples per run (unless otherwise indicated).
2.14 Species specificity
The impact of species specificity can be assessed using the online basic local alignment search tool (BLAST) [
Boratyn GM, Schäffer Aa Fau - Agarwala R, Agarwala R Fau - Altschul SF, Altschul Sf Fau - Lipman DJ, Lipman Dj Fau - Madden TL, Madden TL. Domain enhanced lookup time accelerated BLAST. 2012(1745–6150 (Electronic)).
] and searching the forensic primer sequences against known genomic sequences. Empirical studies using this approach with the STR primers that are in MainstAY but in a larger multiplex (ForenSeq DNA Signature Prep Kit) already have been described [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]. Here, to attempt to detect cross-reactivity with non-human DNA, a wet-lab study was performed using the reagents of ForenSeq MainstAY with DNA from animals and bacteria at 10-fold (i.e., 10 ng) the recommended input for human DNA for MainstAY. Libraries were prepared from the non-human DNA samples, 2800 M, and a negative amplification control in triplicate. The libraries were sequenced at 96 samples per run. If reads were observed at a MainstAY locus for non-human samples, the primer sequences were searched against that species’ genome using BLAST [
Population studies have been performed and published previously for the loci in ForenSeq MainstAY including DYS393 which is the one locus that is in ForenSeq MainstAY that is not in ForenSeq DNA Signature Prep Kit [
]. The ForenSeq DNA Signature Prep Kit and ForenSeq MainstAY primers are the same for the markers in common.
3. Results and discussion
ForenSeq MainstAY flanking region reports were used in order to analyze the majority of the sequences for the amplicons. The flanking region report can be generated in the UAS from the Sample Results page or the Reports tab in a Project. The flanking region report gives information about potential variants observed in the STR flanking region as well as illustrating isometric heterozygotes and variants within the repeat regions. A screenshot of a flanking region report is shown in Fig. 1.
Fig. 1Screenshot of ForenSeq MainstAY Flanking Region Report (Excel) for SRM 2391d A sample, as generated within the Universal Analysis Software. The Autosomal STR Coverage tab is shown; two other tabs are included in the report: Y STR Coverage and the settings tab.txt Format which includes table for each unique flank sequence per locus per STR length
Metrics and specifications within the MiSeq FGx Control Software ensure that the sequencing chemistry performed within expected parameters. If the metric(s) fall outside the recommended specification, the software flags the metric, and the sample results can still be assessed to ascertain the run’s potential value. The metrics such as cluster density, percentage of clusters passing filter, quality scores (Q) for each of the reads, phasing and pre-phasing, and performance of the sequencing control were used to assess performance under the various validation studies [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]. The cluster density metric measures the number of clusters (K) per square millimeter in the flow cell. The optimal range is 400–1650 K/mm2. Runs with cluster densities outside of this range will have lower numbers of usable reads, but the results will be sufficient for analysis. Clusters passing filter measures the number of clusters passing the Illumina chastity filter as a percentage of total clusters and indicates quality of the clustering. The setting in the software for clusters passing filter is 80%. Runs with clusters passing filter less than 80% will have lower numbers of usable reads but can produce results that will be sufficient for analysis. The phasing and pre-phasing metrics measure the percentages of molecules in a cluster that fall behind the current cycle or run ahead of the current cycle, respectively. Phasing values of 0.25% and pre-phasing values of 0.15% are recommended. Runs with values for either phasing, pre-phasing or both outside of these values can still produce results that are sufficient for analysis; however, the positive controls should be carefully analyzed to ensure the quality of the results. Runs analyzed in these studies passed these quality metrics.
3.1 Characterization of genetic markers
STR markers used for human identification have been highly characterized and variation has been determined based on length differences using CE methodologies [
Validation of short tandem repeats (STRs) for forensic usage: performance testing of fluorescent multiplex STR systems and analysis of authentic and simulated forensic samples.
]. With the advent of MPS, sequence information as well as repeat numbers (which are related to amplicon length) are also used for genetic characterization [
Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs.
FDSTools: a software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise.
Global patterns of STR sequence variation: Sequencing the CEPH human genome diversity panel for 58 forensic STRs using the Illumina ForenSeq DNA Signature Prep Kit.
]. There are several advantages to using MPS generated data over CE generated data, including increased discrimination power, reduction in noise, potential increased resolution for mixture deconvolution, and since the actual sequence of the amplicon is used to analyze the marker of interest, incomplete A nucleotide addition is not an issue, and an allelic ladder, size standard and matrix standard dye sets are no longer required [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Compatibility of the ForenSeq™ DNA Signature Prep Kit with laser microdissected cells: An exploration of issues that arise with samples containing low cell numbers.
Global patterns of STR sequence variation: Sequencing the CEPH human genome diversity panel for 58 forensic STRs using the Illumina ForenSeq DNA Signature Prep Kit.
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
], except for the addition of two primers that target an additional marker - the DYS393 locus. To illustrate the detection and performance of the markers in ForenSeq MainstAY, the balance of coverage (reads) of the markers (inter-locus) and balance of heterozygous alleles (intra-locus) were determined for the 94 1000 Genomes Project samples (including the Control DNA NA24385) and 2800 M using 1 ng DNA input for each sample. The results for the inter-locus balance for these 95 samples are shown in Fig. 2 A and 2B. All 53 loci were detected in all male samples with the one exception of male sample HG02982 that had no reads at DYS19 (Fig. 2 A). The region on the Y chromosome containing DYS19 is deleted in HG02982 [
]; thus the results are consistent with the genetic makeup of the sample. The expected average coverage per locus as a percentage of total reads for a male sample is 2.41% for markers with two copies (autosomal, Amelogenin and Y-STRs DYS385ab and DYF387S1) and 1.2% for markers with one copy (the remaining 23 Y-STRs) for the multiplex system. All 28 loci were detected in all female samples (Fig. 2B). The expected average coverage per locus as a percentage of total reads for a female sample is 3.57%. The observed average coverage for two-copy markers in 46 male samples in this study was 2.17% ± 1.02% and 1.56% ± 1.06% for single-copy markers, showing slightly higher coverage (on an allele basis) of the Y-STRs relative to the autosomal markers. The observed average coverage per locus for 49 female samples in this study was 3.57% ± 1.68%.
Fig. 2Box and whisker plots for inter- and intra-locus balance of the 53 ForenSeq MainstAY markers analyzed with the ForenSeq Universal Analysis Software. Inter-locus balance (normalized reads) is calculated as reads per locus for a sample as a percentage of total reads for the sample. Upper and lower whiskers depict maximum and minimum values, respectively. The line dividing the two whisker boxes represents the median value (i.e., 50% of values occur above the median). The bottom of the lower box (red) is the lower quartile value, indicating that 75% of values occur above this value and the top of the upper box (blue) is the third quartile value, indicating that 25% of the values occur above this value. A. Inter-locus balance of MainstAY markers in 45 male 1000 Genome samples and 2800 M. B. Inter-locus balance of markers in 49 female 1000 Genome samples. C. Intra-locus balance (allele count ratio) for autosomal STRs, DYF387S1 and DYS385a-b markers for the 93 1000 Genome samples, Control DNA NA24385 and 2800 M. Triallelic loci were not included in the analysis.
The intra-locus balance (or allele count ratio) was measured for heterozygous loci for each DNA sample excluding triallelic loci detected in these 94 cell-line DNA samples. The average intra-locus balance was > 80% for most loci with the exceptions of Amelogenin (76%), D19S433 (78%), D22S1045 (74%), D2S1338 (77%) and Penta E (77%) (Fig. 2 C). These five loci have primers that cover a known SNP (recognized in the dbSNP database) with population frequencies > 1% [
]. The ForenSeq primer mixes contain degenerate primers to reduce null alleles with these known variants, though the variants still may impact some degree of intra-locus balance in some individuals.
The mode of inheritance for the markers in ForenSeq MainstAY is well established and also was demonstrated previously in the ForenSeq DNA Signature Prep Kit [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]. The Y-STR DYS393, that is not targeted by ForenSeq DNA Signature Prep Kit, has been studied previously and shows the expected segregation of sex-linked loci and uniparental inheritance [
]. The alleles observed in 17 of 3-generational family pedigrees are consistent with Mendelian and lineage-based inheritance (most data not shown). An example of inheritance at the D21S11 locus for the CEPH 1463 family is shown in Supplemental Fig. 1. Increased resolution by sequencing of D21S11 is provided by sample 12878 which is homozygous for 30 repeats as analyzed by CE and has two alleles when refined nucleotide-level data are analyzed. These two isometric alleles are of the same repeat length and different by sequence. The 11 offspring of 12878 (mother) show a distribution of these two isometric alleles, resulting in four different genotypes for this locus when each of the 30 repeat isometric alleles is combined with either the 28 or 29 repeat allele from the father (Supplemental Fig. 1).
Polymorphisms in the repeat number for the loci in MainstAY for the 94 samples (1000 Genome samples and 2800 M [
Promega. 2800M Control DNA 2017 [cited 2017]. Available from: https://www.promega.com/products/forensic-dna-analysis-ce/str-amplification/2800m-control-dna/?catNum=DD7101.
]) of different biogeographical ancestry were analyzed. The distribution of alleles for the loci in the multiplex for this small sample set are shown in Fig. 3 A and B. The number of different alleles for each locus also is shown in Supplemental Table 4. The most highly polymorphic loci by repeat number include D1S1656, D2S1338, D6S1043, D12S391, D18S51, D19S433, D21S11, and DYS385a-b, and these results agree with those previously published [
Global patterns of STR sequence variation: Sequencing the CEPH human genome diversity panel for 58 forensic STRs using the Illumina ForenSeq DNA Signature Prep Kit.
Fig. 3Distribution of repeat numbers in alleles for the 52 STR loci in ForenSeq MainstAY, in male and female samples for the experiment shown in Fig. 2. The number of instances (displayed on the y-axis) that a repeat number (displayed on the X-axis) was detected for a given locus (indicated by the different colors, see legend) are plotted for the 94 1000 Genome samples and 2800 M. A. Autosomal STR loci allele and repeat numbers for 95 samples. B. Y-STR loci allele and repeat numbers for 46 male samples.
The sequences of the alleles were also examined to determine the variation in repeat sequence (iso-metric alleles) as well as length for these 95 samples. The loci with the largest increase in number of distinct alleles when sequence information is utilized include D3S1358, D12S391, D21S11, D2S1338, vWA, D8S1179, DYS389II, DYF387S1, and DYS448 (Supplemental Table 4). This study did not detect sequence variation within the repeat regions for Penta E, FGA, D19S433, CSF1PO, D10S1248 and DYS612 which have been shown previously to have less improvement in discrimination power by MPS [
Global patterns of STR sequence variation: Sequencing the CEPH human genome diversity panel for 58 forensic STRs using the Illumina ForenSeq DNA Signature Prep Kit.
], likely due to the small sample set used in this study. The largest increase in number of alleles was observed for DYS398II. The number of distinct alleles increased 5-fold, from 4 to 20 alleles. The most polymorphic locus in this study was D12S391 with 39 distinct alleles (Supplemental Table 5). Diversity observed for this locus is consistent with larger population studies supporting an increase of heterozygosity with sequence data [
Global patterns of STR sequence variation: Sequencing the CEPH human genome diversity panel for 58 forensic STRs using the Illumina ForenSeq DNA Signature Prep Kit.
]. Overall, these results are consistent with other studies and support the fundamental performance of MainstAY.
3.3 Determination of a default analytical threshold
Analytical thresholds (ATs) are established to assist in distinguishing alleles from background noise. Noise from sequencing differs from that of CE. CE uses fluorescence detection to locate amplicons, loci are distinguished by amplicon length and designated relative to an allelic ladder. For sequencing, the instrument cannot generate a sequence de novo. Noise can be generated by a contamination event such as cross-well contamination during library preparation, contamination of a workspace or reagents, or contamination of the instrument (see MPS-Specific Studies section). Noise can also be errors generated during PCR amplification or sequencing. The AT (%) for the default ForenSeq MainstAY Analysis Method in the UAS was confirmed to be similar to that of ForenSeq DNA Signature Prep Kit in the UAS v1.3 from a negative amplification study (Materials and Methods). For MainstAY, two sequencing runs with a total of 190 negative amplification control libraries were analyzed with a 0% AT. Some reads that aligned to MainstAY locus amplicons were observed in 39 of the 190 libraries (21%). An average of 0.11 reads per locus with a standard deviation of 3.3 was observed across all loci in the 190 libraries. This average plus three standard deviations of 10.1 reads is similar to the 9.15 avg + 3 SD observed for ForenSeq DNA Signature Prep Kit.
Analysis Thresholds are implemented in the UAS as a percentage of total reads per locus with a minimum of 650 reads per locus and a minimum of 10 reads for an AT setting. Greater than 10 reads correspond to > 1.5% of 650 reads. Therefore, The Mainstay AT is set at > 1.5% of the total reads for loci with 650 or greater total reads. For loci with fewer than 650 reads, the AT is > 1.5% of 650 or > 10 reads, with a minimum AT of > 10 reads in UAS v2.4.
3.4 PCR conditions
A two-step PCR is used in library preparation of ForenSeq MainstAY [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Verogen. ForenSeq MainstAY Kit Reference Guide 2021 [cited 2022 September 2022]. Available from: https://verogen.com/wp-content/uploads/2022/01/forenseq-mainstay-reference-guide-PCR1-vd2020050-c.pdf.
]. The first PCR (PCR1) amplifies target STR loci and Amelogenin from a DNA sample and adds tag sequences to amplicon ends (Fig. 4). The second PCR (PCR2) adds adapters to the amplicons (Fig. 4). The adapters contain the sequences for attaching libraries to the flow cell for sequencing, the primer binding sites for initiating the sequencing of the targets, and the primer binding sites and index sequences for the identification of the indices. The PCR steps are the same chemistry (except for different primer mix and adapter indices) as those in the ForenSeq DNA Signature Prep Kit [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Altering the concentrations of reagents in the PCR and the temperatures during cycling can determine the tolerance of performance windows around the manufacturer’s established conditions and what, if any, consequences arise in data when those conditions are altered. The following studies were designed to stress MainstAY PCR1 and PCR2 to determine the robustness of the library preparation process and detect potential limitations.
The MainstAY primer mix was tested with 10% concentration intervals up and down, relative to the control concentration with male and female DNA. The primer mix buffer was held at a constant concentration and only the total primer concentration was varied. The total reads were variable across the different primer concentrations, but the total reads showed no observed trends across the primer titrations for either male or female DNA libraries (Supplemental Fig. 2). More reads are detected for the male DNA libraries as these samples were sequenced at 56 samples per run and female DNA libraries were sequenced at 96 samples per run.
The primer concentration testing was performed on male and female DNA to assess the impact of varying primer concentration on a DNA sample as well as impact on templates that do not contain 25 of the 53 amplicons (Y-STR primers with female DNA). Intra-locus balance was not affected by changing primer concentration in the PCR1 (Fig. 5). There was no appreciable difference on intra-locus balance when varying the primer concentrations for male (Fig. 5A) and female (Fig. 5B) DNA.
Fig. 5Impact of MainstAY primer concentration (PCR1) on intra-locus balance (allele count ratio). Total primer concentration was varied in 10% increments relative to the control levels (red, 100). Intra-locus balance was measured for the heterozygous loci (x-axis) in 2800 M male DNA (A) and NA12878 female DNA (B). The ratio of the reads for the two alleles (Min/Max) is plotted as a percentage on the y-axis.
All expected calls were generated for male and female DNAs across the varying levels of primer concentrations tested in the PCR1 (Supplemental Fig. 3). Some STR stutter reads exceeded the default stutter filters in the MainstAY Analysis Method in the UAS indicated as “Discordant (Stutter)” (Supplemental Fig. 3), with no discernable trend observed (data not shown).
The concentration of critical reagents in the PCR buffer formulations and thermal cycling conditions can have an impact on amplicon yield. Magnesium sulfate, potassium chloride, BSA, heat-stable DNA polymerases and dNTPs were each tested by increasing and decreasing the concentration 10%, 20% and 30% in both PCR1 and PCR2 (while holding all other components constant at control amounts) in the ForenSeq DNA Signature Prep Kit (unpublished results, [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]). Other components such as dimethyl sulfoxide, glycerol, or polyethylene glycol were tested at 0% or 100% levels. Of these reagents, only magnesium sulfate, potassium chloride and dNTPs influenced results (data not shown). There was no discernable effect modifying the concentrations of BSA or heat-stable DNA polymerases or the absence of DMSO, glycerol or PEG. The effects of modifying the dNTPs concentration were inversely correlated to the effects of modifying the concentration of magnesium sulfate but to a more modest degree. This observation is likely due to the interaction between dNTPs and magnesium ions with the result that at low dNTP concentration, free magnesium ion concentration is higher and at high dNTP concentration, free magnesium ion concentration is lower. PCR1 and PCR2 formulations are the same in the ForenSeq DNA Signature Prep Kit and ForenSeq MainstAY. Therefore, only magnesium sulfate and potassium chloride were tested with ForenSeq MainstAY.
Magnesium sulfate and potassium chloride concentrations were tested by increasing and decreasing the concentration 10%, 20% and 30% in both the PCR1 and PCR2 buffer formulations. Total reads for the libraries across all concentrations tested were higher than the recommended total per sample read number of 15,000 (Supplemental Fig. 4) with the exception of 70% potassium chloride concentration (Supplemental Fig. 4, orange bar). The total reads per sample increased as potassium chloride concentration was raised from 70% to 90%. The magnesium sulfate had no impact on total reads per sample as did the changes in either potassium chloride or magnesium sulfate in the PCR2 formulation (Supplemental Fig. 4). The negative amplification controls for all formulations of PCR1 or PCR2 tested had no detected reads.
The intra-locus balance was not affected by changing magnesium sulfate concentration in the PCR1 formulation (Fig. 6A), the PCR2 formulation (Fig. 6C) or potassium chloride concentration in PCR2 formulation (Fig. 6D). The intra-locus balance was affected by changing the potassium chloride concentration in PCR1 at the higher levels of potassium chloride (Fig. 6B) due to an observed size bias that impacted the longer alleles. Larger alleles or loci displayed lower reads at the 120–130% potassium chloride relative concentrations (Fig. 6B) (see Supplemental Table 2 for amplicon lengths).
Fig. 6Impact of varying concentrations of potassium chloride or magnesium sulfate on intra-locus balance. Libraries were generated with 1 ng of NA24385 Control DNA with PCR1 or PCR2 buffers formulated with varying concentrations of magnesium sulfate (A and C) or potassium chloride (B and D). Concentrations were varied in 10% increments relative to the control levels in the PCR1 (A and B) and the PCR2 (C and D) buffer formulations.
All expected allele calls for all loci were observed for all conditions tested except for the highest level of potassium chloride in the PCR1 reaction buffer (Supplemental Fig. 5A and B). Both autosomal and Y-STR loci dropped below the AT with the 130% potassium chloride concentration (Supplemental Fig. 5 A and B). The loci that were not detected in the libraries generated with PCR1 buffer at the 130% potassium chloride concentration were Penta D, Penta E, D22S1045, DYS522, DYS19, DYS389II, DYS390, DYS385ab, DYS460, DYS392, and DYS448, all of which are amplicons with a mean length greater than 250 bp. The inter-locus balance was therefore impacted by the potassium chloride concentration in the PCR1 formulation as is shown in Fig. 7. Inter-locus balance was not impacted by varying magnesium sulfate in PCR1 or magnesium sulfate and potassium chloride in PCR2 (data not shown). Stutter was slightly higher at the lowest concentrations of magnesium sulfate and potassium chloride for the PCR1 formulation as shown as an increase in discordance due to stutter exceeding the stutter filters in the default MainstAY analysis method (Supplemental Fig. 5). Overall, the formulations are robust to varying amounts of magnesium sulfate and potassium chloride with the exception of high potassium chloride concentration in PCR1. The established concentrations in ForenSeq MainstAY are within a window of performance that supports practical forensic application.
Fig. 7Inter-locus balance (normalized reads) for MainstAY NA24385 Control DNA libraries prepared using PCR1 buffer formulated with potassium chloride at varying concentrations for (A) autosomal STRs and Amelogenin or (B) Y-STRs. The average relative coverage is plotted for three replicates; error bars represent standard deviations.
In previous studies, the thermal cycling conditions were tested for ForenSeq DNA Signature Prep Kit by testing annealing and extension temperatures up and down two levels in 2-degree increments, denaturing temperatures up and down one level in 2-degree increments, and cycle numbers up and down two levels in 2-cycle increments (unpublished results, [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]). The only cycling parameter that impacted any of the metrics described above was lowering the extension temperature for PCR2. Lowering this temperature increased reads of the AT-rich loci D22S1045, DYS19 and DYS392 and reduced stutter (data not shown). The loci D22S1045 and DYS392 demonstrated specific amplification issues with ForenSeq DNA Signature Prep Kit such as heterozygote imbalance for large alleles and allele drop out, respectively ([
Verogen. ForenSeq DNA Signature Prep Reference Guide. Document # VD2018005. Rev. D 2022 [cited 2021]. Available from: https://verogen.com/wp-content/uploads/2022/01/forenseq-dna-signature-prep-reference-guide-PCR1-vd2018005-d.pdf.
]). However, the higher read depth of these loci observed with the MainstAY PCR2 cycling conditions reduces these issues. Therefore, ForenSeq MainstAY utilizes a PCR2 thermal cycling condition with the extension temperature lowered two degrees, relative to ForenSeq DNA Signature Prep Kit, for improved read depth and reduced stutter for these two loci. The modification to the PCR2 thermal cycler conditions had no impact on the total sample reads, or inter- and intra-locus balance and stutter for the other loci in the kit (data not shown).
The MainstAY primer sequences are the same as those in ForenSeq DNA Signature Prep Kit except for addition of primers for DYS393. Also note that DNA Primer Mix B (DPMB) in ForenSeq DNA Signature Pre Kit contains primers to seven X-STRs, 94 identity-informative SNPs and 78 ancestry- and phenotypic-informative SNPs, none of which are targeted by MainstAY. Effects of multiplexing the 52 primer pairs in common between the two kits can be compared between the 53 primer pairs in DNA Primer Mix C (DPMC) of ForenSeq MainstAY and the 237 primer pairs in DPMB. Libraries were generated for three DNA samples: Control DNA NA24385 of the ForenSeq MainstAY kit, SRM 2391d B NIST PCR standard and Control DNA 2800 M of the ForenSeq DNA Signature Prep Kit, with both ForenSeq MainstAY and ForenSeq DNA Signature Prep Kit and sequenced following the recommendations in the reference guides. The allele calls for the 52 loci in common are identical for the three DNA samples tested, and the reads per locus were compared for the three DNA samples for DPMB and DPMC, with better inter-locus balance across the multiplex generated using MainstAY (DPMC, Supplemental Fig. 6). However, sequencing the libraries at the recommended sample plexities results in coverage at or higher than the recommended minimum total reads per sample and accurate allele calls for each multiplex and primer set.
3.5 Sensitivity studies
PCR-based enrichment assays are able to detect alleles from sub-nanogram quantities of DNA [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Validation of short tandem repeats (STRs) for forensic usage: performance testing of fluorescent multiplex STR systems and analysis of authentic and simulated forensic samples.
Zavala E.A.-O., Rajagopal S., Perry G.H., Kruzic I., Bašić Ž., Parsons T.J., et al. Impact of DNA degradation on massively parallel sequencing-based autosomal STR, iiSNP, and mitochondrial DNA typing systems. 2019(1437–1596 (Electronic)).
Validation of short tandem repeats (STRs) for forensic usage: performance testing of fluorescent multiplex STR systems and analysis of authentic and simulated forensic samples.
Validation of short tandem repeats (STRs) for forensic usage: performance testing of fluorescent multiplex STR systems and analysis of authentic and simulated forensic samples.
]. Dilution (or titration) series can allow assessment of the dynamic range for DNA input for the ForenSeq MainstAY multiplex on the MiSeq FGx sequencer.
To assess the limits of sensitivity of detection and concomitantly potential stochastic effects, DNA was serially diluted one to two starting at 500 pg/μL down to ∼1 pg/μL and eight microliters of DNA added to PCR1 for library generation with four replicates at each DNA input [
Verogen. ForenSeq MainstAY Kit Reference Guide 2021 [cited 2022 September 2022]. Available from: https://verogen.com/wp-content/uploads/2022/01/forenseq-mainstay-reference-guide-PCR1-vd2020050-c.pdf.
]. Libraries were sequenced at 96-plexity, and total reads per sample for each replicate were averaged and plotted to measure the impact of DNA input on overall library generation (Supplemental Fig. 7). While the number of reads per sample decreased slightly for DNA inputs of 4 ng down to 500 pg, inputs in this range performed similarly in MainstAY library preparation. The bead-based normalization in ForenSeq MainstAY achieved optimal numbers of clusters on the MiSeq FGx for each of the libraries generated from DNA amounts in this range [
Verogen. ForenSeq MainstAY Kit Reference Guide 2021 [cited 2022 September 2022]. Available from: https://verogen.com/wp-content/uploads/2022/01/forenseq-mainstay-reference-guide-PCR1-vd2020050-c.pdf.
]. As input DNA decreased below 500 pg, so did reads per sample and per amplicon. The average intensity or reads per marker by marker type were calculated for four replicate libraries at each DNA input showing that the average reads per marker track with DNA input (Supplemental Fig. 8).
Despite the lower reads per sample and amplicons below 500 pg, all 78 expected alleles were detected in all replicates from 4 ng down to 62 pg (Fig. 8). At 62 pg, there were three, three, one and two discordant alleles for each of the four replicates, respectively (Supplemental Table 6). All of these few discordant calls were in the stutter positions of the expected alleles. The reads for the expected alleles at these loci ranged from 82 to 692 with reads for the called stutter from 22 to 124 (Supplemental Table 6). The called stutter fell into the three categories of n-1, n + 1 or n-2 at 13–32%, 8% or 4% of the parent allele, respectively (Supplemental Table 6). Increased stutter has been observed with low input DNA in CE analyses [
Validation of short tandem repeats (STRs) for forensic usage: performance testing of fluorescent multiplex STR systems and analysis of authentic and simulated forensic samples.
]. Below 62 pg, some expected alleles and loci fell below the AT (Fig. 8). At 8 pg across the four replicates, on average 32 autosomal STR loci (61% of all expected autosomal alleles) and 20 Y-STR loci (70% of all expected Y alleles) still were detected (Fig. 8).
Fig. 8Sensitivity Study. Four replicate libraries were generated with NA24385 Control DNA serially diluted for inputs from 4 ng to 8 pg. The average outcomes for the genotype (auSTRs) and haplotype results, designated as “Concordant” (blue), “Below AT” (green) or “Discordant (Stutter)” (orange), were calculated as a percentage of the total outcomes (y-axis) and plotted for each DNA input (x-axis). Error bars designate the standard deviation for the four replicate measurements. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
The average coverage of the autosomal markers in MainstAY for the 1 ng input libraries was 2.33% ± 0.29% of the total reads and the average coverage of the Y markers was 1.38% ± 0.19% of the total reads (Fig. 9). The expected coverage (assumes consistent coverage across the markers for each general category) is 2.48% and 1.24% for autosomal markers and Y markers, respectively. The inter-locus balance of the markers, calculated as total reads per marker as a percentage of the total reads per sample, was maintained for inputs from 4 ng down to 62 pg (Fig. 8, Fig. 9). Below 62 pg, stochastic effects on inter-locus balance were exacerbated with the average coverage of autosomal markers dropping to 2.0% ± 1.99% and average coverage of the Y markers increasing to 1.76% ± 1.80% at 8 pg of DNA input (Fig. 8, Fig. 10).
Fig. 9Inter-locus balance (normalized reads) among the markers in the MainstAY multiplex for Control DNA NA24385 inputs from 62 pg to 4 ng. (A) autosomal STRs and Amelogenin; (B) Y-STRs. DNA input amounts: 4 ng (green diamonds), 2 ng (light blue squares), 1 ng (dark blue circles), 500 pg (orange triangles), 250 pg (pink diamonds), 125 pg (teal squares) and 62 pg (red circles). The standard deviations for four replicates are indicated by the error bars. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 10Inter-locus balance (normalized reads) among the markers in the MainstAY multiplex for Control DNA NA24385 inputs from 8, 16 and 31 pg and 1 ng. (A) autosomal STRs and Amelogenin; (B) Y-STRs. DNA input amounts: 31 pg (green triangles), 16 pg (pink diamonds) and 8 pg (teal squares) of Control DNA NA24385 input compared to the 1 ng (blue circles). The standard deviations for four replicates are indicated by the error bars. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Inter-locus variability among the four replicates from 62 pg to 4 ng was consistent (Fig. 9). Below 62 pg, the inter-locus balance shifted such that some reads were observed at counts less than the AT at some loci and/or alleles (Fig. 10). The variability for the relative coverage of the loci also increased at inputs below 62 pg with the standard deviations increasing for 31, 16 and 8 pg compared to the higher input ranges. Fig. 9, Fig. 10 were plotted on the same scale for the y-axis to illustrate the larger error bars at the lower inputs (8–31 pg) compared to 1 ng in Fig. 10 and the higher inputs (62 pg - 4 ng) in Fig. 9.
Overall, complete profiles could be obtained across a broad range of DNA inputs (62 pg to 4 ng) for Control DNA NA24385 (Fig. 8, Fig. 9). At DNA inputs of 8, 16 and 31 pg, stochastic effects were observed and partial profiles for Control DNA NA24385 were obtained (Fig. 8, Fig. 10). However, even at 8 pg many loci/alleles still were detected. Stutter filters could be adjusted upward for low input samples (e.g., less than 250 pg), as may be considered during internal validation studies by each laboratory.
3.6 Stability studies
Resilience to known PCR inhibitors can vary among commercial kits for CE and for MPS-based detection. Modifications to components of the enrichment process, such as formulation of the ForenSeq Enhanced PCR1 Buffer System, are likely to further increase inhibitor resistance, as they have for CE-based systems [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
]. Complete profiles were obtained with MainstAY for DNA that was incubated with inhibitors at 20 μM hematin and 0.5 μM tannic acid and 80% of the alleles detected with 0.63 ng/μL humic acid (Supplemental Fig. 9), and a complete profile detected with 0.125 ng/μL humic acid (data not shown) prior to addition to the PCR1 reaction (see Materials and Methods). In one of the four replicate libraries generated in the presence of 40 μM Hematin, stutter was called at locus D8S1179 (Supplemental Fig. 9). The expected alleles of 13 and 16 repeats for Control DNA NA24385 were called; however, stutter was also called at 12 and 15 repeats at 28.3% and 25% of the parent alleles, respectively. The stutter filter is set such that stutter is called if it is ≥ 25% of the parent allele. The amplification was resistant to DNA incubated with indigo at 133 μM (Supplemental Fig. 9). While purification of DNA generally can remove these PCR inhibitors [
Verogen. ForenSeq MainstAY Kit with the Enhanced PCR1 Buffer System Reference Guide. Document # VD2021049. Rev. B 2022 [cited 2022]. Available from: https://verogen.com/wp-content/uploads/2022/04/forenseq-mainstay-reference-guide-ePCR1-vd2021049-b.pdf.
], but this system was not used in this study.
The performance of ForenSeq MainstAY was assessed with degraded DNA, prepared as described in the Methods section. For this study, the allele calls for a control sample with a degradation index of 1 (no treatment) were considered the correct allele calls, and the degradation series (sonicated samples) were compared to this control. As would be expected for degraded samples, larger alleles and loci were not detected in a manner consistent with higher degradation index samples, especially those amplicons greater than 270 bp in length which include D12S391, Penta E, DYS612, DYS19, DYS389II, DYS390, DYS522, DYS392, DYS460, and DYS448 (Supplemental Fig. 10, see Supplemental Table 2 for amplicon lengths). Even for the sample with the highest degradation index of 56, 37, 37 and 39 of the 78 expected total alleles were detected for the three replicates (48%) in the sequencing run with 96 samples and 39, 45 and 47 of the 78 total alleles were detected in the three replicates (56%) in the run with 32 samples (Supplemental Fig. 10). Comparing the higher and lower number of samples in sequencing runs, the allele call rate was similar, but more alleles were detected with the lower plexity run (Supplemental Fig. 10). The average reads per amplicon, however, showed a bigger difference between the two sample numbers per run with approximately 4.5-fold higher reads per amplicon for the 32-sample run compared to the 96-sample run (Supplemental Fig. 10). The expectation is a 3-fold difference in reads per amplicon for the different sample numbers per run, but the reads are also dependent on the total cluster density of the sequencing run. With these two sequencing runs, the cluster density for the 96-sample sequencing run was 508 clusters per mm2 and for the 32-sample sequencing run was 763 clusters per mm2, which results in a 1.5-fold increase in data recovered for the 32-sample run. The difference in number of samples and the lower cluster density for the 96-sample sequencing runs result in a total of 4.5-fold (1.5 ×3 = 4.5) reduction in reads. For highly degraded samples, a lower number of samples per sequencing run may be considered to achieve higher reads per sample. Overall, ForenSeq MainstAY performed well on degraded DNA and similarly to that of the ForenSeq DNA Signature Prep Kit [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Optimizing DNA recovery and forensic typing of degraded blood and dental remains using a specialized extraction method, comprehensive qPCR sample characterization, and massively parallel sequencing.
Concordance studies with ForenSeq MainstAY were conducted by comparing the allele calls from this study with previously published results for 2800 M and the three NIST SRM 2391d DNAs [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
NIST. Certificate of Analysis: Standard Reference Material 2391d PCR-Based DNA Profiling Standard 2019 [cited 2021]. Available from: https://www-s.nist.gov/srmors/certificates/2391d.pdf.
Promega. 2800M Control DNA 2017 [cited 2017]. Available from: https://www.promega.com/products/forensic-dna-analysis-ce/str-amplification/2800m-control-dna/?catNum=DD7101.
]. Results for the nine 1000 Genomes Project samples and Control DNA NA24385 from the UAS also were compared to their respective CE results. Accuracy was calculated as the number of expected alleles detected as a percentage of the total expected alleles. Call rate was calculated as the number of expected alleles detected as a percentage of all alleles detected. All expected calls were detected in the UAS for the 14 DNA samples in all replicates for 100% accuracy (Supplemental Table 7). Some samples also had some stutter products that exceeded the stutter filters in the default ForenSeq MainstAY Analysis Method, especially the cell line 1000 Genomes Project samples, resulting in call rates of 99.7% and 99.56% for autosomal and Y STRs, respectively (Supplemental Table 7).
Repeatability was measured for one operator who prepared two sets of 96 libraries of Control DNA NA24385 at 1 ng DNA input for each sample. All expected alleles were detected for the loci in the multiplex for the 192 libraries sequenced resulting in accuracy of 100%. There were two and five stutter products that exceeded the default stutter filters in the ForenSeq MainstAY Analysis Method for the first and second set of 96-sample runs, respectively (Supplemental Table 8). The four loci with stutter exceeding the stutter filter were D10S1248, Penta E, D16S539 and DYS385a-b (Supplemental Table 8). The amount by which stutter exceeded the filters was modest (0.1 – 1.3%) (Supplemental Table 8).
Reproducibility was measured using the runs described above with two additional operators, who were not part of the repeatability study, generating 96 libraries of Control DNA NA24385 at 1 ng DNA input for each sample. All expected alleles were detected for the loci in the multiplex for the 384 libraries sequenced resulting in an accuracy of 100%. There were three stutter products that exceeded the default stutter filter for one of the two sets of 96 libraries for the two operators at the loci DYS612 and DYS385a-b (Supplemental Table 8). Again, the amount by which stutter exceeded the filters was modest (0.2 – 3%) (Supplemental Table 8).
Precision was calculated as the number of expected alleles detected as a percentage of all alleles detected for all replicates of Control DNA NA24385. Compiling the results for the 384 libraries, a precision of 99.97% and 99.94% was calculated for autosomal STRs and Y-STRs, respectively (Supplemental Table 7).
ForenSeq MainstAY, like the ForenSeq DNA Signature Prep Kit, exhibits high accuracy and precision. In this study, some stutter exceeded the stutter filters in the MainstAY default analysis method. Laboratories are recommended to consider performing a stutter study as part of internal validation for ForenSeq MainstAY and to implement default or adjusted stutter filter thresholds that are applicable for the intended use(s).
3.8 DNA mixture studies
Two-person DNA mixture studies were performed to determine the ability of ForenSeq MainstAY with the MiSeq FGx and ForenSeq UAS to detect mixtures and accurately type distinct (i.e., discernable, and unshared) alleles from major and minor contributors. Male:male and male:female mixtures were assessed to determine the ability to detect different types of mixtures and detection of Y-STRs in female:male mixtures. Under this system the UAS designates a sample as a ‘Mixture’ if three or more loci have more alleles typed than are possible for a single source sample and ‘Inconclusive’ if one or two loci have more alleles typed than is possible for a single source sample (note: ‘Inconclusive’ in this context does not mean uninterpretable). Single source samples have no loci with allele numbers that exceed the expected number of alleles, prior to interpretation and editing (e.g., if elevated stutter is observed or a triallelic locus).
The default AT in the UAS (>1.5%) was used for the majority of the loci, meaning that detecting autosomal alleles from minor contributors at a 99:1 ratio and higher would be unlikely in most cases (under an assumption of extremely accurate pipetting and the absence of stochastic sampling). Further, the default UAS setting in the MainstAY Analysis Method is 20% for an isometric allele such that mixtures higher than a 3:1 ratio are automatically called if those reads are present at 20% or higher of the total reads for that repeat allele length. Finally, minor contributor alleles that fall into stutter positions were not typed if the reads fell below the stutter filters in the default ForenSeq MainstAY Analysis Method in the UAS. No alleles were manually typed in this analysis. In laboratory use these alleles may be called manually in the UAS at user discretion or as defined in operating procedures.
As expected, as the minor contributor decreased in proportion, allele non-detection increased (Fig. 11 and Supplemental Fig. 11). Those minor contributor autosomal alleles observed were reliably typed at a minor contributor level of 5% or higher for male:male, female:male and male:female mixtures. Approximately 20 of 32 expected, 16 of 36 expected and 10 of 31 expected distinct (unshared) autosomal alleles were detected for the minor contributor at the 5% proportion for female:male, male:female and male:male mixtures, respectively (Fig. 11 and Supplemental Fig. 11 A). At 2.5% minor contributor proportion, the observed male minor contributor autosomal alleles (10 distinct typed alleles of the 32 expected) were reliably typed in the female:male mixture samples (Fig. 11 and Supplemental Fig. 11 A). For the male:male and male:female mixture samples, the minor alleles were often below the AT of 1.5% or fell in stutter positions of the major contributor and therefore were not automatically called in this validation study for minor contributor proportions of 2.5% or less; these mixtures had approximately two to four distinct alleles detected at a 2.5% minor contributor proportion of the expected 31 or 36 alleles (Fig. 11 and Supplemental Fig. 11 A).
Fig. 11Detection of distinct, unshared minor alleles in three types of two-person mixture samples at varying mixture ratios with 1 ng total DNA input per sample for MainstAY. The average detection and standard deviations of the distinct, unshared minor contributor alleles in two-person DNA mixtures are plotted versus the percentage of the minor contributor for autosomal and Y- STRs at each mixture ratio tested in triplicate. The 0.2%, 0.4% and 0.5% minor contributor levels were not tested for the Male:Male mixture. Note there are no Y-STR results displayed for the Male:Female mixture because the female is the minor contributor. The standard deviations for three replicates are indicated by the error bars. The maximum number of distinct, unshared alleles is indicated by the lines (green for Female:Male, blue for Male:Female, and orange for Male:Male). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
The same general trends described above for the minor contributor with autosomal alleles were observed with Y-STR alleles. For the male minor contributor in the female:male mixtures 7–12 Y-STR alleles were detected from the replicate libraries with the male minor contributor proportion at 1% with increasing numbers of alleles typed as the proportion of minor contributor increased (Fig. 11 and Supplemental Fig. 11B). One to four alleles were typed in female:male mixtures less than the 1% minor contributor proportion. These results are similar to those obtained with the ForenSeq DNA Signature Prep Kit [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
To quantitatively determine contributor ratios from ForenSeq MainstAY sequence results, the proportion of minor contributor in the mixture samples can be calculated from loci with discernable and unshared (i.e., distinct) minor contributor genotypes. The proportion for each locus was calculated by dividing the sum of the reads for these minor contributor genotypes by the total reads for the locus. The proportions for all loci with discernable and unshared minor contributor genotypes were averaged for each sample. These individual proportions were averaged, and standard deviations calculated for the three replicates for the three sets of mixture types at each mixture ratio (Fig. 12). For the 1:1 mixture, technically there is no minor contributor as the proportion is expected to be 50% (but a minor was designated based on the status for the other mixture ratios in a series). The calculated values ranged from 27% to 73% across all loci for the replicates of the three different mixture types. The 90% confidence interval for all loci ranged from 39% to 67%. The 90% confidence intervals for female:male mixtures were 22 – 33%, 7 – 16%, 0 – 10%, 0 – 6%, and 0 – 3%, for 3:1, 9:1, 19:1, 39:1, and 99:1, respectively (Fig. 12). The results are consistent across autosomal and Y-STRs for the male:male mixtures (Fig. 12). Proportions of the distinct alleles from the minor contributor were consistent with the mixture ratios. For example, distinct autosomal alleles for the female:male mixtures at a ratio of 19:1 (or 5% male contribution), ranged from 0% to 12% of the total reads per locus with an average of 6% ± 3% of the reads. It is important to note that the sequence information for the repeats reveals more distinct alleles than could be detected by CE. However, calling these alleles at the lower minor contributor proportions using default UAS analysis settings may require manual editing.
Fig. 12Quantitative determination of mixture ratios in DNA Mixture Study. The average calculated (observed) proportions of reads for alleles from the minor contributor with standard deviations were plotted versus the proportions that were targeted in the study design for two-person mixtures. MainstAY Y-STR reads were measured for the Female:Male and Male:Female mixture samples though proportions were not calculated. The standard deviations for three replicates are indicated by the error bars.
To assess MPS features that are distinct from CE systems, studies were performed that tested the indexed adapters that allow libraries to be multiplexed together for sequencing, the effects of multiplexing libraries at different numbers of samples (plexity) per sequencing runs and run-to-run carry-over of samples on the MiSeq FGx sequencer.
Results from sequencing of 93 Coriell DNAs, 2800 M and MainstAY Control DNA NA24385 described in the Genetic Characterization section demonstrated that demultiplexing of libraries using the 96 UDI adapters in the MainstAY kit performed as intended. The negative amplification control had no aligned reads, and the 49 female samples sequenced showed no reads for the Y-STRs, indicating no detectable cross-contamination of the samples or issues with demultiplexing (which would have similar hallmarks as cross-contamination). All libraries exhibited total aligned reads greater than the 15,000 sample read guideline in the UAS, except the negative amplification control which had no reads.
In a second study, 96 libraries were prepared with 1 ng of Control DNA NA24385 and sequenced as three 32 sample sequencing runs. Information for all samples were included in the run creation (sample name, index information, assay type, sample type, project and analysis method) for the three runs, even when the library was not physically present on the flow cell (Supplemental Fig. 12). If there were issues with demultiplexing of the samples, reads could be assigned to some of the 64 samples included for each run creation with no corresponding library. No reads were detected for the libraries that were not physically included in the specific MiSeq FGx run (Supplemental Fig. 12). These runs intentionally contained only 32 samples each to yield a high number of reads for the actual libraries (from approximately 100,000 reads to greater than 200,0000 reads per library) to increase the chances of detecting an issue. Given that no aligned reads were detected above the > 10 read minimum threshold in the UAS for the other index combinations, the errors in demultiplexing (sequencing error, PCR error, contamination, or signal crosstalk) were estimated to be less than 0.011% (11 reads (the lowest number possible that can be detected) divided by 100,000 reads (the lowest total sample reads detected in these experiments)).
Unlike CE instrumentation, MPS systems for STR analysis do not exhibit saturation of signal with high DNA inputs, though for MPS the concept of an “over clustered flow cell” exists. The more amplified material that goes onto the sequencer, the more reads per sample. There is an optimal range of total library concentration in the pool that is loaded into the sequencing cartridge: low library pool concentration can result in low cluster density and low reads per sample or if the library pool concentration exceeds this optimal amount and the flow cell is overloaded, the number of clusters exceeds the recommended density which can in turn impact clusters passing the quality filter. The result of over clustered flow cells is a loss of data due to the filter. When the library pool concentration is in or near the optimal range (6 – 20 pM) [
Illumina I. MiSeq System Denature and Dilute Libraries Guide. Document # 15039740. v10. 2019.
], the potential clusters on the flow cell are distributed across the samples in the pool. Therefore, the fewer samples in a pool, the higher the number of reads per sample. This outcome is illustrated in Supplemental Figure 13 which compares the average reads per library for replicate libraries generated with varying inputs of DNA at three different sample plexities (numbers of samples per run). Cluster densities for the three sequencing runs were 1149 K/mm2, 1084 K/mm2, and 1438 K/mm2 for the 96-, 66- and 33-sample plexities, respectively. The fold differences observed among the sample reads per library were as expected: 1.5-fold higher average reads per DNA input for 66 samples compared to 96 samples, 2-fold higher average reads per DNA input for 33 samples compared to 66 samples, and 3-fold higher average reads per DNA input for 33 samples compared to 96 samples.
The average reads per marker type (autosomal STRs and Amelogenin or Y-STRs) also are impacted by the sample plexity of the sequencing run (Supplemental Figure 14). Again, the expected fold difference in average reads per marker type (autosomal STR, Y-STR) was observed when comparing the different total sample numbers per sequencing runs were compared: 1.5-fold higher average reads per marker type for 66 samples compared to 96 samples, 2-fold higher average reads per marker type for 33 samples compared to 66 samples, and 3-fold higher average reads per marker type for 33 samples compared to 96 samples (Supplemental Figure 14).
In the sensitivity study described above for the 96-sample sequencing run, at DNA template input less than 62 pg (MainstAY Control DNA NA24385) detection of all expected alleles dropped below 100%. 100% detection of all expected alleles was observed at the 31 pg input levels due to the higher reads per sample generated on these 33- and 66-sample sequencing runs (Supplemental Figure 15). Increasing the reads per sample (by decreasing the number of samples per sequencing run) recovered alleles for 31 pg samples such that full profiles were typed. For the two lowest DNA inputs tested (16 and 8 pg), the numbers of alleles with reads fewer than the AT were not statistically different (two-tailed Student’s t test method with p values ranging from 0.08 to 0.99) for the three different sample number runs (Supplemental Figure 15 and Supplemental Table 9). However, the reads per amplicon or locus were higher with the lower sample plexities and the same trends were observed as for the reads per marker type (Supplemental Figure 14) and per sample (Supplemental Figure 13), as shown in Supplemental Figures 16 and 17. At the lowest DNA inputs, decreasing the sample plexity per run brought the reads per locus from just higher than the 10 read minimum AT (i.e. 16–230 reads at 8 pg) to greater than 100 reads for 64% of the loci (i.e. 57 – 1072 reads at 8 pg) (Supplemental Figure 17). Though reducing sample numbers per run increases per locus reads, allele non-detection may still be observed if stochastic effects are exacerbated, as described below.
Data indicated that the number of reads per locus also impacted the inter-locus balance of the MainstAY multiplex. At the lower DNA inputs (less than 62 pg), the inter-locus balance showed higher variability due to stochastic effects (see above, Sensitivity Study). Reducing the sample number per run and increasing the reads per sample did not overcome the stochastic effects observed with the 96-sample sequencing run (Supplemental Figure 18). Comparing the inter-locus balance at the three sample plexities during sequencing (33, 66, or 96 samples per run) for 4 ng, 1 ng, 125 pg and 16 pg, the standard deviations for the relative marker coverage were highest for all number of samples per sequencing run for the 16 pg input (Supplemental Figure 18).
Overall, reducing the number of samples on a sequencing run increased the total reads for a sample. At some low DNA input levels, reducing the sample plexity from 96 to 66 or 33 yielded an overall increase in loci with reads that exceeded the default AT in the UAS. Thus, it may be that operators consider sequencing challenging samples, such as but not limited to low input and/or mixtures, at fewer than 32 samples per run (see Degraded DNA study in Stability Studies, above). Though the cost of a run may be increased, in certain scenarios the data recovery boost may be justified.
ForenSeq MainstAY utilizes UDI adapters arrayed in a one-time use plate. The 96 adapters have unique index sequences for both the i7 and i5 index reads [
Illumina I. Understanding unique dual indexes (UDI) and associated library prep kits 2021 [cited 2021]. Available from: https://support.illumina.com/bulletins/2018/08/understanding-unique-dual-indexes--udi--and-associated-library-p.html.
]. To assess the level of cross talk with these UDIs, the three 32-sample sequencing run studies described above (Supplemental Fig. 12) were analyzed for mis-aligned reads by determining the reads in the fastq files for every possible index combination (9216 combinations). Observed reads per library comprised of the expected index combinations ranged from ∼100,000 reads to more than 250,000 reads across the three sequencing runs. The highest misaligned index combination had 106 reads, which is 0.03% of the sum total reads for the two libraries with the correct index pairs. The 99th percentile for misaligned index combinations is 0.003% with the maximum occurrences of 25 out of the possible 9216 combinations (0.27%) with reads above 10. For a typical 96-sample run of 1 ng DNA input per sample, the average reads per library were approximately 50,000 reads. 0.03% of 50,000 reads is 15 reads and 0.003% is 1.5 reads. The occurrence of misalignment with the UDI adapters was thus (extremely) low. Misaligned indices could be generated if signal cross talk occurs during sequencing on the MiSeq FGx instrument, though these reads are not demultiplexed and are discarded as ‘undetermined’. This type of misalignment is not captured in the results in the UAS since only perfect adapter pairs are recognized and demultiplexed thus eliminating detection of this type of crosstalk in the analyzed results.
The post-run wash cleaning procedure recommended after every MiSeq FGx run mitigates the potential for carryover of reads from one sequencing run to the next. The three sensitivity sequencing runs (sample plexities) described above were sequenced one after the other on the same instrument following post-run washes. Information for all 96 MainstAY libraries were included in the data input for run creation for all three sequencing runs (even for libraries not loaded into the sequencing reagent cartridge and thus not present on the flow cell). If reads from any of the 33 libraries from the 96-sample sequencing run that were not included on the 66-sample sequencing run contaminated the 66-sample run, reads to these libraries would be observed. Likewise, if reads from any of the 33 libraries from the 66-sample sequencing run that were not included on the 33-sample sequencing run contaminated the 33-sample run, reads to these libraries would be observed. The sample representation plots from the UAS for the three runs are shown in Supplemental Figure 19. No aligned reads from 33 libraries from the 96-sample run (Supplemental Figure 19 A) were detected in the 66-sample run (Supplemental Figure 19B), and no reads from the 33 libraries from the 66-sample run (Supplemental Figure 19B) were detected in the 33-sample run (Supplemental Figure 19 C). Similar results were observed with the MiSeq FGx for the ForenSeq mtDNA Control Region and ForenSeq mtDNA Whole Genome kits [
Evaluation of sample types similar to those analyzed by forensic testing laboratories can demonstrate a technology’s reliability without consuming evidentiary materials or database samples [
]. Analysis of challenging mock case-type samples assist with determining and describing limitations of ForenSeq MainstAY kit and accompanying software.
To test amplification robustness of MainstAY STR loci, DNA was extracted from a blood stain using Chelex-100 with varying amounts of heme retained in the extract (Supplemental Table 3). Chelex-100 was selected as the purification procedure as it may not remove as many inhibitors or levels of them as do other widely used extraction procedures. DNA (1 ng per sample) was added to MainstAY library preparation and processed as described in the Methods section; libraries were sequenced with 72 samples. All samples had similar total reads that exceeded the sample read count guideline in the UAS and were similar to that of the Control DNA NA24385 on that sequencing run. The alleles called from the extracted blood samples were consistent with those from DNA from paired Chelex-extracted buccal sample (Control, Supplemental Figure 20), and reads for all alleles exceeded the default AT in the UAS (Supplemental Figure 20). Some stutter exceeded the default stutter filters in the ForenSeq MainstAY analysis method (Supplemental Figure 20). There was one unexpected discordant call that was not attributed to stutter in this study. In addition to detection of the expected 23 allele at DYS635 with 1290 reads, one unexpected DYS635 allele call (20 repeats, 31 reads) was detected in the ‘heavy heme carryover' blood sample.
Hairs were selected from female (samples A (one rooted, plucked hair) and C (one rooted, plucked hair and one shed with no visible root)) and male (samples B and D (each with one rooted, plucked hair and one shed with no visible root)) individuals. Extracted DNAs from the hair and paired buccal samples were used to generate MainstAY libraries. One ng of input DNA was targeted for the hair and control samples, but some hair samples were too dilute to obtain 1 ng in the maximum MainstAY sample volume of 8 μL, and the amount used was 440 pg for Individual C shed hair and 559 pg for Individual D shed hair (Supplemental Table 3). Similar total reads were observed for most of the libraries and exceeded the sample read count guideline in the UAS, with the exceptions of the libraries generated from hair DNA extracts for individual B and the library generated from the shed hair DNA extract for individual D which produced 315, 248, and 24 total reads, respectively. The plucked hair sample from individual A was 100% concordant with the paired buccal sample; observed read numbers were similar to the buccal sample (Supplemental Figure 21). No shed hairs were processed for individual A. Extracted hair DNA samples from individual B showed 12 and 9 loci detected (out of 53 total loci) for the plucked and shed hair (Supplemental Figure 21), respectively, even though the total reads for these samples were < 500 (Supplemental Figure 21). The shed and plucked hair DNA extract libraries generated 100% concordant calls compared to the buccal control sample for individual C with no stutter exceeding the stutter filters in the default ForenSeq MainstAY analysis method (Supplemental Figure 21). For individual D, the library from the shed hair DNA extract had very low reads (24 total reads), and only 2 loci were automatically called using the default MainstAY analysis method (Supplemental Figure 21). However, automated allele calls from the library generated from the plucked hair DNA extract was 100% concordant with the buccal sample library with high total reads (19,708 total reads) and no stutter exceeded the stutter filters in the default ForenSeq MainstAY analysis method (Supplemental Figure 21).
Direct amplification performance with ForenSeq MainstAY was evaluated by comparing the allele detection in libraries generated from Chelex-extracted DNA from buccal cells to libraries generated from lysates from buccal cells collected on cotton swabs or Bode Buccal DNA Collector filter paper [
Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.
Verogen. ForenSeq MainstAY Kit Reference Guide 2021 [cited 2022 September 2022]. Available from: https://verogen.com/wp-content/uploads/2022/01/forenseq-mainstay-reference-guide-PCR1-vd2020050-c.pdf.
]. Similar total reads that exceeded the sample read count guideline in the UAS were observed for all libraries. All expected alleles were detected in the directly amplified libraries with exception of the library generated from the Bode Collector Filter paper punch lysate for individual G, with two Y STR loci dropping below the AT (10 reads) in the UAS (Supplemental Figure 22). All three direct amplification libraries for individual G had some stutter exceeding the stutter filters in the default ForenSeq MainstAY analysis method (Supplemental Figure 22).
Human remains samples were tested with MainstAY as they may contain different or higher levels of PCR inhibitors than blood, buccal or hair samples. Six skeletal remains and five dental remains were analyzed with 96 samples per sequence run (Supplemental Table 3). The bone samples from Sam Houston State University’s AARC Outdoor Research Facility and the Willed Body Donation Program at the Southeast Texas Applied Forensic Science Facility were exposed to substantial environmental insults: commercial cremation (from funeral home), burning or embalming. The teeth were commercially obtained (Supplemental Table 3). Total reads from each library exceeded the 15,000 sample read count guideline in the UAS, with exception of Tooth 1665 with 11,227 reads (Table 1).
No reads were detected for one and six loci, respectively, from Tooth 1662 sample and Cremated Bone 1 sample, even though total sample reads for each sample exceeded the 15,000 sample read count guideline in the UAS (Table 1). The six missing loci in Cremated Bone 1 sample were DYS19, DYS389II, DYS522, DYS392, DYS460 and DYS448; the mean amplicon length for these loci is greater than 280 bp (see Supplemental Table 2 for amplicon lengths). Penta E and DYS390, with MainstAY mean amplicon lengths greater than 280 bp, had 20 and 14 locus reads in this library, respectively. Penta E, the longest amplicon in the MainstAY multiplex (422 bp mean amplicon length) was not detected in Tooth 1662 sample. All other samples yielded full profiles at all 28 or 53 loci (female or male samples, Table 1). The reads per locus for the six bone samples are provided in Supplemental Figure 23 and for the five tooth samples in Supplemental Figure 24. Each bone sample exhibited some degree of degradation as illustrated in the plots of locus read depth relative to average amplicon length (Supplemental Figure 23). Locus reads were much higher (2–4-fold higher reads) for the bone samples for the shorter amplicons (e.g., TPOX, CSF1PO) relative to MainstAY Control DNA NA24385 (Supplemental Figure 23). Conversely, the reads were much lower (2–14-fold lower) for the longer amplicons e.g., (Penta E, DYS448; see Supplemental Table 2 for amplicon lengths) for the bone samples relative to the Control DNA NA24385 (Supplemental Figure 23).
Fewer total sample reads were observed for the five tooth sample libraries as compared to the six bone sample libraries (Table 1). The method used to quantify the extracted skeletal samples was not the same as the method used to quantify the extracted dental samples which may account for the differences in library reads. Also, the tooth samples were less degraded than the bone samples, when locus read depth was plotted relative to average amplicon length (Supplemental Figures 23 and 24). Locus specific reads were fewer for the longer amplicons for the tooth libraries relative to the Control DNA NA24385 library, but the shorter amplicons had similar locus read depth for the tooth libraries relative to the Control DNA NA24385 library (Supplemental Figure 24).
ForenSeq MainstAY can yield full profiles with high quality DNA in 96-sample runs on the MiSeq FGx, and often at reduced DNA input amounts that mimic case-type samples, including down to 62 pg in a 96 sample run. As expected, locus read depth was affected when DNA was partially degraded. Even though lower read depth of the longer amplicons was observed with some challenging samples, a large number of amplicons (47–53 for male samples, all 28 for female samples) were detected from the five bone and six teeth DNA samples. Plucked hairs were successfully typed for three of the four individuals examined. Shed hairs, in general, were less successful for STR typing than plucked hairs, most likely due to a lower quantity of MainstAY-accessible nuclear DNA.
3.11 Species specificity
Noting that detection of genetic information from non-human species does not necessarily invalidate forensic use of a typing system, the potential of species cross reactivity with PCR primers in ForenSeq MainstAY was studied to