Technical in-depth comparison of two massive parallel DNA-sequencing methods for formalin-fixed paraffin-embedded tissue from victims of sudden cardiac death

Sudden cardiac death (SCD) is a tragic and traumatic event. SCD is often associated with hereditary genetic disease and in such cases, sequencing of stored formalin fixed paraffin embedded (FFPE) tissue is often crucial in trying to find a causal genetic variant. This study was designed to compare two massive parallel sequencing assays for differences in sensitivity and precision regarding variants related to SCD in FFPE material. From eight cases of SCD where DNA from blood had been sequenced using HaloPlex, corresponding FFPE samples were collected six years later. DNA from FFPE samples were amplified using HaloPlex HS, sequenced on MiSeq, representing the first method, as well as amplified using modified Twist and sequenced on NextSeq, representing the second method. Molecular barcodes were included to distinguish artefacts from true variants. In both approaches, read coverage, uniformity and variant detection were compared using genomic DNA isolated from blood and corresponding FFPE tissue, respectively. In terms of coverage uniformity, Twist performed better than HaloPlex HS for FFPE samples. Despite higher overall coverage, amplicon-based HaloPlex technologies, both for blood and FFPE tissue, suffered from design and/or performance issues resulting in genes lacking complete coverage. Although Twist had considerably lower overall mean coverage, high uniformity resulted in equal or higher fraction of genes covered at ≥ 20X. By comparing variants found in the matched samples in a pre-defined cardiodiagnostic gene panel, HaloPlex HS for FFPE material resulted in high sensitivity, 98.0% (range 96.6 – 100%), and high precision, 99.9% (range 99.5 – 100%) for moderately fragmented samples, but suffered from reduced sensitivity (range 74.2 – 91.1%) in more severely fragmented samples due to lack of coverage. Twist had high sensitivity, 97.8% (range 96.8 – 98.7%) and high precision, 99.9% (range 99.3 – 100%) in all analyzed samples, including the severely fragmented samples.


Introduction
Sudden cardiac death (SCD) is an event defined as a sudden and unexpected death occurring within an hour of the onset of symptoms, or occurring in people found dead within 24 h of being asymptomatic, who presumably died due to a cardiac arrhythmia or hemodynamic catastrophe [1].The overall incidence in unknown, but in North America and Europe the annual incidence of SCD ranges between 50 and 100 per 100, 000 individuals per year in the general population [2] and for the younger population of 1-40 years, the incidence is estimated to 1.3-8.5 per 100,000 per year [3,4].Standard post-mortem autopsies to determine cause of death may be complemented by molecular autopsies to determine a probable cause of death through DNA analyzes.Commonly, the molecular investigation is performed on blood or fresh frozen tissue, where high quality DNA can easily be extracted.For suspected SCD, molecular autopsy from blood samples may determine the underlying cause of death in ~ 30% of cases [5].
SCD causes include inherited structural, functional and/or cardiac abnormalities [6,7].The most common cause for SCD in the young is ventricular tachyarrhythmia.The disorders have common features in regard to genetics: they are familial rather than sporadic; they are autosomal dominant diseases, and the vast majority show incomplete penetrance.Furthermore, they show marked genetic heterogeneity with multiple types of variants [8].Other causes of SCD are vascular disorders, especially aortic diseases and familial hypercholesterolemia (FH).To date, more than 100 genes have been associated with inherited cardiac disease and SCD [9].Previous studies on the genetics of SCD, using targeted sequencing with gene panels of 70-192 genes associated with cardiac disease, showed that potentially pathogenic variants were found in 13-50% of the cases [3,10,11].Clinical screening of relatives identifies an inherited genetic condition in 22-53% of the families, and targeted genetic DNA screening identifies pathological variants in genes coding for cardiac ion channels in up to 35% [12][13][14].
Genetic testing may be delayed due to a traumatized family and/or lack of information on the possibility of a genetic cause of the SCD.In yet other cases, patients seek genetic counseling, having relatives who died of SCD, often several years ago.Generally, long time storage of blood samples is not practiced in most forensic and pathology departments, entailing that if a DNA analysis is requested, the only source for DNA analysis may be FFPE material, which is usually saved after histopathological examination of tissues.
FFPE tissue is challenging as a source for DNA extraction and subsequent sequencing.Fixation of the tissue in buffered formalin preserves morphology and enables storage at ambient room temperature.However, the formalin induces damage to the DNA molecules and the fixation process results in shearing of nucleic acids.In turn, the fragmented DNA directly influences the amount of templates available for PCR reactions used in subsequent library generation [15].Formalin also causes crosslinks between DNA, proteins and histones, as well as deamination of cytosine resulting in sequencing errors such as C>T/G>A or A>G/T>C variants [16].All of this leads to sequence artefacts that can be falsely interpreted as clinically important disease-causing variants.
In this study, the aim was to compare two different massive parallel sequencing assays for SCD using FFPE tissue as starting material.The first protocol employs improved amplicon-based HaloPlex High Sensitivity (HS) design for targeted amplification of selected genes, while the second protocol, based on Twist library preparation, utilizes hybridization capture-based amplification of the whole exome and subsequent bioinformatic filtering with virtual ad hoc gene panels.To avoid false positive variants, unique molecular identifiers (UMI) were used to tag individual DNA templates, permitting sequencing and PCR errors in high coverage NGS data to be accounted for.Twist's double stranded probes enable sequencing of both strands, which is an advantage.The methods were validated and compared against HaloPlex amplification of genomic DNA extracted from blood samples.
The current clinical protocol at the forensic department in Linköping, Sweden, uses HaloPlex and a cardiodiagnostic gene panel (hereafter referred to as Cardio Diagnostic Gene Panel, CDGP) containing 81 genes linked to 10 diseases, see Fig. 1.This method is based on sequencing of relatively long amplicons, which is rarely compatible with the fragmented DNA extracted from FFPE material.Moreover, since for each location, the sequencing reads represent only one of the two strands in the DNA double helix and UMIs are not utilized, the FFPE artefacts cannot be removed bioinformatically.Its application is therefore limited to cases where blood or fresh tissue samples are available and therefore there is a clinical need for genetic analysis compatible with DNA from FFPE.

Materials and methods
The work described herein has been carried out in accordance with the code of ethics of the World Medical Association declaration of Helsinki.The research was approved by the regional ethics committee in Linköping, ethic permission nr 2016-389/61.

Subjects/specimen selection
Eight consecutive molecular autopsy cases were selected from two forensic medical departments.FFPE heart tissue was chosen for this study.The set of blood samples from these cases had previously been analyzed six years earlier in the forensic routine with HaloPlex using the CDGP containing 81 genes, as previously described for ARVC genes [17].The study plan was to isolate DNA from the FFPE cardiac tissue samples for sequencing with HaloPlex HS and Twist, respectively, to compare the results of the two methods with results from previous fresh blood sequencing as a true result.Genomic standards NA12877 and NA12878 [18] were included as validation controls.

HaloPlex HS for FFPE
HaloPlex HS for FFPE experiments were performed at department of department of Clinical Genetics, Linköping University Hospital.A Hal-oPlex design for the CDGP was made with Agilent ´s SureDesign (Agilent, Santa Clara, United States), using High Sensitivity and FFPE settings for inclusion of shorter fragments to avoid dropouts, with targeting of both strands and incorporation of 10-16 bp UMIs.(See supplementary Table 1 for genes and transcripts).

Modified Twist with UMIs and Westburg library preparation
Twist whole exome sequencing of FFPE-samples was performed at the Department of Laboratory Medicine, Örebro University Hospital.This modified Twist protocol uses 120 nucleotides whole exome target double stranded DNA probes to detect a 33 Mb protein-coding region.Capture was performed with Twist core exome based on CCDS accessions, with the addition of spiked-in RefSeq areas.To account for variation in input DNA quality, the protocol was adjusted using library preparation kits with options to include/exclude further fragmentation of DNA.UMIs were included to enhance the suitability for FFPE samples.After whole exome library preparation and sequencing, the CDGP including exons with 10 bp padding of the 81 genes was added as an ad hoc bioinformatic filter to detect genetic variants.

DNA extraction and quality control
DNA was extracted using QIAamp DNA FFPE Tissue Kit (Qiagen, Hilden, Germany).Concentration was determined using Qubit 1.8 (Invitrogen, Carlsbad, USA) and the Quant-it dsDNA BR Assay Kit (Life Technologies, Carlsbad, USA).Extent of fragmentation was assessed on Agilent's 4200 Tape Station system using Genomic DNA Screen Tape (Agilent, Santa Clara, United States) to obtain average fragment length and DNA integrity number (DIN).

Library preparation 2.4.1. HaloPlex HS for FFPE tissue
Library preparations were performed according to the manufacturer's protocol.DNA input was determined based on the extent of fragmentation in the FFPE-samples.250 ng of DNA was used for samples with average fragment length > 2000 bp.500 ng of DNA was used for samples with average fragment length 1000-2000 bp.All available DNA was used (656-2125 ng) for samples with fragment length < 1000 bp.For the genomic standards, 50 ng of DNA was used.
All incubation steps and PCR reactions were performed on a 2720 Thermal cycler (Applied Biosystems, Foster City, United States).Briefly, DNA from all samples was cleaved by eight different enzyme mixes at 37 • C for 30 min.Correct restriction patterns were assured by analyzing samples and the ECD control using 2100 Bioanalyser with High Sensitivity DNA Kit (Agilent, Santa Clara, United States) according to manufacturer's instructions.After fragmentation, HaloPlex gene panel probes and sample indexes were hybridized to pooled fragmented DNA by incubating at 95 • C for 5 min, followed by 16 h hybridization at 58 • C. The hybridized pool was purified using AmPure XP beads (Beckman Coulter, Brea, United States).
Fragments were circularized at 55 • C for 10 min, then captured using Dynabeads™ MyOne™ Streptavidin T1 (Life Technologies, Carlsbad, United States).PCR amplification was performed for 22 (moderately fragmented samples) or 24 cycles (severly fragmented samples) followed by library purification using AmPure XP beads.Library length was verified using Bioanalyzer and DNA High Sensitivity Assay.If artefact peaks were discovered, an additional washing step (one or two times as required) were performed.Finally, library concentrations were determined using Qubit.

Twist
For FFPE samples and for genomic standards, 250 ng and 50 ng of DNA were used to generate libraries, respectively.Degree of fragmentation was the decision point for choosing to include or exclude a fragmentation step in the library preparation method.All incubation steps and PCR reactions were performed in a Veriti™ 96-Well Thermal Cycler (Applied Biosystems, Foster City, USA).

2.4.2.1.
Westburg DNA library preparation kits.FFPE samples with average fragment length > 1000 bp were selected for library preparation using Westburg library preparation kit.Briefly, DNA was enzymatically fragmented at 32 • C for 4 min, followed by DNA end-repair and dA-tailing at 65 • C for 30 min using Westburg NGS DNA Library Prep Kit (Westburg, Leusden, the Netherlands).
FFPE samples with average fragment length < 1000 bp were selected for library preparation using Westburg NoFrag Library Prep Kit.DNA was end-repaired and dA-tailed by incubating the samples for 30 min at 20 • C, followed by 30 min at 65 • C.

Adapter ligation using UMIs. Duplex adapters containing UMIs
(xGEN Duplex Seq Adapters, Integrated DNA Technologies, Inc., Coralville, USA) were ligated to the fragments by incubating at 20 • C for 30 min.Ligated fragments were amplified in a pre-hybridization PCR (10-12 cycles) using KAPA HiFi HotStart Ready Mix (KAPA Biosystems, Wilmington, USA) and 8 nt long IDT duplex indexing primers (Integrated DNA Technologies, Inc., Coralville, USA).Libraries were purified using DNA purification beads (Twist Bioscience, San Francisco, USA).Library concentration was measured using Qubit 2.0 Fluorometer and Qubit™ dsDNA HS Assay Kit and fragment length using 4200 Tapestation System and D1000 Screen Tape.

Twist whole exome library preparation.
Exome libraries were generated using Twist Human Core Exome Multiplex Hybridization Kit (Twist Bioscience, San Francisco, United States of America) in a modified protocol.Briefly, eight individual libraries (1500 ng of DNA) were pooled.The pool was hybridized at 70 • C for 16 h using whole exome probes with the addition of spiked-in RefSeq Human Panel (Twist Bioscience, San Francisco, United States of America).The exome library was amplified for 10 PCR-cycles using KAPA HiFi HotStart Ready Mix and xGEN Library Amplification Primer (xGEN Duplex Seq Adapters, Integrated DNA Technologies Inc., Coralville, USA).DNA concentration of the pool was determined using Qubit 2.0 Fluorometer and Qubit™ dsDNA HS Assay Kit and analyzed on Tapestation using D1000 Screen Tape.
E. Adolfsson et al.

Sequencing
HaloPlex HS sequencing was done using MiSeq (Illumina, San Diego, USA).Sample sheets were edited to include HaloPlex indexes and molecular barcodes.Either four pooled libraries were sequenced at 6 pM on MiSeq Reagent v2 300 cycle kit, or six pooled libraries at 8 pM on MiSeq Reagent v3 600 cycle kit.The Miseq instrument was configured to collect fastq files for two index reads, as well as 151 cycles of paired end sequencing.
Twist sequencing of the eight pooled samples was performed on NextSeq (Illumina, San Diego, USA) on a high-output kit (300 cycles) using 1 pM of the whole exome pool as input following manufacturer´s instructions.

Bioinformatic pipelines
All genomic positions and variants denoted in this manuscript uses the Genome Reference Consortium Human Build 37 (GRCh37) hg19 coordinates.

HaloPlex HS
An in-house bioinformatic pipeline was designed.The procedure is stepwise, I) Trimming adapters and barcodes from fastq-files and synchronizing reads for paired end alignment II) All low-quality base calls (Phred score ≤14) were substituted with 'N' III) Reads were aligned to hg19 reference genome using bwa mem [19].Properly mapped read pairs were combined, and barcodes restituted.IV) Bam-files were created and indexed, and reads were tabulated with samtools mpileup [20].V) All relevant regions (CDPG +/-10 bp flanking sequences) were extracted.Variants were detected using an in-house script detailed in previous publication [17] and vcf and coverage files were created.

Twist
Fastq files were generated from the NextSeq run using bcl2fastq2 Conversion Software v2.20 set to trim TruSeq adapters, and to include UMIs in the read header as well as trimming the first five cycles in each read to exclude UMIs from read sequences.Reads were also trimmed 3b y 5 bp to remove UMI remnants.Fastq files were analyzed using Bcbio, with the aligner bwa-mem [19] and the callers GATK version v4.1.8.1 [21], Samtools version 1.9 [20] and Freebayes version v1.1.0-46-g8d2b3a0-dirty[22].Duplicates were marked.Ensembl Variants from ensembl files from the three callers were used for variant comparison using hg38 as reference.Multisample vcf files were split into individual sample vcf files using bcf-tools.These variant and bam files were lifted from hg38 coordinates to hg19 coordinates using CrossMap [23] to enable variant comparisons.Variants had to be called by two out of three variant callers in order to be considered "true" variants.

Sequencing metrics
Sequencing data were analyzed using MultiQC [24] to include reads per sample, mapped reads, reads on target, and duplicates.Sequencing data was also analyzed in R using custom scripts to generate summary statistics such as sequence length, GC-content and nucleotide balance.In particular, a special focus was given to the CDGP subset regions, where data from both Twist and HaloPlex protocols are available.

Read coverage and uniformity in the CDGP
Sequencing data were analyzed using in-house scripts to determine coverage and uniformity in each gene of the CDGP.Measures of coverage included overall coverage per gene, percentage of bases covered > 20X per gene, percentage of bases covered 0X per gene, and exceedance coverage, expressed as percentage of bases covered 100% between 0 and 50X.Measures of uniformity included fold80/fold 90 and percentage of bases covered at > 0.2X of the calculated mean coverage.

Variant comparisonvalidation samples
CLC Genomics workbench 20.0 (https://digitalinsights.qiagen.com) was used for all variant comparisons employing the hg19 build as reference genome.Platinum genomes NA12877/NA12878 were used to verify the setup of HaloPlex, HaloPlex HS and Twist by comparing the consensus variants found after amplification/sequencing/bioinformatics and filtering to the confident regions of the CDGP.Variants not overlapping between assays for corresponding samples were analyzed in Integrative Genomic Viewer [25] where coverage in the variant position/s, allele frequencies and number of UMIs present in the variant was evaluated.
For intra-sample comparisons, variants obtained using HaloPlex amplification of blood samples were considered true, since this method has previously been validated using Sanger data [17].Variants from each blood sample amplified using HaloPlex were compared to the variants detected using HaloPlex HS and Twist, respectively, of the corresponding FFPE DNA.Additional variants found in FFPE reads were denoted extra variants and variants not found in the FFPE reads were denoted missed variants.

Sequencing metrics 4.1.1. HaloPlex HS
Libraries were successfully generated and sequenced on MiSeq from all FFPE samples.The average read length ranged 73-101 bp, with shorter reads lengths from poorer quality samples.Reads per sample ranged from 2.8 to 8.6 Mb in CDGP, despite equal input of all libraries in the sequencing reaction, with a tendency for more reads from higher quality samples.Percentage of duplicate were high; ranging from 14.4% to 49.2% without correlation to sample quality.The overall 20X coverage was higher for moderately fragmented samples (98.55-99.44%)compared to severely fragmented FFPE samples (76.19-90.44%).Same pattern was seen for average depth; 749-1565 X vs. 408-771 X, respectively (See Table 1 and Fig. 2).Skewed nucleotide balance and increased GC-content were observed in severly fragmented samples, see Supplementary Fig. 1 and Supplementary Fig. 2.

Modified Twist
Generating libraries from FFPE-samples using Twist required optimization of the protocol for each individual sample.Based on degree of fragmentation and DIN-score, the protocol was adjusted to include or exclude fragmentation, i.e. to use Westburg NGS kit for moderate fragmented samples or Westburg NoFrag kit for severely fragmented samples with average fragment length < 1000 bp.Fragmentation time was reduced from proposed 22-4 min for FFPE samples to achieve library lengths of the target length 375-425 bp.Using this strategy, libraries were successfully generated and sequenced on NextSeq.
Reads per sample ranged from 0.4 to 0.8 millions in CDGP.Percentage of duplicates ranged from 5.6% to 10%.Sequencing metrics showed that > 99.7% of reads were mapped, with > 60% reads on target (data not shown).Sequence reads were significantly longer than for HaloPlex HS (131 ± 8 bp vs 91 ± 10 bp, p < 0.05).Average sequence depth ranged from 57 to 171X, and coverage 20X ranged from 85.8% to 99.5%.Severely fragmented samples remained covered to a high degree.See Table 1 and Fig. 2.
Detailed data analysis did not show correlation between number of reads and initial DNA fragment length, DIN-score or mean insert size (data not shown).For severly fragmented samples, GC-content increased and nucleotide imbalances with increasing percentage of G/C in relation to A/T were observed compared to samples with moderate fragmented DNA, but to a lesser degree for HaloPlex HS.See Supplementary Fig. 1 and Supplementary Fig. 2.

Read coverage and uniformity in the CDGP
Average sequence depth was higher in HaloPlex HS than Twist (408-1565X, vs 57-171X, p < 0.001).Using Pr> 20X as threshold for sufficient coverage for hereditary disease [26] Twist had higher fraction of bases covered above threshold compared to HaloPlex HS for FFPE, while HaloPlex HS had higher fraction of bases lacking coverage (Pr0X).Read coverage uniformity was assessed using fold-80 and fold-90 base penalty metrics, defined as the fold change of non-zero reads coverage needed to bring 80% or 90% of the bases to the observed mean coverage [27].For Twist, lower values were obtained indicating less variability and hence higher uniformity.Fold90 showed that HaloPlex for blood samples had difficulties with uniformity in certain regions.Percentage of bases covered at > 0.2X of the calculated mean coverage was higher and more uniform for Twist.HaloPlex for blood was included for comparisons.See Fig. 3.

Table 1
HaloPlex HS and Twist sequencing metrics.The table displays library length, average read length, reads in the cardiodiagnostic gene panel (CDGP), duplicates, average sequence depth and overall coverage in CDGP.

Sample fragmentation
Library   Since fold-80/fold-90 ignores base positions with zero coverage, another measure of uniformity was applied; the fraction of bases in each gene covered at least 20% of the observed mean coverage for that gene.A similar pattern as with fold-80/fold-90 was observed, where Twist displayed a less variable and more uniform coverage, whereas HaloPlex technologies resulted in variable coverage, with specific genes having less uniformity within a gene.Detailed investigation of genes not achieving ≥ 20X coverage in all bases within the gene was performed.See Fig. 4.

Variant comparisonsvalidation samples
Initially, HaloPlex HS and Twist were validated using platinum genomes samples NA12877 and NA12878 with known reference variants as well as defined regions of high confidence [18].Variant detection using both technologies showed high concordance with reference variants in NA12877 and NA12878 (data not shown).
Using IGV [25] the missed and/or extra variants detected with HaloPlex HS were analyzed.Two false positives and four true negatives were identified.The false positives were a nine bp deletion located in a repeat region in chr2:21266775, and a SNV (G/C) located in chr2:17946330.The false negative variants were SNVs located in chr1:237957161 (A/G), chr1:2377754200 (C/T) and chr2:179598034 (C/T).These variants displayed skewed allele ratios in HaloPlex HS and were hence not called.The skewed allele ratios were only noted with HaloPlex HS, not with Twist.(See Supplemental Table 5 for further details.).
Using IGV, three unique false negative variants were discovered that were not present in FFPE samples, only in the corresponding blood.One variant was a SNV located in chr2:179458591 (C/T) present in only one blood sample.One variant was a 2 bp deletion located in chr14:23858272-23858273 (GG/-) present in two samples.The last variant was a SNV chr21:35821821 (T/C) present in five of the eight samples.This variant was present in the bam-files for Twist, but only in reads with MAPQ = 0, indicating that there may be mapping ambiguities, possibly due to a pseudogene.We found only one false positive variant with Twist.This was a 3 bp deletion (GCT/-) located in chr9137534099-137534101, which presented in only one of the matched samples.(See Supplementary Table 6 for further details.).

Methodological differences and their attribution to the outcome
We suggest that the observed differences in read coverage and uniformity in the present study can be explained in terms of performance of HaloPlex and Twist technologies.In general, amplicon methods have been described as having higher on-target rates, whereas hybridization capture-based approaches have been shown to have lower on-target rate, but better uniformity [28].Limited data indicate that use of the Twist protocol results in more uniform coverage and may be suitable for highly degraded DNA [29].Indeed, this is observed in this study.Although the average sequence depth is higher for HaloPlex technologies than for Twist (Fig. 4), the fold80/fold90 measurements show a more uniform coverage for Twist (Fig. 3), resulting in higher 20X coverage in the genes, and higher sensitivity in the more fragmented samples compared to HaloPlex HS (Fig. 6).
An advantage of the Twist protocol is the possibility to exclude enzymatic fragmentation and only do end repair and poly-A ligation, making it easier to generate libraries from severly fragmented samples, compared to HaloPlex HS.In this study, the DNA extracted from FFPE tissue ranged from moderately fragmented ~ 3000 bp in average length, to severely fragmented samples as short as ~300 bp (Table 2).One major observation was the considerable reduction in sensitivity for severely fragmented DNA samples using the HaloPlex HS protocol.The same was not observed using the Twist protocol (Table 2, Fig. 6).This clearly gives the Twist technology an edge, since the clinical laboratory must rely on a robust method for all types of FFPE samples, regardless of the degree of fragmentation.However, some of the lost coverage using HaloPlex HS method on extensively fragmented samples might be mitigated through increasing library input, and thereby the amount of reads.
Twist requires less DNA input to generate reliable results.The ability to use lower amount of DNA is an advantage since extracting large amounts of DNA from FFPE tissue is difficult.A potential disadvantage of Twist is that samples are pooled at the hybridization step, prohibiting re-sequencing of a single sample without repeating the hybridization step.Twist protocols are also labor intensive and time consuming.
Focusing on coverage data, Twist lacked coverage in the PKP2 gene.Per design, probes were lacking in exon 6 of the gene.This resulted in complete absence of coverage for this particular region in all samples (see supplemental Fig. 3).Pathogenic/VUS variants correlated to cardiomyopathy are located in this region according to ClinVar [30].In comparison, using HaloPlex technologies, we observed lack of coverage in several regions of interest (supplemental Table 3).For example, neither HaloPlex nor HaloPlex HS completely covers KCNQ1, a clinically relevant gene containing several pathogenic variants linked to LQTS and other cardio-related diseases.

Variant detection -comparisons
For both technologies, the concordance between variants found in whole blood and the variants found in the corresponding FFPE samples was high (>99.999%).However, for HaloPlex HS, lack of coverage resulted in increasing number of missed variants as the quality of DNA dropped (Table 2).The most fragmented sample contained only 116 variants, as opposed to the matched blood which contained 155 variants.In this sample, 40 variants were undetected in the FFPE-DNA, resulting in false negative rate of 25.8% and a sensitivity of just 74.2%.This phenomenon was not seen with Twist where less variants were undetected, and hence sensitivity for the severely fragmented samples was higher (97.7-98.7% for Twist compared to 74.2-91.1% for E. Adolfsson et al.

HaloPlex HS).
Lack of coverage in the Haloplex for blood golden standard method resulted in variants assigned as extra variants in Twist (n = 22) or HaloPlex HS (n = 6) and is reflected in the false positive rate and PPV, which then would give the impression of Twist being less specific than HaloPlex HS.We therefore excluded the extra variants that were due to lack-of-coverage in the HaloPlex reference method when calculating PPV (see Supplementary Table 5 and Table 6).

FFPE artefacts -our findings in relation to others
A study by Lin et al. [31] compared variants found in FFPE cardiac tissue samples against paired whole blood samples, using HaloPlex amplification and MiSeq sequencing technology.They found both false negative and false positive variants compared to blood, but also intra-sample variation in the FFPE material, which they attributed to depth of sampling in the tissue block.The most common sequencing errors were A:T > G:C nucleotide change.In comparison, Baudhuin et al. [32] achieved 100% concordant accuracy between FFPE material, dried blood spots and whole blood using SureSelect for amplification and MiSeq sequencing technology.Bhagwate et al. [33] investigated concordance between FFPE and fresh frozen breast tissue material.By adding UMIs and bioinformatic variant detection and filtrating strategies such as simple filtering criterion to exclude variants below 5% alternative allele frequency, they achieved sensitive and confident variant calling from FFPE material although the number of variants were higher compared to the fresh frozen samples.
In this study, no false positive variants were attributed to FFPE artefacts.We employed UMI as a strategy to avoid this, by inclusion of molecular barcodes in both HaloPlex HS and Twist.Also, with Twist, three separate calling tools were used and applied a rule stating that a variant should only be called if detected by at least two separate callers.We conclude these strategies to be successful, but careful examination of each detected variant is needed in clinical practice to ensure that the variant is present in several amplicons (HaloPlex) and originates from several unique DNA molecules to discriminate a true variant from a FFPE artefact when performing variant interpretation from FFPE-DNA.

Clinical considerations
The choice between a targeted sequencing approach using a gene panel and a whole exome sequencing approach with post-sequence ad

Table 2
Comparison between HaloPlex HS and Twist, variant detection (n = 8).Both methods are compared against the golden standard HaloPlex for matched blood samples.Twist has lower fraction of missed variants (2.2% vs 6.5%), slightly higher fraction of extra variants (1.8% vs 0.8%), higher sensitivity (97.8% vs 93.5%) and specificity/PPV (99.9% vs 99.8%).hoc filtering may be a cost-dependent decision.The considerations range from coverage and cost, to the possibility of including more genes as new knowledge emerges.Each clinical laboratory must make a choice here.The strategy of using gene panels have several advantages over WES.The most common genes associated with SCD can be investigated to a lower cost, with possibly shorter turnaround time.HaloPlex have the great advantage that even a single sample can be prepared and sequenced at a time to a comparatively low expense, by using smaller flow cells.Twist, on the other hand, has the advantage of requiring less sequencing for optimal coverage but the disadvantage of requiring eight samples simultaneously for lowest cost of library preparation.Another option is to take advantage of the uniform coverage obtained with Twist technology, but use a customized Twist panel based on, for example the CDGP panel used in this study, and excluding all other probes in the Twist design used here.Twist yielded roughly 1/10 the number of reads in the CDGP region compared to HaloPlex HS, while still providing equal or better results in terms of coverage statistics,.Assuming same on-target rate and duplicate percentage for the customized probes design as for core exome, an estimated 1.5 million reads would be needed to obtain same coverage as seen with Twist WES.As a consequence, instead of sequencing 8 exome FFPE samples on a NextSeq high-output kittheoretically yielding 800 M paired-end reads-, ~ 500 custom panel FFPE samples could be sequenced.This corresponds to a reduction in sequencing cost of ~ 98%.Compared to HaloPlex HS, 16-20 samples could be sequenced on a MiSeq v2 300 cycle kit (theoretically yielding 24-30 million paired-end reads) using the customized Twist panel, i.e. 4-5 times the number of samples sequenced when using HaloPlex HS.Sequencing cost compared to HaloPlex HS would therefore be reduced by 75-80%.

Limitations of the study
The massive parallel sequencing experiments described in this study were performed over a period of six years at two different laboratories using different sequencing instruments and different pipelines with different callers for the analysis, but the same DNA.At each site, pipelines used for bioinformatic analysis were optimized for their respective protocol.Although the bioinformatics has not been uniform, the high concordance of variant calls means that the pipelines themselves are not expected to impact the validity of the observations or the conclusions of this study.

Conclusion
SCD is a traumatic event for relatives of the victim, and finding a potential underlying genetic cause is very important for the relatives since genetic testing of the family may prevent further deaths [34].Since the only remaining material may be FFPE samples from autopsy, genetic testing must be compatible with this material.In this study, two approaches of DNA sequencing for detecting genetic variants causing SCD were contrasted.Despite significant challenges with NGS data from FFPE, a combination of strategies (inclusion of UMIs, optimization of library preparation, bioinformatics variant detection and filtering strategies) allowed us to get sensitive and confident variant calling from both NGS approaches.However, it should also be added that we have from our own experience encountered FFPE samples that have been so degraded that they could not be sequenced with either the Twist or HaloPlex HS methods described here.
In conclusion, this study provides methodological insights to sequencing of FFPE samples for detection of clinical variants relevant to SCD using Twist technology and Haloplex HS.We show that although using a lower median coverage for the capture-based Twist technology the sequencing results in a more uniform coverage, higher fraction of the target region above 20X, a high sensitivity and maintained specificity for moderately and severely fragmented FFPE samples, as compared to Haloplex HS.Also in comparison, the amplicon-based HaloPlex HS method resulted in a higher number of missed variants i.e. lowered sensitivity due to insufficient coverage in especially the more severely fragmented samples.In summary, the Twist technology resulted in more variants being identified in the FFPE samples (also compared to the Haloplex sequencing of DNA from fresh tissue), and that genetic variants could be identified in cases with severely fragmented DNA, making it a suitable technology for sequencing of FFPE samples.

Fig. 2 .
Fig. 2. Raw sequencing metrics.A: Sequence length distribution, given as the percentage of adapter-trimmed reads that reach a specified length.The graph illustrates results from Twist (blue), HaloPlex HS (yellow) and HaloPlex blood (red).Left: FFPE sample with moderate fragmentation.Middle: FFPE sample with severe fragmentation.Right: average and one SD confidence bands for all the eight validation samples in this study (ranging from moderate to severe fragmentation).B: Quality per read position, given as the Phred quality score of adapter trimmed-reads.Twist (blue), HaloPlex HS (yellow) and HaloPlex blood (red).Left: FFPE sample with moderate fragmentation.Middle: FFPE sample with severe fragmentation.Right: average and one SD confidence bands for all eight validation samples in this study.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 3 .
Fig. 3. Summary statistics for each gene in the cardiodiagnostic gene panel (CDGP) (x-axis).From top to bottom, A) aligned average coverage with a one SD confidence band, B) Fraction of bases covered to at least 20X, C) Fraction of bases not covered (0X), D) fold80, a measure of uniformity calculated as the fold change to raise 80% of the bases to the mean coverage, E) fold90, calculated as the fold change to raise 90% of the bases to the mean coverage, F) Fraction of bases above 0.2 times the mean coverage.

Fig. 4 .Fig. 5 .
Fig. 4. Coverage graphs.A. Per gene coverage (based on aligned reads) in the cardiodiagnostic gene panel (CDGP) for each arbitrary base covered in the panel (xaxis).For comparative reasons, the y-axis (coverage) is on log scale.20X coverage is marked by a dashed line.For each gene, the average coverage for HaloPlex HS (yellow line), HaloPlex blood (red line) and Twist (blue line) are shown.Values are based on all samples, n = 8.HaloPlex and HaloPlex HS for FFPE results in much higher average coverage.However, as shown in the figure, HaloPlex HS has several regions where coverage fails to exceed the 20X threshold, indicating design and/ or performance issues.This also happens to HaloPlex and Twist, but to a lesser extent.B. Per gene exceedance probability graphs for the cardiodiagnostic gene panel (CDGP).The graphs show fraction (y-axis) of bases covered 0-50X (x-axis).Values are based on all samples, n = 8.Ideally, each gene is covered 100% at least up to the clinically important threshold value 20X, marked by a dashed line.Twist, in blue, displays rapid drop in fraction of bases covered after 20X, compared to HaloPlex and HaloPlex HS.There are also genes where coverage drops before 20X, for example EMD and SNTA1.With HaloPlex HS for FFPE, on the other hand, several genes never reach coverage 100%, even at 0-20X.Examples include FHL2, JUP, MYL2 and TGFBR1.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)