If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Monozygotic (MZ) twins are considered being genetically identical, therefore they cannot be differentiated using standard forensic DNA testing. Here we describe how identification of extremely rare mutations by ultra-deep next generation sequencing can solve such cases. We sequenced DNA from sperm samples of two twins and from a blood sample of the child of one twin. Bioinformatics analysis revealed five single nucleotide polymorphisms (SNPs) present in the twin father and the child, but not in the twin uncle. The SNPs were confirmed by classical Sanger sequencing. Our results give experimental evidence for the hypothesis that rare mutations will occur early after the human blastocyst has split into two, the origin of twins, and that such mutations will be carried on into somatic tissue and the germline. The method provides a solution to solve paternity and forensic cases involving monozygotic twins as alleged fathers or originators of DNA traces.
]. However, monozygotic “identical” twins have identical microsatellite profiles. Thus they cannot be distinguished e.g. as alleged fathers or as sources of DNA traces at crime scenes using the current technology. This inability causes problems in criminal or paternity cases with a MZ twin as suspect, or as alleged father. With a probability for MZ twins of about 3 in 1000 births [
], around 6 of 1000 males are identical twins. Therefore, crime or paternity cases with MZ twins are not infrequent and sometimes receive a high level of attention. Small genetic or epigenetic differences between twins have been described [
Monozygotic twins with neurofibromatosis type 1 (NF1) display differences in methylation of NF1 gene promoter elements, 5′ untranslated region, exon and intron 1.
] states “that >80% of the offspring of one twin brother would carry at least one germline mutation that would be detectable in the sperm of their father, but not in that of the other twin”. The authors strongly suggest to conduct paternity testing in the context of MZ twins by whole genome sequencing and identification of rare de novo mutations. Studies to differentiate twins genetically have been carried out earlier, e.g. Bruder et al., but these studies had no forensic objective [
] suggest to utilize next generation sequencing of Y-chromosomal DNA in forensic cases to detect rare differences between closely related men. However, looking for these differences is equivalent to chasing a tiny needle in a huge haystack.
In some recent articles, the application of next generation sequencing to forensic mtDNA testing has been described [
] have been shown that shotgun next generation sequencing of DNA from stain samples can give insight in the metagenomic composition of the stain.
2. Methods
We recruited a pair of identical male twins as well as the wife and the child of one twin as volunteers. Informed consent according to the requirements of the German Gene Diagnostic Act was obtained from all participants. To avoid any intentional or unintentional bias, the laboratory team was not informed which one of the two twins was the real father, before they had solved the case analytically. DNA was extracted from blood samples (mother and child) and from blood, buccal mucosa and sperm samples (twins) using the QIAsymphony DNA Investigator Kit (Qiagen, Hilden, Germany) and a QIAsymphony extraction robot. Paternity and monozygosity of the twins was confirmed by typing all individuals with the PowerPlex 21 PCR Kit (Promega, Mannheim, Germany). PCR products were separated on an ABI3130xl capillary sequencer and evaluated using Genemapper 3.7 (Applied Biosystems Division of Life Technologies GmbH, Darmstadt, Germany). NGS libraries were prepared from blood of the child and sperm samples of the twins according to the common guidelines for shotgun library preparation. Briefly, genomic DNA (from sperm of the twins and blood of the child) was fragmented to an average size of 300 bp using a Covaris ultra sonication device (Covaris, Woburn, MA, USA). Subsequently, the shotgun libraries were prepared using commercially available chemistry from NEB (New England Biolabs, Ipswich, MA, USA). Sequencing was performed on the Illumina HiSeq 2000 with chemistry v3.0 and using the 2 × 100 bp paired-end read mode and original chemistry from Illumina according to the manufacturer's instructions. The initial data analysis was started directly on the HiSeq 2000 System during the run. The HiSeq Control Software 2.0.5 in combination with RTA 1.17.20.0 (real time analysis) performed the initial image analysis and base calling. In addition, CASAVA-1.8.2 generated and reported run statistics and the final FASTQ files comprising the sequence information which was used for all subsequent bioinformatics analyses. Sequences were de-multiplexed according to the 6 bp index code with 1 mismatch allowed.
For both of the twins, as well as for the child all corresponding Illumina read data were mapped to the human reference genome sequence (GRCh37.p10). The mapping has been conducted using the Eurofins in-house mapping pipeline based on the Convey FPGA hardware architecture (http://www.conveycomputer.com) and Convey software tools that mimic a standard mapping using the Burrows-Wheeler Alignment software (BWA [
] the mapping results were sorted according to reference sequences and coordinates and filtered by applying a mapping quality threshold of 20. After this procedure a single BAM file containing only good quality unique mapping reads was obtained for each of the twins and the child. The mapping result for each individual was separated according to chromosomes. Duplicates were removed from the chromosome specific BAM files by applying Picard v1.87 [
] was used to identify somatic mutations. Positions with the highest VarScan scores were compared to the inherited mutations and visually inspected using IGV [
]. Primers located 50–100 bp upstream and downstream were used to amplify the regions of interest. All sequences were generated using BigDye terminator chemistry (version 3.1) of Applied Biosystems (Foster City CA, US) following standard protocols. For sequencing reactions Primus 96 HPL Thermal Cyclers (MWG AG, Ebersberg, Germany), peqStar 96 HPL (PEQLAB Biotechnologie GmbH, Erlangen, Germany) or DNA engine Tetrad 2 cyclers (Bio-Rad, Munich, Germany) were used. Sequencing reaction clean-up was done on a Hamilton Starlet robotic workstation (Hamilton Robotics GmbH, Martinsried, Germany) by gel filtration through a hydrated Sephadex matrix filled into appropriate 96 well filter plates followed by a subsequent centrifugation step. Finally all reactions were run on ABI3730xl capillary sequencers equipped with 50 cm capillaries and POP7 polymer.
3. Results
The paternity and the monozygosity of the twins were confirmed using standard forensic STR typing with PowerPlex 21 PCR Kit (data not shown). Next, DNA obtained from sperm samples of the twins and from the child's blood was used for ultra-deep next generation sequencing to identify inherited germline/somatic mutation events that occurred after twinning and are therefore only present in the twin father and not in the twin uncle.
Samples were sequenced using Illumina HiSeq 2000 technology. In total, 600 Giga-bytes of raw sequencing data were generated. For twin A, 283 Giga-base-pairs were sequenced which corresponds to a mean genome coverage of 91 fold. For twin B 292 Giga-base-pairs were generated which corresponds to a mean genome coverage of 94 fold. The child was sequenced with an amount of 175 Giga-base-pairs (mean genome coverage of 56 fold). Fig. 1 shows the number of non-redundant and uniquely mapped reads for each chromosome. Electronic supplement Table 1 lists the number of reference bases covered for each chromosome and for each sample.
Fig. 1Reads per chromosome for twins A and B and the child. For each chromosome, the number of non-redundant and uniquely mapped reads is shown.
The production of the overall raw sequence data generated required a timeframe of approximately three to four weeks. This includes DNA isolation, library preparation as well as sequencing and initial data analysis. The time for all subsequent analysis steps strongly depend on the available computational IT infrastructure. Using a highly parallelized mapping server (Convey FPGA hardware architecture) the mapping step of all three samples itself took 72.5 h of computing time which is equivalent to a mean of 28,751 mapped reads per second. However, it is noteworthy that this step represents only a small proportion of the entire analysis pipeline. Due to additional numerous manual investigations (including the validation of the findings using independent approaches) the whole procedure can take up to weeks.
Storing of the raw and analysis data is another aspect. Approximately 2.8 Tera-bytes of disk space was utilized for storing all mapping and analysis data files.
] was used to determine potential mutations. VarScan2 includes the ability to identify somatic mutations in tumors, but also supports the detection of germline mutations. Paternal inheritable de novo mutations appear like somatic mutations when comparing both twins. Therefore we assigned twin A to be “normal” and twin B to be “tumor” and vice versa and ascertained a set of potential somatic mutations. Mutations inherited to the offspring were detected by presuming twin A or twin B to be “normal” and the child to be “tumor”. Using the integrative genomics viewer (IGV) [
] the potential somatic mutations were visually inspected and compared with the potential inherited mutations in the child starting with the highest scores provided by Varscan2 scoring scheme for somatic mutations. With this approach, we identified 12 potential somatic SNP candidates present in both the twin father and the child, but not in the twin uncle. SNPs including 100 bp downstream and upstream flanking sequencing information were compared with the human genome reference (using BLASTN). Seven SNP candidates were discarded after the BLAST search, because the surrounding sequence showed more than one significant hit on the genome. The remaining five SNP candidates are located on chromosome 4 (pos 188,267,982, snp C/T), 6 (pos 41,885,722, snp A/G), 11 (pos 68,781,324, snp C/T), 14 (pos 103,545,720, snp G/A) and 15 (pos 57,884,799, snp G/A) (see Table 1). Fig. 2 is showing one of the SNPs observed in the NGS data visualized by IGV [
]. None of the five SNP candidates is related to SNPs annotated in dbSNP. The NGS findings were clearly confirmed by PCR and double-stranded Sanger sequencing of the respective positions for the five SNP candidates in the twins’ sperm-derived DNA. PCR and Sanger sequencing was also used to investigate the respective positions of the mother's DNA as well as blood and buccal mucosa DNA of the twins together with the child's blood DNA. Corresponding Sanger results for the SNP NGS data from Fig. 2 are shown in Fig. 3, excluding the theoretical possibility that the mother inherited the SNPs to the child. Interestingly, four of the five mutations seen in the sperm DNA of the twin father are also present in his buccal mucosa DNA. Only one of those four mutations is also seen in the blood DNA. One mutation is exclusively present in the sperm sample (Table 1). The ratio between the original base and the mutated base was between 50% to 50% and 80% to 20%, with identical ratios in sperm and buccal mucosa DNA, and a deviation in the SNP found in the blood-derived DNA (Table 1). The mosaicism is therefore not only present in the sperm of the twin father, where we initially queried, but also in other tissues with strong similarity between sperm and buccal mucosa.
Table 1Genotypes of the identified SNPs in mother, child and twins. The SNP number refers to an alignment to the human genome sequence build GRCh37.p10. The figures given in brackets next to the genotypes are the mixture ratios in % and were estimated from the Sanger sequencing results.
The differentiation between MZ twins has been a limitation in forensic genetics, since “identical” twins were found to exhibit identical STR profiles. We have developed a new method to identify SNPs caused by de novo mutations that occurred in only one twin using ultra-deep next generation sequencing. Such SNPs allow distinguishing MZ twins. The approach can be used to shed light on so far unsolvable forensic paternity and criminal cases that involve identical twins.
], presumably because the coverage during NGS was too low to distinguish real SNPs from sequencing artifacts or they screened with SNP chips. Our approach was to overcome such problems and reveal mosaicistic de novo mutations by using a very high coverage of more than 90fold for the twins and more than 50fold for the child, followed by adjusted bioinformatics filters. Generally, without involving identical twins, the estimated number of SNP substitution rates per generation ranges from 1 to 3 × 10−8 per human single base-pair, equal to approximately 10–40 expected SNPs per paternal generation [
]. The number of five inherited SNPs we found from post-split of the early embryonic stage corresponds reasonably well with such estimations. Our initial comparison of the twins revealed more SNPs, but we focused on those present in the child as well. Krawczak et al. [
] expect lesions (deletions, insertions, indels) to occur at a ratio of approximately one in three relative to SNPs. We have not been able to clearly verify such mutations in our NGS sequence datasets. We consider this to be a stochastic effect based on the relative infrequency of these events, or reflecting that such mutated DNA fragments do not fulfill the alignment criteria, or both.
Monozygotic twins are the result of a separation of the morula during early embryotic development. In one third of all cases the morula divides before day 5 after fertilization in the 16–32 cell stage. Approximately two thirds of the monozygotic twinnings occur after the separation of the morula between day 5 and day 9 (40–150 cells). Rare exceptions are twins that are formed after day 9, sharing one placenta. Mutations which can be used to identify a specific twin must have occurred after separation of the morula, or early before separation with exclusive presence in the cells belonging later to only one of the twins. In general, only mutations present in the germline are inheritable. Mutation events occurring later in the somatic lineages will not be transmitted to any potential offspring [
]. During embryogenesis, germ cells enter a complex series of events that ends with the formation of ova and sperm. Due to the inaccessibility of the human embryo to experimental investigations at these early stages, there is still little knowledge about the precise development of human primordial germ cells (PGCs). According to the summary of De Felici [
] the combination of data from human studies and most recent results obtained in mouse support the following scenario. Briefly, PGCs are early committed and specified in the epiblast. Prior to gastrulation they rapidly move into an extra-embryonic region. Subsequently PGCs are determined and re-enter into the embryo proper during early gastrulation to reach the developing gonads [
] and help to narrow down the history of the somatic mutation event: mutations that are present in one twins’ sperm and buccal mucosa, but not in blood, will have occurred after gastrulation, but before the separation of the buccal mucosa and the precursors of the sperms. A mutation event present in all three tested tissues must have occurred earlier, before the separation of the germ layers. Therefore the mutation that is exclusively present in the sperms is the “latest” one.
The fact that most of the mutations we identified are present in the two ectodermal tissues, buccal mucosa and sperm, suggests the use of buccal mucosa as starting material for identification of mutations between MZ twins. Sampling of buccal swabs in paternity or crime scene cases is legally and ethically easier than sampling sperm. The findings show that the described method for distinguishing MZ twins is not only applicable to paternity cases, but also to forensic cases with ectodermal traces such as contact stains, skin scales, hair, buccal mucosa or semen stains found at crime scenes, and very likely will also work with blood stains. Our analytical methodology allows also re-analyzing cold cases of sufficient relevance and approaching new cases involving MZ twins as donors of stain material. Only the reference samples for the MZ twin suspects need to be analyzed by NGS in all cases. The DNA from the stain itself, which might be of lower quantity and quality, can be analyzed using sensitive and specific PCR based standard SNP detection assays developed according to the NGS findings.
Acknowledgements
We are grateful to Jennifer Meyer, Cornelia Ellendt and David Kobias for technical assistance.
References
Butler J.M.
Short tandem repeat typing technologies used in human identity testing.
Monozygotic twins with neurofibromatosis type 1 (NF1) display differences in methylation of NF1 gene promoter elements, 5′ untranslated region, exon and intron 1.
☆This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.