If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Medical Genomics Research Center, KRIBB, Department of Functional Genomics, Republic of KoreaUniversity of Science of Technology, 111 Gwahang-no, Yuseong-gu, Daejeon, Republic of Korea
Medical Genomics Research Center, KRIBB, Department of Functional Genomics, Republic of KoreaUniversity of Science of Technology, 111 Gwahang-no, Yuseong-gu, Daejeon, Republic of Korea
Corresponding author at: University of Science of Technology, 111 Gwahang-no, Yuseong-gu, Daejeon, Republic of Korea. Tel.: +82 042 879 8116; fax: +82 042 879 8119.
Medical Genomics Research Center, KRIBB, Department of Functional Genomics, Republic of KoreaUniversity of Science of Technology, 111 Gwahang-no, Yuseong-gu, Daejeon, Republic of Korea
In forensic science, identifying a tissue where a forensic specimen was originated is one of the principal challenges. Messenger RNA (mRNA) profile clearly reveals tissue-specific gene expression patterns that many attempts have been made to use RNA for forensic tissue identification. To systematically investigate the body-fluid-specific expression of mRNAs and find novel mRNA markers for forensic body fluid identification, we performed DNA microarray experiment with 24 Korean body fluid samples. Shannon entropy and Q-values were calculated for each gene, and 137 body-fluid-specific candidate genes were selected. By applying more stringent criteria, we further selected 28 candidate genes and validated them by RT-PCR and qRT-PCR. As a result, we suggest a novel combination of four body-fluid-specific mRNA makers: PPBP for blood, FDCSP for saliva, MSMB for semen and MSLN for vaginal secretion. Multiplex qRT-PCR assay was designed using the four mRNA markers and DNA/RNA co-extraction method was tested for forensic use. This study will provide a thorough examination of body-fluid-specifically expressed mRNAs, which will enlarge the possibility of practical use of RNA for forensic purpose.
Identifying a tissue origin of a specimen is crucial in some crime cases like sexual assaults. Until now, immunologic detection of specific protein marker is mainly used for forensic body fluid identification: prostate specific antigen (PSA) for semen, and hemoglobin for blood identification [
]. However, immunologic tests occasionally cross-react to other species or tissues. And most immunologic tests rely on simple color-change which is hard to quantify, and sometimes hard to recognize especially there are very small amounts of forensic sample. Use of RNA for forensic body fluid identification has been discussed because of its reliable tissue specificity, and the available co-extraction methods to obtain RNA along with genomic DNA (gDNA) [
]. While gDNA is used for standard STR analysis, RNA can give tissue-specific information which DNA does not have. Furthermore, when used with multiplex PCR, numerous tissue-specific genes can be simultaneously analyzed to identify multiple tissue types and thus save the limited amount of forensic samples. RNA analysis is expected to be adopted smoothly by using the reverse transcription and standard end-point PCR (RT-PCR) and reverse transcription and quantitative PCR (qRT-PCR) assays which are now widely used by forensic laboratories.
The major obstacle of forensic RNA analysis is the instability of RNA molecule which is easily degraded by ubiquitous ribonuclease (RNase) enzyme. But recent studies have shown that mRNA samples from crime scenes can be sufficiently stable for forensic use [
]. These studies show that body fluid types can be distinguished using forensic RNA samples by measuring the expression levels of specific mRNA markers.
Several tissue-specific mRNA markers have been discovered for identification of frequently encountered forensic body fluids including blood, saliva, semen, vaginal secretion and menstrual blood [
]. In previous studies, many body-fluid-specific mRNA markers have been reported through single-gene-based assays. For example, SPTB, PBGD and HBA1 are reported as blood-specific mRNA markers; MMP7 and MMP11 as menstrual blood-specific markers; STATH and HTN3 as saliva-specific markers; PRM1, PRM2 and KLK3 as semen-specific markers; and HBD1 and MUC4 as vaginal-secretion-specific markers [
]. These methods have advantages of sensitive analysis of mRNA expression, but only a few genes can be examined. To understand the gene expression signature of forensic body fluid samples and further investigate the possibility of novel mRNA markers, the systematic approach to analyze the whole transcriptome is needed.
DNA microarray is a useful tool to analyze the whole transcriptome and discover new biomarkers. Several researchers applied DNA microarray to their search for tissue-specific genes [
]. However, investigating specific mRNA markers of four body fluids (blood, saliva, semen, and vaginal secretion) by genome-wide profiling has not been performed yet.
In this study, we performed genome-wide expression profiling with total RNAs isolated from four kinds of body fluid samples (blood, saliva, semen and vaginal secretion) obtained from Korean volunteers. Then, we calculated Shannon's entropy and Q-values for each gene to select 137 specific genes for each body fluid. The body-fluid-specific signature of the 137 genes was validated in an independent gene expression dataset prepared from public gene expression database. We also validated 18 genes by RT-PCR and qRT-PCR. As a result, we obtained novel specific mRNA markers for forensic body fluid identification.
2. Materials and methods
2.1 Sample collection and RNA preparation
Four kinds of body fluid samples, blood (18), saliva (18), semen (43) and vaginal secretion (157) were collected from healthy Korean volunteers, and treated anonymously following local ethical restrictions. To prepare intact RNA for microarray, aliquots of whole blood, saliva and semen were collected in microcentrifuge tubes and vaginal secretion samples were collected by immediate soaking of vaginal secretions in lysis buffer. Total RNA for microarray was isolated using Qiagen RNeasy Mini Kit according to the manufacturer's protocol. For saliva, several RNA samples were isolated using Qiagen RNeasy Protect Saliva Mini Kit. For DNA and RNA co-extraction, Qiagen AllPrep DNA/RNA Mini Kit was used to prepare genomic DNA and total RNA simultaneously. Extracted total RNA was analyzed with Experion™ RNA StdSens (BIO-RAD) to check its quality and quantity.
2.2 DNA microarray experiment
Total RNA (1–5 μg) was amplified with Illumina TotalPrep™ RNA Amplification Kit (Ambion). RNA amplification process is composed of cRNA (RNA derived from cDNA through standard RNA synthesis) synthesis, biotin labeling and in vitro transcription steps. Through these steps, RNA (500 ng) was converted into biotin-labeled cRNA, and then 750 ng of cRNA was hybridized with Illumina BeadChip according to manufacturer's protocol. After hybridization, BeadChip was scanned with Illumina BeadArray Reader™ laser scanner. Scanned images were converted into intensity values using Illumina BeadStudio™ 3.4 program.
2.3 Analysis of DNA microarray data
Intensity values from Illumina BeadStudio™ 3.4 program were globally normalized with Quantile method [
]. Given expression levels of a gene (g = 1, 2, …, G) in N tissues (t = 1, 2, …, N), the relative expression of a gene g in a tissue t was defined as
where wg,t is the expression level of the gene in the tissue. The entropy of a gene's expression distribution was calculated as
A low entropy value represents a specific gene expression pattern. However, while Shannon entropy provides a single metric for assessing the complete gene expression profile, it does not provide any information on the tissues in which a gene may be specifically expressed. A statistic, Qg|t, was introduced to measure the degree of specificity of gene expression in each tissue type:
A low Qg|t represents a specific expression pattern for that tissue [
Based on Q-value cut-off 1.8, candidate genes were selected. Co-expressed genes in more than two body fluids were removed manually. Unsupervised hierarchical clustering and visualization was performed with MEV 4.0 program (http://www.tm4.org/). Affymetrix U133 Plus 2 (GPL570) platform DNA microarray data about four body fluids were collected from Gene Expression Omnibus (GEO) database in NCBI (http://www.ncbi.nlm.nih.gov/geo/). GEO accession numbers of collected datasets were GSE6872, GSE7307, GSE7451, GSE8764, GSE11622, GSE12446, GSE13494, GSE14245, GSE14642, GSE17340, GSE20266, GSE22331 and GSE25518. Collected microarray data were globally normalized with MAS5 method by using affy package [
]. Two DNA microarray datasets about four body fluids have been visualized and provided in our own web server (http://medical-genome.kribb.re.kr/forensic). All primary data are deposited in the Gene Expression Omnibus (GEO) in NCBI under accession number GSE34844.
2.4 RT-PCR and qRT-PCR
Using total RNA (1–5 μg) as a template, reverse transcription was carried out with iScript™ cDNA Synthesis Kits (BIO-RAD). Then, 1/150 of synthesized cDNA was used for RT-PCR and qRT-PCR. The RT-PCR primers were designed with Primer3 software. We designed primers and probes for RT-PCR and qRT-PCR based on sequences on exon–exon junctions to avoid possible false positives from genomic DNA contamination. RT-PCR was performed with Novelzyme™ Taq Plus Premix using the following parameters: initial denaturation at 98 °C for 5 min, followed by 35 cycles of denaturation at 98 °C for 30 s, annealing at 60 °C for 30 s, elongation at 72 °C for 30 s, and a final elongation at 72 °C for 5 min. β-Actin gene was used as a housekeeping control for normalization. PCR product was detected using agarose gel electrophoresis and ethidium bromide staining. The sequences of oligonucleotide primers are listed in Supplementary Table 1.
Commercial dual labeled fluorogenic probes (Metabion) were used for qRT-PCR. The qRT-PCR reaction was carried out with THUNDERBIRD™ Probe qRT-PCR Mix on a CFX96TM Real-Time PCR machine (C1000™ Thermal Cycler, Bio-Rad) using the following parameters: initial denaturation at 94 °C for 1 min, followed by 45 cycles of denaturation at 94 °C for 15 s, and a final annealing/elongation at 56 °C for 1 min. GAPDH gene was used as a housekeeping control for normalization. The sequences of oligonucleotide primers used in qRT-PCR are listed in Supplementary Table 2. Expression level was quantified using delta Ct (ΔCt) method.
For multiplex qRT-PCR, iQ™ Multiplex Powermix (Bio-Rad) was used. The qRT-PCR reaction was carried out on a CFX96™ Real-Time PCR machine (C1000™ Thermal Cycler, Bio-Rad) using the following parameters: initial denaturation at 95 °C for 1 min, followed by 45 cycles of denaturation at 95 °C for 15 s, and a final annealing/elongation at 56 °C for 1 min. GAPDH gene was used as a housekeeping control for normalization. Probes for multiplex qRT-PCR were labeled with five dyes: FAM, Yakima Yellow, IRD700, Cy5 and CAL Fluor Red 610. The intensities were detected using five filters: FAM, HEX, Quasar705, Cy5 and CAL Fluor Red 610. The sequences of oligonucleotide primers used in multiplex qRT-PCR are listed in Supplementary Table 3. Probes and primers were added as the following concentrations for multiplex qRT-PCR: PPBP probe, 0.33 μM; FDCSP probe, 0.67 μM; MSMB probe, 0.33 μM; MSLN probe, 0.33 μM; GAPDH probe, 0.33 μM; PPBP forward primer, 0.33 μM; PPBP reverse primer, 0.33 μM; FDCSP forward primer, 0.67 μM; FDCSP reverse primer, 0.67 μM; MSMB forward primer, 0.33 μM; MSMB reverse primer, 0.33 μM; MSLN forward primer, 0.67 μM; MSLN reverse primer, 0.67 μM; GAPDH forward primer, 0.33 μM; GAPDH reverse primer, 0.33 μM. We selected GAPDH as an endogenous control based on the previous reports [
]. Additionally, its expression was relatively invariable among samples and different tissues in our microarray experiment.
2.5 STR analysis
Four kinds of body fluid samples, blood (6), saliva (8), semen (7) and vaginal secretion (7) were collected from healthy Korean volunteers and used for DNA and RNA co-extraction. gDNA was prepared with Qiagen AllPrep DNA/RNA Mini Kit simultaneously with total RNA according to the manufacturer's protocol. gDNA was quantified with Quantifiler™ Human DNA Quantification Kit (Applied Biosystems) on 7500 Real-Time PCR System (Applied Biosystems) according to the manufacturer's protocol. Autosomal STR amplification was carried out using AmpFlSTR® Identifiler® PCR Amplification Kit (Applied Biosystems) according to the manufacturer's protocol. PCR product was resolved and detected with 3730 DNA Analyzer (Applied Biosystems) capillary electrophoresis system and STR profile was analyzed using GeneMapper ID software (Applied Biosystems).
3. Results
3.1 Gene expression profiling
We performed gene expression profiling experiments to find out specific mRNA markers for four types of body fluids including blood, saliva, semen, and vaginal secretion. Using commercial RNA preparation kits, we prepared total RNA from each body fluid samples and checked the quality of total RNA using Experion™ (Supplementary Figure 1). Not surprisingly, total RNA samples, especially from saliva and vaginal secretion, were in bad quality with extensive degradation possibly because of the abundant RNase in some of the body fluid samples (Supplementary Figure 1).
We then checked if DNA microarray experiments were possible for those RNA samples of bad quality. The first step of Illumina BeadChip Array™, one of widely used DNA microarray platforms, is RNA amplification which performs cRNA synthesis, labeling with biotin and in vitro transcription. The success of Illumina BeadChip Array™ experiments is almost dependent on the success of RNA amplification. We selected 24 samples, 6 per each body fluid, out of the 44 samples, based on the gel electrophoresis image rather than RQI value because we assumed that the RQI value might be less useful for highly degraded samples. With the selected samples, we performed RNA amplification and checked if cRNA was synthesized properly for subsequent steps (Supplementary Table 4). The qualities of some of the cRNA samples were not good, but most of them were amplified enough for hybridization to Illumina BeadChip. Thus, we hybridized them to Illumina BeadChip and scanned with laser scanner. During the scanning, we checked the overall intensities of 24 cRNA hybridization images (Supplementary Figure 2), and found that 6 out of 24 images had low overall intensity values. However, as we show later, even those samples were informative in selecting body-fluids-specific genes. These results show that Illumina BeadChip DNA microarray platform is robust enough to be applied to forensic body fluid investigation.
3.2 Gene expression data analysis
We used Shannon Entropy (H) and Q-statistics (Q) to select specific genes for each body fluid. Using normalized intensity values of 24 samples, we calculated H and Q values for all genes on the DNA microarray. Then, based on these Q-values, we selected candidate genes whose Q-values were lower than 1.8 in at least one body fluid. First, 158 genes were selected, but some of them were not specific to one body fluid. Especially, 21 genes were expressed in both saliva and vaginal secretion. After removing those 21 genes, we obtained 137 potential body-fluid-specific genes (Supplementary Table 5). These candidate genes consist of 40 blood-specific, 80 semen-specific, 4 saliva-specific and 13 vaginal-secretion-specific genes. When we performed unsupervised hierarchical clustering with the 137 candidate genes, we found that those genes distinguish the four body fluids (Fig. 1). This result indicates that the 137 candidate genes are successfully selected to show a body-fluid-specific signature from 24 Korean samples using Illumina BeadChip Array™ platform.
Fig. 1Selected body-fluid-specific candidate genes. DNA microarray experiments were performed for 24 Korean body fluid samples, 6 samples per each body fluid, using Illumina BeadChip Array platform. Based on the Q-value (Q ≤ 1.8) of each body fluids, 137 body-fluid-specific candidate genes were selected. Unsupervised hierarchical clustering of 137 candidate genes was performed (red: blood, blue: saliva, green: semen, yellow: vaginal secretion). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
To validate the reliability of the 137 candidate genes in other dataset, we analyzed gene expression datasets publicly available from the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/). We collected data from samples of the four body fluid types profiled using the Affymetrix U133 Plus2 platform. A total of 113 samples were collected, which contained 40 blood, 37 saliva, 13 semen and 23 vaginal samples. After global normalization of the intensity values summarized by using MAS5 algorithm, we extracted expression values of the 137 candidate genes and performed unsupervised hierarchical clustering. The 137 genes clearly separated the four body fluids in the 113 samples (Supplementary Figure 3). Therefore, it is concluded that the 137 body-fluid-specific genes can be applied to various datasets and microarray platforms as a general body-fluid-specific signature. We show the expression patterns of the well-known specific mRNA markers of the four body fluids: blood (HBA1 and PPBP), saliva (STATH and HTN3), semen (PRM1 and PRM2), and vaginal secretion (HBD1) in Supplementary Figure 4. A simple web interface is also prepared to help examine the gene expression pattern of each gene in the two datasets (http://medical-genome.kribb.re.kr/forensic).
3.4 Further selection of 41 out of 137 candidate genes
For practical application, it is expensive and time-consuming to use all the 137 candidate genes for body fluid identification. We tried to select representative genes from the 137 candidates. Three criteria were used to select representative candidate genes: (1) the rank of Q-value in each body fluid, (2) the specificity in both datasets, (3) the absolute expression level. By these three criteria, we selected 22 blood-specific, 4 saliva-specific and 8 semen-specific representative candidate genes (Table 1 and Supplementary Figure 5). Unfortunately, for vaginal secretion samples, we could not obtain any proper specific genes fulfilling the three criteria. For vaginal secretion, we used higher Q-value cut-off and selected other candidate genes (Table 1). Many famous body-fluid-specific markers including PPBP (blood), HBA1 (blood), HTN1 (saliva), HTN3 (saliva), PRM1 (semen) and PRM2 (semen) were successfully selected from our dataset (Table 1), which suggests the effectiveness of our approach to identify specific markers for the four body fluids.
Table 141 candidate genes for the identification of 4 body fluids.
3.5 Validation of body-fluid-specific markers using RT-PCR and qRT-PCR
We initially tested each of the primer pairs of the 41 candidate genes, but found that only 28 primer pairs worked in our RT-PCR condition (data not shown), therefore those 28 genes were selected for validation by RT-PCR (Supplementary Table 1). First, we performed RT-PCR assay with two samples for each body fluid (Fig. 2 and Supplementary Figure 6). Eighteen out of 28 genes were successfully validated as body-fluid-specific genes (Fig. 2), while 10 genes were not specific (Supplementary Figure 6). Successfully validated mRNA markers included six blood-specific (PPBP, NKG7, CCL5, NRGN, GZMH, PRF1), five saliva-specific (FDCSP, MUC7, KLK4, HTN1, HTN3), four semen-specific (MSMB, NKX3-1, SEMG1, PRM2) and three vaginal-secretion-specific (SERPINB3, MMP7, MSLN) genes. Several candidate genes including HBA1 (blood) and PRM1 (semen) are already known as body-fluid-specific mRNA markers, but in our RT-PCR data, their expression patterns were not body-fluid-specific in Korean samples (Supplementary Figure 6).
Fig. 2Validation of selected body-fluid-specific mRNA markers using RT-PCR. RT-PCR assay was carried out with two total RNA samples per each body fluid. Body-fluid-specific expression patterns of 18 representative candidate genes were validated. PCR product was detected using agarose gel electrophoresis and ethidium bromide staining.
We tested qRT-PCR probes of the 18 genes, but only ten probes worked in a preliminary qRT-PCR test (data not shown). Therefore, 10 mRNA markers were validated again by qRT-PCR using TaqMan® probes (Supplementary Table 2). We first performed qRT-PCR assay with two samples per each body fluid (Supplementary Figure 7). Most mRNA markers, except KLK4, showed the same body-fluid-specific expression patterns as RT-PCR results. For multiplex qRT-PCR, we selected four markers, only one mRNA marker per each body fluid (PPBP for blood, FDCSP for saliva, MSMB for semen and MSLN for vaginal secretion), and their expression patterns were further validated in a larger number of samples using qRT-PCR. Fig. 3 shows the body-fluid-specific expressions of the four representative genes, PPBP, FDSCP, MSMB and MSLN.
Fig. 3Validation of selected body-fluid-specific mRNA markers using qRT-PCR. qRT-PCR assay was carried out with 18 total RNA samples per each body fluid. Expression patterns of four mRNA markers including PPBP (blood), FDCSP (saliva), MSLN (Semen) and MSMB (vaginal secretion) were measured by qRT-PCR and represented as a box plot. (A) Blood, (B) saliva, (C) semen, (D) vaginal secretion.
For the practical application in forensic body fluid identification, we designed multiplex qRT-PCR probes of the four mRNA markers (Supplementary Table 3). Multiplex qRT-PCR was carried out using those probes labeled with 5 different dyes (Fig. 4). Four body fluids were successfully distinguished in 5-dye multiplex qRT-PCR assay.
Fig. 4Validation of multiplex qRT-PCR probes. Multiplex qRT-PCR was performed with probes labeled with five different dyes. GAPDH gene expression was measure together and used as endogenous control (bl: blood, sa: saliva, se: semen, va: vaginal secretion). (A) Blood, (B) saliva, (C) semen, (D) vaginal secretion.
We then tried to evaluate the four mRNA markers. Two datasets were used for the evaluation: one was the microarray data of the 113 samples from public microarray database and the other was our additional qRT-PCR data of 18 samples per each body fluid. First, we performed receiver operating characteristic (ROC) analysis and calculated area under curve (AUC) value (Table 2, Supplementary Figure 8 and 9). Second, we calculated sensitivity and specificity of the four mRNA markers (Table 2). The selected markers showed good sensitivity, specificity and AUC value in the microarray and the qRT-PCR datasets except MSLN. These results suggest that our mRNAs were efficient body-fluid-specific markers though the vaginal secretion marker, MSLN, was less optimal in its sensitivity as we previously expected.
Table 2Evaluation of four body-fluid-specific mRNA markers.
AUC (area under curve) means the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one.
a AUC (area under curve) means the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one.
3.6 DNA/RNA co-extraction and simultaneous body fluid identification and STR analysis
In most forensic samples, it is indispensable to conduct human identification by purifying gDNA followed by STR amplification and fragment analysis. Simultaneous extraction of DNA and RNA in the same spot of sample can save the sample consumption, and it allows gDNA extraction even though RNA is badly degraded. Commercially provided DNA/RNA co-extraction kit (AllPrep DNA/RNA Mini Kit, Qiagen) was tested for independent forensic body fluid samples: blood, semen, saliva and vaginal secretion. Multiplex qRT-PCR was followed using the RNA fraction, while gDNA was used for autosomal STR amplification according to the standard protocol.
In all body fluid samples tested for DNA/RNA co-extraction, the 15 Identifiler autosomal STR loci, along with Amelogenin locus, were successfully analyzed. The representative STR analysis results of each body fluid are shown in Supplementary Figure 10. RNA prepared simultaneously with gDNA was tested for body fluid identification with multiplex qRT-PCR method. Our multiplex qRT-PCR system using newly selected four mRNA markers succeeded in distinguishing the body fluid types by showing the specific expression of its own body fluid marker (Supplementary Figure 11). These results indicate that gDNA extracted by DNA/RNA co-extraction method is of sufficiently good quality for STR analysis, and also RNA prepared together is suitable for RT-PCR assay. This would open up the possibility of forensic use of mRNA markers for body fluid identification.
4. Discussion
We collected 24 body fluid samples from Korean individuals, and selected 137 specific genes from microarray gene expression profiling data. Those candidate genes were validated in another dataset prepared from public gene expression datasets. We then further validated 18 genes by RT-PCR and qRT-PCR. We also developed multiplex qRT-PCR system using the newly selected four representative mRNA markers and its usability was demonstrated with the DNA/RNA co-extraction method for practical forensic application.
One challenge for gene expression profiling experiment with body fluid samples is the quality and quantity of RNA. Not surprisingly, we found that many RNA samples from body fluids were degraded and of poor quality. However, even with the degradation, we could obtain enough cRNA samples after RNA amplification and hybridize them to Illumina BeadChip. These results suggest that Illumina BeadChip Array platform is sensitive enough to be applicable to highly degraded RNA samples from crime scenes.
The aim of this study was to investigate new body-fluid-specific mRNA markers using DNA microarray technology to distinguish the four body fluids frequently obtained from crime scenes. We performed gene expression profiling experiment for 24 body fluid samples from Korean individuals, and starting from 29,377 genes, selected 137 body-fluid-specific genes by applying Shannon's entropy and Q-statistics. We then validated the 137 genes in another dataset of 113 non-Korean individuals profiled by using Affymetrix platform. As a result, we showed that the 137 candidate genes display a body-fluid-specific gene expression signature not restricted to Koreans or to a single microarray platform. Previous researches tried to find body-fluid-specific markers using DNA microarray, but only two kinds of body fluids, blood and saliva, were studied so far [
]. As our study systematically investigated four body fluids frequently encountered in crime scenes, it is expected to contribute to discovering more efficient combination of mRNA markers for body fluid identification. We provide a simple web interface to examine our data for other forensic scientists at http://medical-genome.kribb.re.kr/forensic.
Using three stringent criteria, we selected 41 candidate genes out of 137 genes for further validation (Table 1). 18 of them were successfully validated by RT-PCR and qRT-PCR. Well-known body-fluid-specific mRNA markers which were reported in previous studies were included in the 41 selected candidate genes: HBA1, MMP7, HTN3, PRM1 and PRM2 [
]. However, some genes showed discrepancies in expression patterns. For example, HBA1 is known as a blood-specific mRNA marker and PRM1 is known as a semen-specific mRNA marker. However, in our RT-PCR data, HBA1 and PRM1 did not showed the body-fluid-specific expression in Korean samples. It is also examined that HBA1 is expressed in some of vagina samples in non-Korean GEO data set as well (Supplementary Figure 5). Meanwhile, our selected representative blood-specific mRNA marker, PPBP, showed specific expression patterns in both our own and GEO datasets (Supplementary Figure 5). This blood-specific expression of PPBP was validated using RT-PCR and qRT-PCR assay (Fig. 2, Fig. 3). These results suggest that we may not apply some of previously reported body-fluid-specific mRNA markers to Korean samples without thorough observation of their expression patterns in wide datasets.
Among the four body fluid types, vaginal secretion samples showed the most heterogeneous gene expression patterns, so we had to apply higher Q-value cut-off to select candidate genes for vaginal secretion. Higher Q-value cut-off means lower fidelity of vaginal secretion candidate genes, and consequently lower accuracy in validation. RT-PCR assay revealed that SERPINB, MMP7 and MSLN were good mRNA markers for vaginal secretion (Fig. 2), however, large sample to sample variations were shown as expected (Fig. 2, Fig. 3). These large variations would cause false negative result in vaginal secretion sample and may require more specific markers for identification of vaginal secretion. Meanwhile, when discussing the inconsistent expression patterns of MMP7 in vaginal secretion samples, it needs to be considered that MMP7 is also reported as a marker for menstrual blood [
]. Because we had no information about the menstrual cycles of sample donors, variations of MMP7 might be a result of hormonal effects.
In some cases, the inconsistency in results of RT-PCR, qRT-PCR, Illumina and Affymetrix microarray is examined, possibly caused by the differences in sequences used as primers and probes. For example, KLK4 was selected as a semen-specific mRNA marker in Illumina microarray platform, but in Affymetrix microarray platform, it showed higher expression in saliva samples (data not shown). Moreover, KLK4 was considered as a saliva-specific marker in RT-PCR result (Fig. 2), but its expression was much higher in semen samples in qRT-PCR result (Supplementary Figure 7). Another example, NKX3-1 was selected as both saliva and semen-specific markers in Affymetrix microarray platform because there were two different probes for the gene on Affymetrix microarray platform (data not shown). Therefore, it is considered to be important to examine various gene expression studies using different assay platforms and various regions where the probes or primers bind to select genuine representative body-fluid-specific markers.
DNA/RNA co-extraction method tested in this study is especially useful for forensic scientists, because it gives gDNA and RNA simultaneously from the same spot of a sample. gDNA is used for standard human identification based on STR analysis, while RNA gives tissue-specific information which could not be gained from DNA (Supplementary Figure 8 and 9). In spite of the great tissue specificity and all the efforts to use RNA for forensic purpose, it is hard to obtain intact RNA without degradation in body fluid samples (Supplementary Figure 1). RNase is ubiquitous and very stable enzyme that it is crucial to keep the sample in the environment which reduces the RNase activity. However, forensic samples are not under our control before they arrive at the laboratories. Considering the massive degradation and limited amount of forensic samples, continuous experimental attempts will be needed to practically use RNA in crime cases. Multiplex qRT-PCR designed in this study is to conduct body fluid identification with limited amount of forensic samples (Fig. 4). Previous studies explored the possibility of microRNAs for forensic use because of its short length [
]. Next generation sequencing would be another technological alternative which provides more sensitive detection of much larger number of genes all together [
We successfully obtained four body-fluid-specific mRNA markers: PPBP for blood, FDCSP for saliva, MSMB for semen and MSLN for vaginal secretion. Additionally, many putative mRNA markers were investigated through systematic mRNA profiling. In this study, we validated expression patterns of several previously known mRNA markers, like PPBP, and also found novel mRNA markers for forensic body fluid identification. Finally, multiplex qRT-PCR was designed using the selected mRNA markers and tested in RNA samples simultaneously extracted with gDNA for forensic use. It is expected that our extensive gene expression study and novel body-fluid-specific mRNA markers will advance the use of RNA in forensic field.
Acknowledgement
This work was supported by the research project for practical use and advancement of forensic DNA analysis (2012) of Supreme Prosecutors’ Office, Republic of Korea.
Appendix A. Supplementary data
The following are Supplementary data to this article: