::  A N C H A   L A B  ::

In silico search for natural antisense transcripts in human genome and analysis of their expression patterns

PhD Student: Andre Marakhonov

This is a collaborative project with the lab of Dr. M. Skoblov (Russian Center for Medical Genetics)


Both mRNA expression in a eukaryotic cell and efficiency of its translation into proteins are controlled by many regulatory levels subsequent to transcription initiation. As mRNA is a single strand molecule, the expression of a complementary antisense strand may alter transcription, elongation, processing, stability, and translation of the template RNA. Functional antisense RNAs have been identified in bacteria, but later were shown to be involved in gene regulation and differentiation in several eukaryotic organisms, including mammals. Natural antisense transcripts (NAT) usually arise via independent transcription initiation on the opposite DNA strand at the same genomic locus as the sense strand. Computational analysis of data from large-scale sequencing projects has revealed a surprising abundance of antisense transcripts in several eukaryotic genomes. As some antisense transcripts have been shown to regulate gene expression, it is possible that antisense transcription might be a common mechanism of regulating gene expression in eukaryotic cells.

We created an algorithm that allows high-throughput mapping of NATs. We used exact coordinates of transcripts and their orientations on the plus/minus chains of the human genome archived at NCBI server (NCBI http://www.ncbi.nlm.nih.gov/). In-house software “Antisense Searcher” was written on С++ and SQL. This program fulfills following tasks: 1) forming EST and mRNA transcripts in clusters on every chain of DNA; 2) retrieving all overlapping pairs of transcripts that are located on different DNA strands with more than 20 nucleotide overlaps; 3) retrieving an intersection of two previous sequence sets. EST clusters that contain only 1 or two ESTs were filtered at the subsequent stage of analysis. By this method we mapped approximately 13,500 NATs.

To study expression patterns of natural antisense pairs we created C++ -based software “Antisense Cluster Filter”. This software allowed us to retrieve tissue expression field for all the transcripts from the lists of NATs. We used cDNA library descriptions available from CGAP website (CGAP http://cgap.nci.nih.gov/) and other sources. By that, our data describing NATs were updated by information of pattern expression of transcripts. NATs were sorted by two criteria: 1) prevalence of expression in tumor or in normal cells 2) tissue specificity. In both cases we found 108 NATs in which only one member of a pair expresses only in tumor cells or in specific tissue. These pairs are currently studied experimentally.

In particular, we experimentally characterized an antisense mRNA asAFAP overlapping human AFAP1 gene that encodes for an actin filament binding protein, which serves as a modificator of actin filament structure and integrity and relay a signal from receptor tyrosine kinases through PKCα to Src protein kinase. To study the intriguing phenomenon of tumor-specic asAFAP antisense expression we performed detailed in silico analysis of asAFAP sequences and experimentally quantified this transcript in normal and tumor human tissues. We also studied an antisense mRNA asLZK overlapping human MAP3K13/LZK gene that is involved in mitogenesis related JNK/SAPK signal transduction pathway. According to the functional annotation of the human genome, asLZK transcript (LOC647276) is expressed at the relatively high level and overrepresented in tumor samples. To our surprise, experimental study of human asLZK revealed that this sequence is not expressed, but represents a silent pseudogene of ribosomal protein L4 encoding gene RPL4. This pseudogene resulted from relatively recent retroposition of RPL4 mRNA into the rst intron of MAP3K13 gene and does not participate in the regulation of MAP3K13 expression. This study stresses that, after initial in silico mapping efforts, experimental verication of the expression landscape is warranted.

Results related to MAP3K13 have been published in Mol Biol (Mosk). 2008 Jul-Aug;42(4):581-7.



Figure 1. Detailed map of the genomic locus MAP3K13/LZK and asLZK. Non-coding exons are shown in grey. Coding exons of sense LZK transcript are shown in black. Numbers under the boxes representing exons correspond to length of exons measured in bp. Numbers above the introns correspond to length of introns measured in bp.

Figure 2. Multiple alignment of loci paralogous to human RPL4 and localization of the PCR primers. Non-differentiating PCR primers are shown is dotted arrows. asLZK-specific primers are shown in solid arrows. Additional nucleotide substitutions inserted close to 3′ ends of asLZK-specific primers are underlined.

Figure 3. Results of the PCR analysis.
A. 1. Normal cervix, sample 1; 2. Cervical carcinoma, sample 1;  3. Normal cervix, sample 2; 4. Cervical carcinoma, sample 2; 5. Normal cervix, sample 3; 6. Cervical carcinoma, sample 3; 7. pGEM-RPL4; 8. pGEM-asLZK; 9. Human genome DNA; 10. No template control; M. 1-kb ladder.  PCR with GAPDH-specific primers has been performed for normal cervix and cervical carcinoma samples and no template control. In this gel no template control corresponds to lane 7, 1-kb ladder corresponds to lane 8. B. 1. Endometrial carcinoma; 2. Pancreatic carcinoma; 3. Renal cell carcinoma; 4. Ovarian carcinoma; 5. Colon carcinoma; 6. Normal brain; 7. Normal sceletal muscle; 8. No template control; M. 1-kb ladder.

Figure 4. A. Structure of the site of retroposition into MAP3K13 gene. Nucleotide sequence of the 5′ and 3′ ends of retroposed RPL4 mRNA and genomic flanking regions are indicative of retroposition. Target site duplications are boxed. B. Phylogenetic tree of prealigned RPL4-like pseudogenes reconstructed by the neighbor-joining method.