The Drosophila melanogaster genome includes ∼300 genes encoding translation factors or proteins with well-characterized RNA binding motifs. Essentially all of the canonical eukaryotic translation factors were found among their predicted products. The four most numerous classes of RNA binding proteins are RNA recognition motif (RRM) proteins, DEAD/DExH-box helicases, KH domain proteins, and double-stranded RNA binding domain (DSRBD) proteins. Many of these correspond readily to yeast or mammalian orthologs, and can therefore be predicted to have specific functions in pre-mRNA and pre-rRNA processing, translation initiation, and nuclear export of RNA. The genes encoding cytosolic translation factors, and those encoding each class of RNA binding protein, are discussed in turn below.
Most genes encoding general translation factors that have been characterized in other species are present in the Drosophila melanogaster genome, and their products are similar to their mammalian counterparts (Table). One possible exception is eIF4B, as the closest Drosophila homologue (CG10837) exhibits only modest BLAST similarity to the mammalian and yeast translation factor (1e-13 and 5e-03, respectively). Genes encoding several translation factors are present in two similar copies; these are eIF3-S4, eIF3-S5, eIF3-S7, eIF4A, eIF4H, eEF1α, and eEF2. Mutant alleles of a single eIF4A gene (CG9075) or of a single eEF1α gene (CG8280) are recessive lethal, indicating that, at least in these two cases, the related genes are not functionally redundant.
The Drosophila genome contains six genes encoding proteins highly similar to the mRNA cap-binding protein, eIF4E (Table; Fig. 1). One of these, CG10716, is the probable ortholog of mammalian 4EHP (Rom et al. 1998). cDNAs have been identified for three of the genes encoding eIF4E-related proteins (CG8023, CG10124, and CG10716), indicating that at least these genes are expressed. I examined the five genes encoding proteins most closely related to eIF4E to determine the extent of conservation of eight residues critical in murine eIF4EI for binding the 7-methylguanosine cap (Marcotrigiano et al. 1997). All eight of these are invariant in the two alternative splicing products of the Drosophila eIF4E gene and in the five cognates (CG1442, CG8023, CG8277, CG10124, and CG11392), but not in mammalian 4EHP or CG10716. I also investigated whether the eIF4E cognates contained seven amino acids essential for eIF4G or 4EBP binding, and whether they possessed a serine residue that can be phosphorylated (Ser-251 in Drosophila eIF4EI), and a lysine residue (Lys-201 in Drosophila eIF4EI) with which P-Ser-251 is predicted to form a salt bridge (Marcotrigiano et al. 1999; Pyronnet et al. 1999). Another translation initiation factor, eIF4G, competes with an inhibitor protein, 4EBP, for the same binding site on eIF4E. Of the residues required for eIF4G or 4EBP binding, only CG1442 possesses all seven. CG8277, CG10124, and CG11392 all have a single conservative substitution among these seven residues, whereas CG1442 diverges at two residues. This analysis suggests that the eIF4E cognates all can bind 7-methylguanosine caps, but may interact with eIF4G and the inhibitor protein 4EBP in different ways, perhaps adding complexity to the regulation of cap-dependent translation. Ser-251 and Lys-201 are retained in CG8277, CG10124, and CG11392, but not in the other two, again suggesting differential regulation.
In the Caenorhabditis elegans genome, there are three eIF4E cognates in addition to a 4EHP ortholog (Keiper et al. 2000). Like the fly cognates, the nematode cognates all are invariant in the eight residues essential for 7-methylguanosine cap binding, but are somewhat variable in the eIF4G-interacting and phosphorylation residues. Whereas the C. elegans cognates have been implicated in binding the unusual trimethylated cap structure found in SL1 trans-spliced mRNAs, our results indicate that a diversity of eIF4E-like proteins exists in an organism which does not produce mRNAs with trimethylguanosine caps. In contrast to eIF4E, and unlike mammals (which have at least three), the Drosophila genome contains only a single gene encoding a recognizable ortholog of 4EBP.
RNA Binding Proteins
I analyzed >200 genes encoding proteins of the RRM, DEAD/DExH-box, KH domain, and DSRBD classes. These structural motifs are not absolutely predictive for an RNA binding function, particularly as for several DExH-box helicases experimental evidence indicates a function in DNA, rather than, or in addition to, RNA binding. Overlap between DNA and RNA binding functions is also evident from recent work on the Drosophila homeodomain-containing protein, Bicoid, which was found to bind and regulate translation of caudal mRNA. Bicoid, like other homeodomain-containing proteins, has a well-established function in DNA binding and transcriptional regulation. The RNA binding function of Bicoid is not mediated by its homeodomain but, surprisingly by a PEST motif (Niessing et al. 1999), which is usually associated with protein degradation. It is possible that further experimental work will prove that dual functionality of proteins, and unexpected functions for conserved motifs, are relatively common. Conversely, the Drosophila genome contains an unknown number of other RNA binding proteins, not listed here, that lack the canonical domains that allow such a function to be predicted. Apontic (Lie and Macdonald 1999) is an example of such a protein, for which experimental evidence of an RNA binding function has been obtained despite the absence of known RNA binding domains (RBDs). Similarly, no zinc finger proteins are included here, although some C2H2 zinc finger proteins bind RNA. It is not possible using presently available algorithms to predict an RNA binding function rather than a DNA binding function for a zinc finger protein based on sequence comparison.
RNA Recognition Motif Proteins
The Drosophila genome encodes 117 different RRM containing proteins, rendering this type the largest class of RNA binding proteins. Fig. 2 presents a phylogenetic tree showing the relationships among these proteins. The RRM contains the shorter RNP-1 and RBD domains (Birney et al. 1993); in some RRM proteins, one or both of these shorter domains are the only ones easily recognized. However, the RNP-1 domain alone is not always predictive of an RNA-binding function, as it is present in several proteins that clearly have a different function, for example, Pk34A, a protein kinase, and laminin A, a cell adhesion molecule. In comparison to the RNP-1 motif, the RBD domain is more closely associated with an RNA binding function. Proteins with only an RNP-1 or RBD domain were used in BLASTP searches against the Drosophila genome, and were only included in Fig. 2 if several RRM proteins were recovered among the hits (P < 0.01). Many RNP-1 containing proteins, but only two RBD proteins (CG6997 and CG8919) were excluded from Fig. 2 by this process.
Of the RRM proteins encoded by the Drosophila genome, 74 have not been described previously. Particularly interesting among these are CG6049, which encodes a protein highly similar (BLASTP < 1e-80) to human Tat stimulatory factor-1 (Tat-SF1; Zhou and Sharp 1996). Tat increases HIV-1 transcription at the level of elongation by binding to TAR, which forms a stem-loop structure at the 5′ end of the nascent viral transcript. Tat-SF1 is a cofactor for Tat, and has more recently been identified as a general transcription elongation factor (Li and Green 1998). CG5251 encodes a probable ortholog (BLASTP < e-102) of human NOT4, a likely negative regulator of transcription (Albert et al. 2000). CG4887 is highly related (BLASTP < 7e-72) to LUCA15, a putative tumor suppressor gene in humans (Drabkin et al. 1999). CG4787 is very similar (BLASTP < 2e-40) to human TIA-1, an RNA binding protein that acts downstream of eIF2α phosphorylation in the assembly of mammalian stress granules, in which untranslated mRNAs accumulate during translation arrest consequent to environmental stress (Kedersha et al. 1999). Other RRM proteins involved in RNA processing are discussed in an accompanying review (Mount and Salz 2000, this issue).
DEAD/DExH-box Putative Helicases
63 putative nucleic acid helicases of the DEAD, DEAH, and DExH families have been grouped into a phylogenetic tree (Fig. 3). Many of these proteins are extremely similar to yeast and mammalian counterparts. Numerous yeast helicases have been implicated in processing of pre-mRNAs or pre-rRNAs, in RNA export, or in the cases of TIF1/2 (eIF4A) and DED1, in translation initiation. A subset of DExH-box proteins function as DNA helicases, having roles in DNA replication and repair. The striking degree of conservation between individual yeast and Drosophila helicases often allowed a one-to-one correspondence to be drawn (see also Mount and Salz 2000, this issue); in these cases, the name of the yeast ortholog has been added in parentheses to Fig. 2. This led to the assignment of probable functions to many of the Drosophila helicase proteins. CG9748 as a DED1 ortholog is likely to be involved in translation initiation, and is close phylogenetically to Vasa, which has been implicated experimentally in the same process (Carrera et al. 2000). The Xenopus ortholog of CG9748, An3, encodes an mRNA that is asymmetrically localized to the animal pole of the oocyte (Gururajan et al. 1991). Both An3 and DED1 have been implicated in export of RNPs from the nucleus. An3 protein binds to and is exported from the nucleus by the CRM1 nuclear export receptor (Askjaer et al. 2000), and in yeast DED1 genetically interacts with SRM1, which encodes the Ran-associated guanine nucleotide exchange factor, which in turn binds CRM1 (Hayashi et al. 1996; Taura et al. 1998). These data suggest a similar role in Drosophila for CG9748.
Comparison of the yeast and Drosophila helicase-encoding genes allowed many of the latter to be assigned functions in preribosomal RNA processing based on yeast orthologs. These are: CG2173 (DRS1; Ripmaster et al. 1992); CG5589 (ROK1; Venema et al. 1997); CG5800 (DBP4; Liang et al. 1996); CG7843 (FAL1; Kressler et al. 1997); CG8611 (DBP7; Daugeron and Linder 1998); CG9253 (RRP3; O'Day et al. 1996); and CG9630 (SPB4; de la Cruz et al. 1998). Similarly, CG4152 is implicated in mRNA export based on its yeast ortholog (DOB1; Liang et al. 1996). Several other Drosophila helicases can be assigned functions in pre-mRNA processing using similar reasoning (Mount and Salz 2000; Fig. 3). Finally, CG11403 encodes an ortholog of yeast CHL1, a DNA helicase involved in chromosome transmission and progression through the G2/M cell cycle transition (Gerring et al. 1990).
KH Domain Proteins
The held-out wings (how) gene encodes a 382-amino acid protein with a single KH domain at its COOH terminus. how is essential for tendon cell differentiation, and a nuclear isoform of the How protein has been shown to bind a specific mRNA (stripe) and prevent its export (Nabel-Rosen et al. 1999). how is related to the murine quaking gene, which is required for Schwann cells to mature into myelin-forming cells in the peripheral nervous system (Ebersole et al. 1996). The Drosophila genome encodes a total of ten proteins highly related to quaking. Four of these (qkr58E-1, qkr58E-2, qkr58E-3, and CG10384; Di Fruscio et al. 1998; Fyrberg et al. 1998) are tightly linked in a gene cluster in cytological region 58E, and two others are nearby in 58F (CG3875) and 58A (CG4021). The final four members of this gene family (how, Sam50, CG9337, and CG18078) are unlinked.
A phylogenetic tree illustrating relationships among the 27 KH domain proteins predicted from the Drosophila genome is presented in Fig. 4. 18 of the 27 proteins possess only a single KH domain, but the others have multiple copies, as many as 13 (Dp1). One of the single KH-domain proteins (CG7878) also has a DEAD-box helicase domain. CG1691 encodes a product very similar (BLASTP < 8e-88) to zipcode binding protein (ZBP-1), a protein with four KH domains that binds to a specific site on β-actin mRNA and mediates its localization to the leading edge of chick embryo fibroblasts (Ross et al. 1997). A similar protein in Xenopus, Vera, is involved in localization of Vg1 mRNA to the vegetal pole of the oocyte (Deshler et al. 1998; Havin et al. 1998). This is of interest because, whereas asymmetric localization of specific mRNAs has been extensively studied in Drosophila oocytes, no ZBP-1 counterpart had previously been identified in flies. In addition to the four KH domains, ZBP-1 and Vera contain a single RRM; this is apparently absent from CG1691.
A phylogenetic tree illustrating relationships among the 12 DSRBD proteins predicted from the Drosophila genome is presented in Fig. 5. Two new genes for which possible functions can be assigned are CG6866, which encodes a protein highly similar (BLASTP < 2e-36) to the human TAR RNA binding protein, and CG12598, which encodes a putative deaminase involved in RNA editing (BLASTP < 7e-53 to human homologue). Mice bearing a targeted disruption of the CG6866 homologue (Tarbp2) are male-sterile, and the protein product of this gene, Prbp, has been proposed to have a role in the assembly of translationally regulated RNPs (Zhong et al. 1999).
Homologues of RNA Binding Proteins Implicated in Human Disease
Among the genes encoding KH-domain proteins, CG6203 encodes an excellent homologue (BLASTP < 6e-81) of the human fragile-X associated protein (FMRP; Ashley et al. 1993). Fragile X syndrome is the most common inherited mental retardation disorder in humans (Ashley and Warren 1995). FMRP is believed to shuttle between the nucleus and the cytoplasm, and is associated with large mRNP complexes and ribosomes (Feng, et al. 1997; Ceman et al. 1999). CG8144 encodes a KH-domain protein highly similar (BLASTP < 6e-63) to the human paraneoplastic antigen Nova-2, an RNA binding protein implicated in paraneoplastic opsoclonus-myoclonus ataxia (POMA), a neurologic disorder with associated dementia (Yang et al. 1998). The structure of a KH domain from Nova-2 has recently been determined by X-ray crystallographic methods, and the domain has also been shown to be a sequence-specific RNA binding protein (Lewis et al. 2000).
The importance of posttranscriptional mechanisms of gene regulation, particularly at the level of translational control, is becoming increasingly apparent. Research in Drosophila has already provided many key insights into this field. Now that dozens of novel RNA binding proteins have been identified in the Drosophila genome, it is likely that our level of understanding of translational control will increase dramatically in the months and years to come.
I am grateful to Oona Johnstone and an anonymous reviewer for helpful comments on the manuscript.
I am grateful to the Medical Research Council of Canada for a genomics operating grant (to P. Lasko and B. Suter).
Abbreviations used in this paper: DSRBD, double-stranded RNA binding domain; RBD, RNA binding domain; RRM, RNA recognition motif; Tat-SF1, Tat stimulatory factor-1.