The well-developed genetics of Drosophila make it an excellent system to understand mechanisms of cytoskeletal function and organization and how the cytoskeleton is coupled to development and differentiation. Many years of work have revealed that Drosophila has a set of cytoskeletal structural proteins and protein motors qualitatively similar to most other eukaryotes. The full toolbox of cytoskeletal elements, and the extent to which they resemble cytoskeletal proteins in other organisms has, however, remained unknown owing to the lack of a complete genome sequence. With the recent arrival of the complete sequence of the euchromatic genome of Drosophila (Adams et al. 2000; Rubin et al. 2000), we can begin to compile a complete list of cytoskeletal components. We can also answer once and for all the question of what cytoskeletal proteins discovered in mammals and other eukaryotes are present in the fly, and what variations on these known proteins exist. This article provides a brief review of the cytoskeletal genes of Drosophila, and highlights several interesting features of their nature and organization not previously known.
As described (Adams et al. 2000) several different methods were used to predict the set of proteins encoded in the Drosophila genome. These methods relied on a combination of coupling the sequence to known genetic loci, EST sequences, and gene prediction programs. The predicted proteins were searched using greater than 1,000 cytoskeletal query sequences (see Table SI at http://www.jcb.org/cgi/content/full/150/2/F63/DC1) from vertebrates, C. elegans, and S. cerevisiae. Queries were chosen by selecting representative sequences from each class of known cytoskeletal proteins described in Kreis and Vale 1999, as well as selections of proteins classified as cytoskeletal on the C. elegans and S. cerevisiae proteome Web sites. In addition, consensus sequence elements from the BLOCKS database were also used (Henikoff et al. 1999a,Henikoff et al. 1999b). Searches were done primarily using BLASTP against the set of predicted proteins. In a few cases, TBLASTN was used to search the entire nucleotide sequence when an exhaustive classification of protein families was needed, or where a rigorous case for absence of a particular protein, or protein class was being established. G proteins and kinases were not analyzed since other workers were analyzing these proteins separately. Therefore, attention was effectively restricted to cytoskeletal structural proteins, non-enzymatic regulatory proteins, or proteins that moved, bound, cross-linked, or severed cytoskeletal polymers. Search results were inspected and evaluated manually; many false positive hits were found and eliminated from the classification as cytoskeletal.
Overall, 262 genes (see Table SII at http://www.jcb.org/cgi/content/full/150/2/F63/DC1) were found to give moderate to completely convincing homology with some member of the query set. This result corresponds to ∼2% of the estimated 13,000 proteins encoded in the Drosophila genome. This 2% estimate is probably an underestimate since there are no doubt many proteins not yet recognized to have cytoskeletal functions, and it is also possible that some weakly homologous proteins were incorrectly eliminated from consideration. This 2% estimate is about half that reported for C. elegans (Chervitz et al. 1998) although the precise source of the estimate in this case is unclear. A few systematic sources of potential error were also identified in these analyses. In particular, proteins having coiled-coil, WD-40, or ankyrin repeat regions often gave significant BLAST scores with a variety of cytoskeletal queries, but examination of the actual sequence alignments revealed that they were not true homologues or relatives. For example, searches with the myosin II tail or intermediate filament proteins (see below) yield apparently significant hits with clearly unrelated α-helical coiled domains of myosins, kinesins, lamins, restin, and other unrelated α-helical coiled-coil containing proteins.
Among the ∼261 genes encoding cytoskeletal structural or motor proteins 95 genes encode polypeptides belonging to the kinesin, dynein, or myosin motor superfamilies, or accessory/regulatory polypeptides known to interact with the motor polypeptide subunits. Approximately 80 genes encode actin-binding proteins of various types, including proteins belonging to the spectrin/α-actinin/dystrophin superfamily of membrane cytoskeletal and actin–cross-linking proteins. 26 genes encode proteins likely to bind microtubules including relatives of microtubule binding proteins found in other organisms. 14 genes encode members of the actin superfamily, 12 genes encode members of the tubulin superfamily, and 5 genes encode septins.
Dystrophin, Spectrin, and Related Cross-linking Proteins: Links to Human Disease
Most multicellular eukaryotes encode multiple proteins related to spectrin and other proteins of the membrane cytoskeleton. In humans, not only are these proteins important for cell shape, but muscular dystrophy and various hemolytic anemias in humans appear to be caused by cellular fragility owing to a weakened or disrupted membrane cytoskeleton in specific cell types. For example, hemolytic anemias are often caused by mutations in genes encoding spectrins, ankyrins, or interacting proteins while Duchenne muscular dystrophy is caused by mutations in a gene encoding a protein related to spectrin called dystrophin.
The possible existence of dystrophin and its associated proteins in Drosophila has been discussed in the literature for many years, and other than one recent report (Roberts and Bobrow 1998), have been thought not to exist. The sequence of Drosophila has revealed the clear existence of a dystrophin homologue (Fig. 1 A; unfortunately spread over adjacent gene pieces in the database of predicted genes). In addition, a clear homologue of dystrobrevin (Fig. 1 B), and reasonable matches to the dystrophin binding proteins syntrophins were also found (see Table SII at http://www.jcb.org/cgi/content/full/150/2/F63/DC1). Thus, Drosophila may prove a good model for understanding the normal functions of dystrophin, the proteins it interacts with, and the role of this system in human disease.
A class of proteins that has spectrin repeats, actin-binding motifs, and that appear to cross-link various filamentous classes has recently been found. In mammals, this class includes BPAG (Yang et al. 1999) and MACF (Leung et al. 1999). In Drosophila, other than the spectrins themselves, kakapo (CG18076) and MSP-300 (CG18252) are the clearest members of this group. Strikingly, even after analysis of the complete genomic sequence, there is still only a single α-actinin gene. This result is puzzling since mutants lacking α-actinin function have a relatively muscle-specific phenotype (Fyrberg et al. 1998), yet α-actinin is thought to be an important component of adhesion plaques (Hynes and Zhao 2000). Clearly, this conclusion is either incorrect, or other proteins can substitute for this function of α-actinin. Finally, several band 4.1 relatives were found as were two clear homologues of ankyrin.
Tubulin and Actin Polymers and Their Binding Proteins
Work in Drosophila has played a major role in our understanding of the actin and tubulin multi-gene families, the regulation of microtubule or microfilament polymerization, and the interactions of microtubules and microfilaments with other cellular components. Many genes have been identified by mutation in Drosophila, and found to encode important components of these systems previously discovered by biochemical or other approaches in various organisms.
Before the completion of the genome sequence, Drosophila was known to have four conventional α-, four conventional β-, and two γ-tubulin genes. The complete sequence revealed the presence of two additional, well-conserved conventional tubulin genes. One encodes a protein most like α-tubulin, the other encodes a protein most like β-tubulin (Fig. 2). Surprisingly, no proteins similar to the recently discovered δ- or ε-tubulins (Dutcher and Trabuco 1998; Chang and Stearns 2000) were found (both BLASTP and TBLASTN were used to ensure that the genes are truly missing, although it is formally possible that one or both could still be lurking in the few small sequence gaps, or heterochromatic DNA). The lack of δ is particularly surprising given that δ-tubulin has been found both in humans and algae. In view of the proposal that δ-, and perhaps ε-tubulins, are important components of centrioles and centrosomes respectively, these data suggest that other tubulins, or other non-tubulin proteins can substitute for these functions.
A long-standing question about microtubule-binding proteins in the fly has concerned whether homologues of the repeat-containing MAPs that are typical of many mammalian MAPs were present in the fly (Cambiazo et al. 1995). Previously, a 205K MAP lacking such a motif was found (Irminger-Finger et al. 1990). The sequence now reveals a protein that is a bit larger than mammalian tau protein, but has excellent sequence similarity to the repeat domains (Fig. 3). Whether it is a true homologue of tau, or whether it is a smaller version of MAP2 remains to be determined. It is intriguing however, that the fly may have a τ-like, but not a MAP2-like protein since tau is known to be enriched in mammalian axons while MAP2 is enriched in mammalian dendrites. This feature could be correlated with the general rarity of typical dendrites in fly neurons (Kandel et al. 1991). C. elegans also has a protein like tau and CG5606, but no MAP2-like molecule has been reported in that organism either.
Drosophila also has a variety of other well-known microtubule or microfilament binding proteins that were previously known or can now be inferred based on sequence homology (see Table SII at http://www.jcb.org/cgi/content/full/150/2/F63/DC1). The most striking new additions are homologues of the microtubule-severing protein called katanin, which is composed of two subunits, p60 and p80 (Fig. 4). Drosophila has genes encoding both subunits but has the added wrinkle of two genes encoding p60. Whether this allows novel regulatory properties awaits functional studies. Finally, Drosophila has well-conserved homologues of all of the recently identified components of the ARP2/3 complex as well as proteins homologous to WASP and Scar and most other known actin binding or bundling proteins.
Three major superfamilies of cytoskeletal motor proteins and associated proteins have been identified in all organisms: kinesins, dyneins, and myosins. Among the 95 genes encoding components of these systems are 24 genes with sequence similarity to the motor domain of kinesins. Many of these were previously described, although eight are new. Among these eight are two with unusual sequence features. One has a domain in its tail region with sequence similarity to Gγ, while another has similarity to RAS-GAP domains. Two had close relationships to so-called chromokinesins. Intriguingly, there are three members of the central motor domain class of microtubule-catastrophe kinesins exemplified by XKCM1 and KIF2 (Desai et al. 1999). Two of these in the fly appear to be present as a local duplication. Finally, no new COOH-terminal kinesins other than NCD were found, although there are a few predicted kinesins whose sequence is still partial. Several homologues of kinesin-II raft components were also found as was the kinesin-II KAP subunit. Finally, a clear homologue of the mammalian protein PAT1 (Zheng et al. 1998), which has sequence similarity to kinesin light chain and binds to the amyloid precursor protein and microtubules, was found. Strikingly absent was any gene with significant sequence similarity to kinectin (Kumar et al. 1998), the proposed kinesin receptor for ER or ER-derived membranes. The kinectin sequence was searched exhaustively using both BLASTP and TBLASTN and nothing other than unrelated α-helical coiled-coil proteins were found among the top hits.
Of the 12 dynein heavy chains, 3 appeared to be new. No new cytoplasmic heavy chains were found; most appear to be axonemal although further work will be needed to categorize them rigorously. In addition to the gene already known to encode the p150 glued component of dynactin, a new relative was also found (Fig. 5). Finally, all previously identified dynactin subunits were also found.
Among the 13 myosin genes, two genes were found with significant sequence similarity to myosin VIIA, the myosin defective in the human hereditary deafness and blindness disease, Usher's type IB. One gene was previously known; the other is new (Fig. 6). There are also several genes with homology to myosin light chains including several not previously known.
A great deal is now known about the composition of centrosomes or spindle poles. Work from Drosophila has revealed a group of proteins called GRIPs that interact with γ-tubulin to form a nucleation complex (Oegema et al. 1999). Other proteins called CP60/CP190 were also previously found. Work in vertebrates has identified two large α-helical coiled-coil proteins called pericentrin and NUMA that are important structural components of centrosomes. Extensive searching revealed no true homologue of these two proteins recognizable by simple sequence similarity. However, a protein called centrosomin previously identified by mutation has a similar predicted structure to pericentrin and NUMA, and the phenotype of cnn mutants (Megraw et al. 1999) is consistent with the notion that cnn encodes the functional homologue of NUMA or pericentrin. Further work is needed to test this hypothesis.
Intermediate filaments exist in virtually every well-studied eukaryotic organism. Drosophila, however, has long been argued to lack such proteins based on morphological grounds (reviewed in Fyrberg and Goldstein 1990). In addition, the lack of any genes encoding homologous proteins has bedeviled and perplexed many of us. Searches of the complete genomic sequence with many different intermediate filament probes, both full-length, and non-helical fragments using both BLASTP and TBLASTN, failed to yield any convincing matches other than nuclear lamins. Comparable searches of the C. elegans genomic sequence with the same query set easily identified C. elegans intermediate filament proteins with scores better than those garnered with nuclear lamins. In addition, Drosophila appears to lack an intermediate filament binding domain in the kakapo protein, which is a relative of mammalian proteins that can link actin filaments and intermediate filaments (Gregory and Brown 1998). How Drosophila lives without intermediate filaments remains a puzzle since the organism clearly has epithelia and other tissues under mechanical stress. While part of the answer may lie in the hard chitinous exoskeleton of the fly, other cytoskeletal systems may have acquired the mechanical functions played by cytoplasmic intermediate filaments in other organisms (discussed in Gregory and Brown 1998). Further work is needed to understand this feature of Drosophila cell biology.
In conclusion, for studies of cytoskeletal function, and the role of the cytoskeleton in development and differentiation, the sequence of the major components provides an invaluable starting point. These data, coupled to the ever expanding array of P-element mutations and other genetic manipulations will enable a complete elucidation of cytoskeletal biology and the beginnings of cytoskeletal genomics. Given the inherent redundancy of some cytoskeletal subsystems, the ability to manipulate genetically the dosage or content of several cytoskeletal genes simultaneously will be informative. Similarly, new genetic screens with enhancement or suppression of cytoskeletal mutant phenotypes, or overexpression screens using EP collections and cytoskeletal phenotypes will catapult us to a new era of understanding of cytoskeletal biology. Finally, cytoskeletal chips that allow monitoring of cytoskeletal and other gene function will be an invaluable tool for relating the overt phenotype of cytoskeletal mutants to cytoskeletal expression patterns in various developmental mutants. As the ultimate phenotypic readout of determinative decisions, an understanding of global patterns of cytoskeletal gene expression will be essential to elucidating the program and principles of eukaryotic development.
The authors would like to thank Celera Genomics for their kind assistance in the analysis of the data discussed in this paper. In particular, we are grateful to Wenyan Zhong for her excellent and tireless help in the early phases of this work. Lawrence S.B. Goldstein is an Investigator of the Howard Hughes Medical Institute.
The online version of this article contains supplemental material.