Both genetic and biochemical approaches have been used to study the molecular mechanisms by which damaged DNA is repaired in a number of species. The fundamental DNA repair pathways have been functionally conserved for the most part among prokaryotes, lower eukaryotes, and higher eukaryotes, and the proteins and protein families involved in these repair processes show high degrees of amino acid sequence conservation. However, there are also a number of cases in which lack of conservation of particular polypeptides may reveal interesting species-specific differences in how certain repair functions are performed.
The recent completion of the Drosophila genome sequence makes possible a comprehensive determination of the presence or absence of Drosophila proteins with significant sequence similarity to proteins implicated in DNA repair studies carried out in other species. In this analysis, we will focus on specific insights into the Drosophila responses to DNA damage that have come to light through inspection of the genomic sequence, considering in turn various groups of proteins involved in different pathways of repair. A comprehensive list of Drosophila DNA repair genes can be found at http://www.dmrepair.ucdavis.edu.
The strand-exchange protein RecA is a central component of the Escherichia coli DNA repair and recombination machinery. A family of RecA-related proteins exists in eukaryotes (for review see Thacker 1999), named for Saccharomyces cerevisiae Rad51, the first member discovered. There are three additional members of this family in S. cerevisiae. Dmc1 is highly similar to Rad51 in both sequence and function (both possess strand-exchange activity), but is specifically required in meiotic recombination. Rad51 and Dmc1 are highly conserved; mammalian orthologs of each are ∼80% similar over their entire lengths (∼340 residues).
The other two Rad51-related proteins in S. cerevisiae, Rad55 and Rad57, are more divergent in sequence, with similarity being restricted to a region of ∼170 residues encompassing the ATPase domain. Although these may not possess strand-exchange activity, they, like Rad51, are important in both recombinational DNA repair and meiotic recombination. Humans have two members of this divergent class (Xrcc2 and Rad51D) and three that are intermediate between this class and Rad51/Dmc1 (Xrcc3, Rad51B, and Rad51C).
Two Drosophila Rad51 proteins have been described previously. An ortholog of Rad51 was found through sequence similarity (McKee et al. 1996), and a protein closely related to human Xrcc3 was found to be the product of the spindle-B meiotic recombination gene (Ghabrial et al. 1998). The genome sequence reveals two additional genes, CG6318 and CG2412, that encode polypeptides belonging to the divergent class of Rad51-related proteins (Fig. 1). CG6318 is extremely divergent; the most similar sequence in the database is human Xrcc2 (24% identity and 41% similarity within the core region), but it is unclear whether these are true orthologs. CG2412 is most similar to human Rad51D (30% identity and 53% similarity), but again these may not be true orthologs.
Notably lacking from the Drosophila genome is a Dmc1 ortholog. A hallmark of meiotic recombination that distinguishes it from DNA repair is that interhomologue events are much more frequent than intersister events. Dmc1 plays a key role in enforcing this bias, presumably by directing strand exchange specifically to a nonsister chromatid, or by preventing strand exchange between sisters (Bishop et al. 1992). The absence of a Dmc1 ortholog raises the question of how this bias is achieved in Drosophila. One possibility is that another family member, such as SPN-B, fulfills this function. Mutations in spn-B cause defects in meiotic recombination, but do not result in hypersensitivity to methyl methanesulfonate or nitrogen mustard, suggesting that the gene may function specifically in meiosis (Ghabrial et al. 1998; Burtis, K.C., unpublished data).
An alternative is that the interhomologue bias is enforced through another mechanism. Structural constraints imposed by the synaptonemal complex (SC) may direct recombination toward homologues or away from sisters. In Saccharomyces, SC assembly is not completed until after Dmc1 functions (Bishop et al. 1992). However, in Drosophila SC assembly is believed to be completed before recombination begins (McKim et al. 1998), so its presence could influence subsequent events in the recombination process. Caenorhabditis elegans also lacks a meiosis-specific Rad51 gene (only a Rad51 ortholog is found in C. elegans), but has the same temporal relationship between SC assembly and recombination as in Drosophila (Dernburg et al. 1998).
S. cerevisiae RAD51 is a member of the genetically defined RAD52 epistasis group. Most members of this group are found in Drosophila, but no homologue of Rad52 can be identified. Rad52, which plays a key role in the Rad51-dependent repair and recombination pathways in S. cerevisiae, is represented by conserved homologues in Schizosaccharomyces pombe and vertebrates, and a paralog (Rad59) in S. cerevisiae. Disruption of the mouse RAD52 homologue does not cause defects in viability, fertility, or sensitivity to ionizing radiation (Rijkers et al. 1998), suggesting a less crucial role for Rad52 in mammals than in yeast, although it remains possible that other RAD52-related genes exist in mammals.
Mismatch Repair Proteins
The E. coli mismatch repair system plays a crucial role in correcting replication errors that generate single base-pair mismatches and small insertion/deletion (I/D) loops, such as might occur within microsatellite tracts (for review see Modrich and Lahue 1996). The early events in the pathway involve MutS, which binds to mismatches and I/D loops, and MutL, which couples this recognition stage to downstream repair events. These proteins have been conserved in eukaryotes, where each has diverged into a family of polypeptides whose products are involved not only in postreplication mismatch repair, but in other DNA repair pathways and in meiotic recombination (for review see Buermeyer et al. 1999). The S. cerevisiae genome encodes six MutS homologues, Msh1 through Msh6; and four MutL homologues, Mlh1 through Mlh3 and Pms1.
Msh1 is involved in repair of mitochondrial DNA; Msh2, Msh3, and Msh6 perform nuclear DNA repair functions; Msh4 and Msh5 function specifically during meiosis. Although orthologs of each of these except Msh1 have been identified in mammals, efforts in several laboratories to clone Drosophila Msh genes identified only the Msh2 ortholog, encoded by spellchecker1 (Flores and Engels 1999). The complete Drosophila genome sequence reveals a single additional family member, an ortholog of Msh6; there are no orthologs of Msh3, Msh4, or Msh5 (Table).
The Msh proteins function as heterodimers. Msh2 and Msh6 form MutSα, and Msh2 and Msh3 form MutSβ (Marsischky et al. 1996). These two heterodimers have partially overlapping repair functions. MutSα can direct the repair of base-base mismatches and small I/D loops, whereas MutSβ directs the repair of 2-8 nucleotide I/D loops. Thus, Msh3 is partially redundant with Msh6 with regards to post-replication mismatch repair. However, some S. cerevisiae pathways for the repair of double-strand breaks (DSBs) require Msh2 and Msh3, but not Msh6 (Sugawara et al. 1997).
This raises the question of whether the Drosophila Msh6 ortholog fulfills what are Msh3-specific functions in yeast. The DSB repair function in yeast requires the nucleotide excision repair endonuclease Rad1/Rad10. The Drosophila homologue of Rad1, MEI-9, is involved in some DSB repair pathways (Baker et al. 1976); perhaps a Msh2/Msh6 heterodimer or some other protein has acquired the ability to participate in this process. If true, it will be interesting to determine what properties of Drosophila Msh6 have allowed this substitution, or what properties of Msh6 and Msh3 have prevented it in Saccharomyces or humans.
The meiosis-specific paralogs Msh4 and Msh5, which are also believed to function as a heterodimer, promote crossing over among a subset of meiotic recombination events. Orthologs of these proteins are present in the C. elegans, mouse, and human genomes, where they are also expressed highly or specifically during meiosis. In C. elegans, these proteins have also been found to promote crossing over (Zalevsky et al. 1999), but Msh4−/− or Msh5−/− mice show defects at an earlier stage of meiosis (Kneitz et al. 2000).
At least some of the underlying enzymology of meiotic recombination is conserved between S. cerevisiae, C. elegans, and Drosophila (McKim and Hayashi-Hagihara 1998; Dernburg et al. 1998). Thus, the absence of Msh4 and Msh5 orthologs in Drosophila does not reflect the implementation of a fundamentally different recombination pathway. Rather, Drosophila females must have adopted an alternative mechanism for promoting crossing over at later stages in the pathway. A clue to this mechanism is found in the observation that the nucleotide excision repair protein MEI-9 is required to generate crossovers in Drosophila, a function that the yeast homologue, Rad1, does not have (Sekelsky et al. 1995). Although it is unlikely that MEI-9 functionally substitutes for Msh4 and Msh5, it is clear that different proteins have been co-opted to carry out the later steps in the recombination pathway in Drosophila. Msh4 and Msh5 orthologs have not yet been identified in the genomes of Schizosaccharomyces pombe or Arabidopsis thaliana, although orthologs of Msh2, Msh3, and Msh6 are present in these species (Table). These findings reinforce an emerging theme in comparative studies of meiotic recombination: although some of the underlying enzymology appears to be highly conserved (e.g., initiation via a DSB), other aspects of the pathway differ among different organisms, in either the methods or the molecules. Determination of how different species have mixed and matched different basic DNA repair functions to carry out meiotic recombination should prove to be informative.
MutL homologues also form heterodimers. Three MutL homologues have been identified in humans: Mlh1 and Pms2 constitute MutLα, and Mlh1 and Pms1 constitute MutLβ. As with the MutS homologues, Drosophila has just two members of this family, orthologs of Mlh1 and Pms2. Thus, the major heterodimer, MutLα, is present, but the minor one, MutLβ, is not. It would seem that Drosophila has retained a minimal set of eukaryotic mismatch repair genes, but whether these are capable of carrying out all the functions of the more extensive set of genes in yeast and mammals remains to be seen.
The E. coli recQ gene encodes a DNA helicase involved in certain recombination pathways. There are five known human genes encoding RecQ-related proteins: RECQ1, BLM, WRN, RECQ4, and RECQ5. The importance of the divergent functions of these genes is evidenced by the finding that at least three of them when mutated produce genetic disorders that lead to increased genomic instability and cancer (reviewed in Karow et al. 2000). Mutations in BLM cause Bloom syndrome, mutations in WRN cause Werner syndrome, and mutations in RECQ4 cause Rothmund-Thomson syndrome. The predicted proteins all share extensive sequence similarity across a region of ∼400 residues that includes the seven helicase motifs as well as additional sequences conserved among members of the RecQ family. In most family members, this region is flanked by sequences that are similar to one another only in composition, being rich in charged residues.
The complete genome sequence of S. cerevisiae encodes only a single RecQ-related protein, whereas C. elegans has four family members. Comparison of the sequences within the core region indicates that these four are orthologs of four human RecQ helicases (RecQ4 is not represented in C. elegans), suggesting that gene duplication and divergence occurred early in metazoan evolution (Kusano et al. 1999). In the Drosophila sequence, there are three RecQ helicase genes, orthologs of Blm, RecQ4, and RecQ5 (Table). Thus, an ortholog of human RecQ4 is found in Drosophila, but not in Caenorhabditis, whereas orthologs of RecQ1 and Wrn are found in Caenorhabditis, but not in Drosophila. Only Blm and RecQ5 are found in all three species.
The case of Wrn is particularly interesting. In mammalian Wrn, the region NH2-terminal to the RecQ helicase core has a domain with 3′ to 5′ exonuclease activity (Huang et al. 1998) that has sequence similarity to the exonuclease domains of RNaseD and E. coli PolA. Although there is no detectable ortholog of the Wrn helicase region in Drosophila, CG7670 is strikingly similar to the exonuclease region of Wrn (Fig. 2). The sequence similarity extends throughout the conserved exonuclease motifs, and includes the putative ion-chelating and catalytic residues. The region of similarity within the predicted CG7670 protein encompasses residues 113–305 of 346, with the remainder being rich in charged residues (25% basic and 18% acidic), which is also a feature of eukaryotic RecQ helicases.
The linkage of helicase and exonuclease functions in the Wrn protein presumably reflects a coupling of these two activities that is important to the function of the protein. Perhaps in Drosophila a different helicase interacts with CG7670 to form a functional homologue of Wrn. A similar situation may occur in other organisms. For example, although there is not yet an ortholog of the Wrn helicase identified in Arabidopsis, there is a predicted polypeptide with sequence and structural similarity to Drosophila CG7670. Furthermore, although there is an ortholog of the Wrn helicase in C. elegans, F18C5.2, the predicted protein is truncated at the NH2 terminus, and the exonuclease region is absent. However, there is a putative Wrn-related exonuclease within the predicted polypeptide ZK1098.8 (mut-7). Thus, at present, the only RecQ helicase known to contain an exonuclease domain is vertebrate Wrn, suggesting that coupling of exonuclease and helicase activities in the same polypeptide occurred later in evolution. The determination of the functional relationships between the exonuclease and the helicase may provide a new avenue toward understanding the cellular functions of Wrn and the clinical manifestations of Werner syndrome.
Base excision repair (BER) and nucleotide excision repair (NER) are different pathways involved in the removal of many common DNA lesions. Oxidatively damaged bases are primarily corrected via the BER pathway, whereas helix-distorting lesions caused by exposure to chemical mutagens or ultraviolet light are removed via NER. Both pathways can remove damage throughout the genome, but in both humans and yeast damage on the template strand of actively transcribed genes is removed much more rapidly than other damage. Efforts to verify that this transcription-coupled repair (TCR) occurs in Drosophila, however, failed to find evidence for its existence, at least for the two main classes of UV-induced damage (de Cock et al. 1992; van der Helm et al. 1997).
Cockayne syndrome is a hereditary disorder that results from disruption of TCR (for review see van Gool et al. 1997). This disease can result from mutations of any of several genes, including the NER genes XPB, XPD, and XPG, as well as the non-NER genes CS-A and CS-B. CS-B encodes a polypeptide with an ATPase domain similar to that of Swi/Snf-type chromatin remodeling proteins. CS-A encodes a WD-repeat protein that interacts with CSA and with the RNA polymerase II basal transcription factor TFIIH. The S. cerevisiae protein Rad26 is highly similar to CSB, showing ∼50% amino acid identity over 500 residues; Rad28 is the yeast homologue of CSA. The Drosophila genome does not encode apparent homologues of either CSA or CSB. Thus, both the absence of sequence conservation and, more directly, experimental results indicate that TCR does not occur in Drosophila.
What are the consequences of lack of TCR? The importance of this pathway in humans is evidenced by the severity of Cockayne syndrome, whose features include neurological abnormalities, prenatal growth defects, and severe postnatal developmental failure, resulting in early death. A model for the mechanism of TCR has emerged recently (Le Page et al. 2000). The types of damage repaired by BER and NER often block RNA polymerase, resulting in a stalled polymerase complex that remains stably associated with the damaged template strand. A critical function of the TCR machinery is to remove this stalled complex to allow repair proteins to access and correct the damage. At least some of the repair machinery is recruited to the site by the TCR complex.
Drosophila cells apparently lack at least the process that allows rapid removal of DNA damage from template strands of transcribed genes. How do these cells deal with stalled RNA polymerase complexes? One possibility is that they don't. Most cell growth and division in Drosophila occurs during larval development. Larval tissues are typically polyploid or polytene during this time, so the absence of one template for transcription may not be detrimental in many cases. Cells that contribute to the adult are diploid, but the animal can suffer the loss or slow growth of a substantial fraction of these cells without apparent consequences. Nonetheless, if stalled transcription complexes were not removed, one would expect to observe the reverse of TCR: slower repair of template strands. For the genes analyzed, both strands were repaired with similar time courses for both transcribed and nontranscribed genes (de Cock et al. 1992; van der Helm et al. 1997). It is likely that some other mechanism is used to remove stalled RNA polymerase complexes in Drosophila. One possibility is that these stalled complexes have a much shorter half-life in Drosophila than in mammalian cells. Alternatively, there may be an active mechanism to remove stalled complexes without recruitment of repair proteins.
It is formally possible that TCR of oxidative damage does exist in Drosophila, since only repair of UV-induced damage has been measured. However, the failure to identify homologues of CSA and CSB in the genome sequence suggests that this is not the case.
DNA-dependent DNA Polymerases
The family of identified sequences in eukaryotic genomes encoding unique DNA-dependent DNA polymerases has grown rapidly in recent years (for reviews see Friedberg et al. 2000; Hübscher et al. 2000). Polymerase activity has been demonstrated for most of the polypeptides encoded by these genes, and most if not all are likely to play a role in some aspect of DNA repair. In Drosophila, evidence for the presence of nine DNA polymerases (in some cases including multiple subunits) is obtained through BLAST-based sequence comparisons, including polymerases α, δ, ε, η, θ, ι, γ, ζ, and deoxycytidyl transferase (yeast REV1 homologue).
One notable finding from perusal of the Drosophila genome is the apparent absence of sequences encoding DNA polymerase β, which plays a critical role in the “single nucleotide” pathway of base excision repair (Sobol et al. 1996). Evidence suggests an important role for Pol β in the long-patch pathway as well (Dianov et al. 1999), although the alternative PCNA-dependent pathway using Pol δ or Pol ε may play the major role in BER in vivo, at least for radiation-induced DNA damage (Miura et al. 2000). The absence of Pol β in the Drosophila proteome suggests that the PCNA-dependent pathway may represent the only mechanism for BER in Drosophila. It further suggests that initial removal of the 5′-deoxyribose phosphate remaining after strand cleavage by AP endonuclease action at abasic sites is likely carried out by a FEN1-like flap endonuclease activity (DeMott et al. 1996), as opposed to β-elimination catalyzed by Pol β. Two FEN1 homologues are present in the Drosophila genome (CG8648 and CG10670).
It should be noted that although Pol β–like proteins have been found in mammals, yeast, and several protozoans, the other completely sequenced metazoan invertebrate genome, that of C. elegans, does not encode Pol β. The Drosophila and worm genomes likewise lack homologues for any of the other members of the Family X group of DNA polymerases, which include terminal deoxynucleotidyl transferase and the recently reported mammalian polymerases Pol μ and Pol λ (Dominguez et al. 2000).
DNA Damage Checkpoints
After exposure to DNA-damaging agents, eukaryotic cells activate checkpoint pathways that delay cell cycle progression, allowing additional time for repair of DNA damage. Checkpoint sensors appear to interact directly with damaged DNA (Weinert 1997; Kitazono and Matsumoto 1998). The four sensors identified in yeast that have clear homologues in mammals (Hus1, Rad1, Rad9, and Rad17) also have homologues in flies. Checkpoint transducers, which include the Atm, Chk1, and Chk2/Cds1 serine/threonine kinases, as well as the 14-3-3 proteins, amplify and distribute the checkpoint signal to cell cycle regulators. Drosophila homologues of the Chk1 and Cds1/Chk2 kinase have been reported as the products of the grapes and maternal nuclear kinase genes (Fogarty et al. 1997; Oishi et al. 1998). Drosophila has two members of the 14-3-3 family, 14-3-3ε and 14-3-3ζ (Chang and Rubin 1997; Kockel et al. 1997). grapes, 14-3-3ε, and mei-41 (see below) are all required for a DNA damage checkpoint (Hari et al. 1995; Brodsky et al. 2000b).
One Drosophila member of the Atm kinase family has been described, the product of the mei-41 gene (Hari et al. 1995). The genome sequence reveals that flies, like S. cerevisiae and mammals, have a second Atm-like gene, CG6535. Based on sequence comparisons, CG6535 is most similar to Atm, while MEI-41 is more similar to the related checkpoint kinase Atr. No ortholog of the DNA-dependent protein kinase (DNA-PK) catalytic submit can be found, though the DNA binding subunits, Ku70 (Beall and Rio 1996) and Ku80, are present, and Drosophila cells have a strong non-homologous end-joining activity for the repair of DNA DSBs.
In many metazoan cells, DNA damage can also induce apoptosis. A critical regulator of this response in mammals is the transcription factor p53 (for review see May and May 1999). In addition to p53, mammals have two closely related proteins, p73 and p63, that are required for normal development and may also play a role in transducing DNA damage signals. Drosophila contains a single p53 homologue and expression of dominant negative forms of this protein blocks DNA damage-induced apoptosis (Brodsky et al. 2000a; Ollmann et al. 2000). At this time, squid and clam genes resembling p63 are the only other invertebrate p53 homologues described. DNA damage-induced death in S. cerevisiae does not use the apoptotic machinery described in animal cells, and C. elegans exhibits DNA damage-inducible cell death in the germline, but not in the soma (Gartner et al. 2000). These differences in cell death regulation may explain why no p53 homologue has been reported in the sequenced genomes of these organisms.
Studies in mammals have identified a large number of DNA damage-induced regulators and targets of p53. In mammals and Drosophila, the four conserved checkpoint kinases described above are strong candidates to directly activate p53 after DNA damage. In mammals, the Mdm2 protein is a transcriptional target of p53 and functions in a negative feedback loop by binding p53 and promoting its degradation and subcellular relocalization. Sequence similarity searches do not reveal an obvious Mdm2 homologue in Drosophila, suggesting that other mechanisms may act to limit p53 activity.
Abbreviations used in this paper: BER, base excision repair; DSB, double-strand break; I/D, insertion/deletion; NER, nucleotide excision repair; SC, synaptonemal complex; TCR, transcription-coupled repair.