The use of EM and image analysis techniques, particularly the single particle methodology, has opened the door to the structural characterization of large protein complexes in the megadalton range. This review covers the basic principles of single particle EM and recent examples of its application to the study of large molecular machines in the eukaryotic cell. These examples will give the reader a perspective on how this emerging technique can be applied to the study of large protein complexes to determine their overall shape, or the location of subunits and domains within them, and to follow conformational changes accompanying intermediates in their functional cycle.
Structural Studies of Molecular Machines
We are entering a new and exciting time in biology. The advent of full genome information for several organisms, including humans, provides a new tool to probe cellular structure and function. Structural genomics is now on its way and promises to accelerate the production of structural models at the atomic resolution for many proteins and nucleic acids. However, even the most optimistic structural biologist realizes that the extrapolation from structure to function will be difficult to ascertain. Numerous approaches, most importantly microchip technology for genomic expression, will work in parallel with the structural effort on the task of linking genes to cellular function.
In this postgenomic era, the structure determination of large, complex protein ensembles will pose a particularly difficult problem for structural biologists. It is now clear that most functions in the cell are not carried out by single protein enzymes, colliding randomly within the cellular jungle, but by macromolecular complexes containing multiple subunits with specific functions (Alberts 1998). Many of these complexes are described as “molecular machines.” Indeed, this designation captures many of the aspects characterizing these biological complexes: modularity, complexity, cyclic function, and, in most cases, the consumption of energy. Examples of such molecular machines are the replisome, the transcriptional machinery, the spliceosome, and the ribosome. The size of these macromolecular complexes, alone, often makes them inaccessible to x-ray crystallography, since crystals have to be obtained first and phasing such large structures is difficult. In most cases, with the clear exception of the ribosome, these complexes are relatively sparse in the cell, and purification yield only small amounts of sample, usually insufficient for crystallization trials. Reconstitution of large complexes from recombinant proteins create other difficulties, such as maintaining correct stoichiometry and order of assembly, and often, the composition of a complex is simply not known. As a result, many efforts have concentrated on determining the structures of individual subunits and domains within the machines. Consequently, information on the organization of the assembly subunits, their interactions, and sometimes their precise function within the context of the fully functional complex is lost. The most promising technique for the structural characterization of large molecular machines is EM and image reconstruction. In particular, single particle techniques overcome the need for crystals, making this method generally applicable to the structure determination of molecular machines. Furthermore, several reconstructions of complexes differing only slightly in composition or conformation can be obtained relatively quickly, and even conformational changes (on a millisecond time scale) can be trapped by rapid freezing. In this review, we summarize some of the principles underlying the potential of this methodology. We discuss examples of present and future applications to the characterization of molecular machines. Finally, we consider the prospects of improving these methods to reach high resolution and high throughput.
Single Particle Reconstruction
In diffraction-based methods, the interaction of the probe (x-rays or electrons) with biological matter results in a scattering process that carries information on the atomic organization in the sample. Unfortunately, biological macromolecules are very sensitive to radiation damage during this process. For EM, this means the number of electrons passing through the sample has to be kept small. The resulting low signal to noise ratio must be overcome by averaging many images of protein particles of identical composition and conformation. Crystallographic methods (using x-rays or electrons) take advantage of the molecular alignment provided by the crystal to obtain information averaged over many molecules. The principle of single particle averaging can be used to substitute the crystallographic ordering by computational alignment of imaged molecules (Frank et al. 1988), sometimes referred to as crystallization “in silico.” Images contain both amplitude and phase information. A 3-dimensional (D) structure can be calculated using basic mathematical principles to determine the relative orientations of different 2-D views of one object. The order and size of a protein crystal determine the power and resolution of the diffraction data. Similarly, the accuracy of the alignment (the angular relationship between different views of the molecule) and the number of images averaged determine the reliability and resolution of the EM reconstruction from single particles. Fig. 1 shows a schematic of image analysis procedures in 2-D electron crystallography and in single particle reconstructions.
The single particle method is especially well suited for studying large biological complexes. In fact, the larger the complex, the easier it is to align different particles to achieve higher resolution. In addition, only a small amount of protein is required, on the order of a few picomole for a 25-Å reconstruction. At this resolution, the overall shape of a complex can be determined and domains may become visible.
Single particle reconstruction is not without problems, the most obvious being its limited resolution (currently, a resolution better than 15 Å is considered state of the art for particles without symmetry). Whereas theoretical calculations show that atomic resolution can in principle be obtained from single particles (Henderson 1995; Glaeser 1999), the resolution has been limited by technical problems. These include image contrast lower than predicted by theory, limited accuracy of alignment procedures, and the sheer magnitude of data and computational time required to reach 3 Å. Furthermore, certain conformations of domains or loops in a protein might be stabilized in a crystal if they are involved in crystal contacts. Such contacts are absent in isolated molecules and complexes, giving them a larger degree of conformational freedom and further limiting the resolution obtained using single particle averaging. Of all EM techniques, only electron crystallography of 2-D crystals has yielded atomic resolution for a handful of proteins (Henderson et al. 1990; Kühlbrandt et al. 1994; Kimura et al. 1997; Nogales et al. 1998; Murata et al. 2000).
What Can Be Learned from Single Particle Reconstructions?
Within the present range of attainable resolution, image reconstruction can provide not only an overall shape of the protein complex under study, but also information on the assembly process of the full molecular machinery, the distribution of subunits within it, and the overall conformational changes underlying its function. We have chosen examples from recent literature to illustrate how this methodology is used to elucidate the structure–function relationship in a variety of molecular machines in a cell. We will consider: (a) the determination of 3-D structures, (b) the mapping of subunits and active sites within a complex, (c) the visualization of different assembly stages of a complex, (d) conformational changes accompanying biological activity, and (e) a movie-like series of snapshots of functional states.
3-D Structure Determination
The structure of a large protein assembly resolved at 20-Å resolution provides the overall shape and domain architecture of the complex. The outline of the complex can be useful to correlate biochemical and functional information with the coarse structural features visible at the 20 Å level, even if the location of individual subunits cannot yet be determined.
Mitochondrial Complex I.
Complex I is the entry point for electrons in the electron transport chain of mitochondria. The bovine complex consists of 43 different subunits with a total molecular mass of ∼900 kD. It can be dissociated into two subcomplexes, Iα and Iβ, containing 23 and 17 subunits, respectively. Iα contains mostly hydrophilic subunits, including those that bind NADH, FMN, and all the Fe-S clusters identified by electron paramagnetic resonance. Iβ is part of the membrane domain of complex I, since it contains at least two of the hydrophobic subunits. The 3-D structure of bovine complex I (Grigorieff 1998) (Fig. 2 a), complex I from Neurospora crassa (Guénebaut et al. 1997), and Escherichia coli (Guénebaut et al. 1998) have been studied by single particle EM. At 17-Å resolution (Fig. 2 a), the bovine complex revealed two separate domains linked by a thin stalk. The domains were identified as membrane and matrix domains by comparing their shape to other complex I structures determined earlier at a lower resolution, and by correlating their volume with the approximate volumes calculated for subcomplexes Iα and Iβ. The structure implies that the thin stalk contains part of the electron transfer pathway through complex I.
Clathrin is the major coat protein for vesicles involved in endocytosis and in the latest stage of the biosynthetic pathway. The clathrin coat is capable of deforming the membrane into an invaginated bud in which selected recycling receptors bearing appropriate signals are concentrated. The 3-D structure of a clathrin cage at 21-Å resolution reveals a hexagonal barrel detailing the packing of entire clathrin molecules as they interact to form the cage (Smith et al. 1998) (Fig. 2 b). This structure provides a structural basis for the assembly and disassembly of clathrin coats on a membrane. Each triskelion contributes to two connecting edges of the polyhedral cage to form a tightly packed structure that maximizes the area of contact between the legs. The terminal domains continue inward towards the adaptor complex, forming “hook-like” structures and a network with other legs. The terminal domain density could be interpreted using the x-ray structure of the 55-kD NH2-terminal fragment (Musacchio et al. 1999). The structure surrounding the threefold vertices is fairly open and may be accessible to other proteins.
Nuclear Pore Complex.
The nuclear pore complex (NPC) creates an aqueous channel across the nuclear envelope through which macromolecular transport between nucleus and cytoplasm occurs. Nucleocytoplasmic traffic is bidirectional and involves diverse substrates, including protein and RNA. The NPC comprises multiple copies of ∼50 different nucleoporins, which form a scaffold for proteins involved in binding and translocation of macromolecules across the nuclear membrane. Low resolution models of vertebrate and yeast NPCs were derived using single particle methods. EM revealed important differences, within the overall similarity of the structures, that have arisen during evolution and may be correlated with differences in nuclear transport regulation (Akey 1995; Yang et al. 1998) (Fig. 2 c). The NPC has eightfold rotational symmetry and a multidomain-spoke complex surrounding a central transporter framed by cytoplasmic and nucleoplasmic coaxial rings. Eight thin fibers involved in protein import (Rutherford et al. 1997) project from the NPC cytoplasmic ring, and a basket structure that participates in mRNP export (Kiseleva et al. 1998) extends from the nucleoplasmic ring.
Mapping Subunits and Active Sites
Ideally, the 3-D envelope of a multisubunit complex is combined with information about the location of individual subunits within it. Such information could be obtained, for example, by labeling subunits using subunit-specific antibodies. If the complex is obtained from a recombinant source, it is sometimes also possible to introduce cysteines at specific sites that can be labeled with gold clusters. These labels produce additional density in the 3-D structure and allow a mapping of subunit locations. Often, atomic models obtained by x-ray crystallography or nuclear magnetic resonance are available for some of the subunits. These can be used to interpret the EM map by fitting the models into the low resolution density, a process referred to as “docking.” Thus, even if a high resolution structure of the entire complex cannot be obtained due to the lack of suitable crystals, an approximate atomic structure can still be derived by fitting together components. However, an atomic model derived this way is likely to contain some degree of error, since some subunits may assume different conformations in isolation and in states bound to other subunits in the complex.
Most eukaryotic proteins are degraded in the ubiquitin pathway by the 26S proteosome (2–3 MD) in an ATP-dependent process. Although the proteosome 20S core is well characterized (Fig. 3 a, orange), little is known about the regulatory complexes (RCs) that associate at both ends of the core and are involved in recognizing and unfolding proteins before degradation (Fig. 3 a, blue). The RCs contain hexameric AAA ATPases and up to 12 other subunits. One of these subunits is the deubiquitinylating enzyme p37A. Using a nonhydrolyzable substrate analogue, Hölzl and colleagues 2000 recently mapped its location. This is the first time that one of the RC's specific functions, deubiquitinylating activity, has been localized, thus providing new insights into the sequence of events leading to substrate degradation. Gold-conjugated inhibitor was mapped to the interface between the two RC subcomplexes, the base and the lid (Fig. 3 a, bottom).
The insulin receptor is a constitutively dimeric receptor of the tyrosine kinase family. The 3-D structure of the insulin receptor bound to its ligand was determined by cryoEM and image reconstruction (Fig. 3 b) (Luo et al. 1999). Gold-labeled insulin served to locate the insulin-binding domain, whereas available high resolution domain substructures were used to obtain a detailed model of this heterotetrameric receptor. This study showed that both subunits participate in insulin binding and that the kinase domains are in a juxtaposition that permits autophosphorylation in the first step of receptor activation.
Transcription Factor TFIIH.
TFIIH is a complex of 12 different subunits shared by the transcription initiation and the transcription-coupled DNA repair machinery. Studies of 2-D crystals of a yeast core TFIIH containing five of the subunits have rendered an 18-Å resolution view of this part of the complex (Chang 2000). The structure of the complete human TFIIH was obtained at a resolution of 38 Å using the single particle methodology (Schultz et al. 2000). The TFIIH is organized into a ring-like structure from which a large protein domain protrudes (Fig. 3 c). Immunolabeling experiments have shown the relative positions of some of the subunits within the complex. The helicases XPB and XPD are located roughly on opposite sides of the cyclic part of the structure, which surrounds a central cavity that could accommodate double-stranded DNA. The protrusion accommodates the kinase involved in phosphorylation of the COOH-terminal domain of Pol II.
The use of EM and image reconstruction has been fruitful in the study of the actin cytoskeleton, particularly on the structural characterization of the binding interfaces of actin-binding proteins and their effect on the actin filament structure. These studies have generally used helical reconstruction to make use of the natural symmetry present in these cytoskeletal fibers to visualize F-actin decorated with different interacting proteins. This approach, combined with the docking of crystallographic structures, was most successfully used to determine acto-myosin interactions (Rayment et al. 1993). Since then, it has been used to describe the interactions of actin with α-actinin (McGough et al. 1994) (Fig. 4 d), cofilin (McGough et al. 1997) (Fig. 4 b), gelsolin (McGough et al. 1998) (Fig. 4 c), fimbrin (Hanein et al. 1997, Hanein et al. 1998), tropomyosin, calponin (Hodgkinson et al. 1997), and scruin (Schmid et al. 1995), to cite some examples (for a review see McGough 1998). These studies have provided invaluable information for understanding how these proteins cap, sever, cross-link, or destabilize the actin filament. Although helical reconstruction differs from single particle methods, both techniques deal with naturally occurring, physiologically relevant protein assemblies. Moreover, single particle methods are becoming increasingly useful to compensate for irregularities in helical structures.
CryoEM and helical reconstruction have also been used extensively to obtain intermediate resolution structures of kinesin motors bound to microtubules. The crystal structures of the motors and tubulin have been docked into cryoEM envelopes to obtain detailed information on the contacts between these proteins (for review see Amos 2000). A case of special interest is the study of microtubules decorated with KIF1A, a monomeric, yet processive, motor that differs from conventional kinesin by an insertion of a lysine-rich segment in loop L11 (Fig. 4 e). Docking of the tubulin and kinesin structures into the 15-Å reconstruction shows that the position of an extra density coincides with the position of the L11 loop and faces the acidic COOH-terminal end of the tubulin monomer (Kikkawa et al. 2000). This strongly suggests that an ionic interaction is responsible for the processivity of this motor. The interaction creates a tethering force during the motor step and avoids its diffusion away from the microtubule wall, even in the absence of a second head.
H+-ATPases of the F0F1-type catalyze ATP synthesis coupled with proton transport in the membrane of mitochondria, chloroplasts, and bacteria. They have a transmembrane component, F0, involved in proton transport and a soluble part, F1, that contains the ATP-binding sites. Single particle image processing showed that F1 is connected by a prominent stalk to a more peripheral part of F0 (Böttcher et al. 2000). The 3-D maps of the enzyme both in the absence and presence of AMP-PNP showed that the nucleotide does not induce any significant changes in F0, whereas major conformational changes were observed in F1: decrease in diameter, appearance of a pointed cap, and a significant change in its inner cavity.
Chaperonin complexes are required for the efficient folding of many proteins in a cellular context. Bacterial chaperonin GroEL and its coactivator GroES have been extensively characterized by x-ray and cryoEM (for review see Saibil 2000). The latter has been particularly useful in the study of the chaperonin complex bound to substrate or in different nucleotide states (Chen et al. 1994; Rye et al. 1999).
The eukaryotic chaperonin CCT is formed by two stacked rings, each with eight different subunits, in contrast with the simpler, homooligomer GroEL. Valpuesta and coworkers have used single particle methods with images of frozen–hydrated samples of CCT to obtain 3-D reconstructions of both the apo and ATP-bound complexes (Llorca et al. 1999b). Whereas the apo-CCT has a symmetric, barrel-like structure, the binding of ATP generates an asymmetric particle. One of the rings in the ATP-CCT undergoes substantial movement in the apical domains that rotate and point towards the cylinder axis. The same authors have characterized the binding to CCT of α-actin, one of the few specific substrates of this chaperonin (Llorca et al. 1999a) (Fig. 5, a–d). Using antibodies specific to two of the eight subunits, they have shown that the small domain of actin binds to CCTd and the large domain to CCTb or CCTe, clearly indicating that the binding of actin to CCT is subunit specific.
Some complexes go through distinct stages of assembly during their biochemical cycle. A classic example is the spliceosome, with at least four biochemically distinct assembly intermediates needed to facilitate the two chemical steps of splicing: lariat formation and exon ligation. The enzymatic activity of such complexes can be studied by trapping the intermediates, either biochemically or by rapid freezing. The 3-D structure of each intermediate can then be determined by EM, producing many snapshots to visualize the entire pathway. In some cases, the assembly pathway of a large supracomplex can be followed by sequentially adding components to a nucleating unit. The position of the additional elements can then be followed together with any conformational changes that occur.
The general transcription factor TFIID contains about 12 different subunits involved in core promoter recognition and interaction with gene-specific activators. The binding of TFIID to the DNA promoter is followed by the recruitment of the rest of the general transcriptional machinery and RNA Pol II. The 3-D structure of this complex shows a horseshoe shape formed by three lobes, with a central cavity that could accommodate dsDNA or even a whole nucleosome (Andel et al. 1999; Brand et al. 1999) (Fig. 5 e). The lobes are connected by narrow regions of density, which suggests that relative movement of the lobes (that would change the size and shape of the central cavity) can be easily accomplished, perhaps upon binding of the complex to DNA or upon interaction with activators or other GTFs (general transcription factors). Reconstructions of multicomplexes that recapitulate the first steps in transcriptional machinery assembly have shown the binding sites of TFIIB and TFIIA on the large TFIID (Andel et al. 1999) (Fig. 5 e). The positions of these sites, together with the existing knowledge of the interaction between different TFIID components, suggest a hypothetical distribution of subunits on the trilobal TFIID structure.
Moving Machines: The Ribosome
Structural analysis of the ribosome and the translational cycle has driven the development of single particle EM, and the ribosome is the most successful example of this methodology. Joachim Frank and coworkers have obtained reconstructions of the ribosome in different stages of the translation cycle with the ribosome bound to different cofactors (tRNAs) (Agrawal et al. 1996), elongation factors (Agrawal et al. 1998), or the translocon channel (Beckmann et al. 1997). Similar efforts have been carried out by van Heel and coworkers (Stark et al. 1995, Stark et al. 1997, Stark et al. 2000). The EM efforts to elucidate the structure of the ribosome have developed in parallel with those of x-ray crystallography and have proven how the two methodologies are highly complementary. EM reconstructions of the ribosome at ∼20-Å resolution were initially used to phase x-ray data (Ban et al. 1998; Harms et al. 1999), whereas high resolution crystal structures of ribosomal components have been “docked” into the EM reconstructions of the full ribosome to gain atomic detail of some aspects of the translation cycle (Malhotra et al. 1998). The entire translation cycle has been visualized by obtaining a complete series of intermediate structures of the ribosome using single particle EM (Frank et al. 1999) (Fig. 6 a).
Recent articles have been dedicated to the detailed analysis of 3-D reconstructions of the ribosome intermediate states throughout translocation. Stark and co-workers have shown that elongation factor G (EF-G) has different conformations before and after translocation, resulting in different conformations of the ribosome in these two states (Stark et al. 2000). The authors docked the crystal structure of EF-G–GDP to the density attributed to the factor in both states and found that the movement of domains 1 and 2 with respect to domains 3, 4, and 5 is required to obtain a good fit (Fig. 6 b).
Frank and Agrawal have most recently shown that both binding of EG-F and GTP hydrolysis cause rotations of the 30S subunit with respect to the 50S subunit, in addition to various conformational changes in both subunits (Frank and Agrawal 2000). The binding of the EG-F and an initial anti-clockwise rotation are accompanied by a widening of the mRNA channel. Upon hydrolysis, the tRNAs translocate, and then a clockwise rotation occurs that narrows the mRNA channel again.
Future Samples and Perspectives
The capabilities of single particle EM will certainly increase within the next few years. Only a few years ago, a resolution of ∼7 Å for hepatitis B virus particles (Böttcher et al. 1997; Conway et al. 1997) was celebrated as a breakthrough for this technique. The high resolution was greatly facilitated by the 60-fold symmetry of the virus. However, since then, work on asymmetrical particles, such as the ribosome, has progressed to a resolution of ∼10 Å (Gabashvili et al. 2000), impressive indeed, considering that only five years ago such structures yielded resolutions between 20 and 30 Å. This jump in resolution is most dramatically illustrated by realizing that at 10-Å resolution secondary structure (α-helices) becomes visible, whereas at 20–30 Å a complex could only be described as an arrangement of blobs. The improvement is mainly due to technical advances in EM, namely the introduction of the field emission electron source that generates a highly coherent beam to enhance image contrast. Further improvements in instrumentation are also likely to increase image quality. Instruments are now commercially available that combine field emission with increased voltage up to 300 kV (higher voltage reduces image distortions due to specimen charging), more stable specimen stages (some are cooled to liquid helium temperature for better sample cryoprotection), and better vacuum to reduce or avoid sample contamination. In the next 5 to 10 yr, current developments will also make it easier to collect large data sets. Microscopes will be equipped with fast sample loading mechanisms, automated data collection (Potter et al. 1999), and CCD cameras, avoiding cumbersome film development and scanning. Automation of data collection is particularly promising since this is often a rate-limiting step, especially when data sets containing a few hundred thousand particle images must be collected for high resolution. Progress is also being made in data processing. Not only is the computing power of a workstation increasing every year, but better algorithms for particle classification, alignment, and reconstruction (Grigorieff 1998; Sigworth 1998; Lanzavecchia et al. 1999; Ludtke et al. 1999; Li et al. 2000; Pascual et al. 2000), using correct statistical weighting schemes, are key to high resolution. Automation and robust, easy to use image processing software will increase the throughput of structures determined by single particle EM. New algorithms will also be developed to dock atomic models of subunits into a lower resolution map (Volkmann and Hanein 1999; Wriggers et al. 1999). This procedure will become more important as the number of available atomic models increases, developing into a standard technique linking x-ray crystallography, nuclear magnetic resonance, and EM. Low resolution structures determined by EM will also be used to derive phases for x-ray diffraction data (Prasad et al. 1999; Reinisch et al. 2000). Hybrid techniques like docking and low resolution phasing combine the strengths of all structural techniques in common use today, making them the most powerful tools in the structural analysis of large complexes in the cell.
The number of macromolecular complexes of great biological relevance that await structural characterization is still very large. In many cases, such complexes are available in quantities too small for crystallization trials, or are too unstable or perhaps too flexible to form crystals at all. The complexes may be modular with different components joining the assembly at different stages in its functional cycle. Examples of such complexes are the replisome machinery involved in DNA replication, the spliceosome complexes needed for RNA processing, or chromatin remodeling complexes required to allow gene replication. The big unknown of the molecular motors, dynein, and its interaction with dynactin, are in the minds of many biologists interested in the cytoskeleton. The development of cryoEM techniques and of single particle image processing methods for single particles will undoubtedly parallel the discovery and biochemical purification of many new and essential molecular machines. Our expectations for higher throughput and higher resolution should make possible the simultaneous biochemical and structural characterization of these fascinating macromolecular assemblies in our effort to elucidate their mechanisms of function.
We would like to thank Kenneth H. Downing, Robert M. Glaeser, and Maria Gianferrari for their comments on the manuscript.
Abbreviations used in this paper: D, dimensional; EF-G, elongation factor G; NPC, nuclear pore complex; RC, regulatory complex.