Thrombospondin-1 (TSP-1) contains three type 1 repeats (TSRs), which mediate cell attachment, glycosaminoglycan binding, inhibition of angiogenesis, activation of TGFβ, and inhibition of matrix metalloproteinases. The crystal structure of the TSRs reported in this article reveals a novel, antiparallel, three-stranded fold that consists of alternating stacked layers of tryptophan and arginine residues from respective strands, capped by disulfide bonds on each end. The front face of the TSR contains a right-handed spiral, positively charged groove that might be the “recognition” face, mediating interactions with various ligands. This is the first high-resolution crystal structure of a TSR domain that provides a prototypic architecture for structural and functional exploration of the diverse members of the TSR superfamily.
The thrombospondins (TSPs)* are a family of five matricellular glycoproteins that regulate extracellular matrix structure and cellular phenotype (for reviews see Bornstein, 1995; Chen et al., 2000). These modular proteins act by bringing together cytokines, growth factors, other matrix components, membrane receptors, and extracellular proteases. TSP-1, for instance, regulates cell proliferation, migration, and apoptosis in a variety of physiological and pathological settings, such as wound healing, inflammation, angiogenesis, and neoplasia (Chen et al., 2000). The inhibition of angiogenesis involves the inhibition of endothelial cell migration and the induction of endothelial cell apoptosis (Tolsma et al., 1993; Jimenez et al., 2000). In addition, TSP-1 inhibits the mobilization of VEGF from the extracellular matrix by matrix metalloproteinase 9 (Rodriguez-Manzaneque et al., 2001). TSP-1 can also suppress the growth of TGFβ-responsive tumor cells through activation of this cytokine (Miao et al., 2001). The 420-kD TSP-1 molecule is trimeric and each protomer is composed of multiple domains: NH2- and COOH-terminal globular domains; a procollagen-like domain; and three types of repeated sequence motifs, designated type 1, 2, and 3 repeats (Lawler and Hynes, 1986). The type 1 repeats (designated TSP-1 domain in the pfam database or thrombospondin repeat [TSR] elsewhere) were initially identified in human endothelial cell TSP-1 (Lawler and Hynes, 1986). Database searches at that time revealed that a similar structural motif was present in the complement factors C8 and C9 and in the circumsporozoite protein of the Plasmodium falciparum parasite. Since then, the TSR domain has been identified in multiple different protein families (for review see Adams and Tucker, 2000). The human genome has 41 proteins that contain TSRs and the Caenorhabditis elegans and Drosophila genomes have 27 and 14 proteins with TSRs, respectively (Venter et al., 2001). The copy number of the TSR in these proteins varies from 1 to 18. All of the TSRs are in secreted proteins or in the extracellular portion of transmembrane proteins. They appear to be involved in cellular migration, communication, and tissue remodeling in complex tissues. The TSR domain consists of ∼60 amino acids, of which 12 are highly conserved. Most notably, the NH2-terminal portion of the TSR contains two or three tryptophan residues separated by two to four amino acids each (Klar et al., 1992; Adams and Tucker, 2000; Kilpelainen et al., 2000). The majority of TSRs have six cysteine residues; however, those found in some complement factors and in the malaria proteins contain fewer cysteines. In addition, two arginines and two glycines are also highly conserved.
The functions of the three TSRs that are present in TSP-1 have been extensively studied. They reportedly function as (a) attachment sites for many cell types, (b) inhibitors of angiogenesis, (c) protein binding sites, and (d) glycosaminoglycan (GAG) binding sites (for review see Chen et al., 2000). Synthetic peptides have been used to identify amino acid residues within the TSRs of TSP-1 that are important for the anti-angiogenic activity. Tolsma et al. (1993) found that peptides from the second and third TSRs that included the CSVTCG sequence inhibit angiogenesis. This sequence is reportedly involved in the binding of TSP-1 to CD36, and CD36 on endothelial cells reportedly mediates the anti-angiogenic effect of TSP-1 (Dawson et al., 1997). However, two groups have shown that synthetic peptides that flank either side of the CSVTCG sequence, but do not include it, also inhibit angiogenesis (Dawson et al., 1999; Iruela-Arispe et al., 1999). One of these includes the W**W**W (the asterisk indicates positions that are occupied by various amino acids) sequence that has been proposed to be a GAG binding site (Guo et al., 1992). The other sequence, GVITRIR, contains the highly conserved arginine residues (Dawson et al., 1999).
A unique function has been mapped to the sequence between the first and second TSRs of TSP-1. This region binds and activates TGFβ (for review see Murphy-Ullrich and Poczatek, 2000). This activity is mediated by the tripeptide RFK of TSP-1 and is conserved in all TSP-1 sequences determined to date. By contrast, this sequence is not found in the TSRs of most other proteins, including the closely related TSP-2 protein. An RFK sequence is found in the sixth TSR of F-spondin but it falls in a different location within the sequence and, to date, it has not been shown to activate TGFβ (Klar et al., 1992).
Taken together, the data indicate that the TSR-containing proteins participate in development, angiogenesis, tumor progression, axon guidance, activation of TGFβ, and wound healing. Some of the amino acid sequences that mediate TSR functions have been identified. However, several nonoverlapping synthetic peptides have been reported to have similar functions. To better understand the biological function of the TSRs, we have determined the three-dimensional structure of the TSRs of human TSP-1 by X-ray crystallography.
Crystal structure determination of the h3TSR
A soluble recombinant protein containing all three TSRs of human TSP-1 (designated h3TSR) was expressed in Drosophila S2 cells. We have previously shown that this protein, h3TSR, is active in that it activates TGFβ and inhibits endothelial cell migration, angiogenesis, and tumor growth (Miao et al., 2001). The h3TSR contains 186 amino acid residues, including Asp358–Ile530 of human TSP-1, six vector-derived amino acids (RSPWPG) at the NH2 terminus, and two vector-derived amino acids (TG) followed by a His tag at the COOH terminus. Crystals were obtained that diffracted to 1.9 Å resolution, and the structure was solved by the multiple-wavelength anomalous diffraction (MAD) method (Hendrickson, 1991) using a Hg-derivatized crystal (Table I). Only residues from TSR2 and TSR3 are seen in the electron density maps. SDS-PAGE run on crystal samples revealed a single band with a molecular mass of around 17 kD, corresponding to a two TSR domain protein (unpublished data). Although the three TSRs in the intact TSP-1 protein are resistant to protease digestion, apparently the high concentration of h3TSR in the crystallization droplet exposes the highly basic sequence of KRFK between TSR1 and TSR2 to a trypsin-like protease for degradation. The final model hence just includes the second and third TSRs arranged in tandem with a total of 117 residues. For the following discussion, we will refer to this model as hTSR[2,3].
The overall hTSR[2,3] structure
Fig. 1 A is a ribbon diagram of the molecular structure of hTSR[2,3]. Each TSR folds into a long, thin, spiraling, antiparallel, three-stranded domain with dimensions of ∼15× 20×55 Å. The core of each domain is a novel eight-layered structure that will be described in detail below. The tilt and twist angles between TSR2 and TSR3 are 40° and 180°, respectively. The 180° twist places the TSR2 and TSR3 domains facing in opposite directions. DALI search with the TSR domain (using the website http://www2.ebi.ac.uk/dali/) failed to identify any homologous structures in the database.
The TSR2 and TSR3 domains are the products of exons 9 and 10 of the human TSP-1 gene, respectively. The exception is Ile473, which is encoded by exon 9 but does not belong to TSR2. Rather, Ile473 forms two main chain hydrogen bonds to Gln507, joining the top portions of the TSR3 domain. The three residues at the COOH terminus of exon 10 are not visible in the electron density map of the current crystal structure. Pro472 appears to be a linker between the TSR2 and TSR3 domains. Immediately before Pro472 is the disulfide bridge of Cys471–Cys433, which stabilizes the conformation of the bottom portion of the TSR2 domain. The residue Pro472 (the linker between TSR2 and TSR3) and its spatially neighboring residues Ala470, Cys471, Ile473, Phe508, and Gly509 create a hydrophobic interface between the two domains. These data suggest that there is a relatively rigid linkage between the TSR2 and TSR3 domains. Based on the amino acid sequence alignment shown in Fig. 1 B, the TSR1 domain should have a very similar fold as that of the TSR2 and TSR3 domains. A distinct feature of the linkage between the TSR1 and TSR2 domains is that there is a five-residue linker (D411KRFK), as opposed to just the one residue that is between TSR2 and TSR3. Presumably, the linkage between TSR1 and TSR2 is very flexible. The structures of TSR2 and TSR3 are so similar that the rmsd value for the superposition of TSR2 and TSR3 is only 0.62 Å (two loops are excluded). Therefore, unless otherwise indicated, only TSR2 is used for the discussion below.
Two types of glycosylation have been identified biochemically in the TSRs (Hofsteenge et al., 2001). Hofsteenge et al. (2001) have demonstrated that a fucose is O-linked to the threonine in the CSVTCG sequence motif on each TSR. Our electron density maps have clearly shown that one fucose moiety each can be built onto the TSR2 and TSR3 domains and no electron density beyond this sugar unit is visible. The modeled glycan is located on the AB loop at the bottom of each domain (Fig. 1 A). Another potential glycosylation is the C-mannosylation of the first tryptophan of the W**W motif (Hofsteenge et al., 1999, 2001). In platelet TSP-1, for instance, Trp367, Trp420, Trp423, and Trp480 are C-mannosylated (Hofsteenge et al., 2001). The S2 expression system used in this work, however, does not allow for this posttranslational glycosylation (Hofsteenge et al., 2001). There is indeed no electron density to be seen for any C-manosylation in our calculated maps. However, the side chains of all three tryptophans on each TSR are so oriented that the Cδ1 atoms are exposed, ready for accommodation of mannoses (Hartmann and Hofsteenge, 2000) (Fig. 2). The function of this glycosylation is not well studied.
The CWR-layered core structure of the TSR fold
Small domains like the TSR (<100 amino acid residues) very often have special folds that lack regular secondary structure and are maintained by the rich presence of disulfide bonds or metal ions (Richardson, 1981). The exceptions are some recently identified small modules, such as SH3 (Pawson, 1995) and WW (Macias et al., 1996) domains, that are present in a number of signal transduction proteins and that do have regular secondary structures as basic folding elements. The TSR domain described in this article, however, represents a novel fold, for which the interdigitating side chain stacking of cysteine, tryptophan, and arginine in layers (the CWR layer) from three strands comprises the core of the structure (Fig. 2). Only the B and C strands form limited regular β structures (Fig. 1). By contrast, the A strand assumes a unique rippled conformation (Fig. 1 A; Fig. 2) and has a conserved sequence motif of W**W**W (Fig. 1 B). As shown in Fig. 2 A, the side chains of the three tryptophans (Trp420, Trp423, and Trp426 of TSR2) make up three tryptophan layers (the W layers) that play a central role in the fold. Sandwiched between each of the two adjacent W layers is one guanidinium group of an arginine from strand B. A comparison of the amino acid sequences of the TSRs from various proteins reveals that the sequence R*R*R is commonly found in strand B (Fig. 3). These arginines comprise the R layers. The alternate stacking of the planar cationic guanidinium groups of the arginines and the aromatic side chains of the tryptophans forms multiple cation-π interactions (Flocco and Mowbray, 1994; Gallivan and Dougherty, 1999), which may provide vital stabilization energy for TSR folding. The spacings between the R and W layers are observed to be around 4.6 Å throughout TSR2 and TSR3. A similar structural element is found in the class I cytokine receptor family. The so-called “WSXWS box” is observed in this family, but its function is controversial (Somers et al., 1994). In the human growth hormone receptor, for instance, the stacked tryptophans and arginines come from four different strands on the same face of a membrane proximate Ig-like domain. This face is not involved in hormone binding, nor receptor dimerization (de Vos et al., 1992). The erythropoietin receptor has the WSXWS sequence motif in a β bulge, and the pyrrole rings of two Trps sandwich a guanidinum group of an Arg from a neighboring β strand (Livnah et al., 1996). The uniqueness of the TSR domain is that, to our knowledge, it is the first example in which the alternately stacked W and R layers capped by C layers constitute the major structural feature of the domain. In the TSRs of other proteins, a glutamine or lysine sometimes replaces the first arginine in the sequence (the R3 layer in Fig. 3). In TSP-1, TSR1 and TSR3 have a glutamine in the R3 layer, whereas TSR2 has an isoleucine. The amino group from a lysine and the amide group from a glutamine can also participate in cation-π interaction with a tryptophan residue (Flocco and Mowbray, 1994; Gallivan and Dougherty, 1999).
The arginines in the R layers also always interact with a residue from the C strand laterally, either in a salt bridge like the Arg440–Glu462 pair, or in a hydrogen bond to the carbonyl group like the Arg442–Glu459 pair. In TSR2, Ile438 in the R3 layer makes hydrophobic contacts with Lys464 of the C strand between the aliphatic portions of their side chains. The sequence alignment shown in Fig. 3 demonstrates that the residues in the R layers from the B strand and their interacting partners in the C strand frequently have large side chains. This suggests that the contacts between hydrophobic portions of the paired side chains may also play a role in stabilizing the TSR domain. In TSR3 of TSP-1, Gln495 in the R3 layer and Gln521 from strand C form a typical hydrogen bond between the NE2 atom and OE1 atom of their respective amide groups. Again, we see a spectrum of variation in the R3 layer among different TSR domains.
The six alternated W and R layers are capped by disulfide bonds at the top and bottom, hence the two C layers. These are Cys444–Cys456 and Cys429–Cys466 in TSR2 (Fig. 2), Cys501–Cys513 and Cys486–Cys523 in TSR3, and their conserved counterparts in other TSRs (Fig. 3). Fig. 2 B depicts the eight CWR–layered fold of the TSR2 domain in a schematic manner. Within the eight layers (in the order of C1, W1, R1, W2, R2, W3, R3, and C2 from top to bottom), all of the side chains that form the CWR-layered structure are in a planar and parallel conformation. Even the side chain of Ile438 in the R3 layer of TSR2 assumes the best possible orientation for stacking. Due to the inherent strand twist, the whole CWR-layered structure is in a right-handed spiral conformation, reminiscent of a DNA structure.
The rippled A strand with its multiple bulges is another interesting feature of the eight-layered TSR domain. As mentioned previously, the A strand is not a regular β strand. It nevertheless has a unique hydrogen bonding network to the B strand. Fig. 4 shows how a serine after each of the tryptophans in the W layers and after the COOH-terminal Cys429 on the A strand forms two hydrogen bonds to the backbone of the B strand: one via its hydroxyl side chain and the other with its amide group. The formation of side chain to main chain hydrogen bonds between the two antiparallel strands creates even larger bulges in comparison to a well-defined wide bulge (Richardson, 1981). A threonine or a glycine sometimes replaces the serine in these positions in other TSRs (Fig. 3). The former can act like a serine, and the latter allows a sharp turn. They all favor the bulged conformation for the A strand. This hydrogen bonding network between the A and B strands, the regular β structure of the B and C strands, the disulfide bonds, and the side chain stacking of the CWR-layered structure ensure a stable folding unit for this small domain.
The “jar handle” structures
The CWR-layered structure is the core of TSR domain. At the bottom of both the TSR2 and TSR3 domains, a cysteine on the AB loop forms the third disulfide bond with a cysteine in the COOH terminus of strand C (Fig. 1 A; Fig. 4 A). This disulfide bond consolidates the bottom of the domains. At the top of the domains, there is a relatively long BC loop that tends to be enriched in proline, glycine, or charged amino acids in the TSRs of various proteins. In the TSR structure of TSP-1, the loop features a regular β turn in the middle and two “jar handle”–like structures at the beginning (the dark red–colored smaller one in Fig. 1 A and Fig. 4 A) and the end (the purple-colored larger one in Fig. 1 A and Fig. 4 A). The small jar handle is at the beginning of the BC loop and above the C1 layer (the Cys444–Cys456 pair in TSR2; Fig. 4 A). It appears somewhat like a conventional β turn composed of four residues (Asn445–Ser448), but the characteristic hydrogen bond within the turn is missing. Instead, the amide group of the second residue (Ser446) and the carbonyl group of the third residue (Pro447) turn to form hydrogen bonds to the main chain of the NH2 terminus of the A strand (Fig. 4 A). This unique local structure brings the beginning of the A strand closer to the BC loop, while the nearby disulfide bond in the C1 layer cross-links the B and C strands. Together, the interactions stabilize the top of the domain. The third residue is a cis-proline in both TSR2 and TSR3, which may favor the formation of a jar handle. In TSR1 and the TSRs of some other proteins, this proline is absent (Fig. 1 B; Fig. 3). It is difficult to convincingly predict if this small jar handle exists in these proteins. The unusual handle structure acts just like the serine hydrogen bonding network to integrate the A strand into the domain (Fig. 4 A). As discussed later, a disulfide bond may serve this function in the TSRs of some other proteins.
The large handle (purple in Fig. 1 A and Fig. 4 A) is at the end of the BC loop, and immediately below the C1 layer. It consists of five residues (Cys456–Ala460). The interesting features of this handle are that the third position is a conserved Gly458, and the carbonyl group of the second residue, Glu457, forms a hydrogen bond to the NH1 atom of the tryptophan in the W1 layer (Fig. 4 A). This allows this tryptophan's bulky side chain to be accommodated within the handle. By contrast, the tryptophans in the W2 and W3 layers have their polar NH1 atoms exposed and available for potential ligand binding.
A positively charged groove on the “front” face of the TSR: the proposed “recognition” face
The above-mentioned exposed tryptophans from the W layers along with the exposed arginines from the R layers define the “front” face of the domain. This is best seen in Fig. 2 A, where the three strands run along the “back,” providing a framework to support a continuous positively charged front face containing side chains from the W and R layers. Other residues that are located at the edge of the A and C strands also help to create a groove-like structure within this positively charged region. Fig. 5 A is a molecular surface representation of TSR2 with a few residues labeled on the surface. This figure clearly depicts this DNA-like right-handed spiral groove. We propose that this is likely to be the “recognition face” of the domain. The molecular orientation in Fig. 1 A sets the TSR2 domain's recognition face away from the reader, whereas the TSR3 domain's recognition face is toward the reader.
Having the structure of TSP-1 TSRs at hand enables us to interpret some functional data that are largely based on peptide studies. In general, a short linear peptide cannot easily mimic its counterpart in a folded TSR domain. Nevertheless, some peptide investigations are still suggestive, as discussed below.
GAG binding properties of TSRs
GAGs and proteoglycans constitute one major component of extracellular matrix. GAGs are long, unbranched polysaccharide molecules composed of repeating disaccharide units. One of the disaccharides is always a uronic acid, whereas the second one is an N-acetylglucosamine or N-acetylgalactosamine, which are usually sulfated, rendering the molecule very negatively charged. The ability to bind GAGs is important to the function of many extracellular and membrane proteins. TSRs have been reported to mediate GAG binding in several proteins. Crystal structures of heparin–protein complexes indicate that a cluster of positively charged residues on a protein's surface can interact with the negatively charged carboxylate and sulfate groups of heparin (Pellegrini et al., 2000; Lietha et al., 2001). Heparin molecules are in a right-handed helical conformation, and the disaccharide repeat of heparin spans ∼9 Å (Mulloy et al., 1993). The distance between the W1 layer and R3 layer in the TSR is close to 20 Å. It is conceivable that two spiral disaccharide units may fit well into the right-handed groove in the recognition face of the TSR. The C-mannosylation on some tryptophans (Hofsteenge et al., 2001) of TSP-1 TSRs does not significantly change their surface electrostatic potential, but creates protrusions along the edge of the grooves (Fig. 5 B). The mannose groups may help to define the groove in the recognition face. Because the separation of two arginines in the groove is also ∼9 Å, the speculation might be that the negatively charged groups of GAGs can form electrostatic interactions with these arginines. TSRs of TSP-1 have a relatively low affinity for heparin that may not be detectable at physiological salt concentrations (Panetti et al., 1999), suggesting that residues other than the tryptophans and arginines, in particular residues from the C strand, may contribute significantly to GAG binding.
The protein ADAMTS-4 or aggrecanase-1 is a newly identified cartilage-degrading enzyme. It contains one TSR, which is highly homologous to the TSRs of TSP-1. The strong binding of the TSR of ADAMTS-4 to the GAG side chains of aggrecan (the major sulfate proteoglycan in cartilage) appears to be necessary for the enzymatic cleavage of aggrecan that occurs during disease-related cartilage destruction (Tortorella et al., 2000). It has been reported that synthetic peptides that include sequences found in the A and C strands of the TSR of ADAMTS-4 have weak affinities for aggrecan. The C strand of the ADAMTS-4 TSR contains more arginine residues than the TSRs of TSP-1 (Fig. 3). Fig. 5 C shows the surface representation of the ADAMTS-4 TSR domain, which has been modeled based on the TSR structure of TSP-1 reported here. This figure demonstrates that positive charges cover nearly the entire front face of this TSR domain due to the extra positively charged residues from the C strand. This model may explain the high binding affinity of ADAMTS-4 for the GAG chains of aggrecan.
F-spondin is a matrix-associated protein that is expressed in the floor plate and involved in patterning the axonal trajectory of commissural and motor neurons. F-spondin has multiple TSRs in its COOH terminus as well as reelin and mindin homologous regions in the NH2 terminus (Klar et al., 1992; Tzarfaty-Majar et al., 2001). The interaction of the F-spondin TSRs with extracellular matrix and neuronal growth cones has been reported to involve binding to GAGs (Klar et al., 1992; Tzarfaty-Majar et al., 2001). This binding is mediated by the fifth and sixth TSRs, whereas the first four TSRs are not involved. Comparing the sequence of the TSRs of F-spondin with those of TSP-1 reveals that the fifth and sixth TSRs of F-spondin have more basic residues on the C strand and BC loop (Fig. 3). These basic residues contribute positive charges to the front face of these TSRs as they do in the ADAMTS-4 TSR. For example, the residues of Glu457 and Glu459 (Fig. 5 A) in TSR2 of TSP-1 are substituted by Lys794 and Lys796 in TSR6 of F-spondin, respectively (Fig. 3). The only two acidic residues (an aspartic and a glutamic acid) on the C strand of TSR6 in F-spondin point away from the front surface. The front face of TSR6 is therefore dominated by a cluster of positively charged residues, as shown in Fig. 5 D.
The interaction of TSRs with GAGs is also involved in cellular invasion by apicomplexia. This process is central to the life cycle of P. falciparum and to the etiology of malaria. Host cell invasion involves the binding of circumsporozoite protein and thrombospondin-related anonymous protein (TRAP) on the parasite surface to GAGs on the surface of salivary glands in the mosquito and liver hepatocytes in the vertebrate host (Matuschewski et al., 2002). Mutations in the TSR sequence of TRAP result in a decrease in the ability of the parasite to invade salivary glands and liver cells (Wengelnik et al., 1999; Matuschewski et al., 2002). Mutation of the tryptophan in the W3 layer to alanine or substitution of the positively charged amino acids in and around the R layers with alanine residues decreased sporozoite infectivity of the salivary gland (Matuschewski et al., 2002). Because these residues are integral components of the CWR-layered structure, our data suggest that these mutations would disrupt the overall folding of the TSR and the formation of the recognition face.
Taken together, these data suggest that positive charges across the front face of a TSR are essential for high-affinity binding to GAGs in many important biological processes. More basic residues in the C strand and elsewhere around the front face probably contribute to stronger binding.
The binding of TSP-1 to CD36
TSP-1 is one of several naturally occurring inhibitors of angiogenesis. The inhibition of angiogenesis by TSP-1 is reportedly due in part to CD36-mediated inhibition of endothelial cell migration and induction of apoptosis (Dawson et al., 1997; Jimenez et al., 2000). Peptides having sequences from strand A and the sequence of GVITRIR from strand B of the second TSR have been shown to have the inhibitory activities (Tolsma et al., 1993; Dawson et al., 1999; Iruela-Arispe et al., 1999; Jimenez et al., 2000). The major portions of these peptides can now be mapped on the front face of the TSR structure presented here. On the other hand, the region between amino acids 90 and 120 of CD36 has been implicated in TSP-1 binding (Leung et al., 1992; Pearce et al., 1995). Secondary structure predictions (using the website http://www.embl-heidelberg.de/predictprotein/predictprotein.html) suggest that the sequence T(103)QDAEDN of CD36 falls into a negatively charged loop. This sequence is conserved in human and mouse CD36. We hypothesize that this negatively charged loop of CD36 may interact with the positively charged groove of the TSR's front face. The CSVTCG sequence has also been implicated in the interaction of TSP-1 with CD36. Upon careful examination of the folded TSR, one sees that the two cysteines in the CSVTCG sequence are involved in two separate disulfide bonds and that the threonine is O-glycosylated (Hofsteenge et al., 2001). Thus, the activities of synthetic peptides may not correlate with the function of the CSVTCG sequence in the context of the three-dimensional structure of the TSR.
Disulfide bond pattern and TSR grouping
A close scrutiny of the TSRs in the database reveals considerable variability (Fig. 3). Based on the disulfide bond patterns and multiple sequence alignment algorithms, we tentatively divide TSRs into two major groups represented by the TSRs of TSP-1 (group 1) and F-spondin (group 2). The major difference is the disulfide bond at the top of TSR's layered structure (the C1 layer), as illustrated at the left margin of Fig. 3.
The TSRs of TSP-1 are the prototype of group 1. Within this group (Fig. 3, top), the C1 layer at the top of the domain is formed by cysteines from the end of the B strand and the beginning of the C strand. In some of these TSRs, such as TSRs found in TSP-2, BAI-1, -2, and -3, ADAMTS family members, properdin, and others, there is a third disulfide bond at the bottom of the domain that helps to consolidate the bottom of the domain as described before. There are also TSRs found in the COOH terminus of the complement factors C6, C7, C8α, C8β, and C9, in which this third disulfide bond is absent. Instead, there appears an unpaired cysteine residue between the first and second tryptophans on the A strand (Fig. 3).
The structure of group 2 can be predicted for the TSRs of F-spondin. In these TSRs, the cysteines that form the C1 layer are predicted to be from the A and C strands, rather than from the B and C strands (Fig. 3, bottom). This disulfide bond pattern is found in midkine (Iwasaki et al., 1997). Thus, the beginning of the A strand is linked to the top portion of the domain through this disulfide bond. Note that in group 1, it is a jar handle–like element that integrates the A strand into the top of the domain (represented by two hydrogen bonds in red in Fig. 3, top). Like those TSRs in TSP-1, the TSRs of F-spondin also have a third disulfide bond at the bottom of the domain. TSRs of M-spondin, mindin-1 and -2, the NH2-terminal of complement factors, and TRAP/CTRP (circumsporozoite protein from P. falciparum) may fall into the same category. In the TRAP TSR and CTRP TSR1, 2, 3, 5, 6, and 7, the first tryptophan on the A strand is absent, presumably resulting in a seven-layered TSR core structure. An extra disulfide bond may exist in a few TSRs, like TSR2 of properdin, in which the NH2 terminus of the A strand is probably linked to the beginning of the C strand (see two cysteines in red for TSR2 of properdin in Fig. 3).
Midkine and HB-GAM have two TSR-like domains (Iwasaki et al., 1997; Kilpelainen et al., 2000). Sequence alignment demonstrates their large deviation from TSRs of other proteins (Fig. 3, bottom). The NMR structure of midkine's TSR-like domains (Iwasaki et al., 1997) showed that although their overall topology is reminiscent of that of TSP-1's TSRs reported here, they depart substantially in the CWR-layered structure of a TSR domain. Not only are the layers significantly fewer, but they can hardly be structurally compared with that found in TSP-1's TSRs. Fig. 6 is the superposition of the NH2-terminal TSR-like domain of midkine from the NMR structure onto TSR2 of the TSP-1. The layered positioning of tryptophan and arginine residues to form multiple cation-π interactions is a key feature of the TSP-1 TSRs. This does not seem to exist in midkine where there is only one potential cation-π interaction. A DALI search with TSP-1's TSR domain did not detect similarity to the structure of midkine, indicating that the structures are distinct.
The disulfide bond–based grouping might provide a framework to aid functional studies. For example, some proteins in groups 1 and 2 seem to differ in their effects on angiogenesis. The TSRs that are present in TSP-1, TSP2, BAI-1, and ADAMTS-1 in group 1 have been shown to inhibit angiogenesis (Nishiomori et al., 1997; Vázquez et al., 1999). By contrast, some of the proteins in group 2, including two members of the CCN family and HB-GAM, reportedly stimulate angiogenesis (for review see Lau and Lam, 1999; Papadimitriou et al., 2001).
Materials And Methods
Protein production, crystallization, and X-ray data collection
A recombinant version of all three TSRs of human TSP-1, designated h3TSR, was prepared by PCR using the full-length cDNA of human TSP-1, as described previously (Miao et al., 2001). For crystallization, the purified protein was dialyzed against 20 mM Na3PO4 (pH 7.8) containing 500 mM NaCl, and was reapplied to a column of ProBond resin. The protein eluted with 500 mM imidazole was dialyzed against 20 mM Na3PO4 (pH 7.0) and 500 mM NaCl, and 1% sucrose was added before storage at −80°C.
Single crystals of h3TSR were grown from a crystallization buffer containing 0.5 M sodium potassium tartrate and 0.1 M sodium acetate at pH 5.5 using the vapor diffusion hanging drop method. For the data collection at cryogenic temperature, the crystals were treated with a cryoprotectant solution (25% glycerol, 0.5 M sodium potassium tartrate, 0.1 M sodium acetate, pH 5.5), and then frozen and stored in liquid nitrogen. Mercury derivatives were prepared by soaking the crystals in the same cryoprotectant solution containing 1 mM ethylmercury chloride for 4 h. X-ray diffraction data were collected from prefrozen crystals at APS SBC 19ID at a temperature of 100 K. A native crystal diffracted to a resolution of 1.9 Å, with one molecule in one asymmetric unit. A MAD dataset of the mercury derivative was obtained to a resolution of 2.1 Å. All the raw data were indexed and reduced with HKL2000 (Table I).
Structure determination and refinement
Using programs in the CCP4 suite (CCP4, 1994), we located one major Hg binding site in one asymmetric unit in both isomorphous and anomalous difference Patterson maps. Considering possible nonisomorphism between the native and Hg derivative crystal, the MAD data alone from Hg derivative were actually used for heavy atom parameter refinement and phasing with the high-energy dataset as reference. After the refinement and initial phasing at 3 Å resolution with the program MLPHARE in the CCP4 suite, an additional minor Hg binding site was identified and two sites were refined together. Phase extension was performed subsequently using the native data to 2.5 Å by solvent flattening and histogram matching with DM (Cowtan, 1994). An improved Fourier map was obtained and used for initial model building with the program O (Jones et al., 1991). After each step of model building, the combined phases from experimental and calculated from the partial model were used to perform density modification. The model building cycle was repeated until ∼90% of the residues from TSR2 and TSR3 were built into the model.
The native dataset was eventually used for the completion of model building and refinement. The final model was refined at 1.9 Å resolution to an Rfree factor of 28.2% and Rwork of 23.8% (Table I) using the Xplor (Brunger, 1992). At a 1.5 sigma contour level in 2Fo-Fc map, there was continuous density for the main chain backbone except for part of the glycine-rich AB loop and the COOH terminus of TSR3. The final model contains 113 residues (from Gln416 to Cys528) with two O-linked fucoses associated with residues Thr432 and Thr489. There was no indication on the electron density maps for the existence of the first domain as mentioned before. The current model also includes a total of 154 water molecules. The coordinates have been deposited in the Protein Data Bank under the accession code 1LSL.
We thank Richard Hynes (Massachusetts Institute of Technology, Cambridge, MA) for his critical reading. We are grateful to Alexis Bywater (Beth Israel Deaconess Medical Center, Boston MA) for preparation of the manuscript.
This work was supported by the grants HL68003, HL49081, and CA92644 from the National Institutes of Health and also in part by GM56008 to J.-h. Wang.
Abbreviations used in this paper: GAG, glycosaminoglycan; MAD, multiple-wavelength anomalous diffraction; TRAP, thrombospondin-related anonymous protein; TSP, thrombospondin; TSR, type 1 repeat.