The Ras superfamily is comprised of at least four large families of regulatory guanosine triphosphate–binding proteins, including the Arfs. The Arf family includes three different groups of proteins: the Arfs, Arf-like (Arls), and SARs. Several Arf family members have been very highly conserved throughout eukaryotic evolution and have orthologues in evolutionally diverse species. The different means by which Arf family members have been identified have resulted in an inconsistent and confusing array of names. This confusion is further compounded by differences in nomenclature between different species. We propose a more consistent nomenclature for the human members of the Arf family that may also serve as a guide for nomenclature in other species.
Arf family history: Arfs, Arls, SARs, and other members
Arf was first discovered, purified, and functionally defined as the protein cofactor required for cholera toxin–catalyzed ADP ribosylation of the stimulatory regulatory subunit (Gs) of adenylyl cyclase (Enomoto and Gill, 1980; Kahn and Gilman, 1984) and, shortly thereafter, was shown to be a GTP-binding protein (Kahn and Gilman, 1986). Use of the acronym Arf is currently preferred to ADP ribosylation factor, as only Arf1–6 shares the cofactor activity for cholera toxin and because ADP ribosylation does not appear to be involved in any aspect of the normal cellular actions of any member of the family. The use of all capital letters (e.g., ARF1) refers specifically to the human gene or protein, whereas when only the first letter is capitalized (e.g., Arf1), it may refer to the protein from more than one species, an activity, or a group of proteins. Since their discovery, they have been found to be ubiquitous regulators of membrane traffic and phospholipid metabolism in eukaryotic cells (for reviews and discussion of Arf actions see Nie et al., 2003; Burd et al., 2004; Kahn, 2004). Arfs are soluble proteins that translocate onto membranes in concert with their activation, or GTP binding. The biological actions of Arfs are thought to occur on membranes and to result from their specific interactions with a large number of effectors that include coat complexes (COPI, AP-1, and AP-3), adaptor proteins (GGA1-3 and MINT1-3/X11α-γ/APBA1-3), lipid-modifying enzymes (PLD1, phosphatidylinositol (4,5)-kinase, and phosphatidylinositol (4)-kinase), and others. Arf proteins are activated by guanosine diphosphate (GDP) to GTP exchange, which is stimulated by the Sec7 domain of Arf guanine nucleotide exchange factors, and their activity is terminated upon the hydrolysis of GTP, which is stimulated by interaction with an Arf GTPase-activating protein.
Cloning and sequencing of the first Arf family member (Sewell and Kahn, 1988) led directly to the realization that Arfs are closely related to both the Ras and heterotrimeric G protein α subunit families of GTPases, and all are thought to have arisen from a common ancestor. The very high degree of conservation of Arf sequences in eukaryotes (74% between human and yeast) was also noted early on and has allowed the ready identification of orthologues in every examined eukaryote, including Giardia lamblia, which lack Ras and G protein α subunits (Murtagh et al., 1992).
Cloning by low stringency hybridization and chance led to the identification of additional members of the Arf family in a wide array of eukaryotic species. The number of mammalian Arfs grew to six by 1992 (Tsuchiya et al., 1991) and were named in their order of discovery (Price et al., 1988; Bobak et al., 1989; Kahn et al., 1991; Lee et al., 1992). The first confusion in the nomenclature was that the current human ARF4 was originally published with the name ARF2 (Kahn, et al., 1991). In fact, humans appear to have lost the ARF2 orthologue, which is present in other mammals (including rats, mice, and cows). The combination of protein sequence comparisons and intron/exon boundaries of Arf genes led to further classification of the six mammalian Arfs into classes: class I (ARF1–3 are >96% identical), class II (ARF4 and ARF5 are 90% identical to each other and 80% identical to the other Arfs), and class III (ARF6 is 64–69% identical to the other Arfs). Phylogenetic analyses support the conclusion that the three classes of Arf diverged early, as flies and worms have single representatives of each of the three classes, and the number of genes/proteins in class I and II were later expanded in vertebrates.
The initial criteria for naming new Arfs were functional, and only those proteins that could (1) serve as cofactors for cholera toxin, (2) rescue the lethal arf1−arf2− deletion in Saccharomyces cerevisiae, and (3) directly activate PLD were given the name Arf. Thus, with the chance cloning of an essential gene in Drosophila melanogaster that encoded a protein closely related to the Arfs (50–60% identity) but lacking in these activities, it was named arflike (Tamkun et al., 1991). When orthologues were found in several other species, the name was changed to Arf-like 1 (ARL1) in those species (Kahn et al., 1992; Breiner et al., 1996; Lowe et al., 1996). Note that although the name Arf still denotes a protein with one or more specific functions or activities, the term Arl does not. The term Arl indicates only that the protein is structurally related to Arfs. Thus, the Arls are not a coherent group either functionally or phylogenetically.
PCR amplification with degenerate oligonucleotide primers (Clark et al., 1993; Schurmann et al., 1994) revealed the existence of a large number of mammalian cDNAs encoding closely related proteins. The next to be cloned and sequenced were ARL2 (Clark et al., 1993), ARL3 (Cavenagh et al., 1994), ARL4 (Schurmann et al., 1994), and ARL5 (Breiner et al., 1996). Each of the encoded proteins has a glycine at position 2, the site of N-myristoylation in all Arf proteins. Note that although ARL2 and ARL3 have the NH2-terminal glycine, they appear not to be substrates for N-myristoyltransferases.
Around this time, a protein with similar percent identities to the Arf and Arls was found, but it lacked the NH2-terminal glycine, was membrane associated, and displayed distinctive nucleotide handling properties (Schurmann et al., 1995). Thus, it was given the name Arf-related protein 1 (ARFRP1) to distinguish it from the Arls and Arfs. We realize today that this was unfortunate, as several of the more recently identified Arls also have functions and biochemical properties that are quite divergent from Arfs.
SAR1 was among the earliest members of the Arf family sequenced, and it came out of genetic screens in the yeast S. cerevisiae as a suppressor of sec12(ts) (Nakano and Muramatsu, 1989). Its name is derived from its identification as a secretion-associated and Ras-related protein. Cloning of the mammalian orthologues revealed the presence of two closely related (90% identity) proteins/genes (Kuge et al., 1994). With <30% identity to Arfs or Arls, the SAR proteins are only slightly closer in sequence to Arfs than to other families of GTPases, but they also share considerable functional relatedness to Arfs in that they act through the recruitment of coat proteins or complexes to initiate vesicle budding. SARs lack the other aforementioned Arf activities.
An interesting variation is found in ARD1/tripartite motif 23 (TRIM23), a 64-kD protein that possesses a ∼20-kD domain at its COOH terminus with 60% identity to Arfs (Mishima et al., 1993). Originally named based on the presence of the Arf domain, ARD1 is also a member of the TRIM family, from which it obtained its current name, TRIM23. A large extension is also seen in ARL13B, a protein of 428 residues that contains an Arl domain at its NH2 terminus (Chiang et al., 2004; Fan et al., 2004). Although the NH2-terminal portion of TRIM23 may possess GTPase-activating protein activity toward its own Arf domain (Vitale et al., 1996) and E3 ubiquitin ligase activity (Vichi et al., 2005), the COOH-terminal portion of ARL13B has no defined domains or functions to date.
Defining the Arf family
As the discussion above suggests, there are no shared functions or activities that justify grouping Arf, Arl, and SAR proteins into a family with a common nomenclature. Similarities in protein sequences within the Arf family were first identified by alignment and phylogenetic analyses and were shown to provide distinct signatures that allowed differentiation from Ras, G protein α subunits, and other GTPases. These include an NH2-terminal extension, a glycine acceptor for myristate at position 2, an aspartate at position 26 (in contrast to the glycine 12 of Ras that carries oncogenic potential), and other residues that are very highly conserved within the family. These early observations were put on more solid functional footing when they were found to map to unique elements in their three-dimensional structures, which allow for the GDP/GTP switch to be coupled with interaction signals opposite to the nucleotide-binding site (for review see Pasqualato et al., 2002). The prominent feature of this unique nucleotide switch is a nonconventional GDP-bound form in which the two β strands that connect the nucleotide-sensitive switch 1 and 2 regions (also called the interswitch) are retracted in the protein core and must undergo a two-residue shift to reach the active conformation (Fig. 1). However, the interswitch cannot do so unless the NH2-terminal helical extension, which caps the interswitch and locks it in the retracted conformation, has been displaced. In the case of ARF1, biochemical studies have established that this requires the interaction of the NH2 terminus with membranes, thus allowing the nucleotide-binding site to detect and respond to remote protein–membrane interactions (Antonny et al., 1997). Like Arf proteins, each Sar has an NH2-terminal amphipathic helix that functions as a structural GDP/GTP switch to anchor the GTP-bound form to membranes of the endoplasmic reticulum (Huang et al., 2001; Bi et al., 2002). Furthermore, membrane insertion of this NH2-terminal helix was recently shown to initiate membrane bending at the early stages of COPII coat assembly and to be subsequently required for the completion of COPII vesicle fission (Lee et al., 2005).
Structural analysis of ARF1 and ARF6 GDP/GTP cycles and their comparison with those of small GTP-binding proteins whose interswitch does not toggle identified three structural determinants for this movement: a helical NH2-terminal extension that fastens the retracted, GDP-bound interswitch; a shorter interswitch that can retract completely; and a sequence signature (wDvGGqXXXRxxW) that provides both flexibility for the movement (GG) and hydrogen bonds for stabilization of the active conformation (R/W). These characteristics are present in all Arf and most Arl sequences, which, therefore, are predicted to have the ability to undergo the interswitch toggle to detect interactions opposite to the nucleotide-binding site, whatever their nature, and propagate them to this site (Pasqualato et al., 2002).
These structural criteria for unifying Arf and Arl proteins as a family have since been supported by various structures of GDP-bound Arf and Arl proteins (Table S1). It should be noted, however, that one subgroup, ARL4, has a long interswitch that may have lost the ability to toggle, whereas structures of NH2-terminally truncated ARL8A and ARL8B bound to GDP have a GTP-like conformation. This suggests that truncation of the NH2 terminus is sufficient in this family to destabilize the retracted interswitch or that these proteins have lost their ability to undergo the interswitch toggle. Recent work on ARL3 suggests that proteins interacting with the NH2 terminus could also work as the displacing factor as an alternative to membranes (Behnia et al., 2004; Setty et al., 2004).
Arf family nomenclature
Table I contains information on proposed and previous names as well as other information on the human ARF family members. EST and genomic sequencing resulted in the identification of subsequent Arf-like proteins, and these proteins/genes were often misnamed or named multiple times by different research groups. Some of these names suggest relationships that are misleading, and some are called Arfs despite (presumably) lacking any Arf activities. In many cases, no functional data are yet available for the most recently identified Arf family members. One protein has been referred to by four different names, and some proteins/genes were named by curators of databases responding to specific requests in a manner that disagreed with common usage by researchers in the field. The confusion is magnified when species differences are considered (e.g., yeast Arl3 is the orthologue of ARFRP1).
The need for a generally agreed upon nomenclature for the ARF family has become acute as a result of increasing confusion and interest in their study. It is not possible today to propose a completely consistent nomenclature, as there are simply too many studies with some of the earlier discovered proteins (e.g., ARFRP1 should be an ARL).
The nomenclature developed and described in this article builds on previous efforts to describe phylogenetic relationships and bring consistency to nomenclature (Pasqualato et al., 2002; Li et al., 2004; Logsdon and Kahn, 2004). It is the result of many discussions between researchers in the field and with the HUGO Genome Nomenclature Committee (HGNC) and has been widely circulated to Arf family researchers. We describe the presence in the human proteome of 29 members of the Arf family and a system for naming newly identified proteins in human or other species. The use of letter suffixes is reserved for those groups of proteins within the family that share higher percent identities and are, therefore, likely to share some level of functional redundancy. One exception to this is the ARL13A and ARL13B proteins, which have been given a common number based upon phylogenetic evidence. The consensus nomenclature for the Arf family is shown in Table I along with previous names and unique gene/protein identifying information. Note that in three cases (ARL5C, ARL9, and ARL16), the intron/exon boundary predictions in the database are thought to be incorrect (based upon comparisons with sequences in other species), resulting in differences in the predicted protein sequences. In these cases, we use our corrected sequences for comparisons and provide the predicted protein sequences of the human proteins (see supplemental material). In addition, there is one case (ARL9) in which it appears that alternative splicing yields two different proteins, one of which is truncated and predicted to be unable to bind nucleotides, so both are provided in the supplemental protein sequence material.
We also identify several gene sequences that have questionable EST/mRNA support and are likely pseudogenes derived from members of the Arf family. These genes, which are annotated by the HGNC, are therefore not included as Arf family members and are listed, along with their identifiers, in Table II. It is expected that additional pseudogenes will be found and added to this list over time. We also note some uncertainty as to whether ARL5C in Table I is a transcribed gene, as it may lack part of the consensus GTP-binding signature depending on which predicted protein sequence is used.
Finally, we note that although the large majority of Arf family members appear to have very broad and perhaps ubiquitous tissue expression patterns, a few are far more restricted in their expression. Thus, it is expected that further additions and perhaps even deletions will be needed to keep the nomenclature of this family current and as consistent as possible. To ensure that new family members are assigned unique symbols, we strongly encourage authors to consult the HGNC before publishing any new names for members of this gene/protein family. This is a confidential service provided by the HGNC that will help prevent future confusion from arising. We also suggest that curators and researchers focusing on other organisms use the information provided in this article as much as possible to simplify and clarify the nomenclature across species.
Other researchers supporting the use of this nomenclature include: Bruno Antonny, Bill Balch, Vytas Bankaitis, Gary Bokoch, Juan Bonifacino, Chris Burd, Jim Casanova, Tamara Caspary, Dany Cassel, Rick Cerione, Pierre Chardin, Philippe Chavrier, Shamshad Cockcroft, Peter Cullen, Ivan de Curtis, Maria Antonella De Matteis, Julie Donaldson, Cryslin D'Souza-Schorey, John Exton, Victor Faundez, Jim Goldenring, Jean Gruenberg, Alan Hall, Fuchu He, Wangjin Hong, Victor Hsu, Mary Hunzicker-Dunn, Trevor Jackson, Cathy Jackson, Hans Joost, Toshi Katada, Fang-jen Lee, Michel Leroux, Jennifer Lippincott-Schwartz, John Logsdon, Alberto Luini, Vivek Malhotra, Ed Manser, Tobias Meyer, Paul Melancon, Joel Moss, Aki Nakano, Kazu Nakayama, Tommy Nilsson, Susanne Pfeffer, Richard Premont, Paul Randazzo, Anne Ridley, Scotty Robinson, Anne Rosenwald, Craig Roy, Hisataka Sabe, Randy Schekman, Nava Segev, Val Sheffield, Phil Stahl, Elizabeth Sztul, Chris Turner, Anne Theibert, Martha Vaughan, Kanamarlapudi Venkateswarlu, Fred Wittinghofer, Keqiang Ye, and Marino Zerial.
This work was supported by grants from the National Institutes of Health (GM68029 and GM67226 to R.A. Kahn), the Association pour la Recerche contre la Cancer (to J. Cherfils), and the French Research Ministry (ACI-BCMS to J. Cherfils).
Abbreviations used in this paper: GDP, guanosine diphosphate; TRIM, tripartite motif.