Analysis of human and Drosophila genomes demonstrates an ancient origin of innate immunity and the diversity of the mechanisms of innate immune recognition.
Recognition of and defense against microbial infections are universal adaptations of multicellular organisms. Many gene products and entire pathways involved in host defense appear to be of ancient origin and are found in organisms as evolutionarily distant as humans and flies, and to some extent, even in plants. In this review, we will describe some of the best characterized protein components of immune recognition in mammals and insects and survey the evolutionary distribution of their homologues. This analysis demonstrates the considerable structural and functional diversity of innate immune systems in mammals and insects and provides examples of several trends in protein evolution such as domain accretion, displacement of orthologues, and lateral gene transfer.
Innate immune recognition
The mammalian innate immune system uses two distinct strategies for recognition of invading microorganisms: recognition of “microbial nonself” and recognition of “missing self.” The first strategy is based on recognition of pathogen-associated molecular patterns (PAMPs),* which are conserved products of microbial metabolism (Janeway, 1989). PAMPs are distributed broadly among pathogens (for example, the molecular pattern of lipopolysaccharide [LPS] is common to all gram-negative bacteria) but are not produced by the host. Receptors of the innate immune system that recognize PAMPs are called pattern recognition receptors (Janeway, 1989). Pattern recognition receptors signal to induce expression of inflammatory cytokines and chemokines and activate antimicrobial host defense mechanisms such as the production of reactive nitrogen and oxygen radicals and antimicrobial peptides. Recognition of PAMPs also leads to the induction of the costimulatory molecules CD80 and CD86 on antigen-presenting cells. Induction of costimulators along with presentation of antigenic peptides on antigen-presenting cells couples innate immune recognition of pathogens with the activation of adaptive immune responses (Medzhitov and Janeway, 1997).
The second strategy of innate immune recognition is based on recognition of molecular markers specific for self. These markers are gene products expressed only on the surface of normal uninfected cells of the host but not on microbial cells. Recognition of these signals by the innate immune system is coupled to inhibitory signals that prevent activation of the immune response against self. Lack of these markers on microbial cells allows the immune response to be directed specifically against microbial pathogens. In most cases, the markers of self are recognized by the so-called inhibitory receptors, which either belong to the Ig superfamily or contain a C-type lectin (CTL) domain. A common feature of the inhibitory receptors is the immunoreceptor tyrosine inhibitory motif, which upon tyrosine phosphorylation activates inhibitory tyrosine phosphatases SHP-1 and SHP-2. The best characterized example of recognition of “missing self” is the recognition of MHC class-I molecules by various inhibitory receptors expressed on natural killer cells (Lanier, 1998).
Completely sequenced genomes of multicellular organisms provide an opportunity to trace the evolution of innate immunity by comparing the distribution of the protein components of this system across the phylogeny of life. Recognition of “microbial nonself” appears to be a universal strategy of innate immunity, since it is found in all studied multicellular organisms. In contrast, many of the key components involved in the recognition of “missing self” in mammals, including CTL and Ig type inhibitory receptors with immunoreceptor tyrosine inhibitory motifs, are absent in the Drosophila genome, suggesting that this mechanism of innate immune recognition may have appeared later in the evolution. In this paper, we survey the components of innate immunity in the completely sequenced genomes in an attempt to detect the milestones in protein domain accretion and functional specialization of the system toward its mature state specified by mammals.
The Toll-like receptors (TLRs) comprise a family of transmembrane proteins that play an essential role in host defense in both mammals and flies. The extracellular domain of the TLRs consists of a varying number of leucine-rich repeats and a cysteine-rich region immediately preceding the transmembrane domain. The cytoplasmic domain is called the Toll/interleukin-1 receptor (TIR) domain named after the two groups of proteins where it was found initially.
There are nine genes encoding Toll proteins in Drosophila, at least ten in humans, and a single gene in Caenorhabditis elegans. The best characterized member of the family, Drosophila Toll-1, plays essential roles in both dorsoventral patterning in fly embryos and in antifungal defense in adult flies (Lemaitre et al., 1996; Anderson, 2000). In both cases, Toll-1 is activated by spatzle, a secreted protein thought to be the Toll-1 ligand (Anderson, 2000). To activate Toll-1, spatzle must first be cleaved by serine proteases induced in response to either developmental signals or fungal infection. The signaling pathway induced by Toll-1 has been defined by genetic analysis and consists of an adaptor protein tube, a protein kinase pelle, the NF-κB family transcription factors Dorsal and Dif, and an IκB homologue cactus (Anderson, 2000). Analysis of Drosophila loss-of-function mutations in these genes has demonstrated that both developmental patterning and antifungal immunity require the entire pathway from spatzle to cactus (Lemaitre et al., 1996). However, spatzle activation requires an upstream serine protease cascade, indicating that Toll-1 itself is not a pattern recognition receptor (Levashina et al., 1999); rather, the protease cascade may be triggered upon binding of an unknown protein to a fungal PAMP. Another striking feature of the Drosophila Toll-1 pathway is that its function in immunity is restricted to defense against fungal and gram-positive bacterial infections (Lemaitre et al., 1996). The response to gram-negative bacterial infections is dependent on a distinct pathway defined by a loss-of-function mutation in the imd gene, which has not been molecularly characterized (Lemaitre et al., 1996). The imd pathway also includes the Drosophila homologue of mammalian kinase TAK1 (dTAK1), the caspase Dredd, the IκB kinase homologue (dIKK), the homologue of the IKK regulator NEMO, and the third Drosophila NF-κB homologue Relish (Khush et al., 2001; Vidal et al., 2001).
Surprisingly, in vitro studies suggest that Toll-5 appears to activate the same signaling pathway as Toll-1 (Tauszig et al., 2000). Yet another Toll family member, 18-Wheeler (also known as Toll-2), when mutated, causes defects in both development and immunity (Williams et al., 1997). The functions of the other seven Drosophila Toll are unknown currently. Notably, we detected six spatzle-like proteins or domains in Drosophila, namely Serrate and putative products CG9196, CG9972, CG14533, CG14928, and CG18318 (unpublished data). These proteins or their derivatives might comprise a set of ligands of Toll-like proteins in Drosophila.
The main function of mammalian TLR proteins appears to be in the control of inflammatory and immune responses demonstrated by the analyses of TLR knockout mice (Akira et al., 2001). Like other pattern recognition receptors, TLRs mediate recognition of a variety of microbial PAMPs (Aderem and Ulevitch, 2000). In particular, TLR2 is responsible for recognition of bacterial lipoproteins and peptidoglycan, TLR4 is essential for responses to LPS, TLR5 controls responses to bacterial flagellin, and TLR9 is required for recognition of unmethylated CpG DNA motifs characteristic of bacterial DNA (Aderem and Ulevitch, 2000; Akira et al., 2001). It is likely that the other six TLRs are also involved in recognition of specific subsets of PAMPs derived from various microbial pathogens.
As in the case of Drosophila Tolls, little is known about the mechanism of PAMP recognition by mammalian TLRs. No spatzle homologues have been detected in the human genome. Mammalian TLRs studied so far do not exhibit developmental functions and might in fact recognize microbial PAMPs directly (Aderem and Ulevitch, 2000; Akira et al., 2001).
Toll systems in fruit flies and mammals may also differ in that the latter recruit accessory proteins. In particular, recognition of LPS by mammalian cells requires, in addition to TLR4, at least three proteins: LPS binding protein, CD14, and MD2 (Ulevitch and Tobias, 1995; Shimazu et al., 1999). LPS binding protein is a serum protein that binds and transfers LPS monomers to CD14, a high affinity GPI-anchored LPS receptor (Ulevitch and Tobias, 1995). MD2 is a small protein that lacks a transmembrane region, but is associated with the extracellular domain of TLR4 (Shimazu et al., 1999) and has been shown to be required for LPS recognition by TLR4 (Schromm et al., 2001). Homologues of these accessory proteins are absent in the Drosophila genome, suggesting that the molecular mechanism of LPS recognition by insect cells may be fundamentally different from that of mammalian cells.
The first known downstream component of the mammalian TLR signaling pathway is an adaptor protein MyD88 (Muzio et al., 1997; Medzhitov et al., 1998). MyD88 consists of an NH2-terminal death domain and a COOH-terminal TIR domain. The TIR domain of MyD88 interacts with the TIR domain of activated TLRs, whereas the death domain of MyD88 interacts with the death domain of IRAK, a serine/threonine protein kinase homlogous to the Drosophila kinase Pelle (Cao et al., 1996a). In addition to MyD88, human IL-1R and TLRs also interact with another adaptor protein called Tollip, which is composed of an NH2-terminal C2 domain and a COOH-terminal CUE domain. Tollip also appears to be involved in IRAK recruitment (Burns et al., 2000). Recruitment of IRAK to the receptor complex leads to IRAK activation and phosphorylation, which in turn results in IRAK interaction with TRAF6 (Cao et al., 1996b). TRAF6 is an E3 ligase that undergoes stimulus-dependent autoubiquitination (Deng et al., 2000). This ubqiuitination event is necessary for activation of the kinase TAK1, which then phosphorylates and activates the IKK complex, leading ultimately to IκB degradation and NF-κB activation (Deng et al., 2000).
The mammalian TLR signaling pathway is in many ways homologous to the Toll-1/antifungal pathway of flies. Toll-1 signaling activates the IRAK homologue Pelle, and nuclear translocation of Dif, the NF-κB factor, requires degradation of its inhibitor, the IκB homologue cactus. Recently, a Drosophila homologue of MyD88, dMyD88, has also been identified and shown to function in Pelle recruitment downstream of Toll-1 activation (Horng and Medzhitov, 2001). Previous studies have shown that Toll-1 signaling also requires another adaptor, Tube, which contains an NH2-terminal death domain that mediates its interaction with Pelle but lacks a TIR domain (Anderson, 2000). Why Toll-1 should signal through two adapters or how dMyD88 and Tube differ with respect to Pelle recruitment to the receptor is not yet understood. Another notable difference between fly and mammalian Toll signaling is the lack of a Tollip homologue in the fly genome. Finally, although there are three TRAF homologues in the Drosophila genome it is not known whether any of them play a role in Toll signaling, and intriguingly dTAK and dIKK function in the Imd pathway but not in the Toll pathway in flies (Khush et al., 2001; Vidal et al., 2001). One of the Drosophila TRAFs is known to activate a mitogen-activated protein kinase pathway (Liu et al., 1999).
The genome of C. elegans encodes homologues of Drosophila Toll, Pelle, TRAF, cactus proteins, and a homologue of mammalian Tollip. Although the function of the Tollip homolgue has not been characterized, mutational analysis demonstrated that the homologues of Toll, Pelle, TRAF, and cactus do not appear to function in the antibacterial response in worms. However, surprisingly the Toll gene in the nematode was shown to be involved in a chemosensory perception of pathogenic bacteria, thus contributing indirectly to the host defense in nematodes (Pujol et al., 2001).
Other pattern recognition and signaling molecules
CTLs function in a variety of carbohydrate recognition systems including cell adhesion and phagocytosis. At least 35 CTLs are encoded by the Drosophila genome (Adams et al., 2000). Most of them lack transmembrane regions and appear to be secreted molecules. Some of these lectins may selectively recognize terminal mannose residues, and sequence patterns required for this interaction have been characterized. Since terminal mannosyl residues are abundant in microbes, it is likely that the CTLs specific for mannose play a role in pathogen recognition. However, the Drosophila genome appears to lack the orthologues of the macrophage mannose receptor and the mannan-binding lectin, two of the best characterized CTLs that function in mammalian host defense. The macrophage mannose receptor has been shown to have broad specificity toward many ligands including bacterial, fungal, and viral pathogens (Fraser et al., 1998). The mannan-binding lectin is involved in the initiation of the lectin pathway of complement activation and is a member of the collectin family of secreted lectins (Fraser et al., 1998). In general, collectins contain a CTL domain connected to a collagen-like domain. Collectins are involved in pathogen recognition and clearance in the serum and tissue fluids of mammals (Holmskov, 2000). No collectin orthologues could be found in the Drosophila genome.
The macrophage scavenger receptor is another prototypic pattern recognition receptor that plays an important role in the clearance of LPS and gram-negative bacteria in mammalian species (Suzuki et al., 1997; Thomas et al., 2000). Although there is no orthologue of the macrophage scavenger receptor in Drosophila, there are several genes encoding secreted proteins with scavenger receptor domains. In addition, the fly genome encodes twelve proteins of the peptidoglycan recognition protein family and three gram-negative binding proteins (GNBPs) (Adams et al., 2000; Kim et al., 2000; Werner et al., 2000). The peptidoglycan recognition protein family also exists in the human and mouse genomes, although its function there remains uncharacterized. GNBP homologues appear to be absent from the human genome.
The Drosophila genome also contains homologues of the complement genes, suggesting an ancient origin of the complement system (Lagueux et al., 2000). C-reactive protein and serum amyloid protein are members of the pentraxin family, which are produced during the acute phase response to infection in mammals. These proteins bind to bacterial cell surfaces and activate complement through the classical pathway and thus in effect replace the function of antibodies in this pathway (Du Clos, 2000). Although there appears to be no orthologue of C-reactive protein in Drosophila, members of the pentraxin family are present in flies, and it is possible that they may function in complement activation pathways. Some candidate pattern recognition molecules with unknown functions, such as peptidoglycan binding protein and GNBP, may also function by triggering the complement cascade in Drosophila in a manner similar to the lectin pathway in mammals.
In addition to extracellular recognition molecules, the human innate immune system employs several cytoplasmic receptors for the detection of viral and intracellular bacterial infections. The best characterized intracellular receptor of this category and a key component of the mammalian antiviral defense is the double-stranded RNA-specific protein kinase PKR (Williams, 1999). No PKR orthologue is present in the Drosophila genome. Drosophila also lacks orthologues of the human antiviral proteins Mx and guanilate binding protein. These proteins are related to the endocytosis regulator dynamin and are inducible by interferons in human cells, although their mechanism of action is unknown (Landis et al., 1998; Anderson et al., 1999). The absence of PKR, Mx, and guanilate binding proteins in Drosophila suggests that flies use different mechanisms of viral recognition and antiviral defense.
A family of proteins that could be involved in recognition of intracellular infection has been identified recently in humans and is referred to as the NOD or CARD protein (Bertin et al., 1999; Inohara et al., 1999). NOD/CARD proteins typically contain NH2-terminal interaction domains, such as CARD domains, followed by intermediate nucleotide binding domains and COOH-terminal leucine-rich repeat domains. This domain arrangement is reminiscent of many plant disease resistance genes and the apoptosis regulator APAF1, except that the former lack NOD/CARD domains, (in dicot plants, this domain is substituted by the TIR domain) and in the latter the COOH-terminal region contains repeats of a different structure, namely, WD40 repeats. Although Drosophila has an APAF-1 homologue that likewise functions in the control of apoptosis, we did not find any NOD/CARD-like proteins in the fly genome. Since NOD-like genes are found in the nematode, their absence in Drosophila may reflect lineage-specific gene loss.
Cytokines have multiple essential functions in mammalian immunity. With the exception of a TNF-α homologue (Aravind et al., 2001), the Drosophila genome does not contain orthologues of mammalian cytokines or their receptors. However, it is possible that unrelated molecules in flies play roles similar to inflammatory cytokines in mammals. Interestingly, the JAK-STAT pathway that plays critical role in cytokine signal transduction in mammals is also present in Drosophila where it plays multiple roles in development (Zeidler et al., 2000). Although the JAK-STAT pathway has not yet been implicated in the antimicrobial response of flies, it has been shown to be activated in mosquitoes upon bacterial infection (Barillas-Mury et al., 1999).
Evolution of a signaling system: trends in domain repertoire of innate immunity proteins
The mammalian-type system of innate immunity appears to have been built up from several ancient and widespread protein components and some recent molecular innovations. Comprehensive “part lists,” that is, families of related proteins, protein domains, and their three-dimensional folds can be defined for any species with a completely sequenced genome (Qian et al., 2001; Tatusov et al., 2001). To characterize the trends in domain emergence and assembly that produced the innate immunity system, we compiled a set of 28 human and mouse representatives of protein families essential for this function, extracted the 65 discrete globular domains within these sequences, and analyzed their distribution in various species using an exhaustive database search (Aravind and Koonin, 1999).
On average, at least 60–70% of predicted proteins in completely sequenced genomes are similar to proteins from evolutionarily remote species (Bork, 2000; Lander et al., 2001). The distinction between two types of homologous genes, namely orthologues (genes in two lineages related by descent from a common ancestor) and paralogues (genes arising by duplication within a lineage), is crucial (Fitch, 2000). In the case of the mammalian proteins involved in innate immunity, only ∼50% had homologous proteins in Drosophila, and an even lower fraction was shared with nematodes. (Fig. 1; a more detailed comparison is available at http://www.jcb.org/cgi/content/full/jcb.200107040/DC1). Only some of these interspecies similarities represent pairs of orthologues, whereas in other cases the related genes were paralogues with the picture being further complicated by domain shuffling.
The 65 discrete domains in our dataset belong to ∼50 domain families. Several of these families are ubiquitous and likely to predate the advent of multicellularity or eukaryotic cell organization. These include three protein interaction domains (macroglobulin, FN3, and ankyrin), one protein-RNA interaction domain, one lectin-like carbohydrate interaction module, and the catalytic domain of a Ser/Thr/Tyr-type protein kinase superfamily (details available at http://www.jcb.org/cgi/content/full/jcb.200107040/DC1). Shared by all completely sequenced eukaryotes, but not by bacteria, are several additional domains such as C2 domain and low density lipoprotein domain and a putative nuclease fused to a protein kinase domain.
To obtain a more quantitative picture of domain complexity in the evolution of innate immunity, we calculated the fraction of human proteins that have orthologues in each of the distant lineages of life and, separately, those human proteins that shared similarity with other species only in a subset of domains. The resulting picture (Fig. 1) reveals a contrast between the genuine orthologous relationship and the overall domain repertoire. The percentage of human proteins with orthologues in a given lineage remains low (21%) even in arthropods, although these orthologues in Drosophila are assembled into a pathway that is functionally complete. On the other hand, a steady increase in domain availability is evident in which almost half of the building blocks that would lead ultimately to the makeup of the human innate immune system are present already in both fly and worm proteomes, albeit mostly in nonorthologous domain arrangements.
In addition to gene duplication and domain accretion, other processes contribute to the complex picture of orthologue and domain distribution in different lineages, in particular gene loss and displacement (e.g., Tube in Drosophila Toll pathway) and horizontal transfer. A likely case of horizontal gene transfer in the evolution of innate immunity involves the perforin domain, found in all eukaryotes and, in addition, in a single bacterial lineage, chlamydia (Ponting, 1999). Although more of these events may be detected based on the unusual topology of individual phylogenetic trees, the losses of some genes may never be accounted for, especially if multiple lineages are involved.
Several conclusions can be drawn from a comparative analysis of the proteins involved in innate immune recognition in model organisms. (a) Pattern recognition (or recognition of “microbial nonself”) is a universal strategy of innate immune recognition. This mechanism of recognition is found in mammals, insects, and plants, suggesting that it evolved at very early stages of evolution. This mechanism of recognition does not require multicellularity and may have existed already in protozoa. Another strategy of innate immune recognition, recognition of “missing self,” presumably exists only in multicellular and perhaps colonial organisms, since it requires cooperation of cells with identical or closely related genomes. Interestingly, protein families (for example, inhibitory receptors) that play key roles in recognition of “missing self” in vertebrate animals do not have orthologues in Drosophila or plants. This could be either because the function of inhibitory receptors is played by structurally unrelated proteins in invertebrates or because recognition of “missing self” evolved only in vertebrate animals. (b) Diversity of recognition system. Although the pattern recognition strategy is ancient in origin, there appear to be more differences than similarities in recognition systems used by mammals, insects, nematodes, and plants. Even in the case of the Toll system where receptors and signaling pathways are conserved in flies and humans, the mechanism of recognition is fundamentally different. This diversity of recognition systems may reflect multiple independent origins of pattern recognition receptors in different lineages (e.g., GNBP found in flies but not in humans), or in the case of the Toll pathway functional diversification of the ancestral Toll-like system followed by nonorthologous gene displacement and lineage-specific gene loss. (c) Conserved domains and novel protein architectures. Several conserved protein domains are found in different arrangements in proteins that play key roles in innate immune recognition in plants, insects, and mammals. For example, TIR and leucine-rich repeat domains are found in transmembrane and cytoplasmic proteins along with Ig, kinase, NBD, and death domains. This evolutionary trend of reusing the same protein modules in novel configurations is also conspicuous in the proteins that function in the pathways that control apoptosis (Aravind et al., 2001). The domain repertoire grows faster in evolution than the number of orthologous components assembled from the preexisting modules.
Finally, as we learn more about innate immune recognition, we will need to address an important question: what is the complete repertoire of specificities of the innate immune system, and how is it shaped by the pathogenic environment in evolution.
The online version of this article contains supplemental material.
Abbreviations used in this paper: CTL, C-type lectin; GNBP, gram-negative binding protein; LPS, lipopolysaccharide; PAMP, pathogen-associated molecular pattern; TIR, Toll/interleukin-1 receptor; TLR, Toll-like receptor.