A relatively small cadre of lineage-restricted transcription factors largely orchestrates erythropoiesis, but how these nuclear factors interact to regulate this complex biology is still largely unknown. However, recent technological advances, such as chromatin immunoprecipitation (ChIP) paired with massively parallel sequencing (ChIP-seq), gene expression profiling, and comprehensive bioinformatic analyses, offer new insights into the intricacies of red cell molecular circuits.
Red blood cells (RBCs) fulfill the essential functions of transporting oxygen to tissues and facilitating gas exchange in the lungs. They are continuously produced throughout life in a tightly controlled growth process termed erythropoiesis. Erythroid differentiation is accompanied by temporally regulated changes in cell surface protein expression, a reduction in cell size, progressive hemoglobinization, and nuclear condensation, which culminates in extrusion of the nucleus, RNA, and mitochondria (Richmond et al., 2005).
Erythropoiesis is largely mediated by a relatively small number of lineage-restricted transcription factors, including GATA-1, SCL/TAL1, LMO2, LDB1, and KLF1 (Cantor and Orkin, 2002). The importance of these transcription factors in erythropoiesis has been demonstrated unequivocally by cell-based ex vivo assays, as well as in knockout mouse models and rare patients with anemias. The critical transcription factors are present in diverse multiprotein complexes. However, how distinct multiprotein complexes activate or repress transcription, and thereby regulate the erythroid maturation program, remains incompletely understood. New techniques, including ChIP coupled with massively parallel sequencing (ChIP-seq), gene expression profiling, and bioinformatic analyses, provide new information about the regulatory networks that coordinate erythroid cell maturation and function. This minireview will summarize recent findings relevant to the understanding of gene expression regulation in red blood cells.
The transcription factor GATA-1 recognizes the DNA consensus sequence (A/T)GATA(A/G) through two Cys-X2-Cys-X17-Cys-X2-Cys zinc fingers that are characteristic of the GATA family (Wall et al., 1988; Evans and Felsenfeld, 1989). Annotation of GATA consensus sites, even those that are phylogenetically conserved, is a poor predictor of in vivo GATA-1 chromatin binding (Bresnick et al., 2005). Hence, several groups generated whole-genome occupancy maps for GATA-1 by using ChIP-seq in erythroid cell lines (Cheng et al., 2009; Fujiwara et al., 2009; Yu et al., 2009; Soler et al., 2010). Although three studies identified ∼4,000–6,000 in vivo binding sites for GATA-1 in mouse erythroleukemia (MEL) cells expressing a tagged form of GATA-1 (Yu et al., 2009; Soler et al., 2010) or human K562 erythroleukemia cells (Fujiwara et al., 2009), a fourth study identified >15,000 sites occupied by GATA-1 in G1E-ER4 cells, which were derived from GATA-1 knockout mouse embryonic stem cells and express an estrogen-inducible GATA-1 construct. Careful assessment of the data may help explain discrepancies in the number of GATA-1–occupied sites. These may have arisen from usage of different cell lines, employment of different peak calling algorithms, differences in the ChIP protocols, or simply differences in choice of statistical cut offs.
All studies demonstrated that a minority of GATA-1 binding sites (∼10–15%) are located at proximal promoter regions close to the transcription start site (TSS). The bulk of GATA-1 binding (∼85%) occurs at distal regulatory elements with equal distribution between intra- and intergenic regions (Fujiwara et al., 2009; Yu et al., 2009). High-level H3K4 monomethlyation (H3K4me1), a histone mark strongly enriched at functional enhancer regions (Heintzman et al., 2007), was observed at nearly all GATA-1–occupied DNA segments, further supporting the notion that GATA-1 principally binds enhancer regions (Cheng et al., 2009). To identify direct GATA-1 target genes, microarray gene expression profiling was performed (Yu et al., 2009) using G1E-ER4 cells (Weiss et al., 1997). G1E cells are arrested at the proerythroblast stage of differentiation, but undergo synchronous terminal maturation upon restoration of GATA-1 function (Weiss et al., 1997). Reexpression of GATA-1 triggers an extensive program of gene activation and repression (Weiss et al., 1997). Superimposition of GATA-1 whole-genome occupancy and gene expression data permitted identification of putative, direct GATA-1 targets. Although up to 5,000 genes were found to be differentially expressed upon GATA-1 activation (Cheng et al., 2009; Fujiwara et al., 2009; Yu et al., 2009), a surprisingly small fraction (∼300–700) of genes could be identified as direct GATA-1 target genes (Fujiwara et al., 2009; Yu et al., 2009). It should also be noted that within those genes identified as direct GATA-1 targets, 40–57% were up-regulated and 41–60% were down-regulated (Cheng et al., 2009; Fujiwara et al., 2009; Yu et al., 2009), demonstrating that GATA-1 activates or represses nearly equivalent numbers of genes. Bioinformatic analysis of transcription factor motifs further revealed that among activated genes, binding sites for SCL/TAL1 were highly enriched (Cheng et al., 2009; Fujiwara et al., 2009; Tripic et al., 2009; Yu et al., 2009; Kassouf et al., 2010). Based on this finding, one may infer that GATA-1 activates gene expression specifically in concert with SCL/TAL1 (Fig. 1). However, partners for GATA-1 in gene repression are less clear. GATA-1 is thought to facilitate gene repression via interaction with the NuRD complex; this may be mediated through a direct interaction between GATA-1 and FOG-1 (Hong et al., 2005; Rodriguez et al., 2005), as well as via the transcriptional repressor Gfi-1b in concert with the LSD1–CoREST corepressor complex (Fig. 1; Rodriguez et al., 2005; Saleque et al., 2007). Interestingly, the genome-wide occupancy maps revealed an additional level of complexity, as a subset of GATA-1–repressed genes was also found to carry the repressive H3K27me3 histone mark (Cheng et al., 2009; Yu et al., 2009). This mark is catalyzed by the polycomb repressive complex 2 (PRC2), a multiprotein complex containing EED, Ezh1/2, and Suz12 (Müller et al., 2002; Schuettengruber et al., 2007). Erythroid differentiation is impaired in mice with erythroid-specific loss of EED (Yu et al., 2009). Thus, the PRC2 complex participates in GATA-1–mediated gene repression during erythroid differentiation. Whether GATA-1 recruits PRC2 directly, or indirectly, will be of interest in future studies.
It should be recognized that these chromatin occupancy studies do not account for posttranslational modifications of GATA-1. For example, GATA-1 is acetylated (Boyes et al., 1998), and this modification appears to be important for erythroid differentiation (Hung et al., 1999; Lamonica et al., 2006). A study recently published in JEM revealed the importance of GATA-1 SUMOylation (Yu et al., 2010b). Genetic ablation of the SUMO-specific protease SENP1 resulted in severe anemia and embryonic lethality in mice at embryonic day 13.5. Accumulation of a SUMOylated form of GATA-1 was observed and coincided with down-regulation of GATA-1 target genes (Yu et al., 2010b). SUMOylation may modulate aspects of GATA-1 function beyond DNA binding, as suggested by Yu et al. (2010b), given that SUMOylation of FOG1 affects its interaction with other proteins (Snow et al., 2010). Further work is needed to interrogate protein–protein and protein–DNA interactions of SUMOylated GATA-1.
In recent years, microRNAs (miRNAs) have emerged as additional regulators of overall gene expression, representing yet another layer of control. Indeed, recent work demonstrates that the miR-144/451 locus is a direct target of GATA-1 and that mice lacking miR-144/451 or miR-451 alone show impaired erythropoiesis, particularly under conditions of stress (Dore et al., 2008; Rasmussen et al. 2010; Patrick et al., 2010; Yu et al., 2010a).
The basic helix–loop–helix (bHLH) transcription factor SCL/TAL1 recognizes a short consensus DNA motif (CANNTG), the E-box. SCL/TAL1 expression largely parallels that of GATA-1, as it is expressed in erythroid cells, megakaryocytes, and mast cells (Cantor and Orkin, 2002). In erythroid cells, SCL/TAL1 forms a complex with the ubiquitous bHLH protein E2A, and also with the LIM domain containing cofactors LMO2 and LDB1 (Cantor and Orkin, 2002). These proteins interact with GATA-1 to form a pentameric complex (Fig. 1) that binds to composite E-box/GATA-1 DNA motifs spaced 9–11 nt apart (Wadman et al., 1997; Cohen-Kaminsky et al., 1998). LMO2, GATA-1, and SCL/TAL1 are all required for erythropoiesis in mice (Cantor and Orkin, 2002), and a conditional knockout mouse model of SCL/TAL1 is available (Mikkola et al., 2003). In this issue, Li et al. present the first conditional knockout of LDB. They find that embryos lacking LDB1 show defective primitive erythropoiesis and that Mx-Cre–driven deletion of LDB1 in adult mice results in a persistent drop in hematocrit and, ultimately, death, demonstrating that LDB1 is continuously required for definitive erythropoiesis.
For some time, only a handful of red cell–specific direct target genes of this complex had been identified. Two recent studies mapped whole-genome occupancy of this complex by performing ChIP-seq for endogenous SCL/TAL1 in primary mouse proerythroblasts (Kassouf et al., 2010) or for tagged LDB1 and SCL/TAL1 in MEL cells (Soler et al., 2010). A third group generated an occupancy map of SCL/TAL1 in G1E-ER4 cells, performing ChIP-on-chip analysis using a tiling array covering mouse chromosome 7 (Tripic et al., 2009). Approximately 3,000–4,000 and 5,000 genome-wide binding sites were identified for SCL/TAL1 and LDB1, respectively. Approximately 30% of all SCL/TAL1 binding sites were located at proximal promoter regions (in this study defined as ±3 kb of the TSS), whereas the bulk of SCL/TAL1 binding (∼70%) resided at distal regulatory elements with a distribution of 40 or 25% in intragenic or intergenic regions, respectively (Kassouf et al., 2010). To identify putative direct SCL/TAL1 target genes, microarray gene expression profiling was used to compare wild-type primary proerythroblasts with proerythroblasts derived from mice carrying a mutation in the DNA-binding domain of SCL/TAL1 (SCL/TAL1RER; Kassouf et al., 2008). 511 differentially expressed genes were identified, with 51% up-regulated and 49% down-regulated. The intersection of SCL/TAL1 occupancy and gene expression data resulted in an overlap of only 83 genes, which may be considered direct SCL/TAL1 targets. Strikingly, 75% of these genes were down-regulated as compared with wild-type cells, indicating that SCL/TAL1 largely activates gene expression. Analysis of motifs revealed enrichment of GATA binding sites close to SCL/TAL1 binding sites at genes activated by SCL/TAL1, in accordance with the reciprocal findings for GATA-1 (see above). Gene repression mediated by the SCL/TAL1 complex may be performed via recruitment of the corepressors ETO2 and Mtgr1 (Fig. 1; Fujiwara et al., 2009; Tripic et al., 2009; Soler et al., 2010). This conclusion is supported by cooccupancy of ETO2/Mtgr1 at a subset of SCL/TAL1 target genes (Soler et al., 2010), as well as de-repression of some SCL/TAL1 target genes upon depletion of ETO2 in erythroid cells (Tripic et al., 2009). The observation that SCL/TAL1 and LDB1 have been found binding far from their closest repressed gene prompted Soler et al. (2010) to perform chromosome conformation capture sequencing (3C-seq). Combination of the LDB1 ChIP-seq and 3C-seq data revealed direct binding of LDB1 to DNase-hypersensitive sites HS2, HS3, and HS4 of the β-globin locus control region (LCR) and long-range interactions with the β-globin promoter, despite the absence of a functional LDB1 binding site at the β-globin promoter (Soler et al., 2010). It would be of interest to study the nature of these long-range interactions in the absence of SCL/TAL1 or LDB1.
KLF1 (formerly called EKLF), a zinc finger transcription factor with three highly similar C-terminal C2H2-type Kruppel zinc fingers, recognizes a subset of CACC box motifs (Miller and Bieker, 1993). Expression of KLF1 is remarkably restricted to erythroid cells and their precursors (Miller and Bieker, 1993). Although its essential role in erythropoiesis has been known for quite some time (Cantor and Orkin, 2002), few direct transcriptional targets have been identified.
Tallack et al. (2010) generated a whole-genome occupancy map for KLF1 in primary erythroid cells. In two independent ChIP-seq runs, KLF1 occupied between 940 and 1,400 binding sites in erythroid cells. 16% of these binding events occurred within 1 kb of the TSS, whereas the majority of sites were located at distances of >10 kb away from TSSs (Tallack et al., 2010). To identify new direct KLF1 target genes, the authors compared ChIP-seq data with gene expression profiles of wild-type and Klf1−/− fetal liver cells (Hodge et al., 2006). A total of 1,099 genes were differentially expressed in the absence of KLF1 in erythroid cells; 730 genes were down-regulated and 369 genes were up-regulated (Hodge et al., 2006). Only ∼19% of these genes were occupied by KLF1. The bulk of binding events occurred at genes that were down-regulated in the absence of KLF1, suggesting that KLF1 acts primarily as a transcriptional activator (Tallack et al., 2010). The authors investigated a potential functional DNA-dependent interaction between KLF-1 and GATA-1. Comparing KLF1 ChIP-seq data with results from GATA-1 whole genome occupancy maps, Tallack et al. determined the distances between the nearest GATA-1 peak and all KLF1 peaks. Approximately 48% of KLF1 peaks are located within 1 kb of GATA-1 peaks, strongly supporting an in vivo cooperation of the two factors (Fig. 1). Finally, the authors compared GATA-1/SCL (Cheng et al., 2009; Wilson et al., 2009) and GATA-1/KLF1–cooccupied regions and found minimal overlap (Tallack et al., 2010). This finding was surprising, given studies implicating GATA-1 in gene activation almost exclusively in complex with SCL/TAL1 (see above), but it suggests that GATA-1 may exist in two mutually exclusive activating complexes (Fig. 1).
In considering genome-wide occupancy data, one must be cognizant of potential methodological pitfalls. For example, although the widely applied “nearest-neighbor approach” (Kent et al., 2002; Pepke et al., 2009) provides a convenient way to assign transcription factor–binding peaks to nearby genes, it may oversimplify the situation, as it does not take into account long-range cis or trans interactions that frequently occur between promoter and enhancer elements. This limitation may account, in part, for the relatively small overlap between gene expression and transcription factor occupancy data.
Nevertheless, whole genome mapping of transcription factor occupancy is a relatively new technology that is providing prodigious datasets for computational and functional analyses. The integration of such data with profiling of mRNA and miRNA expression, coupled with sensitive proteomics, will lead to an enhanced view of the transcriptional networks orchestrating erythroid differentiation. The basic principles elucidated in these studies will inform transcriptional biology more broadly.