Studies of allelic variation underlying genetic blood disorders have provided important insights into human hematopoiesis. Most often, the identified pathogenic mutations result in loss-of-function or missense changes. However, assessing the pathogenicity of noncoding variants can be challenging. Here, we characterize two unrelated patients with a distinct presentation of dyserythropoietic anemia and other impairments in hematopoiesis associated with an intronic mutation in GATA1 that is 24 nucleotides upstream of the canonical splice acceptor site. Functional studies demonstrate that this single-nucleotide alteration leads to reduced canonical splicing and increased use of an alternative splice acceptor site that causes a partial intron retention event. The resultant altered GATA1 contains a five–amino acid insertion at the C-terminus of the C-terminal zinc finger and has no observable activity. Collectively, our results demonstrate how altered splicing of GATA1, which reduces levels of the normal form of this master transcription factor, can result in distinct changes in human hematopoiesis.
Introduction
While hematopoiesis is arguably one of the best-understood paradigms of cellular differentiation in physiology, many facets of this process remain to be characterized. Genetic blood disorders provide an opportunity to learn more about hematopoiesis, even in cases where the mutated gene has been previously well studied, since allelic variation can provide novel biological insights (Casanova et al., 2014; Basak et al., 2015; Kuehn et al., 2016; Polfus et al., 2016; Kim et al., 2017; Zmajkovic et al., 2018). Advances in sequencing technologies have enabled rapid and efficient mutation identification, but deciphering the pathogenicity of noncoding variation can be immensely challenging. Noncoding transcriptional regulatory elements can be interrogated using a number of functional approaches, including the use of genome editing and exogenous assays of regulatory function (Ulirsch et al., 2016; Wakabayashi et al., 2016; Gasperini et al., 2017). However, deciphering the pathogenicity of cryptic splicing mutations can present substantial challenges (Rosenberg et al., 2015; Cummings et al., 2017).
Here we have identified two unrelated patients with a unique form of dyserythropoietic anemia that is associated with other abnormalities in hematopoiesis. Both of the patients harbor identical intronic mutations in GATA1. Through functional studies, we show that this single-nucleotide change leads to altered splice acceptor usage, along with a decrease in canonical GATA1 splicing and expression, thereby resulting in the disease phenotype. This mutation disrupts the activity of the U2 splicing complex in the last intron of GATA1. While this complex is also mutated in more commonly observed cases of myelodysplastic syndrome (MDS), which can be attributable to somatic mutations in this splicing regulatory complex itself (Yoshida et al., 2011; Sperling et al., 2017), we were not able to identify a connection between the altered splicing observed in these two cases of dyserythropoietic anemia and more common cases of MDS. Nonetheless, our study illustrates how decreasing the splicing efficiency of the hematopoietic master regulator GATA1 can impair selective aspects of human hematopoiesis and lead to distinct phenotypes in comparison with other pathogenic mutations affecting the same gene (Sankaran et al., 2012; Campbell et al., 2013; Crispino and Horwitz, 2017).
Results and discussion
In the course of studying a cohort of patients with rare genetic blood disorders, we encountered two patients who both had a distinct form of dyserythropoietic anemia that necessitated transfusions in the intrauterine period and/or in early infancy with subsequent evolution into a milder anemia that was noted to worsen in the setting of intercurrent illnesses. Bone marrow evaluation of the patients revealed moderate dyserythropoiesis, along with frequent, small hypolobated megakaryocytes and occasional dysplastic myeloid cells (Figs. 1 A and S1 A). Platelet function testing in patient 1, who was noted to have clinical bleeding, revealed specific impairments, including defective aggregation and α-/δ-granule secretion (as indicated by decreased expression of P-selectin/CD62 and LAMP-3/CD63), which are suggestive of defective thrombopoiesis (Figs. 1 B and S1 B). Patient 2 was also noted to have abnormal platelet function testing. Further clinical details of the two patients are described in Materials and methods. To assess the etiology of the impaired hematopoiesis in these patients, whole-exome sequencing (WES) and targeted mutation analysis were performed. No known blood disorder–associated mutations were identified (Materials and methods), but both patients were noted to have a unique mutation in the fifth intron of GATA1 (chrX:48,652,176 C>T in hg19) that was carried in the mothers of these patients (Figs. 1 C and S1 C). This mutation was absent from the 123,136 exomes and 15,496 genomes in the gnomAD database, despite excellent coverage of this region (Materials and methods).
The identified mutation was located 24 nucleotides upstream of the canonical splice acceptor site in the fifth intron of GATA1. While the mutation was not predicted to disrupt known splicing elements, we hypothesized that it may impair appropriate splicing of this gene. To directly interrogate this, we used a minigene reporter assay (Fig. 2, A and B; Kishore et al., 2008). We found that the mutation reduced normal splicing of this region of GATA1 and promoted an intron retention event of 15 nucleotides involving an alternative splice acceptor site (Fig. 2 B). Semiquantitative RT-PCR analysis revealed a reduction of canonical GATA1 splicing to 42% of normal levels, with the mutant mRNA at 36% of normal levels (Fig. 2 C). Importantly, we could confirm that this altered splicing was present in the patient samples but was found at significantly lower levels in healthy controls (Figs. 2 D and 3, A and B). By sequencing, we confirmed that the patients had an identical intron retention event of the same 15 nucleotides noted in the minigene assay. We did not identify any major variation in GATA1 splicing over the course of erythroid differentiation of primary adult CD34+ hematopoietic stem and progenitor cells (HSPCs; Hu et al., 2013; Ulirsch et al., 2016; Figs. 2 E and 3), suggesting that the significantly altered splicing observed in these cases was attributable to the distinctive pathogenic variants. Importantly, we note that this altered splice variant is found in only 1.9–4.6% of GATA1 transcripts during normal human erythropoiesis, emphasizing its minor contribution to total GATA1 mRNA levels (Fig. 3 B).
We next wanted to further investigate the underlying mechanisms for the altered splicing present due to these pathogenic mutations. Interestingly, the observed alternative splice acceptor site usage with resultant intron retention (Fig. 2 F) was similar to what is frequently observed in SF3B1-mutated MDS, where there is alternative splice acceptor site usage in a number of transcripts between ∼15 and 25 nucleotides upstream of the canonical splice acceptor site (DeBoever et al., 2015; Obeng et al., 2016). Given the phenotypic similarity between MDS and the disordered hematopoiesis observed in these patients, as well as the likely involvement of the U2 splicing complex in the region where this mutation resided, we tested whether the commonly observed SF3B1 K700E mutant, which alters U2 splicing activity, would further diminish the observed altered splicing due to this intronic GATA1 mutation. Using the minigene assay described above, we found that expression of the SF3B1 K700E mutant markedly impaired canonical splicing of the mutant GATA1 and led to near-complete alternative splice acceptor usage (Fig. 2 G). This finding suggests that the observed germline mutation in these patients specifically alters the activity of the U2 splicing complex in this region of GATA1. Given the involvement of this complex in MDS, we examined whether SF3B1-mutated MDS cases (with a diagnosis of refractory anemia with ringed sideroblasts) may have altered splicing or levels of GATA1, but we did not observe consistent defects, suggesting altered splicing or the presence of reduced levels of GATA1 mRNA, even in sorted stage-matched bone marrow populations (Fig. S2). This finding is not surprising, given the minimal effect of the SF3B1 K700E mutation on splicing of the wild-type form of GATA1 in the minigene assay discussed above (Fig. 2 G). Nonetheless, given the phenotypic and molecular connections, we felt it worthwhile to explore possible connections between these conditions.
To gain further insight into the mechanism by which the GATA1 intronic mutant disrupts human hematopoiesis, we examined whether the protein produced from the intron retention event—which would add five additional amino acids at the C-terminus of the C-terminal zinc finger of GATA1—may have altered activity (Fig. 4 A). In human embryonic kidney 293T cells, we found that both RNA levels (as inferred through the expression of a linked GFP molecule that is translated from an internal ribosome entry site on the same transcript and through direct measurement) and protein levels (as measured directly) of this mutated cDNA were stable and similar to what is observed with wild-type GATA1 cDNA (Fig. 4, B and C). Moreover, in the mouse Gata1-null erythroid cell line, G1E, we found similar expression of the mutated and wild-type proteins (Fig. 4 D). Exogenous expression of the mutated cDNA in primary HSPCs induced to undergo erythroid differentiation revealed that the mutated form had little effect on promoting precocious erythroid differentiation, as occurs with the wild-type cDNA, and importantly there was no dominant-negative activity observed in this setting where wild-type GATA1 protein is present (Fig. 4 E; Ludwig et al., 2014; Giani et al., 2016; Wakabayashi et al., 2016). To directly assess whether the protein formed by this mutation is inactive or hypomorphic, we used the G1E Gata1-null cell line, to complement this phenotype with either the wild-type or mutant cDNAs (Rylski et al., 2003; Campbell et al., 2013). While the wild-type cDNA robustly promoted erythroid differentiation in this setting based on Ter119 marker induction, the mutant form failed to do so, suggesting that this mutated protein produced from an intron retention event may result in loss of function (Figs. 4 F and S3).
To globally assess the potentially altered transcriptional activity of the mutant splice variant of GATA1, we infected G1E cells with either the wild-type or mutant cDNA and conducted RNA sequencing (RNA-seq) 72 h after infection. While wild-type GATA1 either activated or repressed a large number of canonical target genes (Fig. 5, A, E, and F), expression of the mutant form had little change compared with the control lentivirus (Fig. 5, B, E, and F). Importantly, expression of the cDNAs was at comparable levels (Fig. 5, C and D). Interestingly, we did note some subtle repression of select GATA1-repressed genes by the mutant form (cluster K3, Fig. 5 F). To investigate this further, we assessed GATA1 chromatin occupancy near the transcription start sites (TSSs) of these genes as measured by chromatin immunoprecipitation sequencing (ChIP-seq; ENCODE Project Consortium, 2012) but noted very little occupancy in this set of genes, in contrast to the majority of GATA1 activated genes (Fig. 5, F and G, cluster K1 vs. K3). Together with the data shown above from the interrogation of splicing, these results demonstrate that the impaired hematopoiesis in these cases emerges due to reduced expression of wild-type GATA1 to ∼40% of normal levels with the production of an inactive mutant protein due to an intron retention event.
Our findings have several important and broad implications. While determining the functional consequences of cryptic splicing mutations can be challenging, we illustrate how by using a series of functional assays, the mechanisms underlying such mutations can be more fully understood. Exactly how the activity of the U2 splicing complex is altered in these cases remains uncertain, but given the lack of consensus binding sequences, no clear-cut mechanisms have been apparent. Given the finding of interactions between a mutant form of SF3B1 commonly observed in MDS cases and the germline mutation we identified, as well as the phenotypic similarity between these disorders, it is interesting to speculate about potential connections between these observations. However, we have not been able to demonstrate aberrant GATA1 mRNA splicing or expression in MDS cases in our analyses, but given the phenotypic similarities between this form of dyserythropoietic anemia and MDS, the reported perturbation of GATA1 protein levels in MDS patient samples (Frisan et al., 2012), and the myelodysplasia seen in mouse models with reduced Gata1 levels (McDevitt et al., 1997; Takahashi et al., 1998), this will be an important area for further investigation in the future.
Our findings also demonstrate how by reducing the overall levels of GATA1, a distinct defect in human hematopoiesis can emerge. Importantly, the observed hematologic phenotype in the two patients described here is different from other cases of GATA1-mutated blood phenotypes. Missense mutations in GATA1 can result in other forms of dyserythropoiesis, thalassemia, and a variety of thrombopoietic defects (Crispino and Horwitz, 2017). Lack of the full-length form of GATA1 with continued production of the short isoform can cause Diamond-Blackfan anemia (DBA; Sankaran et al., 2012). We have recently found that impaired translation of GATA1 in early hematopoietic progenitors underlies the most commonly observed cases of DBA due to ribosomal protein mutations (Ludwig et al., 2014; Khajuria et al., 2018). Moreover, such aberrant translation of GATA1 may also occur in other blood disorders, such as myelofibrosis (Gilles et al., 2017). The results we describe in this paper extend the spectrum of GATA1-related disease and show how distinct alleles in a single key regulator of hematopoiesis can cause pleiotropic phenotypes, illuminating the numerous functions of GATA1 in human hematopoiesis.
Materials and methods
Patients and family
Patient 1 was a boy born at full term to healthy unrelated parents from Bulgaria (mother) and Togo (father). On the first day of life, anemia with a hemoglobin (Hb) of 9.4 g/dL and concomitant jaundice with elevated lactate dehydrogenase were noted. Besides second-degree hypospadias, no syndromic features were present. The patient required transfusions at 6 wk and 3 mo of age because of Hb decreases to the 5-g/dL range with an inadequate reticulocyte response and lower gastrointestinal bleeding. Subsequently, the child maintained a stable Hb of 8–10 g/dL with persistent red cell macrocytosis and a few giant platelets. At the age of 3.5 yr, a respiratory tract infection with febrile seizure occurred with transiently worsened anemia (Hb 6.3 g/dL, reticulocytes at 18%) and a mild thrombocytopenia (133,000 cells/µl). The patient had an elevated lactate dehydrogenase with no signs of active hemolysis. Extensive evaluations performed during and between the acute episodes for nonimmune hemolytic anemias; infectious, metabolic, and autoimmune disorders; and bone marrow failure syndromes (including mitomycin C–induced DNA damage testing and telomere length measurements) were unrevealing. An erythrocyte adenosine deaminase level was elevated at 2.79 U/g Hb. Multiple bone marrow examinations revealed normal cellularity with signs of dyserythropoiesis with macrocytic and occasional megaloblastic differentiation, mild dysplasia of the megakaryocytic lineage with hypolobated forms, and hypogranulated neutrophils. During infection-free periods, the Hb remained stable at levels of 9–10 g/dl with macrocytosis (MCV ∼100 fl) and a persistently elevated fetal Hb (∼20–23%), while the platelet counts remained in the low normal range. At the age of 7 yr, when presenting with Mycoplasma pneumonia, there was severe epistaxis and moderate thrombocytopenia (45,000 cells/µl), as well as platelet function abnormalities that persisted even after the platelet counts normalized. Platelet agglutination/aggregation was severely impaired after stimulation with ristocetin, collagen, ADP, and epinephrine. Severely decreased expression of CD62 and CD63, respectively, were consistent with a α-granule and δ-granule secretion defect. The family medical history was unremarkable with the exception of a maternal first cousin with a chronic anemia of unclear etiology. When examined, the mother was noted to have normal peripheral blood counts. Only limited samples from the healthy mother could be obtained.
Patient 2 was a boy born to unrelated parents of African and Asian ethnicity. The child was diagnosed antenatally with severe anemia (Hb 2.4 g/dL) leading to hydrops fetalis, which was treated with intrauterine transfusion at 22 wk gestation with good response. The child was born by normal vaginal delivery at 40 wk gestation with no obvious dysmorphic features. The family history was unremarkable with the exception of a maternal first cousin with thalassemia trait. The patient required two further red cell transfusions at week 4 and week 8 of life for anemia (Hb 6–7 g/dL with normal WBC and platelet counts). Subsequently, he maintained a stable Hb of 8–10 g/dL and a raised MCV (∼100 fl), with transient worsening of anemia associated with febrile illnesses. A bone marrow examination performed at 18 mo of age showed normal cellularity with mild trilineage dysplasia with signs of dyserythropoiesis and small dysplastic megakaryocytes. Investigations performed for hemolytic anemias, infections, and bone marrow failure syndromes (diepoxybutane chromosome fragility test, RPS19 mutation analysis, MDS and juvenile myelomonocytic leukemia mutation/cytogenetic screen) were unremarkable. He had marginally elevated HbF (6.5%) and erythrocyte adenocyte deaminase (123) levels. His platelet counts remained in the low normal range with large platelets seen on the blood smear, and because of a history of prolonged bleeding after circumcision and recurrent mild epistaxis associated with respiratory infections, platelet function tests were done at the age of 6 yr. This showed abnormal platelet function with reduced aggregation after stimulation with low-dose collagen (1 µg/ml) and ADP (2 µM and 5 µM). ADP release was reduced on nucleotide studies, hinting at a δ-granule secretion defect. A repeat bone marrow examination done at the age of 7 yr because of falling platelet counts (lowest 113,000 cells/µl) showed no new changes. Subsequently his Hb and platelet counts have remained stable at 9–10 g/dL and 115,000–160,000 cells/µl, respectively. He remains clinically well with occasional mild epistaxis and bruising after injury. The mother of the patient had a reportedly normal peripheral blood count, but we were unable to obtain samples from this individual for further study.
Study approval
All family members provided written informed consent to participate in this study. The institutional review boards of Boston Children’s Hospital, Massachusetts Institute of Technology, the University of Michigan, and the University of Freiburg approved the study protocols.
Platelet aggregometry and flow cytometry
Platelet agglutination/aggregation were analyzed using the following agonists: ristocetin (1.2 mg/ml), collagen (2.0 µg/ml), ADP (4.0 μmol/l), and epinephrine (8 μmol/l), as described previously (Born and Cross, 1963). For flow cytometric analyses, diluted platelet rich plasma (5 × 107 platelets/ml) was stimulated with different concentrations of thrombin (0.025–1.0 U/ml) in the presence of 1.25 mM Gly-Pro-Arg-Pro. Platelets were stained by monoclonal anti-CD62P (CLB-thromb/6-FITC) and anti-CD63 (CLB-gran/12-FITC) antibodies and analyzed by flow cytometry as previously described (Lahav et al., 2002).
WES and related genetic analyses
The patients described in this paper are part of a rare blood disorder cohort that has been studied through the use of WES. WES was performed as previously described (Polfus et al., 2016; Kim et al., 2017; Khajuria et al., 2018; Ulirsch et al., 2018). WES in these cases was performed using genomic DNA obtained from peripheral blood samples of the patients. The resultant variant call file (in hg19 coordinates) was annotated with VEP v89 (McLaren et al., 2016) and rare variants (based on ExAC v0.3.1 and GnomAD r2.0.2; Lek et al., 2016; http://gnomad.broadinstitute.org/) were identified using a combination of the Genome Analysis Toolkit, Bcftools, and Gemini (McKenna et al., 2010; Li, 2011; Paila et al., 2013). No rare (<0.01% allele frequency in ExAC and GnomAD) loss-of-function or missense variants were identified in any known red blood cell disorder genes (ANK1, SPTB, SPTA1, SLC4A1, EPB42, EPB41, PIEZO1, KCNN4, GLUT1, G6PD, PKLR, NT5C3A, HK1, GPI, PGK1, ALDOA, TPI1, PFKM, ALAS2, FECH, UROS, CDAN1, SEC23B, KIF23, KLF1, GATA1, HBB, HBA1, HBA2, RPS7, RPS10, RPS15A, RPS17, RPS19, RPS20, RPS24, RPS26, RPS27, RPS28, RPS29, RPS31, RPL5, RPL11, RPL15, RPL18, RPL26, RPL27, RPL35, RPL35A, TSR2, and EPO), including those known to cause dyserythropoietic anemia. We next expanded our search to include all rare variants in these genes, and noted chrX: 48,652,176 C>T in the fifth intron of GATA1 harbored by these two patients, but absent from ExAC and GnomAD. The variant was hemizygous in both patients and carried in both mothers, suggesting a model of complete penetrance (the mother of patient 1 was studied by exome sequencing, while the mother of patient 2 had sequence analysis by Sanger sequencing). All mutations were confirmed from genomic DNA samples of the patients or family members by Sanger sequencing.
Minigene assay and splicing analysis
The GATA1 region flanking exon 5 and exon 6 was PCR amplified from genomic DNA from patient 1 and from a healthy unrelated individual with the addition of XhoI and NotI restriction enzyme sites (primers are as follows: GATA1 minigene forward, 5′-ATCATCCTCGAGTCTTGGGTCCTCCTGACATC-3′ and GATA1 minigene reverse, 5′-ATCATCGCGGCCGCCACATGGTCACACATTGCAG-3′) for cloning into the pSpliceExpress vector (Addgene; Kishore et al., 2008). The constructs were transfected into human embryonic kidney 293T cells (ATCC) with FuGENE (Promega). After reverse transcription of RNA obtained 48 h after transfection using an RNAeasy Mini kit (Qiagen), RT-PCR was performed, and amplified fragments were cloned using the TOPO TA Kit (Thermo Fisher Scientific) to enable confirmation of sequences through Sanger sequencing. For RT-PCR analyses, the segment of interest in GATA1 was amplified with the following primers (GATA1 Exon 5, 5′-AGTGGGGATCCCGTGTG-3′ and GATA1 Exon 6, 5′-ATCCTTCCGCATGGTCAGT-3′). Amplified products were linearized with 1X TBE-Urea (Bio-Rad), incubated for 10 min at 95°C, run on a 10% TBE-Urea Gel (Bio-Rad), and subsequently visualized on a GelDoc instrument (Bio-Rad) after 15 min of staining in ethidium bromide/1× TBE buffer. For semiquantitative analyses, bands were quantified from scanned images using ImageJ software (National Institutes of Health).
Cell culture and lentiviral transduction
Human primary adult bone marrow–derived CD34+ HSPCs were obtained from Fred Hutchinson Cancer Research Institute and cultured as previously described (Hu et al., 2013; Ludwig et al., 2014). The wild-type or mutated (involving a five–amino acid insertion corresponding with the observed intron retention event) GATA1 cDNA constructs were cloned into the HMD vector and transfected into 293T cells for lentiviral production, along with helper plasmids, as described previously (Ludwig et al., 2014). The primary HSPCs undergoing erythroid differentiation were infected with lentiviruses on day 2 of differentiation. Multiple donors were used in different experiments, which all yielded similar results. On day 5 of differentiation, cells were sorted for GFP and maintained in culture for further terminal erythroid differentiation (Ulirsch et al., 2016). Differentiating cells were collected on days 7, 9, and 12 for flow cytometric analysis with erythroid markers CD235a (HIR2; BD Biosciences), CD71 (OKT9; Invitrogen), and CD49d (9F10 or MZ18-24A9; BioLegend; Miltenyi) using a Fortessa Instrument (BD Biosciences). Cells were gated on live events using PI. G1E cells were kindly provided by the laboratory of Dr. Mitchell Weiss (St. Jude Children's Research Hospital, Memphis, TN) and cultured with 15% FBS, 1-thiogycerol, murine stem cell factor (Peprotech), and human erythropoietin (Amgen). G1E cells were infected with the lentiviruses discussed above and were either subjected to flow cytometric analysis with the mouse erythroid marker Ter119 and GFP using an Accuri cytometer (BD Biosciences) or sorted and subjected to downstream analyses. Lentiviruses expressing SF3B1 wild-type or K700E mutants, as described (Obeng et al., 2016), were used to infect 293T cells 1 d before transfection, as above for the minigene assay. Cytocentrifugation was performed with ∼100,000–200,000 cells using a Shandon Cytospin 4 onto polylysine-coated slides at 300× RPM for 4 min. After drying, samples were stained with May-Grünwald for 5 min and then with Giemsa for 15 min, as previously described (Fiorini et al., 2017). Images were taken using a Nikon Eclipse E800 microscope and Nikon Elements Software.
Western blotting
Approximately 1 million G1E cells were collected 72 h after infection and lysed in radioimmunoprecipitation assay buffer supplemented with protease and phosphatase inhibitor cocktails (Santa Cruz). Proteins were quantified according to the DC Protein assay (Bio-Rad), run using a Mini-Protean TGX gel system (Bio-Rad) and transferred onto a polyvinylidene fluoride membrane. Signal was detected by ECL Amersham Hyperfilm (GE Healthcare). Western blotting was performed using GATA1 (M20) and GAPDH (6C5) primary antibodies at 1:1,000 dilution (Santa Cruz). An HRP-conjugated goat anti-mouse was used at 1:20,000 dilution as a secondary antibody (Bio-Rad). For the 293T cells, a GAPDH antibody (0411; Santa Cruz) was used.
G1E RNA-seq
Cells were lysed in RLT lysis buffer (Qiagen) supplemented with β-mercaptoethanol, and RNA was isolated using a RNeasy Mini Plus kit (Qiagen) according to the manufacturer’s instructions. RNA was quantified using NanoDrop OneC (Thermo Fisher Scientific). 30 ng of RNA was used as input to a modified SMART-seq2 (Picelli et al., 2014) protocol, and after reverse transcription, eight cycles of PCR were used to amplify the transcriptome library. The quality of whole-transcriptome libraries was validated using a High Sensitivity DNA Chip run on a Bioanalyzer 2100 system (Agilent), followed by library preparation using the Nextera XT kit (Illumina) and custom index primers according to the manufacturer’s instructions. Final libraries were quantified using a Qubit dsDNA HS Assay kit (Invitrogen) and a High Sensitivity DNA chip run on a Bioanalyzer 2100 system.
Sequencing
All libraries were sequenced using Nextseq High Output Cartridge kits and a Nextseq 500 sequencer (Illumina). Libraries were consistently sequenced paired end (2 × 38 bp) and demultiplexed using the bcl2fastq program.
Alignment and differential expression
For each library, raw fastq reads were aligned using STAR version 2.5.1b (RNA-seq; Dobin et al., 2013) using the 2-Pass mode to the GRCm38.p4 reference genome, and gene expression counts were generated using quantMode in STAR. Human GATA1 cDNA (wild type and mutant) in the HMD vector, as described above, were appended as an additional chromosome. Differential expression was performed using the DEseq2 method (Love et al., 2014), and gene-wise statistical significance was determined using independent hypothesis weighting of 0.01 (Ignatiadis et al., 2016).
Clustering
A consistent set of features for the RNA-seq data were established by choosing the top differentially expressed genes between control (HMD) and GATA1 wild-type G1E cells (p.adj < 0.01, log2FC > 2, n = 223). A top genes by sample matrix of log2CPM values was created and transformed to Z scores. The optimal cluster number (3) was determined by NbClust, computed by K-means with three centers, and Z scores were plotted using ComplexHeatmap R package (Gu et al., 2016).
Principal components analysis
Principal components of the G1E RNA-seq experiment were computed using the rlog transformation of the gene count matrix and plotted using the plotPCA function of the DESeq2 package.
ChIP-seq analysis
ChIP-seq data were obtained from the NCBI Gene Expression Omnibus (GEO; accession no. GSE36029) and aligned using Bowtie2 version 2.3.3 (Langmead and Salzberg, 2012). The 30-h 17β-estradiol–treated G1E-ER4 cell sample (GSM995448) was used, as it is most representative of our experiment. The GATA1 binding profiles were determined for the top genes determined in the clustering analysis in a ±3-kb window around the canonical isoform TSSs, and Z scores were plotted as a heatmap using Complex Heatmap.
Human erythropoiesis RNA-seq analysis
Single-end RNA-seq datasets for five distinct maturation stages of primary human erythroid cells in biological triplicates were obtained from the NCBI GEO database (accession no. GSE53983; An et al., 2014). These datasets were aligned to the Ensembl GRCh37 r75 genome and gene annotations using 2-Pass STAR alignment. Resultant alignments were imported into R and visualized with Gviz.
Analysis of MDS patient samples
Cryopreserved bone marrow mononuclear cells from MDS patients with refractory anemia with ringed sideroblasts and healthy controls were thawed, washed, and recovered before staining with fluorescently conjugated antibodies targeting CD235a (HIR2), CD71 (OKT9), and CD34 (4H11). SF3B1 mutations had been analyzed and confirmed in other investigations on these samples but were not specifically done for this study. Flow cytometric cell sorting for CD235a+CD71+ and CD34+ was performed using a FACSAria II Cell Sorter (BD Biosciences). After sorting, the cells were washed, and RNA was isolated using the Ambion small RNA extraction kit (Zymo). cDNA was synthesized from the recovered RNA samples (20–40 ng) using the iScript cDNA Kit (Bio-Rad). RT-PCR was performed with GATA1 primers and normalization controls (primers GATA1 Exon 5, 5′-AGTGGGGATCCCGTGTG-3′, GATA1 Exon 6, 5′-ATCCTTCCGCATGGTCAGT-3′, GAPDH Exon 3, 5′-CACCAGGGCTGCTTTTAACT-3′, and GAPDH Exon 4, 5′-GACAAGCTTCCCGTTCTCAG-3′). Bands were visualized on denaturing gels and quantified as noted above. Samples were also analyzed with quantitative RT-PCR for GATA1 using a CFX96 Real Time Thermocycler (Bio-Rad) with SYBR Green (Bio-Rad; primers GATA1 Exon 2, 5′-CCCCAGTTTGTGGATCCT-3′ and GATA1 Exon 3, 5′-CACAGTTGAGGCAGGGTAGAG-3′, human ACTB Exon 3, 5′-AGAAAATCTGGCACCACACC-3′, and human ACTB Exon 4, 5′-GGGGTGTTGAAGGTCTCAAA-3′).
Statistical analysis
All pairwise comparisons were performed using the two-tailed Student’s t test, unless otherwise indicated. Differences were considered significant if the P value was <0.05.
Data availability
The WES data are available in the dbGaP database (http://www.ncbi.nlm.nih.gov/gap) under the accession no. phs000474. RNA-seq data are available in the GEO database (https://www.ncbi.nlm.nih.gov/geo/) under accession no. GSE124830.
Online supplemental material
Fig. S1 shows clinical phenotypes and WES data for affected patients. Fig. S2 shows the GATA1 mRNA levels of SF3B1 MDS patients in sorted marrow populations. Fig. S3 shows cytocentrifuge analysis of G1E cells expressing wild-type or mutated GATA1.
Acknowledgments
We are grateful to the patients and families for their interest and willingness to participate in this study. We thank members of the Sankaran and Wlodarksi laboratories, as well as S. Orkin and D. Nathan, for valuable comments and assistance. We thank the ENCODE Consortium and specifically the Hardison laboratory for generating the GATA1 ChIP-seq data in G1E-ER4 cells and providing it to the community as a public resource.
This work was supported by National Institutes of Health grants R01 DK103794 (to V.G. Sankaran), R33 HL120791 (to V.G. Sankaran), R01 HL107558 (to H.T. Gazda), and K02 HL111156 (to H.T. Gazda), as well as the New York Stem Cell Foundation (to V.G. Sankaran). V.G. Sankaran is a New York Stem Cell Foundation-Robertson Investigator.
The authors declare no competing financial interests.
Author contributions: N.J. Abdulhay, C. Fiorini, J.M. Verboon, and L.S. Ludwig performed the research, analyzed data, and wrote the manuscript. J.C. Ulirsch, C.A. Lareau, and X. Mi performed the research and analyzed data. B. Zieger, A. Roy, E.A. Obeng, M. Erlacher, N. Gupta, S.B. Gabriel, B.L. Ebert, C.M. Niemeyer, R.N. Khoriaty, P. Ancliff, H.T. Gazda, and M.W. Wlodarski provided clinical data and experimental reagents. V.G. Sankaran supervised the research, analyzed data, and wrote the manuscript. All authors edited the manuscript.
References
Author notes
N.J. Abdulhay, C. Fiorini, J.M. Verboon, and L.S. Ludwig contributed equally to this paper.