Recent advances in single-cell, transcriptomic profiling have provided unprecedented access to investigate cell heterogeneity during tissue and organ development. In this study, we used massively parallel, single-cell RNA sequencing to define cell heterogeneity within the zebrafish kidney marrow, constructing a comprehensive molecular atlas of definitive hematopoiesis and functionally distinct renal cells found in adult zebrafish. Because our method analyzed blood and kidney cells in an unbiased manner, our approach was useful in characterizing immune-cell deficiencies within DNA–protein kinase catalytic subunit (prkdc), interleukin-2 receptor γ a (il2rga), and double-homozygous–mutant fish, identifying blood cell losses in T, B, and natural killer cells within specific genetic mutants. Our analysis also uncovered novel cell types, including two classes of natural killer immune cells, classically defined and erythroid-primed hematopoietic stem and progenitor cells, mucin-secreting kidney cells, and kidney stem/progenitor cells. In total, our work provides the first, comprehensive, single-cell, transcriptomic analysis of kidney and marrow cells in the adult zebrafish.
A central challenge in developmental biology is to dissect the hierarchy of stem cells and to understand the functional diversity imparted to their differentiated progeny. Hematopoiesis is a beautiful example of how stem and progenitor cells give rise to mature cell types that nourish tissues by delivery of oxygen, protect the host from viral and bacterial pathogens, and provide clotting factors to stop blood cell loss after wounding (Copley et al., 2012). Zebrafish have emerged as a powerful, experimental tool for uncovering this functional heterogeneity, with fish sharing conserved cell types and underlying molecular programs that regulate hematopoiesis similar to those found in mammals (Dooley and Zon, 2000). Despite clear similarities between zebrafish and mammalian blood cell development, definitive hematopoiesis occurs in a stromal compartment of the adult zebrafish kidney (Davidson and Zon, 2004), as opposed to the bone marrow in the mouse and human. Nonetheless, it is likely that hematopoietic stem and progenitor niche-supporting cells have conserved function and underlying molecular pathways responsible for driving continued blood production in adult fish. To date, the functional and transcriptional gene expression diversity of blood and their surrounding kidney cells has not been defined unbiasedly and at single-cell resolution in zebrafish.
Hematopoietic cell lineages can be commonly identified in mouse and human using cell-surface antibodies. However, that method has not been broadly applied to the zebrafish because of the lack of fish-specific antibodies. Rather, investigators have developed a wide array of fluorescent, transgenic, zebrafish lines that label specific blood cell types (Langenau et al., 2004; Mathias et al., 2006; Renshaw et al., 2006; Ellett et al., 2011; Page et al., 2013). Despite the great utility of fluorescent reporter lines for dynamic, in vivo cell imaging, those approaches often fail to identify the total heterogeneity of blood cell types because of the lack of sufficient transgenic lines and distinguishing fluorophores that are capable of labeling all the major blood cell types simultaneously within the same animal. Building on known transcriptional differences observed among blood cell types, we have previously developed a Fluidigm quantitative PCR platform using 96 PCR primers that can distinguish the major blood-cell lineages (Moore et al., 2016a). Developing that platform required prior knowledge of lineage-restricted marker genes, optimized PCR primer sets, and access to the expensive Fluidigm quantitative RT-PCR equipment. In contrast, Cvejic and colleagues have recently applied Smart-seq RNA, single-cell transcriptional profiling to identify subsets of cd41+ hematopoietic stem and progenitor cells (HSPCs) and thrombocytes (Macaulay et al., 2016) and lck:GFP+ lymphocytes (Carmona et al., 2017). However, to date, unbiased analysis of whole kidney marrow and large-scale RNA sequencing performed on thousands of individual cells from the same fish have not been reported, precluding unbiased identification of cell populations and quantitative assessment of how genetic mutations alter blood cell types within a given animal. In this study, we used droplet-based, single-cell RNA sequencing (Klein et al., 2015; Zilionis et al., 2017) to transcriptionally profile zebrafish blood and kidney cells. Using that approach, we identified novel blood and kidney cell types for which transgenic reporter lines are not yet available, discovered new marker genes for specified cell lineages in an unbiased manner, and characterized immune deficiencies within mutant zebrafish. Finally, we report a data set that can be queried to identify transcript differences within kidney marrow cells (Lareau et al., 2017). Those data will be useful in developing new transgenic reporter lines and in identifying novel markers of specific cell lineages.
Results and discussion
Single-cell RNA sequencing of blood cells from defined transgenic lines
In an effort to resolve hematopoietic cell lineages in adult zebrafish at single-cell resolution, we initially performed single-cell, transcriptomic profiling of cells isolated from fluorescent transgenic lines that label distinct blood-cell types. Specifically, whole kidney marrow and thymic T cells were isolated from adult, transgenic zebrafish using FACS, in which single cells were sorted into 96-well plates. RNA was isolated and made into bar-coded cDNA libraries using the Smart-seq2 method (Picelli et al., 2013). Transgenic lines analyzed by RNA sequencing included Tg(Runx1+23:GFP), which marks HSPCs (Tamplin et al., 2015); Tg(cd41:GFP), which labels both HSPCs and thrombocytes (Lin et al., 2005); Tg(mpx:GFP), which identifies neutrophil/myeloid lineages (Mathias et al., 2006; Renshaw et al., 2006); Tg(rag2:GFP), which marks marrow-derived B cells (Page et al., 2013); and Tg(lck:GFP), which labels thymic T cells (Langenau et al., 2004). We also analyzed cells isolated from the whole kidney marrow of Tg(lck:GFP), rag1-deficient fish, which would be expected to lack mature T cells and yet retain GFP+ NK cells (Shinkai et al., 1992; Wang et al., 1996; Moore et al., 2016a). In total, we profiled 246 single cells from those six transgenic lines, with a median of 1,059 genes detected per cell.
Differential gene-expression analysis across transgenic lines identified transcriptional programs consistent with identification of predicted blood-cell populations labeled within each transgenic line (Fig. 1 A). From that analysis, we identified known lineage-specific genes that were both highly expressed and found in a large fraction of isolated blood cells from each transgenic line (Fig. 1 B and Table S1). t distributed stochastic neighbor embedding (tSNE) analysis (van der Maaten and Hinton, 2008) was then used to visualize transcriptional differences among single cells. As expected, single cells from the same transgenic lineage largely clustered together. For example, tSNE visualization identified distinct clusters for mpx:GFP+ neutrophils, lck:GFP+ T and NK cells, and rag2:GFP+ B cells (Fig. 1 C). In contrast, cells from cd41:GFPlow and Runx1+23:GFP HSPCs failed to cluster into a single, defined group (Fig. 1 C), likely reflecting extensive and unpredicted heterogeneity within those transgenic, labeled cells.
Reconstruction of gene expression trajectories using Monocle and ARIADNE revealed expected delineation of T, NK, and B cells and neutrophil lineages (Fig. 1 D and Fig. S1 A). ARIADNE pseudo-time visualization also identified that Runx1+23:GFP and cd41:GFPlow cells were confined to two major trajectories; both of which expressed the stem and progenitor cell markers for T cell acute lymphocytic leukemia 1 (tal1), GATA binding protein 1a (gata1a), and v-myb avian myeloblastosis viral oncogene homologue (myb; Fig. 1 D and Fig. S1, B–E). One trajectory comprised classically defined HSPCs and expressed Meis homeobox 1 b (meis1b), pre–B cell leukemia homeobox 1a (pbx1a), zinc finger protein zfpm1 (fog1, friend of gata1), and friend leukemia integration 1 transcription factor (fli1a; Fig. 1 D and Fig. S1). In contrast, a second trajectory characterized cells that had lineage skewing toward the erythroid lineage and expressed hemoglobin, α adult 1 (hbaa1), uroporphyrinogen decarboxylase (urod), spectrin β chain (sptd), and aminolevulinate, δ-, synthase 2 (alas2; Fig. 1 D and Fig. S1). These lineage assignments were confirmed using principle components analysis and unbiased hierarchical clustering using the top 20 differentially regulated genes identified in each ARIADNE trajectory (Fig. S1, B and C). Our results suggest a high degree of evolutionary conservation in blood cell hierarchies that are shared with mouse (Nestorowa et al., 2016) and human (Velten et al., 2017). Our results are also consistent with recent paradigm-shifting findings from humans that HSPCs can be lineage committed and that a subset of hematopoietic stem cells in adult marrow can directly specify erythroid cell types (Notta et al., 2016).
Unbiased analysis of whole-kidney marrow cells using InDrops RNA sequencing
Having defined many of the major hematopoietic cell lineages by sequencing highly purified, transgenic-labeled blood cells, we next profiled a larger number of unlabeled single cells isolated from the kidney of WT fish. These experiments sought to construct a cell lineage atlas of blood cells found within the adult zebrafish kidney marrow. To that end, we used a massively parallel transcriptomic method using indexing droplets (InDrops) single-cell RNA sequencing (Klein et al., 2015; Zilionis et al., 2017), which allowed high-throughput processing of thousands of cells from each individual kidney marrow. Specifically, single cells were harvested from the kidney of WT casper-strain zebrafish, reverse-transcribed, and uniquely bar-coded within individual droplets. cDNA from single cells was collectively processed for library construction and sequenced on a next-generation sequencing platform. Using this method, we profiled 3,782 cells from three WT, casper-strain animals, with a median of 811 genes detected per cell.
tSNE analysis was then used to unbiasedly identify molecularly defined clusters of marrow cells. Unbiased hierarchical clustering separated cells into the major hematopoietic cell lineages and also identified seven unique kidney stromal-cell types (Fig. 2, A and B). That analysis effectively made lineage assignments based on expression of well-known hematopoietic marker genes differentially enriched in each cluster. For example, high macrophage receptor with collagenous structure (marco) and macrophage expressed 1, tandem duplicate 2 (mpeg1.2) expression was indicative of the macrophage population, whereas cells in the erythroid cluster expressed hemoglobin subunit β-1 (ba1), hbaa1, and δ-aminolevulinate synthase 2 (alas2).
To confirm cluster-identity assignments, we also used gene signatures unbiasedly discovered from our SmartSeq2 single-cell RNA sequencing of fluorescent, transgenic hematopoietic lineages, projecting the combined expression of the top 20 most highly and frequently expressed genes found within each transgenically defined cell lineage (Table S2). From that analysis, we uncovered well-defined cell clusters that expressed signatures derived from Runx1+23:GFP+ HSPCs, lck:GFP+ T and NK cells, rag2:GFP+ B cells, and mpx:GFP+ neutrophils (Fig. 2, C and D; and Fig. S2). In analyzing genes identified in the overlap of the most highly and frequently expressed genes from the Runx1+23:GFP HSPCs (Table S2), we recognized that many of those genes included markers of the erythroid cell lineage, consistent with our SmartSeq2 RNA sequencing and pseudo-time trajectory analysis. Using the refined gene list identified from the trajectory analysis of classically defined HSPCs, we confirmed that the same tSNE cluster was identified using this signature (Fig. S2, A–C). Additionally, we also analyzed a thrombocyte lineage signature generated from single-cell transcriptional profiling of cd41:GFPhigh cells by Macaulay et al. (2016). Remarkably, that gene signature was also enriched within the classically defined HSPCs and the erythroid-primed HSPCs cluster (Fig. 2 D and Fig. S2 D). Those data raised the intriguing possibility that closely related transcriptional programs exist among classically defined HSPCs, erythroid lineage-committed progenitors, and thrombocytes and may be indicative of direct stem cell commitment into the megakaryocyte/erythroid cell lineage. Importantly, gene expression analysis of cell cycle–regulated genes confirmed that only a small fraction of HSPCs were cycling and identified further heterogeneity within the erythroid and neutrophil cells, likely identifying distinct progenitor and precursor cell states (Fig. S2 I; Tirosh et al., 2016).
Because lck:GFP+ T cell and NK cell signature genes were both expressed within the same tSNE cluster (Fig. 2, C and D), we next performed additional tSNE analysis and visualization within that cell population, which revealed the existence of three molecularly distinct subclusters. Using gene expression signatures discovered from analysis of lck:GFP+ cells from WT thymus or the marrow of rag1-deficient animals, we identified that the largest subcluster comprising 321 cells expressed T cell signature genes, which included CD8a molecule (cd8a), CD4-1 molecule (cd4-1), and ζ-chain (TCR)–associated protein kinase (zap70; Fig. 2 E and Table S3). Furthermore, a smaller subcluster comprised 142 cells and expressed both T and NK cell genes, including thymocyte selection associated (themis), T cell receptor β constant 1 (trbc1), and NK–lysin tandem duplicate 2 (nkl.2), and IL-2 receptor, β (il2rb; Fig. 2 F). Those data suggest that this subcluster comprised NK cells derived from the T cell lineage. Finally, we observed a cluster of 29 molecularly defined cells that failed to express either T or NK cell signature genes identified from sorted lck:GFP+ cells, yet those cells expressed multiple chemokine (C-C motif) ligands (including ccl33.3), perforins (prf1.2 and 1.7), and NK–lysin 3 and 4 (nkl.3 and nkl.4; Fig. 2, E and F; and Table S3). NK–lysins are analogous to mammalian granulysins, which are antimicrobial proteins made by both cytotoxic T lymphocytes and NK cells (Pereiro et al., 2015; Dotiwala et al., 2016). Those data suggest the existence of a second and novel NK-like cell population (NKL cells) that are not derived from lck:GFP+ lymphocytes and yet retain genes expressed in cytotoxic and lytic cell types (Fig. 2 G). In total, our unbiased analysis of single-cell gene expression identified a wide array of expected and novel blood cell types within the marrow of the adult zebrafish.
Characterizing hematopoietic cell deficiencies in mutant zebrafish using InDrops sequencing
Next, we wanted to validate our proposed InDrops hematopoietic lineage identification by comparing single-cell transcriptional profiles isolated from marrow cells of WT and mutant animals. We have previously reported the characterization of DNA–protein kinase catalytic subunit (prkdc) homozygous mutant zebrafish (Moore et al., 2016b). These homozygous prkdcD3612fs zebrafish have deficiencies in non-homologous end joining repair and thus fail to efficiently recombine T and B cell receptors, demonstrating striking diminution of B cells with only a modest reduction in T cell number when assessed by both quantitative real-time PCR analysis and RNA sequencing performed on bulk kidney marrow (Moore et al., 2016b). In this study, we profiled 3,201 single cells harvested from the kidney marrow of three homozygous prkdcD3612fs mutant fish. We observed a 20-fold reduction in B cells in the homozygous prkdcD3612fs mutant fish, whereas the percentage for T cells decreased by only one half (Fig. 3, A and B). prkdc deficiency specifically reduced the number of mature T cells and NK cells, whereas NKL cells were retained in homozygous prkdcD3612fs mutant fish (Fig. 3, E and F).
To assess whether T and NK cell dysfunction could also be assessed using high-throughput single cell RNA sequencing methods, we created zebrafish with truncating mutations in the IL-2 receptor γ a (il2rga) using transcription activator-like effector nuclease (TALEN)-mediated mutagenesis (Fig. S3). In mammals, the IL-2 receptor common γ chain dimerizes with a wide array of IL receptors to drive lymphocyte and NK cell maturation (Kondo et al., 1993, 1994; Russell et al., 1993; Matthews et al., 1995). Thus, deficiencies in this gene result in perturbed T and NK cell development. Indeed, histologic analysis of homozygous il2rgaY91fs zebrafish revealed a dramatic loss of thymic T cells and a decrease in T and NK cell markers in the whole kidney marrow when assessed by quantitative PCR and bulk RNA sequencing (Fig. S3). As would be expected based on mouse and human deficiencies (Puck et al., 1997; Ito et al., 2002), B cells were unaffected in mutant fish (Fig. S3, D and E). Indeed, InDrops sequencing of homozygous il2rgaY91fs mutant zebrafish also revealed a striking reduction in T and NK cell lineages with no overt reduction in B cells (Fig. 3, C and G; n = 2,068 single cells, two fish analyzed). In fact, the percentage of B cells increased relative to other hematopoietic groups in homozygous il2rgaY91fs mutant fish, likely resulting from lineage compensation and shunting of lymphoid precursors into the B cell lineage. Lastly, generation of compound prkdcD3612fs, il2rgaY91fs double-homozygous mutant zebrafish resulted in losses in T, NK, and B cell populations (Fig. 3, D and H; n = 2,276 cells, two fish analyzed). In total, our experiments provide a robust and efficient methodology to unbiasedly identify hematopoietic cell deficiencies in mutant zebrafish, a method likely to be useful for characterizing a wider array of mutant lines in the future.
Dissecting kidney cells at single-cell resolution
The vertebrate kidney has two main evolutionarily conserved functions. One is to remove waste substances from circulation, and the second is to balance osmolarity within a physiologic range (Vize et al., 1997). These functions are performed by highly conserved structures, including the glomerulus, segmented nephron tubules, and collecting duct (Vize et al., 1997). The simplicity of the kidney structure in the zebrafish embryo has propelled the model forward as one of the best for studying kidney development and function (Drummond et al., 1998; Morales and Wingert, 2017). Studies of the adult zebrafish kidney have also uncovered remarkable regenerative capacity and the identification of stem cell populations that generate new nephrons in response to injury (Diep et al., 2011). Remarkably, zebrafish hematopoiesis also takes place in the kidney, where blood cells are anatomically interspersed among a wide variety of kidney cell types; however, to date, molecularly defined kidney cells that maintain HSPC growth, self-renewal, and maturation have not been fully defined. Because we manually dissected whole kidney marrow, large numbers of kidney cells were also included in our single-cell InDrops sequencing, permitting a more-detailed characterization of those cell types in adult fish.
Our initial analysis of zebrafish kidney marrow samples identified distinct tSNE clusters that comprised seven different kidney cell types (Fig. 4 A). As may be expected, blood cell mutants had no overt differences in kidney cell composition when compared with those isolated from WT fish (Fig. 4 A, right). Similar to the analysis of hematopoietic cell lineages, we identified unique gene expression signatures that defined each of these kidney cell clusters (n = 1,699 total kidney cells analyzed; Table S3). The identities of four clusters could be readily assigned by expression of known genes. For example, three kidney clusters comprised epithelial cell types that are highly conserved with mammals (Lee et al., 2015), including (a) the proximal tubule defined by the expression of solute carrier family (slc) genes, (b) the distal tubule defined by expression of chloride channel K (clcnk), homeobox B8a (hoxb8a), and claudin h (cldnh), and (c) the nephron epithelial cells that share gene signatures with both proximal and distal tubules and may represent an intermediate tubule segment (Table S3). We also identified the long-postulated, yet molecularly undefined, mucin-secreting kidney cells. These cells expressed well-known markers of mucin-secreting cells, including oligomeric mucus/gel-forming (muc2.4), anterior gradient 2 (agr2), and intelectin 1 (itln1; Fig. 4, B and C; and Table S3). These cells are likely responsible for secreting a mucin matrix to create immune barriers that curb infection, sharing a similar function to the mucous cells that line the epithelium of the mammalian urogenital tract (Brunskill and Potter, 2012). This is the first transcriptional analysis of this unique cell type in fish, opening new opportunities to further study their biological functions and ontogeny with human kidney and other mucinous cell types.
The identities for the remaining three kidney cell clusters were identified using gene set enrichment analysis (GSEA), from which we uncovered prominent gene signatures for putative kidney stem/progenitor cells, multiciliated cells, and vascular endothelium (Fig. 4 B and Tables S4 and S5). All of the above identified kidney cell types also expressed the expected cell lineage–defining genes (Fig. 4 C). For example, the kidney progenitor cell population expressed known kidney stem/progenitor cell markers shared with the mouse metanephric cap mesenchyme (Davidson, 2008; Kobayashi et al., 2008), including SIX homeobox 2a (six2a), EYA transcriptional coactivator and phosphatase 2 (eya2), odd-skipped related transcription factor 1 (osr1), and transforming growth factor, β receptor II (tgfbr2). These kidney precursor cells were also enriched for genes highly expressed in human CD31− stromal stem cells, yet also contained a large fraction of collagen matrix-associated genes (Fig. 4 B). Our analysis also identified the existence of molecularly distinct, multiciliated kidney cells that reside in the renal tubules to drive fluid movement through the kidney. These cells expressed well-known motor proteins required for cilia motility including tubulin α 7 like (tuba7l), dynein axonemal intermediate chain 1 paralog 1 (dnai1.1), and kinesin family member 17 (kif17; Fig. 4 C). Finally, GSEA MSigDB analysis allowed us to identify vascular endothelial cells that would be a predicted component of the kidney, supporting both healthy kidney cell function and blood cell trafficking from HSPC niches (Fig. 4, B and C). These cells expressed vascular endothelial growth factor receptor kdr-like (kdrl) and cadherin 5 (cdh5; vascular endothelial cadherin), with a subset of cells expressing cxcl8a/8b, a known chemokine required for sustaining HSPC maintenance and function (Blaser et al., 2017). In total, our work uncovered new kidney cell lineages not previously profiled transcriptionally and identified new marker genes for well-known cell types, opening new and exciting opportunities to study cell-type–specific roles in regulating kidney growth, regeneration, and niche support of hematopoietic stem cells.
Our work defined the single-cell heterogeneity of the zebrafish adult blood marrow and kidney cells at an unprecedented scale using both Smart-Seq2 and InDrops RNA sequencing approaches, providing a much-needed resource for the community to identify novel cell types and gene expression signatures that can be used as markers that define specific cell lineages in zebrafish. These data can be queried using a user-friendly website, defining InDrops gene expression within defined whole-kidney marrow cell clusters (see Lareau et al., 2017). When coupled with subcellular locational information using immunohistochemistry, fluorescent in situ hybridization (FISH), and/or transgenic approaches, the lineage-restricted genes identified by our studies will aid in spatially mapping the organization of blood and kidney cell types in the adult. This will be particularly useful in mapping vascular epithelial cell types that interact with HSPCs. Finally, as the cost of sequencing continues to drop and new methods are developed that allow transcriptional profiling at greater depth within single cells, we envision that future studies will define blood and kidney cell heterogeneity at higher resolutions and likely allow assessment of intertwined relationships between zebrafish blood and kidney cells.
Materials and methods
Transgenic reporter fish used in this work included Tg(Runx1+23:GFP; Tamplin et al., 2015), Tg(cd41:GFP; Lin et al., 2005), Tg(mpx:GFP; Mathias et al., 2006; Renshaw et al., 2006), Tg(rag2:GFP; Jessen et al., 2001), and Tg(lck:GFP; Langenau et al., 2004). Casper strain zebrafish (White et al., 2008), prkdcD3612fs (ZFIN identifier, prkdcfb103) mutant fish (Moore et al., 2016b), and rag1 mutant fish (Wienholds et al., 2002) have been reported previously. il2rgaY91fs mutants were created in casper-strain zebrafish (ZFIN identifier, il2rgafb104) using TALEN-mediated mutagenesis, as previously described (Moore et al., 2012). TALEN targeting sequences were 5′‑TACATTGAAAACAAGCCT‑3′ and 5′‑AGTTTGGTGACGGAACAGGA‑3′. prkdcD3612fs/D3612fs, il2rgaY91f/+ mutant fish were in-crossed to produce double-homozygous animals, which were identified by scale genotyping at 2–3 mo old (Moore et al., 2016b) and were generated at the expected Mendelian ratios. All zebrafish experiments were approved by the Massachusetts General Hospital Institutional Animal Care and Use Committee under protocol 2011N000127.
Isolation of single cells from the zebrafish kidney marrow
Zebrafish kidney marrow and thymus were dissected from 3–9-mo-old transgenic fish, made into single-cell suspensions, and individual cells were FACS sorted into 96-well plates, as previously described (Moore et al., 2016a). Each well contained 5 µl TCL buffer (QIAGEN) + 1% β-mercaptoethanol (Bio-Rad Laboratories). 96-well plates were centrifuged at 1,000 g for 3 min at 4°C, snap-frozen on dry ice, and stored at −80°C. For InDrops sequencing approaches, marrow was isolated from zebrafish as single-cell suspensions and used in microfluidic sorting.
Smart-Seq2 single-cell RNA sequencing and processing
Whole-transcriptome amplification and library construction were performed on single cells using a modified Smart-Seq2 (Illumina) protocol adapted by (Venteicher et al., 2017). Libraries of uniquely bar-coded cells were combined and sequenced with a Nextseq 500 High Output V2 kit (75 cycles; Illumina) on a NextSeq 500 platform (Illumina). Using this method, we profiled n = 246 single cells from six transgenic lines.
The STAR program (Dobin et al., 2013) was used to align reads to the GRCz10 genome and Ensembl genome browser (version 85; EMBL-EML) was used for gene annotation. PCR duplicates were removed using Picard (Broad Institute), and reads were aligning to ribosomal RNA were removed using RSeqC split_bam.py. We then used featureCounts (Liao et al., 2014) to assign reads with alignment quality of 10 or more to genes and generated a matrix of fragment counts in each cell. All further processing was performed using Seurat (GitHub). We filtered for cells that expressed a minimum of 500 genes and for genes that were expressed in ≥1 cell, with a gene detection threshold of five reads. To find markers of each transgenic population of cells, we combined expression read counts in each population and queried the differential expression of each gene in a foreground transgenic population versus all other cells. We used two metrics to calculate differential expression: (1) log-fold change of expression, and (2) difference in the percentage of cells expressing the gene in the foreground population versus the background population. Markers of cd41:GFPhigh cells were identified as the top 20 genes ranked by correlation with high GFP expression from the data set produced by Macaulay et al. (2016). Human orthologues of zebrafish genes were found using the Beagle database (Bernards, 2017).
Indexing droplets (InDrops) sequencing and analysis
Microfluidic sorting and library construction were performed by the Harvard ICCB Single Cell Core using the method outlined in Klein et al. (2015) and Zilionis et al. (2017). In brief, kidney cell suspensions were resuspended in 1.0× PBS + 1% BSA at 0.2–1 × 106 cells/ml, treated with 15% Optiprep (Sigma-Aldrich), and loaded onto the microfluidic sorting device. During microfluidic sorting, individual cells were captured in droplets, lysed, and RNA reverse transcribed using unique bar-codes, allowing measurement of the absolute transcript level by unique molecular identifier quantification within individual cells (Klein et al., 2015). Doublet rate was estimated at 4%, similar to that reported previously (Klein et al., 2015).
RNA from 1,500 cells of each sample kidney were pooled and subsequently processed together. cDNA products were preamplified to produce a cDNA library compatible with the Illumina Nextseq platform. Libraries were quantified using the KAPA Library Quantification kit (Kapa Biosystems). Three sequencing runs were completed using the Nextseq 500 High Output V2 kit (75 cycles; Illumina) on a NextSeq 500 platform (Illumina).
We used the InDrops pipeline (GitHub), modified to process zebrafish immune-cell data for processing InDrop sequencing reads. Reads were aligned to GRCz10 (Ensembl version 85) transcriptome using Bowtie. All further processing was performed using Seurat. We filtered for cells that expressed a minimum of 200 genes and filtering for genes that were expressed in a minimum of 10 cells, using a gene-detection threshold of one read. We combined expression data from all experiments (n = 11,327 cells), unbiasedly clustered cells by the first 25 principal components, and then visualized the data using tSNE. This allowed us to ensure that tSNE plots were uniform across experimental conditions and permitted us to digitally subtract samples to garner the renderings presented throughout this article. Further characterization the T/NK cell cluster identified three subclusters. Markers of each subcluster were identified using the same method described here and are shown in Table S3.
Cell cycle profiles were defined by first calculating the geometric mean expression of G1/S and G2/M genes identified by Tirosh et al. (2016) and then labeling the cell state as G1/S or G2/M if the geometric means of the respective genes in each phase were in the 95th percentile or higher.
RNA-seq on bulk zebrafish kidney marrow
Whole kidney marrow from WT, prkdcD3612fs, il2rgaY91fs, and double-mutant animals (n = 3 for each genotype) were surgically dissected and placed into 350 µl of QIAGEN RLT buffer containing 1% 2-mercaptoethanol, followed by RNA isolation using the RNAeasy Mini kit (QIAGEN). 100 ng of RNA isolated from each animal was subjected to mRNA selection by the RNA NEBNext Poly(A) mRNA magnetic isolation module (New England Biolabs, Inc.), followed by next-generation sequencing library construction using NEBNext Ultra Directional RNA Library Prep kit (New England Biolabs, Inc.) for Illumina using 15 cycles of PCR amplification. Libraries were sequenced on the Illumina HiSeq 2000 platform by the high-output, paired-end, 50-bp method with 1% PhiX spike-in.
The STAR program (Dobin et al., 2013) was used to align reads to GRCz10 genome and Ensembl (version 85) was used for gene annotation. PCR duplicates were removed using Picard, and reads aligning to ribosomal RNA were removed using RSeqC split_bam.py. We then used featureCounts (Liao et al., 2014) to assign reads with alignment quality of 10 or more to genes and generated a matrix of fragment counts in each cell. We used DESeq2 (Love et al., 2014) to detect differential expression between mutant and WT samples. We then identified genes that were significantly down-regulated [DESeq2 log(fc) less than −1 and adjusted P < 0.05) in each comparison. Human orthologues of each of these gene sets were obtained using Beagle and used for enrichment analysis with GSEA MSigDB.
To reconstruct cellular pseudo-time and to reorder cells into potential developmental trajectories, we used the ARIADNE software pipeline and Monocle2 obtained from Bioconductor (R package ‘monocle’ version 2.4.0; Trapnell et al., 2014; Qiu et al., 2017 Preprint). The ARIADNE method first represents single cells in a low-dimensional space based on transcriptomic similarities. Cellular trajectories are then modeled as a tree composed of curves inferred by principal curve fitting. After selecting a root state from the tree structure, ARIADNE generates a 2D representation with multiple branches, in which cells are plotted as dots, and cellular developmental trajectories are represented as straight line. Along each trajectory, cells are ordered based on reconstructed pseudo-time. The lengths of trajectories and distances between cells and their projections on the trajectories are obtained from the low-dimensional space. PCA and hierarchical clustering were also completed on Tg(Runx1+23:GFP) cells using 40 differentially expressed genes automatically detected by ARIADNE (20 highly expressed only in the classically defined HSPCs trajectory and 20 highly expressed only in the erythroid-primed HSPCs trajectory).
For Monocle analysis, parameters were set to expressionFamily = ‘tobit()’ and default parameters for the Smart-seq platform. For the comparison, we used the same gene-expression matrix used for the ARIADNE analysis.
Online supplemental information
Fig. S1 shows Monocle and ARIADNE pseudo-time analyses for transgenic-labeled blood cells profiled using SMARTseq2. Fig. S2 shows the verification of InDrops cluster assignments using transgenically defined gene signatures. Fig. S3 presents the creation and characterization of the il2rga−/− and prkdc−/−, il2rga−/− immune compromised zebrafish. Table S1 lists the genes denoted in the heat map provided in Fig. 1 A. Table S2 provides the Smart-Seq2 lineage signatures used for identification of InDrop groups in Fig. 2 (C–F). Table S3 provides InDrops RNA sequencing gene expression for each identified kidney and blood cell population. Table S4 lists the humanized genes used for GSEA signature analysis shown in Fig. 4 B. Table S5 provides an analysis of the differentially regulated genes by cluster using GSEAsig analysis. Table S6 provides the real-time quantitative PCR primers used in Fig. S3. Tables S1–S6 are provided as Excel files.
We thank Dr. Felix Ellett, Lindsay Theodore, Dr. Finola Moore, and Elaine Garcia for animal husbandry and maintenance of transgenic zebrafish lines. We thank Christina C. Luo and Rachel Servis from Massachusetts General Hospital Flow Cytometry Core for help with single-cell sorting; Na Qu and Dr. Toshi Shioda for help with next-generation sequencing; Dr. Sarah A. Boswell, Alex Ratner, and Samuel L. Wolock from Harvard Medical School ICCB-L Single Cell Core for consultation on InDrops experimental design and technical assistance; Dr. Rory Kirchner for guidance on InDrops sequencing data analysis; and Dr. Ruslan Sadreyev, Anthony Anselmo, Ulandt Kim, and Yevgeniya Li from Massachusetts General Hospital Bioinformatics Core for their assistance with bulk RNA sequencing. We thank members of the Langenau laboratory for constructive discussion.
This work was supported by the National Institutes of Health, grant R24OD016761 to D.M. Langenau. Q. Tang was funded by the China Scholarship Council.
The authors declare no competing financial interests.
Author contributions: Q. Tang performed experiments, data analysis, and wrote the manuscript; S. Iyer performed bioinformatics analysis and wrote the manuscript; R. Lobbardi, J.C. Moore, C. Hebert, M.L. Shaw, and C. Neftel performed experiments, provided expert technical advise, and helped with experimental design; H. Chen and C. Lareau performed bioinformatics analysis and website design; A. Bernards helped with mapping zebrafish homologues with human genes and created the Beagle online portal; M.L. Suva, C.J. Ceol, M. Aryee, L. Pinello, I.A. Drummond, and D.M. Langenau provided help with experimental design, funding, project oversight, expert advice, and help with writing/editing the manuscript.
fluorescent in situ hybridization
gene set enrichment analysis
hematopoietic stem and progenitor cell
indexing droplets single-cell RNA sequencing
lymphocyte-specific protein tyrosine kinase
DNA–protein kinase catalytic subunit
transcription activator-like effector nuclease
t distributed stochastic neighbor embedding
zebrafish identifier nomenclature
Q. Tang and S. Iyer contributed equally to this paper.
R. Lobbardi’s present address is Blueprint Medicines, Cambridge, MA.
J.C. Moore’s present address is Bluebird Bio, Cambridge, MA.