CIZ1 is part of the RNA-dependent supramolecular assemblies that form around the inactive X-chromosome (Xi) in female cells and smaller assemblies throughout the nucleus in both sexes. Here, we show that CIZ1 C-terminal anchor domain (AD) is elevated in human breast tumor transcriptomes, even at stage I. Elevation correlates with deprotection of chromatin and upregulation of lncRNA-containing gene clusters in ∼10 Mb regions enriched in cancer-associated genes. We modeled the effect of AD on endogenous CIZ1–Xi assemblies and observed dominant-negative interference with their reformation after mitosis, leading to abnormal assemblies similar to those in breast cancer cells, and depletion of H2AK119ub1, H3K27me3, and Xist. Consistent alterations in gene expression were evident across the genome, showing that AD-mediated interference has a destabilizing effect, likely by unscheduled exposure of underlying chromatin to modifying enzymes. The data argue for a dominant, potent, and rapid effect of CIZ1 AD that can deprogram gene expression patterns and which may predispose incipient tumors to epigenetic instability.
Introduction
Selection and packaging of chromatin into transcriptionally repressed states underlie cell specialization and development. Weakened repression of heterochromatin can result in pro-oncogenic changes and has the potential to give rise to all the classic hallmarks of cancer, even in the absence of genetic change (Flavahan et al., 2017; Hanahan, 2022; Parreno et al., 2024). The inactive X chromosome (Xi) is the most intensely studied model of facultative heterochromatin formation, revealing how the cis-acting lncRNA Xist (Brockdorff et al., 1992; Brown et al., 1992) directs the formation of large RNA-dependent supramolecular assembly complexes (SMACs) populated by chromatin-modifying enzymes (Markaki et al., 2021). Aggregation of SMAC proteins, mediated by their intrinsically disordered regions (IDRs), creates a functional nuclear compartment that partitions regulatory factors to establish local gene silencing early in development.
Cip1-interacting zinc finger protein 1 (CIZ1) is one of several proteins that populate Xi SMACs, recruited via its interaction with the repeat E element of Xist (Ridings-Figueroa et al., 2017; Sunwoo et al., 2017). Several observations set CIZ1 apart from other SMAC components. First, it is not required for Xist recruitment, Xi silencing, or embryonic development, and the impact of its loss only becomes apparent in somatic cells in which repressed chromatin is already established but must be faithfully maintained. A requirement for CIZ1 is apparent in differentiated fibroblasts from CIZ1 null mice, where local retention of Xist around Xi chromatin is compromised (Ridings-Figueroa et al., 2017; Sunwoo et al., 2017), repressive histone posttranslational modifications (PTMs) are lost, and genome-wide changes in the expression of genes under the regulation of polycomb repressive complexes (PRC 1 and 2) are apparent (Stewart et al., 2019). Second, the stability of CIZ1 within Xi SMACs, even those that form during the initiation stages of X-inactivation, is unusually high. Compared with other SMAC components, the residency time of CIZ1 is estimated to be 2–10-fold longer, similar to that of Xist (Markaki et al., 2021). Thus, it appears that CIZ1 exchanges less readily than other protein components and might therefore contribute a stabilizing influence on Xist and Xi SMACs.
Some of the sequence determinants required for the assembly of CIZ1 within Xi SMACs are known, including two alternatively spliced, low-complexity prion-like domains (PLD1 and PLD2) that modulate interaction with Xist, and a second RNA interaction domain in the C-terminus (Sofi et al., 2022). Neither RNA interaction is sufficient to support the assembly of CIZ1 into Xi SMACs on its own, but together, they drive both the assembly and the de novo enrichment of H2AK119ub1 and H3K27me3, added by PRC1 and 2, respectively, in the underlying chromatin. These experiments directly link CIZ1 SMAC formation with the modification of chromatin and implicate its bivalent interaction with RNA (Sofi et al., 2022).
Disappearance of the Barr body (Xi) has been known for decades and is considered a hallmark of cancer (Moore and Barr, 1957). Erosion of the Xi in breast tumors and cell lines was originally ascribed to genetic instability, though epigenetic instability is also apparent, evident as an abnormal subnuclear organization, aberrant promoter DNA methylation, and perturbations of chromatin, including H3K27me3 (Chaligné et al., 2015). Transcriptional reactivation of X-linked genes has been implicated in both breast and ovarian cancers (Sirchia et al., 2009) though is likely to be indicative of wider, and possibly earlier, epigenetic erosion. In fact, widespread erosion of the DNA methylation landscape can give rise to the transcriptional changes common in tumors (Batra et al., 2021), and for breast cancers in particular, the progression from progenitor cell to premalignant lesion has been shown to involve changes in the DNA methylome that precede genetic instability (Locke and Clark, 2012; Locke et al., 2015). From data such as these, a model is emerging in which induction of breast cancer could occur primarily through epigenetic disruption.
Here, we describe the aberrant expression of CIZ1 in human cancers and model the effects of destabilizing protein fragments on RNA–protein assemblies and underlying chromatin. The data lead to the conclusion that disease-associated dominant-negative CIZ1 fragments (DNFs) contribute to epigenetic instability by deprotecting loci that are normally buffered by surrounding SMACs and that this plays an early role in tumor etiology by promoting epigenetic instability.
Results
CIZ1 assemblies are disrupted in breast cancer cells
In primary epithelial cells derived from normal human female mammary tissue (HMECs), a single large CIZ1 assembly is visible in ∼80% of cells in a cycling population (Fig. 1, A and B). This coincides with local enrichment of H2AK119ub1 identifying the assembly as at the Xi, as reported for humans (Dixon-McDougall and Brown, 2022; Ridings-Figueroa et al., 2017; Valledor et al., 2023) and murine cells (Markaki et al., 2021; Ridings-Figueroa et al., 2017; Sunwoo et al., 2017). Human and murine CIZ1 possess the same conserved domains encoded by the same exons in the same order (Fig. S1 A), and so far no differences in the behavior or function have been uncovered. CIZ1–Xi assemblies are dependent on multivalent interaction with RNAs including Xist (Fig. 1 C [Sofi et al., 2022]) and normally observed with similar frequency regardless of whether epitopes in its N-terminal DNA replication domain (RD) (Coverley et al., 2005) or C-terminal nuclear matrix anchor domain (AD) (Ainscough et al., 2007) are detected (Fig. 1 B).
However, in breast cancer–derived cell lines (Fig. S1 B), the same anti-CIZ1 RD and anti-CIZ1 AD antibodies reveal considerable heterogeneity. CIZ1 Xi assemblies are either absent, less compact, and coherent, or RD and AD epitopes are differentially susceptible to extraction from the nucleus (Fig. 1, B and D). This indicates that CIZ1 RD and AD are not always part of the same polypeptide and are compromised in their ability to form stable assemblies around Xi chromatin. We conclude that CIZ1 protein and CIZ1–Xi assemblies are commonly disrupted in breast cancer cell lines. This is consistent with the reported wider destabilization of the inactive X chromosome in breast cancer cells and tissues and specifically the reported dispersal of Xist (Chaligné et al., 2015).
Alignment of transcriptomes from four breast cancer-derived cell lines and a control cell line to CIZ1’s translated exons (2–17) revealed over-representation of AD-encoding exons in the tumor-derived lines compared with RD-encoding exons (Fig. 1 E and Data S1). We also noted a transition in transcript coverage within exon 10, which coincides with an internal transcription start sites (TSS) annotated in Ensembl (Cunningham et al., 2022) from the FANTOM5 project (Lizio et al., 2015), and with enrichment of indicators of active chromatin in cancer cell lines but not normal HMECs (Fig. S1 C). Thus, archive data suggest that transcription can begin from an internal site in the CIZ1 gene.
Elevation of CIZ1 AD-encoding transcript in early-stage primary breast tumors
To measure CIZ1 transcript expression in primary common solid tumors, we first used quantitative RT-PCR to compare the 5′ end to the 3′ end (which contribute coding sequence to RD and AD respectively) by detection of amplicons unaffected by alternative splicing (Rahman et al., 2010) (Fig. 2 A). In cDNAs from 46 tissue samples, the correlation between two RD amplicons (in exons 5 and 7) or between two AD amplicons (in exons 14 and 16) was strong; however, RD and AD did not correlate with each other. This confirms that expression of RD and AD are commonly uncoupled at the transcript level and shows that the differential can be sampled by comparing sequences in the region of exons 5–7 to sequences in the region of exons 14–16.
Domain disparity was striking and consistent in breast tumors across all stages (Fig. 2 B). It was also significant in bladder cancer at stage III and melanoma at stages III and IV (Fig. 2 C) and observed sporadically in other tumors of different etiology (Fig. S1 E). In addition, in some colon, lung, and thyroid tumors both RD and AD domains of CIZ1 were elevated compared with histologically normal tissue (Fig. 2 C, Fig. S1 E, and Data S2).
Focusing on breast cancer, we analyzed CIZ1 expression in 1,095 transcriptomes submitted to The Cancer Genome Atlas (TCGA). While transcripts that map to the whole CIZ1 gene revealed no overall difference in expression between tumors and normal tissue (Fig. S1 D), the same raw data when mapped to individual CIZ1 exons showed that AD (exon 14) is significantly over-represented compared with RD (exon 5) at all stages (Fig. 2 D and Data S1) and that AD elevation was notable from around exon 10. Similar elevation of C-terminal transcript was not evident in cancer-associated genes ESR1 and TP53 in a subset of the same transcriptomes (Fig. 2 E). Together, these data show that C-terminal CIZ1 exons are over-represented in the majority of breast cancers, that epitopes encoded by C-terminal exons are uncoupled from N-terminal exons and pose the question of whether inappropriate AD protein is functionally relevant.
In vitro modeling of the effect of AD on CIZ1–Xi assemblies
We previously showed that ectopic full-length CIZ1 accumulates within CIZ1–Xi assemblies in WT cells and can in fact build new assemblies de novo in CIZ1 null cells, provided both RD and AD are present (Sofi et al., 2022). The multivalent nature of CIZ1’s interaction with RNA and the requirement for both domains for assembly into SMACs (Sofi et al., 2022), lead us to hypothesize that fragments of CIZ1 encoding only one of its RNA interaction interfaces might have a destabilizing effect. Moreover, based on what we know of CIZ1 genetic deletion and the co-dependency of CIZ1 and Xist (Markaki et al., 2021; Ridings-Figueroa et al., 2017; Rodermund et al., 2021; Sunwoo et al., 2017), we hypothesized that interference with CIZ1 assemblies at Xi would affect Xi chromatin. We modeled this in short-term (one-cell cycle) transfection experiments after ectopic expression of GFP-tagged C-terminal protein fragments (Fig. 3 A) in murine cells.
Endogenous CIZ1–Xi assemblies were categorized into three phenotypes: cells with a discrete normal assembly, cells with no assembly, or cells with intermediate, dispersed, or diminished assemblies (Fig. 3 B). The C-terminal 275 amino-acids of murine CIZ1 (Coverley et al., 2005), here referred to as C275, caused loss or reduction in normal (type 1) assemblies but, as reported previously (Sofi et al., 2022), did not itself accumulate at Xi. In untransfected cells in the same populations or parallel populations expressing empty GFP vector, CIZ1–Xi assemblies were unaffected, all evidenced via detection of CIZ1 RD epitope (not present in C275). Notably, the deletion of two zinc fingers in C275 to produce the smaller C181 fragment did not abolish the disruptive effect on assembly frequency (Fig. 3 B), confirmed by measuring maximum fluorescence intensity per nucleus as a surrogate for CIZ1 assembly density (Fig. 3 C, left). A concomitant effect on Xist was confirmed by RNA FISH for both C275 and C181 (Fig. S2 A) by quantifying either the area occupied by Xist assemblies (Fig. 3 D) or the maximum fluorescence intensity per nucleus (Fig. 3 E). Notably, for both CIZ1 (Fig. 3 C, right) and Xist (Fig. 3 E), the mean intensity per nucleus remains unaffected, suggesting that while their ability to accumulate in Xi-associated assemblies is impaired, their overall levels in the nucleus remain the same. Thus, the data show that C-terminal fragments of CIZ1 do have the capacity to interfere with endogenous CIZ1–Xi assemblies, driving dispersal of both endogenous CIZ1 and Xist lncRNA, and are referred to hereafter as CIZ1 DNFs (dominant-negative fragments).
CIZ1 assembly dispersal is cell cycle-dependent
Not all cells expressing CIZ1 DNFs are depleted of endogenous CIZ1–Xi assemblies. At 24 h, typically 30–40% remain refractory (Fig. 3 B), and in those that respond, the extent of dispersal is variable. We tested whether the cell cycle stage contributes to the heterogenous response initially by testing contact-inhibited (arrested) cells (Fig. 3 F). Under these conditions, CIZ1–Xi assemblies were refractory to the dominant negative effects of C181 (Fig. 3 B), suggesting that passage through the cell cycle is required to expose assemblies to a window in which DNFs can exert their effect.
Normally around 80% of female cells (cycling, mouse or human, primary or established non-cancer lines) contain a discrete compact CIZ1–Xi assembly. Since we know that, like Xist (Hall et al., 2009), CIZ1–Xi assemblies are lost in mitosis (Ridings-Figueroa et al., 2017) we postulated that those cells in which they are not evident have yet to rebuild them and are in early G1 phase. We confirmed this in cells synchronized in mitosis using nocodazole and found that maximal CIZ1–Xi assembly frequency was reached by 4 h after mitotic exit (Fig. S2 B). Expression of C181 significantly delayed SMAC reformation during this window and those that did form had reduced CIZ1 maximum fluorescence intensity (Fig. 3 G). Thus, the dispersive effect of CIZ1 DNFs is potent during the SMAC assembly window in the early G1 phase (Fig. 3 H).
Role of the MH3 homology domain
To refine the sequence requirements for SMAC dispersal by DNFs, we evaluated a set of six deletion constructs based on C181 (Fig. S2 C). All fragments were expressed and became incorporated into detergent-resistant nuclear structures (Fig. S2 D) and retained similar capability to interfere with endogenous CIZ1–Xi SMACs, with the exception of one. C181 lacking the Matrin 3 homology domain (ΔMH3) had a small but consistent reduction in potency based on SMAC frequency (Fig. S2 E), confirmed by measuring maximum fluorescence intensity (Fig. S2 F). This implicates the MH3 CIZ1:CIZ1 dimerization interface (Turvey et al., 2023, Preprint) in the integrity of endogenous CIZ1 SMACs.
Consequences of dispersal of CIZ1–Xi assemblies on Xi chromatin
We postulated that the dispersal of CIZ1–Xi assemblies by DNFs might mimic the effect on Xi chromatin seen in genetically CIZ1 null primary embryonic fibroblasts (PEFs). In these cells, H3K27me3 and H2AK119ub1 are both depleted, and control over PRC target genes, both X-linked genes and elsewhere in the genome, is relaxed (Stewart et al., 2019). In single-cell cycle experiments, in two cell types, C181 caused a marked reduction in H2AK119ub1-enriched Xi’s but did not affect H3K27me3 (Fig. 4, A–C), while in longer-term experiments using lentiviral transduction of C181 (Fig. 4 D) both H3K27me3 and H2AK119ub1 were depleted, whether quantified by enriched Xi frequency or by fluorescence intensity (Fig. 4 E). Survival of H3K27me3 under conditions where H2AK119ub1 is depleted is consistent with replication-linked dilution of H3K27me3 (Coleman and Struhl, 2017; Jadhav et al., 2020; Stewart et al., 2019). Together these data show that DNFs impact histone PTMs.
Enrichment of H3K27me3 and H2AK119ub1 is sometimes taken as evidence that PRCs were specifically recruited by lncRNAs to the same sites, based in part on extensive but controversial evidence of interaction between PRC subunits and Xist (Cech et al., 2024; Guo et al., 2024a, 2024b; Lee and Lee, 2024). However, enrichment of histone PTMs could also arise by a local shift in the balance between addition and removal. In our experiments, disruption of CIZ1–Xi assemblies by DNFs could deplete H2AK119ub1 in Xi chromatin by reducing recruitment of PRC1, or conversely by deprotecting chromatin and allowing access to de-ubiquitinating enzymes. BAP1 is the catalytic subunit of the deubiquitinating enzyme (DUB) that removes H2AK119ub1, acting to restrict its deposition to specific locations (Conway et al., 2021). To begin to distinguish between recruitment and protective functions (Fig. 5 A), we used PR-619, a broad-spectrum reversible inhibitor of DUBs, including the PR-DUB BAP1 (Altun et al., 2011). In one-cell cycle experiments, the immediate (within 24 h) loss of H2AK119ub1 was significantly blocked by PR-619 (Fig. 4 B), and in longer (3 days) transduction experiments, the same trend was observed (Fig. 5 B). Moreover, even in genetically CIZ1 null primary cells, in which H2AK119ub1 is absent from Xi chromatin (Stewart et al., 2019), its enrichment (but not that of H3K27me3) was restored within 24 h of exposure to PR-619 (Fig. 5 C). Thus, loss of CIZ1–Xi assemblies, whether by genetic deletion or dispersal by DNFs, suppresses the accumulation of H2AK119ub1 in Xi chromatin in a manner dependent on DUB activity. We suggest therefore that CIZ1 assemblies perform a shield function that can protect chromatin from enzymatic attack.
Effect on gene expression
To confirm that CIZ1 DNFs have the potential to affect gene expression, we analyzed transcriptomes of three independent populations of PEFs, transduced with C181 or empty lentiviral vector for 3 days (Fig. 4 D). To defocus analysis from Xi, we used primary cells isolated from two female and one male murine embryo. This returned expression changes across all chromosomes (Fig. 5 D; and Fig. S3, A and B), including 471 downregulated genes (DN) and 558 upregulated genes (UP, FDR q < 0.05, log2FC > 1, Data S3). The 19 UP and 13 DN regulated X-linked genes do not argue for a disproportionate effect on the X chromosome. Gene set enrichment analysis with those that are named coding genes returned highly significant molecular signatures derived by chemical or genetic perturbation in murine cells (GSEA MSig. mCGP), including sets linked with the developmental regulator TGFβ and mammary stem cell phenotype (Fig. S3 C). Looking separately at UP and DN genes, sets related to developing breast tissue and mammary stem cell phenotype are returned primarily by UP genes (Fig. 5 E). Focusing on mammary stem cell phenotype set M2573 (Lim et al., 2010), 25% of all genes in the set are significantly changed by the expression of C181 (FDR q < 0.05, Fig. 5 F), and 75% of those are UP (Data S3). This shows that, similar to germ-line deletion of CIZ1 (Ridings-Figueroa et al., 2017), interference with CIZ1 assemblies in an acute setting can significantly alter gene expression across the genome (Fig. S3 D), including genes linked with cellular plasticity and cancer. Moreover, as in our previous experiments where CIZ1 is reintroduced against a CIZ1 null background (Ridings-Figueroa et al., 2017), the effect is rapid (within days) and coincident with changes to the epigenetic landscape. Together, the data argue for a potent and rapid effect of CIZ1 DNFs that can change established patterns of gene expression.
Gene expression in human breast cancers
To test whether the disruption of gene expression observed in DNF modeling experiments might be at play in primary human breast cancers, we segmented TCGA breast cancer transcriptomes into four groups A–D (Fig. 6 A and Data S4) based on the extent of elevation of AD over RD. Gene expression in group A tumors in which exon 14:5 ratio is >2, compared with control group C (where RD and AD are within 10% of even), revealed a massive difference in their transcriptomes (1,608 differentially expressed genes [DEGs] FDR q < 0.05, log2FC > 1, Fig. 5 B and Data S5).
No significant differences in the proportion of tumors in groups A–D were evident across breast cancer subtypes or ER/PR/HER2 receptor status subsets (Thennavan et al., 2021) (Fig. S4, A–C). Similarly, across tumor stages I, II, and III (all subtypes), group A–D profile is close to the cohort profile (Fig. 6 C and Fig. S4 D), but shifts at stage IV where a greater proportion are group D (AD:RD ratio favors RD). This mirrors a trend observed in stage IV lung, thyroid, and kidney tumors by PCR (Fig. S1 E) in which RD is more likely to exceed AD. In both contexts, however, sample size is too low to draw strong conclusions.
The DEGs between groups A and C behave remarkably similarly across stages I, II, and III (Fig. 6 D), but at stage IV a minority (predominantly enzymes) switch from DN to UP (Fig. 6 D, segment). Overall, the main conclusion to be drawn from this analysis relates to early-stage disease. Not only is C-terminal elevation evident very early in the course of the disease (Fig. 2), its effects are also felt early (stage 1), and those effects persist through to later stages.
Affected chromosomal domains
Of the 1,608 genes that are differentially expressed when CIZ1 AD is overrepresented, 15% are UP and 85% are DN. When analyzed by location, the DN genes are distributed more uniformly than the UP genes, which are clustered (e.g., chromosomes 1 and 9 in Fig. 6 E), are entirely absent from chromosome 18, and are over-represented on gene-dense chromosome 19 (Grimwood et al., 2004) (Fig. 7, A and B). For six gene clusters of 10 Mbp in length (Fig. 7 C, circled in 7 A, Fig. 6 E, and Fig. S5), UP regulated protein-coding genes are 4–14x denser than the chromosomal average but also enriched 2–6x greater than expected for local gene density. In contrast, the frequency of DN-regulated genes reflects local gene density (Fig. S5 B). This spatially concentrated UP-regulation is consistent with a CIZ1-related mechanism that normally represses gene expression across large chromosomal domains.
For the six UP gene clusters, we asked whether syntenic regions were similarly affected in our mouse model. In fact, all were among those regions encoding UP genes in mouse-cultured fibroblasts expressing ectopic AD (Fig. S3 D). Thus, despite differences in species and cell type, and duration and quantity of AD expression, similarities were observed, arguing for a degree of mechanistic conservation.
Notably, among UP genes, 38% encode lncRNAs compared with only 4% of DN genes (Fig. 6 B, listed in Data S5, tab 9). These are concentrated within clusters of UP-regulated protein-coding genes at a density greatly in excess of expected (Fig. 7 C and Fig. S5 B), pointing to a relationship between CIZ1 and lncRNA expression. Xist is not among the significantly affected lncRNAs (log2FC 0.16, FDR q = 0.0506, Data S5, tab 8).
Interestingly, differentially expressed lncRNAs that are concentrated in cluster regions are both UP- and DN-regulated (Data S5), suggesting functional specialization. By analogy with the CIZ1–Xist complexes that form at Xi, we suggest that CIZ1 normally sequesters lncRNA molecules into RNA–protein assemblies (protecting some), which then modulate access to the locus as a whole (repressing others). Excess CIZ1 AD expression would be expected to dissolve the assembly and so release the locus.
Exposure of underlying chromatin by DNF-mediated assembly dissolution alters access by deubiquitylases and might therefore be expected to increase susceptibility to transposases. ATACseq has been performed for a subset of TCGA tumors in group A (n = 8) and group C (n = 15) to reveal chromatin accessibility across the genome. Using stringent criteria (log2FC > 1 or less than −1, FDR q < 0.05), over 400 sites are significantly more exposed in group A than in group C (Fig. 7 D), showing that elevated AD is associated with chromatin accessibility. Exposed sites are located within cluster regions but are also evident in locations that do not host DEGs (Fig. 6 E, Fig. S5 A, and Data S6).
Affected genes
The many genes (4% of the transcriptome) whose expression is reduced in tumors with elevated AD are distributed across all chromosomes, are not enriched in lncRNAs, and the mean fold change is overall less than UP genes (−1.19 compared to +1.58). Together, this suggests that a different mechanism is at play to that which affects UP genes and, at present, it is difficult to form a strong hypothesis about the process. Alone, they return highly significant enrichment scores for gene sets linked with the cellular response to DNA damaging agents (Fig. S4 E), and when combined with the UP gene set, their over fivefold higher abundance dominates the results.
In contrast, the UP set of 240 spatially regulated genes was highly enriched in breast cancer-related curated gene sets (6 of the top 20 significant overlaps FDR q < 0.05, Fig. S4 F), despite cell-type signatures identifying primarily lung tissue of fetal origin (Fig. S4 G). The UP genes also returned five sets associated with other types of cancer, one describing genes under the regulation of EZH2 (catalytic subunit of PRC2 responsible for H3K27me3) and one describing mammary stem cells. Together these studies support the conclusion that excess expression of CIZ1 AD promotes the expression of genes linked with breast cancer.
Notably, expression of the CIZ1 gene itself is not returned as UP- or DN-regulated, despite the very different domain expression on which groups C and A were defined. This highlights an important deficiency in the way gene expression analysis is typically carried out, with an amalgamation of all transcripts for a given gene into one indicator. For CIZ1, the common alteration observed here in breast cancers is not evident from overall expression level data, so it has not yet been captured by large-scale transcriptome studies. Furthermore, there are no recurrent polymorphisms in CIZ1 in adult cancers across 46,014 unique samples in COSMIC (Tate et al., 2019), so despite apparently profound effects on breast cancer gene expression, CIZ1 is not yet recognized as a “cancer” gene.
Discussion
The purpose of heterochromatin formation during development is to protect and reinforce cell fate decisions by restricting access to genes. Thus, potential stabilizers of the chromatin state, whose mis-expression may lead to heterochromatin instability, are important to understand in relation to the degeneration of cellular identity, human disease, and aging. Transcript variants of CIZ1 have been reported in a range of adult and pediatric cancers, as well as in neurological disorders including dystonias (Xiao et al., 2014a) and Alzheimer’s disease (Dahmcke et al., 2008), all of which could be affected via the same primary mechanism of weakened heterochromatin.
Our data suggest that RNA-dependent CIZ1 assemblies, exemplified by the Xist- and PLD-dependent CIZ1 SMACs that surround the inactive X chromosome in differentiated cells, normally act as a molecular shield that helps protect heterochromatin from the action of PR-DUBs, and possibly other enzymatic modifiers. “Molecular shield” is one of eight functional classes proposed for phase-separating proteins outlined by PhaSePro (Mészáros et al., 2020). Defined as membraneless organelles that inactivate reactions by sequestering some of the required components while keeping others outside, a CIZ1 shield would sequester chromatin while excluding PR-DUBs. Questions remain about the structure and influence of such a shield and whether some molecules penetrate more freely than others.
Shield loss
We have exploited the easily visualized Xi-associated CIZ1 assemblies as an indicator of dysfunction in breast cancer cells and as a read out on the solubilizing action of CIZ1 DNFs in a murine model system. Experimentally, the exclusion of the N-terminal RD domain which encodes the two PLDs that confer the ability to coalesce inside the nucleus (Sofi et al., 2022) converts CIZ1 from a SMAC participant into a molecule with the ability to disperse SMACs—a SMAC buster. A shift toward SMAC buster expression is suggested to interfere with normal CIZ1 function in heterochromatin protection and so contribute to epigenetic deprogramming. Importantly, both SMAC buster sequence elevation in breast cancer cells and experimental DNF transgenes alter the transcriptome and, like deletion of CIZ1 (Ridings-Figueroa et al., 2017), effects are felt across the nucleus, with X-linked genes and other chromosomes similarly affected. Thus, while Xi-associated CIZ1 SMACs offer an important model for visual studies, smaller assemblies associated with other chromosomes are likely also disrupted.
The lack of difference in Xist expression between breast cancers with and without elevated AD and lack of enrichment of DEGs on the X chromosome has a number of possible explanations: (1) lack of homogeneity in response between active and inactive X chromosomes leading to failure to meet the significance thresholds, (2) cancer-associated changes that are independent of CIZ1 expression, or (3) lack of Xi sensitivity to loss of CIZ1 assemblies possibly buffered by other repressive mechanisms. Notwithstanding this apparent lack of effect on Xi gene expression, we cannot confidently rule out the possibility that changes in autosomal gene expression are not indirect, exerted via subthreshold disruption of X-linked genes (Topa et al., 2024).
Susceptible loci
Discrete chromosomal domains are susceptible to SMAC busters at the transcript level, suggesting that the protective effect of CIZ1 assemblies is spatially restricted but broad, extending over domains in excess of 10 Mb. Within affected domains, protein-coding genes are over-represented but also disproportionately UP-regulated, implying both domain-wide derepression and concentration of genes within CIZ1-protected clusters.
The behavior of lncRNAs within the same domains is not consistent. While lncRNA genes are also enriched and also more likely to be affected than the chromosomal average, this can be UP or DN. Their heterogenous relationship with AD elevation could reflect more than one mechanism. While UP genes may be subject to the same locus derepression as protein-coding genes, the role of lncRNAs in the formation of spatial compartments in the nucleus (Quinodoz et al., 2021) suggests that others might experience transcript preservation upon incorporation into stable locus-specific RNA-protein SMACs.
Taken together, these data argue that the chromatin deprotection observed in DNF modeling experiments is at play in breast cancers and influences gene expression within specific chromosomal domains, possibly by locally altering the balance between ubiquitination of H2AK119 by PRC1 and its removal by PR-DUBs. Crucially, domain deprotection is evident in early-stage cancers but also persists in later stages, raising the possibility that it is a predisposing influence involved in cancer etiology. At present the question of what drives DNF expression in early-stage breast cancers is unanswered. Lack of mutations in CIZ1 raises the possibility that DNF expression is itself controlled primarily epigenetically and that a normal biological context is yet to be found. If DNFs normally confer fluidity on SMACs, for example, as cells pass through natural transition states, delays imposed by extrinsic conditions might prolong residency and exposure to the destabilizing effect of DNFs.
Epigenetic origins of cancer
There remain fundamental questions about the relationship between genetic and epigenetic models of cancer and the question of which comes first is likely to have a range of context-specific answers. Mutations in chromatin proteins and their modifiers occur in approximately half of all tumors (You and Jones, 2012), implying that epigenetic instability is a consequence of mutation, yet for some types of tumor no genetic driver mutations are detected (Mack et al., 2014). In fact, it has been shown convincingly that transient depletion of polycomb proteins during Drosophila larval development is sufficient to initiate cancer phenotypes without genetic change (Parreno et al., 2024). Our proposal is that expression of CIZ1 DNFs drives disruption of chromatin state in the early stages of tumor development, possibly before acquisition of driver mutations, and certainly before widespread genetic instability. While the TCGA breast cancer analysis suggests this, direct modeling of the impact of DNFs by the introduction to normal cells shows unequivocally their ability to drive widespread changes in gene expression.
Materials and methods
Materials availability and contacts
Further information and requests for resources and reagents should be directed to the lead contacts, G.L. Turvey and D. Coverley.
Human primary cells
Primary human mammary epithelial cells (HMECs) were cultured at 37°C with 5% CO2 in MEBM basal medium (Lonza) supplemented with MEGM SingleQuots (Lonza) on culture dishes coated in collagen (Thistle Scientific) and sampled at passages 1–2. HMECs were acquired with informed consent from three donors by the Breast Cancer Now Tissue bank under NHS ethical approval, and accessed under local approval from the University of York Department of Biology Research Ethics Committee.
Human cell lines
All cell lines used are of female origin and were authenticated for this study by Eurofins Genomics human cell line authentication service (Eurofins Medigenomix Forensik GmbH), which returned the expected identities with 92–100% confidence in all cases. MCF-10A is a non-tumorigenic epithelial cell line established from the human mammary gland with fibrocystic disease. MCF7 is a poorly aggressive and non-invasive triple receptor–positive human breast cancer cell line established from epithelial cells isolated from a metastatic mammary adenocarcinoma. BT-474 is a human breast cancer cell line established from a malignant ductal carcinoma of the breast that overexpresses human epidermal growth factors receptors 2 (HER-2) and estrogen receptors (ER). SK-BR-3 was established from a malignant adenocarcinoma of the breast that overexpresses HER-2. MDA-MB-231 is a human breast epithelial cancer cell line established from a metastatic poorly differentiated triple-negative mammary adenocarcinoma. Information on receptor status is derived from cell bank annotations and was not independently verified in this study. Cells were cultured in the following media: MCF-10A, MEGM, 5% horse serum, 10 µg/ml hydrocortisone, 20 ng/ml EGF, 500 ng/ml insulin, 100 ng/ml cholera toxin, 1% PSG; MCF7, EMEM, 10% fetal bovine serum (FBS), and 1% penicillin–streptomycin–glutamine (PSG) (Gibco); BT-474 and SK-BR-3, DMEM, 10% FBS, and 1% PSG; MDA-MB-231, DMEM, 5% FBS, and 1% PSG.
Mouse primary cells
All mouse PEF strains (WT 13.24, 13.31, 13.32, 13.33, 13.27, 45.1fc, and CIZ1 null 13.17, 41.2fa) were derived from day 13 embryos from C57BL/6 mice as previously described (Ridings-Figueroa et al., 2017; Stewart et al., 2019). CIZ1 null mice were generated from C57BL/6 ES clone IST13830B6 (TIGM) harboring a neomycin resistance gene trap inserted downstream of exon 1. The absence of Ciz1/CIZ1 in homozygous progeny was confirmed by qPCR, immunofluorescence, and immunoblot. Breeding of mice and all work with animal models was carried out under a UK Home Office license and with the approval of the Animal Welfare and Ethical Review Body at the University of York. PEFs were cultured in 4.5 g/l glucose DMEM containing 10% FBS and 1% PSG up to a maximum of passage 3. After passage 4, these cells are referred to as MEFs and were not used here.
Mouse cell line
The female D3T3 cell line was cultured as described (Stewart et al., 2019) in DMEM, 10% FBS, 1% PSG (Gibco).
Site-directed mutagenesis
Mutagenic primers that contain additions, substitutions, or deletions of murine CIZ1 by PCR mutagenesis created for this study are listed in Table 1. All plasmids were sequence verified to confirm mutations (Eurofins TubeSeq Service).
Transient transfection
For analysis in cycling cells, cells were seeded on 13-mm glass coverslips at ∼30% confluency 1 day prior to transfection to produce populations at ∼60% confluency at the time of transfection. Coverslips were transferred to individual wells in a 24-well plate in 500 μl media prior to transfection. For each coverslip 50 μl Opti-MEM Medium (Gibco) was mixed with 1.5 μl X2 Transfection Reagent (Mirus) and 200 ng plasmid DNA (pEGFP-C2 with or without inserts derived from CIZ1), incubated for 30 min, and then applied to cells dropwise. Coverslips were fixed and processed for immunofluorescence typically 24 h later. For contact-inhibited cells, cells were plated across a range of densities by serial dilution 2 days prior to transfection. Coverslips at >90% confluency were selected for transfection and processed as above.
Cell synchrony
D3T3 cells were arrested in mitosis or S phase using 50 ng/ml nocodazole (Sigma-Aldrich) for 16–24 h or 2.5 mM thymidine (Sigma-Aldrich) for 24 h, respectively. Cells arrested in the M phase were isolated by mitotic shake-off and replated onto glass coverslips for analysis after release. Cells held in the S phase grown on glass coverslips were released by washing twice with PBS and then replacing with fresh media. In transduced cell populations, cells were arrested ∼48 h after transduction for 16–24 h, and then released and analyzed. To facilitate the retention of mitotic cells, coverslips were fixed prior to permeabilization.
Flow cytometry
Cells were isolated from 9-cm culture plates by trypsinization and resuspended in 100 μl cold PBS to obtain a single cell suspension and then stored at −20°C after the addition of 1.5 ml cold 70% ethanol. For analysis, cells were pelleted and resuspended in PBS (500,000 cells/ml), and 55 μl 10× FACS mix (1 mg/ml propidium iodide, 4% vol/vol Triton X-100, 10xPBS) was added per 500 μl of cell suspension. DNA content was measured using CytoFLEX (Beckman Coulter) at excitation 561 nm/emission 610/20 for detection of the DNA-binding dye propidium iodide (Dean and Jett, 1974). A minimum of 5,000 single cells per sample were recorded for analysis using cell cycle algorithm software FCS Express V7 (Dotmatics).
Inhibitors
To measure the impact of inhibition of PR-DUBs, 5 µM PR-619 (Bio-Techne) was applied to PEFs 16 h after transduction for 32 h, to collect cells 48 h after transduction. In transient transfection experiments, PR-619 was used at 5 µM for 24 h throughout the transfection window.
Lentivirus transduction
Bicistronic ZsGreen/C181-bearing virus and ZsGreen alone-bearing virus were produced in the Lenti-X 293T subclone of human embryonic kidney (HEK) cells. 8 × 105 HEK cells were seeded per well in a 6-well plate prior to transfection with plasmids. For transfection of each well, 1 µg transfer vector, 0.75 µg packaging plasmid, and 0.25 µg envelope plasmid diluted in 100 μl optiMEM (Gibco) was mixed with 20 μl of PolyFect transfection reagent (Qiagen) and incubated for 5–10 min at room temperature to allow complex formation. 0.6 ml of cell growth medium was added and gently mixed then transferred to one well. Cells were incubated for 16 h and then media was replaced with fresh growth medium (supplemented with the addition of HEPES to a final concentration of 20 mM). At 48 h after transfection, the supernatant-containing virus was harvested and filtered through a low-protein binding filter (0.45 µm; Sarstedt) to remove HEK debris. Viral supernatant was supplemented with 4 μg/ml polybrene (Sigma-Aldrich) and transferred to recipient PEFs or D3T3 cells. Transduction was monitored by the emergence of cytoplasmic ZsGreen and showed that close to 100% of the cells were transduced after ∼48 h.
Immunofluorescence
Cells grown on coverslips were washed in cytoskeletal buffer (10 mM PIPES/KOH pH 6.8, 100 mM NaCl, 300 mM sucrose, 1 mM EGTA, and 1 mM MgCl2) with 0.1% vol/vol Triton X-100 and fixed in 4% wt/vol paraformaldehyde (PFA). Where indicated, Triton X-100 was left out of CSK (unextracted cells) or an additional 400 mM NaCl was added (high-salt extraction). After fixation, all coverslips were blocked in antibody buffer (AB) (1xPBS, 10 mg/ml BSA, 0.02% wt/vol SDS, and 0.1% vol/vol Triton X-100) for 30 min, incubated with primary antibodies for 1 h at 37°C, washed three times with AB, incubated with secondary antibodies for 1 h at 37°C, washed three times with AB, and mounted on glass slides with Vectashield containing DAPI (Vector Labs). Primary antibodies are detailed in Table 1. Anti-human CIZ1 monoclonal antibody 87 was generated by Fujirebio Diagnostic Antibodies (FDAB). Anti-species antibodies (Thermo Fisher Scientific) labeled with Alexa Fluor 568 (red) or 488 (green) were used for detection in all cases. Fluorescence images were captured using a Zeiss Axiovert 200M fitted with a 63×/1.40 Plan-Apochromat objective and Zeiss filter sets 2, 10, and 15 (G365 FT395 LP420, BP450-490 FT510 BP515-565, and BP546/12 FT580 LP590) using Axiocam 506 mono and Axiovision image acquisition software (SE64 release 4.9.1) through Zeiss Immersol 518F. For each antibody, constant image capture parameters were used to generate image sets within an experiment, on which quantitative analysis was performed, in all cases from unmodified raw images.
Phenotype scoring in dispersal assays
Cells were classified by eye across replicate experiments and across two to three replicate coverslips per condition within an experiment. Avoidance of bias was achieved by verification by independent workers in all cases and blinded analysis in some cases. In the three-tier scoring system, cells were categorized as either having a compact CIZ1 Xi assembly (type 1) or not (type 3) or were assigned to an intermediate category (type 2) in which CIZ1 assemblies were reduced or diffuse, or made up of locally dispersed particles. Examples are shown. Empty vectors (EV) are used as a negative control and WT-C181 as a positive control in experiments to test the effect of mutants. In transient transfection experiments, untransfected (not green) cells within test populations serve as internal controls on each coverslip.
Image analysis in dispersal assays
For measurement of the effect of CIZ1 fragments on endogenous CIZ1 or histone PTMs, sets of images including test and control samples were processed in parallel and imaged with identical parameters in one sitting. All intensity measurements were conducted on unedited, unenhanced raw image sets. FIJI identified regions of interest (ROI) within DAPI-stained fields of nuclei using autothresholding with Otsu setting to create a binary mask that defined nuclear perimeters. ROI’s were applied to antibody-detected fluorescence image layers to generate nuclear intensity means, minimum and maxima per ROI, and area of each nucleus. In female nuclei, to obtain a surrogate estimate of Xi-assembly intensity, overall nuclear maxima were used. Where two or more data sets were combined (for example two C181/vector control pairs) from experiments performed on different days or with different PEF populations, data was amalgamated after normalization of values to the average of the control set in each case. For reproduction, images were digitally enhanced to remove background fluorescence or increase brightness using FIJI. Identical manipulations were applied within an experiment, so that for example, the intensity of staining with and without transfection, or before and after extraction, is accurately represented.
RNA FISH
Female D3T3 cells were transfected with C181 or C275 for 24 h, then processed for detection of Xist transcript by RNA-FISH under RNase-free conditions as described previously (Sofi et al., 2022). Briefly, cells grown on coverslips were fixed with 4% PFA on ice for 10 min, rinsed 3X in PBS, and then incubated for 10 min in PBS supplemented with Triton X-100 (0.5%), BSA (0.5%), and vanadyl ribonucleoside complex (VRC, 2 mM). A 11-kb Spe1-Sal1 Xist fragment isolated from a full-length mouse Xist clone pCMV-Xist-PA (26760; Addgene) was fluorescently tagged using BioPrime labeling kit (18094-011; Invitrogen), replacing the biotin with Chromatide 594-5 dUTP (C11400; Invitrogen). Following overnight labeling the reaction was supplemented with Cot1 and salmon sperm DNA to compete for repetitive elements. The mix was repeat-precipitated twice and then resuspended in 80 µl hybridization buffer comprising 50% formamide in 2X SSC with BSA (2 mg/ml), dextran sulfate (10%), and VRC (10 mM). Prior to use, the probe (10 µl/coverslip) was denatured at 74°C for 10 min and then annealed at 37°C for 20 min. Coverslips were dehydrated through an ethanol series and air-dried. The probe was spotted onto a clean RNase-free slide, overlaid with the coverslip, and then sealed with rubber cement and incubated overnight at 37°C in the dark. The coverslips were then carefully removed in 4X SSC, washed three times in 2X SSC with 50% formamide at 39°C, three times in 2X SSC at 39°C, once in 1X SSC at room temperature, and then once in 4X SSC at room temperature. All washes were for 5 min. Coverslips were briefly dipped in water and mounted in vectorshield with dapi.
Quantitative RT-PCR
To quantify the relative expression of CIZ1 amplicons in a wide range of primary tumors, TissueScan Tumour cDNA arrays from OriGene Technologies, Inc. containing 2–3 ng of cDNA were analyzed by qPCR. The RNA was collected under IRB-approved protocols, and array details with tumor classification are given in Data S2. Tumor classifications and abstracted pathology reports are given at: http://www.origene.com/qPCR/Tissue-qPCR-Arrays.aspx. cDNA was normalized using β-actin by the supplier, and we used our own amplification of β-actin where indicated. In most cases, results for CIZ1 amplicon expression were expressed relative to one another, rather than to another gene, and normalized to control samples so that change in ratio across cancer samples is apparent. Reactions were carried out in 25 μl volumes with 12.5 μl Taqman master mix (Applied Biosystems), 1 μl of each 10 µM primer, and 1 μl 10 µM probe. Primers (Sigma-Aldrich) and probes (MWG) are specified in Table 1. Primers 9 and 10 were combined with a probe in exon 5 to generate detection tool set DT5, primers 13 and 14 with probe 7 (DT7), primers 1 and 2 with probe 14 (DT14), and primers 6 and 7 with probe 16 (DT16). Primer efficiencies were >90% in all cases, and the relative amplification efficiencies of RD and AD tools were routinely checked using a plasmid template with coupled and equal levels of RD and AD. Data were generated using an ABI 7000, SDS v1.2. (Applied Biosystems) using 50°C (2 min), 95°C (10 min), then 50 cycles of 95°C (15 s), 60°C (1 min). Relative expression was calculated using the comparative Ct method using the formula, 2−ΔΔCt (Livak and Schmittgen, 2001), and results were expressed relative to the mean of normal cells or tissue in each array, or to the lowest stage tumor in the array, as indicated.
Mouse transcriptomics
Primary murine embryonic fibroblasts 13.31, 13.32 (WT female), and 13.33 (WT male) were transduced with virus bearing either the empty pLVX-EF1a-IRES-ZsGreen1 vector (Takara) or the same plasmid expressing the coding sequence of C181 as described in lentivirus transduction. After 72 h, RNA was extracted with TRIzol (15596-026; Ambion) following manufacturer’s instructions and RNA pellets were resuspended in nuclease-free water. Isolated RNA was treated with DNase (04716728001; Roche) before quality analysis by agarose gel, NanoDrop spectrophotometer, and Agilent 2100 Bioanalyzer. Library preparation and sequencing were undertaken by Biomarker Technologies (BMKGene), using the NEBNext UltraTM RNA Library Prep Kit for Illumina (NEB), with enrichment for mRNA using oligo(dT)-magnetic beads, followed by random fragmentation of enriched mRNA in fragmentation buffer. cDNA was synthesized using random primers followed by purification with AMPure XP beads, end repair, dA-tailing, adaptor ligation, PCR enrichment, and further AMPure XP purification to select fragments within the size range of 300–400 bp. Library quality was assessed using the Agilent 2100 Bioanalyzer system and sequenced using an Illumina platform using paired-end sequencing to generate at least 15.31 Gb clean data per sample, with a minimum 93.16% of clean data, and a quality score of Q30. Low-quality sequence reads and adaptor sequences were removed and the resulting high-quality reads were aligned to version GRCm38 of the mouse genome using HISAT2 (Kim et al., 2015). Transcriptomes were assembled and gene expression was quantified using StringTie (Pertea et al., 2015) against the Ensemble release 79 annotations. Differential gene expression analysis was performed by DESeq2 (Subramanian et al., 2005). Plots were generated from the average of transcripts per million plus one (TPM+1) values for each treatment condition to exclude very lowly expressed transcripts using Spyder (v.5.3.3) accessed by Anaconda Distribution (v.2.3.2). Scatter plots were generated using the pandas, numpy, and matplotlib modules. Volcano plots were generated using the pandas and bioinfokit modules. Principle component analysis (PCA) plots were generated using the pandas, sklearn, seaborn, and matplotlib modules. Gene set enrichment analysis was performed using the GSEAPY module, with genes preranked based on the generation of a π value (Xiao et al., 2014b) as calculated by multiplication of the log2FC of the average TPM by the −log10 (q value), and also separately for UP and DN DEGs.
Patient and cell line bioinformatics
Aligned RNA sequencing data for 1,095 primary breast cancer samples from The Cancer Genome Atlas were accessed under dbGaP project 25297. Secondary metastatic breast samples were excluded from the analysis. Data were downloaded using the Genomic Data Commons command line client v1.5.0. FASTQ files were regenerated from sample BAM files using samtools v1.10 (Li et al., 2009) to exclude secondary and supplementary alignments, and then BEDTools v2.27.1 bamToFastq (Quinlan and Hall, 2010). No additional quality control steps were performed on the extracted read files. Reads were aligned to the GRCh38 Gencode primary assembly and to individual CIZ1 transcripts (from Gencode v38 and [Veiga et al., 2022]) using HISAT2 v2.2.0 (Kim et al., 2015). Reads were also pseudoaligned to the Gencode v38 full annotation transcriptome file with kallisto v0.46.0 (Bray et al., 2016) and quantified and aggregated to gene-level transcripts per million (TPM) expression values using tximport v1.24.0 (Soneson et al., 2015). The same expression analysis pipeline was also completed on publicly available RNA sequencing data from four breast cancer cell lines: MCF7 (SRR8615758), BT-474 (SRR8616195), SK-BR-3 (SRR8615677), and MDA-MB-231 SRR8615767), and the breast epithelium transformed cell lines MCF-10A (SRR12877369).
Analysis of domain imbalance
CIZ1 encodes 16 translated exons (2–17), plus at least three alternative untranslated exons 1 s which were excluded from the present analysis. CIZ1 exon transcript read coverage in TCGA breast tumors was generated by alignments to both transcript and full genome and inspected manually for novel, well-supported splice junctions in IGV Desktop for Windows v2.8.2 (Soneson et al., 2015). Outputs were normalized to the canonical (ENST00000372938.10) exon 7 coverage and stratified by the tumor stage. To develop an index of the degree of CIZ1 AD and RD domain imbalance, the ratio of exon 14 to exon 5 was calculated for individual tumors after mapping to the CIZ1 transcript. The 10 stage II patients with the most imbalanced AD:RD were used to rule out broader imbalance in 3′ coverage within the libraries. These were TCGA-E9-A54Y, TCGA-LL-A6FR, TCGA-E9-A3X8, TCGA-AQ-A54O, TCGA-AQ-A54N, TCGA-WT-AB41, TCGA-A2-A3XV, TCGA-LL-A5YL, TCGA-GM-A2DB, and TCGA-AO-A03N, which were similar across ESR1 and TP53 exons. To compare gene expression in TCGA tumors with and without domain imbalance, a subgroup A (n = 98) in which AD (exon 14):RD (exon5) normalized TPM ratio exceeds 2, and a subgroup C (n = 201) in which the ratio is within 0.9–1.1 (10% variance) were identified (Data S4). Intermediate group B (ratio 1.1–1.99, n = 773) and group D (<0.9, n = 15) are also shown. DEGs (absolute log2 fold change >1, FDR q-value 0.05) between groups A and C reveal those whose expression correlates with elevated AD. The representation of groups A–D across breast cancer subtypes was based on histological classifications (Thennavan et al., 2021) and on tumor stage classifications associated with accessed transcriptomes (Thennavan et al., 2021) (listed in Data S4) and compared with the cohort as a whole.
ATACseq analysis
Aligned ATAC sequencing data for all available sub group A (n = 8) and subgroup C (n = 15) primary breast cancer samples from The Cancer Genome Atlas were accessed under dbGaP project 25297. BAM files were filtered to remove unmapped reads and sorted by read name using SAMtools before conversion to FASTQ format using BEDTools bamToFastq. FASTQ files were processed and analyzed using the nfcore/atacseq v2.1.2 workflow using default parameters. DESeq2 was used to identify differentially accessible regions based on read counts within consensus peaks, stringently filtered for absolute log2 fold change >1, FDR q-value 0.05.
Positional analysis
Chromosome maps were generated by segregating DEGs by UP or DN and then by chromosome, and plotting by the start position of each mapped gene in Excel. Centromere positions were extracted from mouse genome build GRCm39/mm39 and human GRCh38.p14 using the USCC genome browser. The plotted murine gene cohort includes 1,029 C181 DEGs (FDR q < 0.05, log2FC > 1 or less than −1), and human gene cohort is 1,126 DEGs related to twofold elevated AD in TCGA patients (FDR q < 0.05, log2FC > 1 or less than −1). Average UP and DN gene density was calculated based on sequenced human chromosome lengths given in NCBI Genes and Disease, and locations enriched for UP genes selected based on increased density across 10 Mb domains. For the six human clusters analyzed, UP DEG enrichment exceeded gene enrichment by 2–12-fold. Syntenic regions in the murine genome for human UP gene clusters were identified using the Ensemble Chromosome view synteny tab.
Quantification and statistical analysis
For analysis of CIZ1 SMAC frequencies, a variable number (>3) of technical replicates and independent counts (N value) were conducted per experiment as indicated, allowing the generation of ±SEM. The number of cells scored is stated individually in each experiment (n value). Wherever possible, at least two independently isolated PEF lines were used across the experimentation relating to each question. Statistical analysis was carried out using SPSS release 2021 ver.28.0 for Mac (IBM) or Microsoft Excel for Mac ver.16.73.3 Parametric or non-parametric tests were utilized where appropriate; for comparison between two data sets, a two-sample unpaired t test or a Mann–Whitney U test was utilized, and for comparison between three or more data sets a one-way ANOVA was followed by an appropriate post-hoc test. Statistical tests used in each analysis are stated in the figure legend with P values, where asterisks indicate *P < 0.05, **P < 0.01, and ***P < 0.001. For qRTPCR data, the Pearson correlation coefficient was used to compare CIZ1 RD and AD amplicon expression, and linear regression trendlines were applied using Excel. Graphs were generated using Excel.
Online supplemental material
Fig. S1 shows CIZ1 domains, transcript levels, and domain expression in common solid tumors. Fig. S2 shows cell cycle analysis and anchor domain mutagenesis. Fig. S3 shows further analysis of murine genes changed by ectopic expression of C181. Fig. S4 shows the segmentation of TCGA transcriptomes and GSEA outputs. Fig. S5 shows a focus on UP gene clusters. Data S1 shows CIZ1 exon expression (cell lines and tumor summary). Data S2 shows data and primary tumor designations for QRTPCR array analysis. Data S3 shows the DNF effect on murine transcriptomics, GSEA analysis, and heat maps. Data S4 shows TCGA BRCA transcriptomics, segmentation, and clinical metadata. Data S5 shows gene expression in BRCA group A compared with C. Data S6 shows ATACseq in group A compared with C.
Data availability
The data underlying Fig. 2, A–C, Fig. 5, D–F, Fig.6, and Fig.7 are available in the published article or its online supplemental material. The source data underlying Fig. 2 D, Fig. 6, and Fig. 7 are openly available in The Cancer Genome Atlas (TCGA) at https://www.cancer.gov/ccg/research/genome-sequencing/tcga. All other data are available from the corresponding author upon reasonable request.
Acknowledgments
We thank Will Brackenbury and Jonathan Godwin for cells, Christian Fermer of FDAB for anti-CIZ1 antibodies, and former colleagues Faisal Abdel Rahman, Jennifer Munkley, Louisa Williamson, Gillian Higgins, Julie Tucker, Karen Clegg, and Matt Dowson. We acknowledge the role of the Breast Cancer Now Tissue Bank in collecting and making available the primary normal breast epithelial cells used here and the patients who donated their tissues.
Protein domain analysis was funded by the Georgina Gatenby PhD scholarship to G.L. Turvey, human CIZ1 gene expression array work by Cizzle Biotech, reanalysis of TCGA data by York Against Cancer, and remaining work by Medical Research Council grant MR/V029088 and a Royal Society Leverhulme Trust Fellowship to D. Coverley. Open Access funding provided by the University of York.
Author contributions: G.L. Turvey: Conceptualization, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing - original draft, Writing - review & editing, E. López de Alba: Investigation, Writing - review & editing, E. Stewart: Formal analysis, Investigation, Methodology, Resources, Validation, Visualization, Writing - review & editing, H. Cook: Investigation, A. Alalti: Resources, R.T. Gawne: Formal analysis, J.F.-X. Ainscough: Conceptualization, Investigation, Methodology, Resources, Supervision, Visualization, Writing - review & editing, A.S. Mason: Data curation, Formal analysis, Investigation, Software, Supervision, Writing - review & editing, D. Coverley: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing.
References
Author notes
Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. J.F.-X. Ainscough reported “J.F.-X. Ainscough is a co-founder and shareholder of Cizzle Biotech.” D. Coverley reported other from Cizzle Biotechnology Ltd. during the conduct of the study; personal fees from Cizzle Biotechnology Ltd. and other from Cizzle Biotechnology Ltd. outside the submitted work; in addition, D. Coverley had a patent to Cizzle Biotechnology Ltd. licensed “Cizzle Bio Inc.” and a patent to Cizzle Biotechnology Ltd. pending “Cizzle Bio Inc.”; and “D. Coverley is a founder, CSO, and shareholder in Cizzle Biotech and reports receiving institutional research support from Cizzle Biotech, which supported the analysis shown in Fig. 2, A and B. Cizzle commercial interests are in the CIZ1B variant which is not the subject of this paper.” No other disclosures were reported.
E. López de Alba’s current affiliation is Topology and DNA breaks Group, Spanish National Cancer Centre (CNIO), Madrid, Spain.
A. Alalti’s current affiliation is Nuffield Department of Medicine, University of Oxford, Oxford, UK.