In humans, copy number variations (CNVs) are a common source of phenotypic diversity and disease susceptibility. Facioscapulohumeral muscular dystrophy (FSHD) is an important genetic disease caused by CNVs. It is an autosomal-dominant myopathy caused by a reduction in the copy number of the D4Z4 macrosatellite repeat located at chromosome 4q35. Interestingly, the reduction of D4Z4 copy number is not sufficient by itself to cause FSHD. A number of epigenetic events appear to affect the severity of the disease, its rate of progression, and the distribution of muscle weakness. Indeed, recent findings suggest that virtually all levels of epigenetic regulation, from DNA methylation to higher order chromosomal architecture, are altered at the disease locus, causing the de-regulation of 4q35 gene expression and ultimately FSHD.
Copy number variations are an important source of human genetic diversity
Genetic association studies generally evaluate single-nucleotide polymorphisms (SNPs), which are single nucleotides at specific genomic locations that vary between individuals of the same species. Recent results indicate that the human genome contains another frequent type of polymorphism: copy number variations (CNVs; Conrad et al., 2010). A CNV is a segment of DNA that can be found in various copy numbers in the genomes of different individuals (Fig. 1). CNVs range in size from a few hundred nucleotides to several megabases. Compared with SNPs, CNVs affect a more significant fraction of the genome and arise more frequently. Hence, CNVs significantly contribute to human evolution, genetic diversity, and an increasing number of phenotypic traits (Stankiewicz and Lupski, 2010).
Depending on the genomic context, CNVs can have varying effects (Fig. 1). For example, recent data indicate that CNVs directly alter the structure of 12.5% of protein-coding genes (Conrad et al., 2010), and there is increasing evidence to suggest that CNVs play an important role in a number of Mendelian diseases and common complex disorders (Stankiewicz and Lupski, 2010). For example, Charcot-Marie-Tooth type 1A disease is caused by duplications in the PMP22 gene, which encodes an integral membrane protein that is a major component of compact myelin in the peripheral nervous system (Chance et al., 1994). Also, susceptibility to acquired immune deficiency syndrome is affected by segmental duplications encompassing the CCL3L1 gene, which encodes the CCR5 chemokine and ligand for the human immunodeficiency virus coreceptor (Gonzalez et al., 2005).
CNVs can also affect gene dosage and expression (Stranger et al., 2007). Recent data indicate that non–B-DNA forming sequences, which are usually enriched in promoter regions, are also enriched in CNV breakpoints. Thus, the same features that are involved in transcriptional regulation may also be involved in the formation of CNVs. As a consequence, CNVs might shape the evolution of gene regulation (Conrad et al., 2010).
More than half of the human genome is comprised of repetitive sequences (Neguembor and Gabellini, 2010). Because repetitive sequences can act as substrates for homologous recombination, their presence facilitates the instability of our genome (Gu et al., 2008). As a result, repetitive sequences account for a significant amount of human CNVs (Warburton et al., 2008).
In this review we will focus on an important human genetic disease, facioscapulohumeral muscular dystrophy (FSHD), which is caused by the presence of CNVs of a repetitive sequence that regulates gene expression.
Clinical features of FSHD
FSHD (MIM #158900) is characterized by the progressive weakness and atrophy of a specific subset of skeletal muscles. As the name implies, FSHD mostly affects the muscles of the face, scapula, and upper arms (Tawil et al., 1998). The peculiar involvement of specific muscles is such a striking feature of FSHD that it is often used in the clinic to distinguish FSHD from the other forms of muscular dystrophy (Padberg et al., 1991). FSHD onset generally involves the wasting of facial muscles, such as orbicularis oris and orbicularis oculi, whereas others, like the pharyngeal and lingual muscles, are unaffected. As the disease progresses, limb girdle muscles, such as scapula fixator and trapezius, are also affected. Abdominal muscle weakness is another feature of FSHD, causing a characteristic lordotic posture (an abnormal curvature of the lumbar spine) associated with a protuberant abdomen. In the most severe cases, the muscular degeneration can extend to the pelvic girdle and foot dorsiflexor muscles, thereby affecting the ability of the patient to walk. Approximately 20% of FSHD patients become wheelchair bound (Pandya et al., 2008).
FSHD is associated with retinal vasculopathy, a blood vessel disorder of the retina, in 60% of cases (Tawil and Van Der Maarel, 2006) and sensorineural hearing loss in 75% of affected individuals (Trevisan et al., 2008). Mental retardation, epilepsy, and cardiac involvement are also present in FSHD patients more frequently than in healthy people (Faustmann et al., 1996; Funakoshi et al., 1998; Trevisan et al., 2006, 2008; Saito et al., 2007).
Most FSHD patients report their first symptoms during the second or third decade of their life; however, the age of onset can vary from infancy to age 50 (van der Maarel et al., 2007). Early-onset cases are generally associated with more severe phenotypes (Miura et al., 1998; Klinge et al., 2006). Interestingly, the FSHD phenotype is gender dependent. Typically, males are more severely affected, whereas females can develop a milder or asymptomatic form of the disease (Padberg, 1982; Zatz et al., 1998; Ricci et al., 1999; Tonini et al., 2004).
Muscle impairment in FSHD is often asymmetric: muscles on one side of the body appear much more compromised than on the other side (Kilmer et al., 1995). Various hypotheses have been proposed to explain this phenomenon, including over-work weakness and handedness, but the mechanism underlying this asymmetric phenotype in FSHD remains unknown (Pandya et al., 2008).
The rate of FSHD progression and the distribution of muscle weakness are highly variable, even between close family relatives. Indeed, these features were noted in the first study conducted on the disease in the late 1800s (Landoyzy and Dejerine, 1886). A number of monozygotic twins discordant for the penetrance of FSHD have been described, pointing to a strong epigenetic component in the disease (Tawil et al., 1993; Griggs et al., 1995; Hsu et al., 1997; Tupler et al., 1998).
Although the genetic defect underlying FSHD has been identified, the molecular mechanism causing the disease remains unclear. Recent results suggest that complex genetic events contribute to FSHD, as discussed below.
FSHD genetics
FSHD is the third most common myopathy, with an incidence in the general population of 1:15,000 (Flanigan et al., 2001). The disease is transmitted as an autosomal-dominant character, although it presents very complex genetics. Up to 30% of cases are due to de novo mutations (Zatz et al., 1995). Approximately half of the de novo cases result from a post-zygotic mutation that leads to mosaicism (Griggs et al., 1993; Weiffenbach et al., 1993; Upadhyaya et al., 1995; Bakker et al., 1996; van der Maarel et al., 2000). In more than 95% of cases, the disease maps to the subtelomeric region of chromosome 4 long arm at 4q35.2 (FSHD1; Wijmenga et al., 1990, 1991). A small percentage of FSHD cases are not genetically linked to chromosome 4 (FSHD2; Wijmenga et al., 1991; Gilbert et al., 1992; Bakker et al., 1995), although no other putative genetic locus has been identified.
In 4q-associated FSHD cases, the disease is caused by a molecular rearrangement that causes copy number variations of a 3.3-kb tandem repeated macrosatellite called D4Z4 (Wijmenga et al., 1992; van Deutekom et al., 1993). D4Z4 is extremely polymorphic in the general population (Hewitt et al., 1994; Winokur et al., 1994), ranging from 11 to 150 copies. FSHD patients carry only 1 to 10 units (Fig. 2; Wijmenga et al., 1992; van Deutekom et al., 1993).
Although FSHD is highly variable, there is a general correlation between the number of residual D4Z4 repeats, the age of onset, and the severity of the disease (Goto et al., 1995; Lunt et al., 1995; Zatz et al., 1995; Tawil et al., 1996; Hsu et al., 1997; Ricci et al., 1999). In particular, larger deletions tend to be associated with earlier onset and a more rapid progression of FSHD (Goto et al., 1995; Lunt et al., 1995; Zatz et al., 1995; Tawil et al., 1996; Hsu et al., 1997; Ricci et al., 1999). Importantly, it has been reported that at least one copy of D4Z4 is required to cause FSHD, as individuals with deletions of the entire repeat array do not display signs of muscular dystrophy (Fig. 2; Goto et al., 1995; Tupler et al., 1996; Rossi et al., 2007), suggesting that the repeat itself plays a critical role in the disease.
A detailed genomic characterization of the 4q35 region led to the identification of different haplotypes (Fig. 3; van Geel et al., 2002; Lemmers et al., 2007, 2010a). A simple sequence length polymorphism is localized 3.5 kb proximal to D4Z4. D4F104S1 (p13E-11), a region located immediately proximal to D4Z4, contains 15 SNPs. The most proximal unit of the D4Z4 repeat array contains several SNPs. Finally, a large region of sequence variation (alleles A, B, or C) has been detected distal to D4Z4 (Fig. 3). Considering these various features, 4q alleles were subdivided in 18 haplotype variants (Lemmers et al., 2010a). Importantly, D4Z4 deletions are pathogenic only in a few of these haplotype backgrounds (4qA161, 4qA159, and 4qA168; Lemmers et al., 2007, 2010b). D4Z4 deletions in the presence of these haplotypes are not sufficient to cause FSHD because 4qA161 asymptomatic carriers have been described (Arashiro et al., 2009), suggesting that these haplotypes represent only a permissive condition for FSHD rather than being the causative event. Importantly, it was observed that FSHD2 patients carry at least one 4qA161 allele (de Greef et al., 2009), further supporting the role of this permissive haplotype in the disease.
There are sequences homologous to D4Z4 on several human chromosomes (Lyle et al., 1995). On chromosome 10q26 there is a repeat array that shares 98% identity with the D4Z4 repeat array at 4q35 (Cacurri et al., 1998). Additionally, high homology extends to 45 kb proximal of D4Z4 and 15–25 kb distal (van Geel et al., 2002). The 10q and 4q D4Z4 repeats are equally polymorphic, and some individuals have nonstandard, hybrid alleles containing 4q-derived repeats on chromosome 10 and 10q repeats on chromosome 4 (van Deutekom et al., 1996a; van Overveld et al., 2000; Lemmers et al., 2010a). Several studies have reported that the D4Z4 repeats (even if 4q derived) on 10q are not pathogenic, suggesting that 4q-specific sequences proximal to D4Z4 are required for FSHD (Bakker et al., 1995; Deidda et al., 1996; van Deutekom et al., 1996a; Lemmers et al., 1998; van Overveld et al., 2000). By contrast, an exception to this rule was recently described (Lemmers et al., 2010b). In this unusual case, only the distal end of the D4Z4 repeat array was transferred to chromosome 10. Thus, the 4q35 FSHD candidate genes located proximal to the D4Z4 repeat array were not present on chromosome 10. This finding suggested that proximal 4q genes are not required for the pathogenesis of FSHD (Lemmers et al., 2010b). It has to be noted, however, that this case is a very unusual patient carrying a rare haplotype with hybrid repeats deleted on 10q chromosome and a permissive 4qA161 allele on 4q chromosome (Lemmers et al., 2010b). Hence, in this case the disease could still be linked to chromosome 4 through an in trans effect of the hybrid 10q repeats on the permissive chromosome 4q.
Although the exact molecular mechanism responsible for the disease is unknown, it is agreed in the field that the D4Z4 deletion causes an epigenetic gain-of-function alteration leading to the up-regulation of candidate gene(s) (Neguembor and Gabellini, 2010).
Epigenetic features associated with FSHD
Several of the clinical features outlined above, such as the gender bias in severity, the asymmetric muscle wasting, and the discordance in monozygotic twins, suggest that FSHD development involves epigenetic factors (Neguembor and Gabellini, 2010).
Epigenetic changes do not affect the primary DNA sequence; rather, gene expression is altered by changing the conformation of chromatin. Local chromatin structure is regulated by at least three processes: DNA methylation (Suzuki and Bird, 2008), histone modifications (Ruthenburg et al., 2007), and ATP-dependent chromatin remodeling (Ho and Crabtree, 2010). In addition, a number of elements (chromatin boundaries, insulators, etc.) affect higher order chromatin structure by regulating long distance interactions and chromatin loop domain organization (Maeda and Karch, 2007). Here, we summarize the studies that have investigated the epigenetic factors involved in FSHD.
DNA methylation in FSHD.
D4Z4 belongs to a family of human tandem repeats termed macrosatellites that are noncentromerically located (Chadwick, 2009). Together with other members of the family, such as DXZ4 on chromosome X (Giacalone et al., 1992) and RS447 on 4p (Kogi et al., 1997), D4Z4 is extremely GC rich.
DNA methylation is a chemical mark added to cytosine residues by DNA methyltransferases: DNMT1, DNMT3A, and DNMT3B (Chen and Li, 2004). Mammalian genomes are globally methylated, with the noticeable exception of short nonmethylated regions called CpG islands. Current data indicate that promoter methylation leads to stable gene silencing, whereas intragenic methylation helps to weaken transcriptional noise (Suzuki and Bird, 2008). In addition, because several transcription factors and chromatin-binding proteins, such as CTCF and YY1, are methylation sensitive (Hark et al., 2000; Kim et al., 2003), it is clear that DNA methylation can significantly affect the occupancy of a specific genomic region.
It has been shown that although D4Z4 is highly methylated in healthy subjects, FSHD patients have a specific hypomethylation of the D4Z4 contracted allele (Fig. 4; van Overveld et al., 2003; de Greef et al., 2009). Recent findings indicate that D4Z4 contraction is always associated with hypomethylation, irrespective of the chromosome or the haplotype, as deletions on chromosome 10 and on chromosome 4 in asymptomatic carriers are also associated with a reduction in DNA methylation of the contracted locus (de Greef et al., 2009). Moreover, D4Z4 is hypomethylated in patients affected by immunodeficiency, centromeric instability, and facial anomalies syndrome (Kondo et al., 2000). Interestingly, there are no common traits among these diseases. These findings suggest that D4Z4 hypomethylation is not responsible for FSHD by itself. Nevertheless, it is interesting to note that non–4q-associated FSHD patients (FSHD2), which lack D4Z4 contractions, display general D4Z4 hypomethylation on both chromosome 4 alleles and on the two chromosome 10 alleles (de Greef et al., 2009), pointing to a general defect in methylation of D4Z4 repeats. Collectively, these results suggest that D4Z4 hypomethylation might represent a permissive condition required for FSHD onset or that it might be a consequence of the primary cause of FSHD.
Histone modifications in FSHD.
Most cellular DNA is compacted into nucleosomes, in which 146 bp of DNA are wrapped around a protein octamer composed of two copies of each core histone H2A, H2B, H3, and H4 (Campos and Reinberg, 2009). Nucleosomes are linked by a variable length of DNA associated with linker histone H1 (Campos and Reinberg, 2009). Histone proteins are subjected to different posttranslational covalent modifications, including acetylation, methylation, ubiquitination, and SUMOylation of lysine (K) residues, phosphorylation of serine (S) and threonine (T) residues, methylation of arginines (R), and ADP-ribosylation of glutamic acid (Bernstein et al., 2007). Combinations of posttranslational modifications of single histones, single nucleosomes, and nucleosomal domains establish local and global patterns of chromatin modifications and recruit nuclear factors that mediate downstream functions (Ruthenburg et al., 2007). These patterns can be altered by multiple extracellular and intracellular stimuli, and chromatin itself functions as a genomic integrator of various signaling pathways, ultimately affecting cellular processes such as replication and transcription (Cheung et al., 2000; Nightingale et al., 2006).
The D4Z4 repeat array appears to be organized in distinct domains, some characterized by transcriptionally repressive heterochromatin and others by transcriptionally permissive euchromatin (Zeng et al., 2009). In particular, on both chromosome 4 and 10, the repressive marks of histone H3 lysine 9 tri-methylation (H3K9me3) and histone H3 lysine 27 tri-methylation (H3K27me3) are both present on some D4Z4 units, but the permissive mark histone H3 lysine 4 di-methylation is present on different units (Zeng et al., 2009). Consistent with previous studies, the authors reported euchromatin histone marks in the first proximal D4Z4 unit of the array (Jiang et al., 2003; Zeng et al., 2009).
The modification of H3K9me3 on D4Z4 is mediated by the histone methyltransferase SUV39H1 (Zeng et al., 2009). Interestingly, H3K9me3 is lost in FSHD patients, preventing the binding of D4Z4 to the heterochromatin-binding protein HP1γ and the sister chromatid cohesion complex, cohesin (Fig. 4). This loss could lead to the de-repression of 4q35 genes and muscular dystrophy (Zeng et al., 2009).
In a study aimed at characterizing the chromatin status of the FSHD region, both D4Z4 and the promoter of the 4q35 gene FRG1 (see below) were reported to be bound by the transcription factor YY1 and the Polycomb Group protein EZH2 (Bodega et al., 2009). Polycomb Group proteins are chromatin modifiers that implement transcriptional silencing in higher eukaryotes (Simon and Kingston, 2009). In particular, YY1 and EZH2 binding are reduced both at D4Z4 and FRG1 promoter in myotubes compared with myoblasts (Bodega et al., 2009). As a consequence, the EZH2-mediated histone repressive mark H3K27me3 is also reduced in myotubes compared with myoblasts. Accordingly, FRG1 expression is increased in myotubes compared with myoblasts. Notably, the H3K27me3 modification at D4Z4 was found by 3D FISH to be less abundant in FSHD cells compared with controls (Bodega et al., 2009).
In summary, a number of repressive histone marks are present on D4Z4 in healthy subjects and their loss in FSHD might lead to de-repression of 4q35 genes.
A repressor complex binds to D4Z4.
A few years ago, a 27-bp sequence located inside each D4Z4 unit, termed the D4Z4 binding element, was identified (Gabellini et al., 2002). This element is specifically bound by a D4Z4 repressor complex (DRC) composed of YY1, HMGB2, and nucleolin (Gabellini et al., 2002). These factors interact with proteins that mediate gene silencing and heterochromatin formation, such as DNA methyltransferases, histone deacetylases, and HP1 (Ko et al., 2008; Wu et al., 2009). DRC binds to the 4q35 located D4Z4 in vivo and mediates the transcriptional repression of 4q35 genes (Gabellini et al., 2002). Thus, the loss of D4Z4 repeats in FSHD may result in reduced DRC binding to the region and, consequently, reduced silencing of the 4q35 genes (Fig. 4; Gabellini et al., 2004). Importantly, of the factors that have been identified to bind D4Z4, YY1 is the only one with sequence specificity. Thus, it would be interesting to investigate whether the recruitment of factors like SUV39H1, EZH2, HP1γ, and cohesin to D4Z4 is mediated by YY1 (Fig. 4).
Subnuclear localization of 4q35.
Most nuclear events do not occur randomly throughout the nucleoplasm; rather, they are usually limited to specific and spatially defined sites (Ferrai et al., 2010). Accordingly, the particular intranuclear positioning of a given chromosomal region plays an important role in several cellular processes, such as transcription and replication (Spector, 2001).
Although mammalian telomeres in somatic cells are evenly dispersed in the inner part of the nucleus (Ludérus et al., 1996; Nagele et al., 2001; Amrichová et al., 2003; Weierich et al., 2003), the 4q telomere is located near the nuclear periphery (Masny et al., 2004; Tam et al., 2004). Interestingly, a sequence 215 kb proximal to the repeat array shows a stronger localization to the nuclear rim than D4Z4 in healthy subjects, suggesting that a region proximal to D4Z4, and not the repeat array itself, directs the 4q telomere to the periphery (Fig. 4; Masny et al., 2004). Recently, Ottaviani et al. (2009b) identified an 80-bp sequence inside the D4Z4 unit that can trigger perinuclear positioning of artificial telomeres in a CTCF- and lamin A–dependent manner (see below). This property is lost upon D4Z4 multimerization. Thus, it appears that in healthy subjects, multiple copies of D4Z4 are located near the nuclear periphery due to a 4q-specific signal proximal to D4Z4, whereas in FSHD patients the perinuclear location is mediated by D4Z4 (Fig. 4; Ottaviani et al., 2009b). Although FISH analyses indicate that the peripheral localization of 4q is maintained in different cell types and is apparently unaltered in FSHD patients compared with controls, the peripheral environment of the FSHD 4q35 allele may be altered, and thereby contribute to the aberrant 4q35 gene expression reported in FSHD (Masny et al., 2004; Tam et al., 2004; Ottaviani et al., 2009b).
In metazoans, the nuclear lamina coats the inner surface of the nuclear envelope (Hetzer, 2010). Using the DNA adenine methyltransferase identification approach, lamin-associated domains that correlate with silenced regions have been identified (Guelen et al., 2008). Intriguingly, a lamin-associated domain has been mapped to a locus that is 50 kb proximal to D4Z4, which is consistent with the previous finding that this region has a role in maintaining gene repression (Guelen et al., 2008). The peripheral location of 4q seems to be strictly dependent on lamin A, given that chromosome 4 telomeres are dispersed in cells lacking the lamin A gene (Masny et al., 2004). Furthermore, chromatin immunoprecipitation assays revealed that lamin A is associated with D4Z4 in vivo (Ottaviani et al., 2009a).
Altered chromatin organization at 4q35 in FSHD.
As mentioned above, there is no macroscopic relocalization of 4q due to D4Z4 deletion in FSHD. Nonetheless, more subtle alterations may occur. For example, the 4q35 locus could be repositioned to a different peripheral subdomain, leading to inappropriate 4q35 gene regulation (Masny et al., 2004; Ottaviani et al., 2009b). Alternatively, the higher order chromatin structure of the 4q35 locus might be affected. There is growing evidence to indicate that the three-dimensional organization of the FSHD region significantly contributes to the regulation of gene expression at 4q35 (Petrov et al., 2006; Pirozhkova et al., 2008; Bodega et al., 2009).
Recently, it has been proposed that the area immediately proximal to D4Z4 could play a role in FSHD (Lemmers et al., 2007; Tsumagari et al., 2008). Interestingly, this area has been suggested to function as a nuclear matrix attachment region (MAR; Petrov et al., 2006). Matrix attachment was shown to be weakened on the contracted chromosome 4 in FSHD-derived myoblasts compared with controls, leading to a drastic alteration in chromatin loop domain organization (Fig. 4; Petrov et al., 2006). In particular, whereas a high number of D4Z4 repeats maintain the organization of the repeat array and 4q35 genes in two distinct chromatin loops, loosening of the MAR in FSHD patients would bring the contracted repeats and 4q35 genes into the same chromatin loop (Petrov et al., 2006). Ultimately, the presence of an enhancer at the 5′ end of the D4Z4 unit could cause inappropriate 4q35 gene de-repression in FSHD (Petrov et al., 2008).
Chromosome conformation capture (3C) is a technique that identifies long distance intra- and inter-chromosomal interactions (Dekker et al., 2002). Using 3C, two groups have independently investigated the higher order chromatin organization of the 4q35 locus (Pirozhkova et al., 2008; Bodega et al., 2009). Pirozhkova et al. (2008) showed that the telomeric 4qA allele is in close proximity to the promoters of the 4q35 genes FRG1 and ANT1 in FSHD myoblasts and not in control myoblasts. The 4qA allele is immediately distal to the D4Z4 repeat array (Lemmers et al., 2002). Interestingly, an enhancer element was detected in the 4qA allele that could be involved in the reported up-regulation of FRG1 and ANT1 in FSHD (Gabellini et al., 2002; Laoudj-Chenivesse et al., 2005; Pirozhkova et al., 2008). In the other 3C study, an interaction between D4Z4 and the FRG1 promoter was identified in human primary myoblasts that appears to be highly reduced upon myogenic differentiation (Bodega et al., 2009). Consistent with the observed mis-regulation of FRG1, a small but statistically significant reduction in the D4Z4–FRG1 promoter interaction was observed in FSHD myoblasts compared with controls (Bodega et al., 2009). Altogether, it appears that in healthy subjects, the FRG1 promoter is in close proximity with the D4Z4 repeat and the gene is repressed, whereas in FSHD patients the promoter of FRG1 is in close proximity with the 4qA marker and the gene is up-regulated (Gabellini et al., 2002; Pirozhkova et al., 2008; Bodega et al., 2009).
CTCF is a multifunctional DNA-binding protein that is important for transcriptional regulation, chromatin insulation, and chromatin organization (Filippova, 2008). The same 80-bp D4Z4 element mediating the perinuclear positioning of the 4q telomere is also responsible for the CTCF and A-type lamin-dependent transcriptional insulator function of the repeat (Ottaviani et al., 2009a). The CTCF binding and insulation activity are lost upon multimerization of the repeats (Ottaviani et al., 2009a). As such, it has been proposed that FSHD patients have a CTCF gain-of-function phenotype that “protects” certain genes from the influence of nearby repressive chromatin, ultimately generating a 4q35 de-repressed state (Fig. 4; Ottaviani et al., 2009a).
Clearly, the FSHD locus is organized into a higher order chromatin structure that undergoes dynamic remodeling. The 3D architecture of the region appears to play a fundamental role in regulating 4q35 chromatin status and gene expression, suggesting that defects in the organization of the epigenome of the FSHD region could underlie this disease.
FSHD candidate genes
The studies aimed at understanding the molecular basis of FSHD indicate that an in cis alteration likely leads to the de-repression of target gene(s). This model provides a valid explanation for the 4q specificity and the autosomal-dominant transmission of the disease (Tupler and Gabellini, 2004).
The 4q35 locus is a relatively gene-poor region (van Geel et al., 1999; Blair et al., 2002). Of the genes that have been identified at 4q35, those of particular interest are ANT1, FRG1, DUX4c, FRG2, and DUX4 (Li et al., 1989; van Deutekom et al., 1996b; Gabriëls et al., 1999; Rijkers et al., 2004; Bosnakovski et al., 2008b). We will focus here on the two main candidate genes, DUX4 and FRG1. We will not discuss the other genes in detail because they are less attractive candidates. For example, DUX4c and FRG2 were found to be deleted in some FSHD families (Fig. 2; Lemmers et al., 2003), suggesting that they are not necessary for disease onset. Nevertheless, these genes are present in most of the affected families and it is possible that they could contribute to the penetrance and severity of the disease. There is some experimental support for this idea for DUX4c (Bosnakovski et al., 2008b; Ansseau et al., 2009).
DUX4.
Although the DUX4 gene contains an ORF encoding a putative double homeobox protein named DUX4, the D4Z4 repeat was initially considered to be a nonprotein coding sequence due to lack of evidence of transcription and protein synthesis. Nonetheless, both DUX4 mRNA and protein were recently detected in FSHD-derived primary myoblasts but not in controls, suggesting that D4Z4 may directly affect disease progression through the aberrant production of DUX4 (Dixit et al., 2007). Because the D4Z4 repeat does not contain a canonical polyadenylation signal, the mRNA is generated exclusively by transcription of the last, most distal, unit of the array that extends to a region named pLAM, which contains a polyadenylation signal (Dixit et al., 2007). It was recently shown that the pLAM polyadenylation signal can stabilize DUX4 transcripts that are ectopically expressed in transfected C2C12 cells (Lemmers et al., 2010b). This pLAM sequence also contains a polymorphism that could affect the polyadenylation of the distal DUX4 transcript (Lemmers et al., 2010b). Because a group of analyzed FSHD1 patients were all found to carry the same SNP in this region, it was suggested that this polymorphism contributes to the selective stabilization of DUX4 transcripts in FSHD (Lemmers et al., 2010b). It should be noted, however, that in this study the permissive 4qA161 allele was compared only to nonpermissive 4qB alleles or 10qA chromosomes (Lemmers et al., 2010b). Non-permissive 4qA variants, such as the previously described 4qA166 (Lemmers et al., 2007; de Greef et al., 2009), were not analyzed. Hence, the available data do not allow us to exclude the possibility that the described variations reflect 4qA/B or 4q/10q differences.
The DUX4 pre-mRNA can be alternatively spliced (Snider et al., 2009). Interestingly, it was recently reported that muscles of healthy subjects express low levels of a DUX4 splicing isoform encoding for a truncated protein, whereas muscles of FSHD patients express a splicing isoform encoding for full-length DUX4 (Snider et al., 2010). It has also been reported that the repeat array displays a complex transcriptional profile that includes sense and antisense transcripts and RNA processing (Snider et al., 2009). Thus, there may be multiple D4Z4-derived RNA players in FSHD and future work will be required to determine their functions.
Recently, an isogenetic screen to assess the effect of overexpressing FSHD candidate genes ANT1, FRG1, FRG2, DUX4c, and DUX4 on cell viability was used (Bosnakovski et al., 2008a). Previous work demonstrated a pro-apoptotic function for DUX4 (Kowaljow et al., 2007), and DUX4 overexpression was found to have a dramatically toxic effect. As a consequence of increased DUX4 levels, the expression of oxidative stress response genes and of myogenic factors (such as MyoD and Myf5) was altered. Interestingly, a myogenic defect has been reported in FSHD patients (Winokur et al., 2003a; Celegato et al., 2006).
DUX4 homeodomains are similar to those of Pax3 and Pax7, which are transcription factors that are pivotal to muscle function (Buckingham, 2007). Additionally, the phenotype of DUX4 overexpression can be rescued by overexpression of Pax3 or Pax7 (Bosnakovski et al., 2008a).
Recently, DUX4 overexpression (even at extremely low levels) was reported to cause massive apoptosis and severely abnormal development in Xenopus laevis, a model for vertebrate development (Wuebbles et al., 2010). Note that an increase in apoptosis is not generally considered to be a phenotype of FSHD (Winokur et al., 2003b).
In theory, DUX4 represents an attractive FSHD candidate gene, as it would easily explain the requirement for at least one D4Z4 for the development of the disease. Due to its extreme toxicity, however, DUX4 could only function in FSHD if it is absent under normal conditions and overexpressed exclusively in the muscle cell precursors of FSHD patients (Wuebbles et al., 2010).
FRG1.
FSHD region gene 1 (FRG1) is highly conserved in both vertebrates and invertebrates (Grewal et al., 1998). FRG1 is considered a likely candidate gene for mediating FSHD due to its selective chromosomal location at 4q35, its overexpression in FSHD samples (Gabellini et al., 2002; Bodega et al., 2009), and the development of FSHD-like phenotypes following its overexpression in mice, Xenopus laevis, and Caenorhabditis elegans (Gabellini et al., 2006; Hanel et al., 2009; Wuebbles et al., 2009; Liu et al., 2010). Transgenic mice overexpressing FRG1 selectively in the skeletal muscle develop pathologies with physiological, histological, ultrastructural, and molecular features that resemble those of FSHD patients (Gabellini et al., 2006). Nevertheless, FRG1 overexpression in FSHD patients is currently controversial, and studies have reported inconsistent results (Gabellini et al., 2002, 2006, Dixit et al., 2007, Osborne et al., 2007; Bodega et al., 2009; Klooster et al., 2009). The idea that FRG1 plays an important role in FSHD was very recently challenged by the identification of an unusual FSHD patient with deletion of D4Z4 only on chromosome 10 (Lemmers et al., 2010b). However, as stated above, before discarding a causative role for FRG1 in FSHD, this patient requires further characterization. As discussed below, FRG1 is crucial for proper muscle function and vascular development (Gabellini et al., 2006; Hanel et al., 2009; Wuebbles et al., 2009). Hence, FRG1 could at minimum affect the severity of the disease.
To better understand the role of FRG1 in FSHD, efforts have been made to characterize its biological function. The human endogenous FRG1 protein copurifies with the spliceosome, the protein–RNA macromolecular complex responsible for pre-mRNA splicing (Kim et al., 2001; Rappsilber et al., 2002; Bessonov et al., 2008). When ectopically overexpressed, FRG1 localizes to nucleoli, Cajal bodies, and speckles (van Koningsbruggen et al., 2004), and colocalizes and/or interacts with proteins involved in RNA biogenesis, such as SMN, PABPN1, and FAM71B (van Koningsbruggen et al., 2007). Interestingly, mutations in SMN and PABPN1 cause myopathies (Calado et al., 2000; Briese et al., 2005). Together, these results suggest that FRG1 functions in RNA processing. Indeed, in FSHD and in transgenic mice or cells overexpressing FRG1, altered splicing has been observed for a number of genes (Gabellini et al., 2006; van Koningsbruggen et al., 2007; Davidovic et al., 2008).
Specific muscle-related functions for FRG1 have been identified in different animal models (Hanel et al., 2009; Liu et al., 2010). Interestingly, overexpression of Xenopus frg1 or C. elegans FRG-1 causes a muscle defect (Hanel et al., 2009; Liu et al., 2010).
FRG1 contains a single fascin-like domain, a motif that is associated with actin-bundling properties (Edwards and Bryan, 1995), and it was recently shown that FRG1 can bind F-actin and promote its bundling (Liu et al., 2010). In agreement with this finding, in different organisms the endogenous FRG1 in muscle has not only a nuclear distribution but is also a sarcomeric protein, suggesting that FRG1 might perform a muscle-specific function (Hanel et al., 2010; Liu et al., 2010).
FSHD pathology is most prominent in the musculature; however, up to 75% of FSHD patients display retinal vasculopathy (Fitzsimons et al., 1987; Padberg et al., 1995). Intriguingly, it was recently reported that in human tissues the endogenous FRG1 is strongly expressed in arteries, veins, and capillaries (Hanel et al., 2010). Moreover, in Xenopus FRG1 levels are crucial for proper vascular development, and up-regulation of FRG1 leads to a disrupted vascular phenotype (Wuebbles et al., 2009).
Collectively, these studies indicate that FRG1 overexpression in different animal models is associated with aberrant muscle structure and vasculature, the two most prominent features of FSHD pathology.
Conclusions
Recent studies suggest that copy number variations (CNVs) are important for human phenotypic diversity and disease susceptibility. DNA repeats account for 55% of the human genome and a significant fraction of CNVs.
FSHD is an important pathology caused by CNVs of D4Z4 repeats. It is an extremely complicated and fascinating disease, and research into this topic is revealing much about the functional organization of our genome.
An increasing amount of evidence suggests that the 4q35 macrosatellite repeat D4Z4 plays a crucial role in the chromosomal organization of the FSHD region. There is a general consensus that the D4Z4 deletion in FSHD leads to epigenetic alterations that affect the expression profiles of genes within the FSHD region. Unfortunately, despite considerable effort, almost 20 years after the identification of the genetic defect underlying the disease, the causative FSHD gene(s) remains unknown, and no effective treatments for FSHD are currently available.
The heterogeneity in disease manifestation probably reflects heterogeneity in gene expression in FSHD. An interesting possibility, therefore, is that the complexity of FSHD could be explained by considering it to be a contiguous gene syndrome, where the epigenetic alteration of DUX4, FRG1, and other potential genes collaborate to determine the final phenotype. Finally, because DUX4 behaves as a transcriptional activator (Dixit et al., 2007), it could play a direct role in transcriptional overexpression of the other 4q35 genes, providing a unifying model for the molecular mechanism of the disease.
Acknowledgments
Due to space limitations, we apologize to our colleagues for the vastly incomplete documentation of their important contributions to the topics described in this review. We thank Drs. F. Jeffrey Dilworth, Claudia Ghigna, Michael R. Green, and Lawrence G. Wrabetz for critical reading of the manuscript.
Research in the Gabellini laboratory is made possible by support provided by the European Research Council (ERC), the Muscular Dystrophy Association of the USA (MDA), the European Alternative Splicing Network of Excellence (EURASNET), the Association Francaise contre les Myopathies (AFM), the FSHD Global Research Foundation, a Jaya Motta private donation, and the FSH Society, Inc. Davide Gabellini is a Dulbecco Telethon Institute Assistant Scientist.
References
- 3C
chromosome conformation capture
- CNV
copy number variation
- DRC
D4Z4 repressor complex
- FSHD
facioscapulohumeral muscular dystrophy
- H3K9me3
histone H3 lysine 9 tri-methylation
- H3K27me3
histone H3 lysine 27 tri-methylation
- MAR
matrix attachment region
- SNP
single-nucleotide polymorphism