Primary immunodeficiencies (PIDs), more recently renamed inborn errors of immunity (IEIs), are a diverse group of over 550 genetic disorders. They cause clinically apparent immune dysregulation, leading to infections, autoinflammation, autoimmunity, and cancer. Initially, most IEIs were described as Mendelian disorders with complete penetrance, but the community has now shown that, in most IEIs, some individuals harboring disease-causing genotypes display only partial clinical disease, or no disease at all. Thus, most IEIs are actually Mendelian disorders with incomplete penetrance. Despite the frequency of incomplete penetrance in IEIs, the conceptual framework for systematically categorizing and explaining these occurrences remains limited. Here, I expand on four recurrent themes of incomplete penetrance that we have recently proposed: genetic variant quality, epigenetic and genetic modification, environment, and mosaicism. For each of these principles, I review what is known and unknown and propose future experimental approaches to fill the gaps in our knowledge. I focus on IEIs, but these concepts can be generalized to all genetic diseases.
Introduction
Primary immunodeficiencies (PIDs) or inborn errors of immunity (IEIs) are a heterogeneous group of monogenic lesions, resulting in severe infections, disorders of immune hyperactivation, or cancers. Since the first descriptions of inherited immunodeficiency in the 1930–1950s (1, 2, 3, 4, 5), IEIs have, by and large, been considered to be Mendelian disorders. In the 2010s, with the fall in costs for next-generation sequencing, the number of genetic errors identified as causing IEIs has grown exponentially, now exceeding 550 unique entities (6). These discoveries have often improved patient treatment (7, 8, 9) and have significantly advanced our understanding of basic and clinical immunology. However, despite unprecedented successes in this field, there is an “elephant in the room”: these disorders are widely held to be Mendelian, but they mostly display an imperfect segregation of gene variants with disease traits.
In genetics generally, the term incomplete penetrance is used to describe the absence of clinical disease in individuals harboring a known disease-causing genotype. This term makes it possible to get around the problem of our lack of precise understanding of incomplete penetrance for the moment, while allowing us to continue to describe genes as Mendelian, albeit with incomplete penetrance.
Before going into the details, we need to establish with precision the language and terminology used. Incomplete penetrance and reduced penetrance are considered here to be synonymous. As defined above, penetrance is the binary presence or absence of the disease trait in the presence of the causal genotype. However, a genetic defect may also present on a scale of disease severity or with different clinical phenotypes—a concept known as variable expressivity. Here, I will consider variable expressivity under the umbrella term incomplete penetrance, as the two phenomena often have largely overlapping origins. The terms fully penetrant and monogenic are not synonymous, as either may occur in the absence of the other. However, both these features are often considered necessary for a trait to be considered Mendelian.
Incomplete penetrance is common (10), but its exact incidence is difficult to determine from published studies. When considered, penetrance is typically assessed in the relatives of affected patients, with segregation of the disease traced from the proband. The reported rates of penetrance of specific IEIs, calculated in this manner, range from extremely low at about 5–10% (11, 12) to moderate 30% (13) and right up to almost 100% (14, 15). A recent study evaluating 453 patients from 193 families determined that the highest form of variable disease expressivity existed in familial lymphoproliferation, autoimmunity and malignancy STK4 deficiency, DNMT3B deficiency, and ATM deficiency, while immunological differences were prominent in syndromic and non-syndromic combined immunodeficiencies (10). Composite estimates across all IEIs indicate that ∼9% families display some degree of incomplete penetrance (16), although this frequency is probably closer to 20–30% across IEIs, given likely studies’ limitations. We suggest that there are at least two inherent biases underlying a pronounced underreporting of incomplete penetrance. These biases are: (1) reporting bias, due to a failure to pursue the study of new variants with highly reduced penetrance or to publish the results of such studies or a failure to pursue the study of already reported variants or to publish the results for such variants, as they are already considered to be Mendelian with full penetrance, thereby reducing the impact of such studies and creating a disincentive for authors, and (2) ascertainment bias, due to an inability to detect asymptomatic individuals carrying variants in the general population alongside the bias of healthy volunteers representing general population in databases.
In population-scale genetic studies generally, beyond the domain of IEIs, it has been noted that an average individual possesses ∼200 rare variants (17), 50 of which have been reported to drive disease, and yet these individuals remain healthy (18). Similarly, one study pointed out that about 1 in 4,000 adults caries a variant for a severe Mendelian condition but remains healthy (19). These studies suggest that the phenomenon of incomplete penetrance is both widespread and underappreciated.
Due to these biases, more remains unknown than known about incomplete penetrance across genetics (20, 21, 22, 23, 24). However, the study of IEIs is blossoming, leading to the reporting of ever increasing numbers of cases of incomplete penetrance and the continual emergence of new patterns. Here, I will review incomplete penetrance across IEIs and wider genetic fields and continue to develop the four principles of incomplete penetrance we proposed 5 years ago (25), to continue the development of a conceptual framework of incomplete penetrance in IEIs and beyond. For each principle, as before, I will document what is known and then what is unknown, proposing testable hypotheses that could be used to advance our current understanding. This evidence, these principles, and new ideas will help to establish a blueprint for future studies focusing specifically on incomplete penetrance. These concepts can readily be generalized to genetic diseases generally, despite our focus here on IEIs.
Principle I: Genetic variant quality
What is known
The quality of the genetic variant, a term which I use as synonymous to variant severity, can have different effects on its biochemical, cellular, and clinical penetrance. We, therefore, need to define these different types of penetrance. Biochemical penetrance refers to assessments of genetic variation in a test tube rather than in vivo, in an isogenic system with a readout. A complete absence of the protein typically results in more serious biological defects than hypomorphic variants, although there are exceptions (7). Such biochemical assays may suggest functional deficits, but such deficits may not be observed in the patient’s cells. Cellular penetrance refers to assessments of the biological pathways or processes in which the gene product with a variant is involved, in cells isolated from the patients themselves, often measured when cells are studied in isolation with a specific assay (e.g., the signal transduction pathway of a mutated receptor) in a non-isogenic system. The cellular phenotype mostly depends directly on the severity of the genetic defect and is usually strongly correlated with biochemical results. Furthermore, clinical penetrance mostly tracks biochemical and cellular penetrance, although this is not always the case. We can, therefore, propose a simple model: the severity of the genetic defect determines biochemical dysfunction, which governs the degree of perturbation in immune cells and, therefore, the propensity for clinical manifestations.
This model holds for a few genetic disorders. One of the principal examples is IFNGR1 deficiency, the first specific genetic etiology of Mendelian susceptibility to mycobacterial disease due to either environmental mycobacteria (EM) or bacille Calmette–Guerin (BCG) immunization (26, 27) to be described. Complete deficiency due to autosomal recessive (AR) IFNGR1 defects invariably results in BCG or EM infections by the age of 5 years, and clinical penetrance is complete (28, 29). By contrast, subjects with partial IFNGR1 deficiency, which is typically autosomal dominant (AD), often remain asymptomatic for longer periods of time, have milder disease, or may, in some cases, never develop disease (30, 31, 32). By inference, with subsequent experimental demonstration, these AD forms are characterized by the retention of some activity when IFN-γ signaling is assayed in patient cells in vitro (30, 31). Some activity is better than none, and the retention of higher levels of activity is associated with a lower observed penetrance.
A similar situation has been observed for STAT1 loss-of-function (LOF) defects. AR complete deficiency with a complete absence of type I–III interferon signaling leads to the complete penetrance of lethal intracellular bacterial and viral infections (33, 34, 35). By contrast, AR partial deficiency, with low levels of signaling, leads to penetrant but milder intracellular bacterial disease (36, 37, 38), and AD forms with LOF point variants retaining some function cause predominantly mycobacterial disease, but with incomplete penetrance (33, 34, 39, 40, 41). These findings demonstrate that incomplete penetrance can be potentiated by partial deficiencies of essential genes but also by complete deficiencies of nonessential genes of type I–III IFN signaling, leaving some signaling still intact or allowing another related pathway to keep some common transcriptional programs active. This situation can readily be observed in deficiencies of STAT2, TYK2, IFNAR1, and IFNAR2, in which severe viral disease affects some but not all individuals, with a greater penetrance observed for infections with live-attenuated viral vaccines than for common viral infections of childhood (42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52).
Allele-penetrance associations have also been noted in other IEIs with infectious phenotype, like CARD9 deficiency (53, 54), and in other IEIs less clearly associated with infection, including variants of STAT3 (55), PRF1 (55, 56, 57, 58), and AIRE (59, 60). The functional impact of each variation should be studied in isolation but, at times, genetics alone can lead the way. For example, a study of a cohort of patients with congenital asplenia due to RPSA variants revealed marked incomplete penetrance, but with no predicted functional differences between incompletely and fully penetrant variants. However, all the missense variants with incomplete penetrance were located close together in the structure, as were those with complete penetrance. Similarly a structural defect of the noncoding RPSA mRNA resulted in incomplete penetrance, whereas a noncoding variant resulting in complete transcript decay conferred complete penetrance (61). These findings suggest that milder hypomorphic variants probably retain some residual function, which may be sufficient for normal spleen development in some individuals. Thus, even in the absence of a full understanding of the effect of variants, the severity of the defect can be seen to be associated with its penetrance.
Autoimmune lymphoproliferative syndrome (ALPS) initially appeared to fit this mold, in that penetrance seemed to be a function of the location of the FAS variant, the most common cause of ALPS. Homozygous or compound heterozygous forms of ALPS-FAS are fully penetrant and particularly severe, with an early onset and an often lethal outcome (62, 63, 64, 65). Heterozygous forms are less penetrant. There is also an additional hierarchy among AD variants: missense variants of the intracellular domain are more highly penetrant (63–90%) than those in the extracellular domain (30–52%). The dominant-negative (DN) mechanism of intracellular domain variants therefore leads to more severe apoptosis than the haploinsufficiency mechanism of extracellular domain variants (66). There is clearly a relationship between the nature of the variant and the probability of disease. However, some level of defective Fas-mediated apoptosis can be identified in almost all affected individuals, for all variants. Finally, some “asymptomatic” individuals even display lymphocyte expansions or have autoantibodies without clinical autoimmunity or true lymphoproliferation (67, 68, 69). One recent study evaluated over 165 cases of ALPS and not only identified associations between the domain in which the variant occurred and penetrance but also suggested additional mechanisms contributing to this extensive clinical variability, particularly in cases with no correlation between genotype and phenotype (70). Be that as it may, each variant requires careful assessment, and determination of the threshold beyond which subclinical cellular defects become clinically apparent disease is vital for our understanding of this model and to expand the spectrum of molecular events, which are rarely mutually exclusive.
What remains unknown and future avenues for research
Despite these examples, the association between the degree of pathogenicity of a variant and its penetrance has not been clearly documented. Indeed, this model does not hold if a genetic defect is complete (deletions, frameshifts, etc.) but has variable penetrance. The best studied example of this is provided by CTLA4 haploinsufficiency. Intensive studies in large cohorts have reported no association between genotype and penetrance (71, 72, 73). For example, in a recent analysis, only 90 of 133 subjects from 54 unrelated families carrying 45 different CTLA4 variants in the heterozygous state presented features of disease. The missense, nonsense, or frameshift nature of the pathogenic variant had no apparent bearing on penetrance. Furthermore, immunologic phenotyping and in vitro CTLA4 dysfunction results were similar for both affected and unaffected carriers of the variants, suggesting complete cellular penetrance (73). However, cellular penetrance is fully dependent on the phenotype in question and the sensitivity of the assay used. Thus, on closer examination, the loss of surface CTLA4 expression was found to be less severe in unaffected carriers (73). The degree of CTLA4 perturbation in cells is, therefore, correlated with disease presentation even though the CTLA4 genotype cannot explain disease segregation. Along similar lines, a more recent study suggested that LOF variants of CLEC7A can act as gene modifiers, accounting for penetrance in some cases (74). Interestingly, a related defect of T cell regulation constituting a more severe phenocopy of CTLA4 haploinsufficiency (LRBA deficiency, IPEX) has been shown to have almost complete penetrance (75, 76). There are, therefore, probably other disease modifiers, in addition to CLEC7A variants, that can affect the degree of T reg cell dysfunction.
Recent work also highlighted that individuals with SPI1 (encodes PU.1) pathogenic variants that lead to haploinsufficiency also have highly reduced penetrance (77). Why this is the case is still not known. Also very interestingly, recent work highlighted, at least in part, why sexual dimorphisms exists in patients with variants in NFKB1 causing common variable immunodeficiency (CVID). Notably, authors conclude that autoimmunity in NFKB1 haploinsufficiency females is secondary to defective XIST-dependent X chromosome inactivation in T cells (78). Why this happens remains to be molecularly and biochemically documented.
These and other cases (79, 80, 81) demonstrate that the most severe genetic defects are not always associated with the greatest propensity for disease. Alternative hypotheses are, therefore, required. Complete TBK1 deficiency was initially thought to result in more severe disease than DN variants. However, it was shown experimentally that DN variants resulted in a more profound defect of IFN-I induction as, unlike the complete absence of TBK1 protein, they prevented the recruitment of IKKe, which could partly rescue the phenotype (7). Indeed, as shown in previous studies, the most severe genetic variants may lead to more robust compensatory responses (82, 83). Transcriptional adaptation—the process by which frameshifts/nonsense variants activate the transcription of homologous genes—may rescue these complete deficiencies (82, 83). In such cases, whether due to experimental knockout or disease variants, nonsense-mediated decay triggers an upregulation of genes with a similar sequence predicted to have a partially overlapping function (82, 83). This phenomenon suggests an enticing and testable hypothesis to account for incomplete penetrance in asymptomatic carriers of disease-causing nonsense variants. The ability of this “genomic compensation” to rescue disease phenotypes should be a key focus of future studies.
Principle II: Epigenetic and genetic modifiers can affect the penetrance of a variant
What is known
Despite the very sparse experimental evidence, the mechanisms most commonly proposed to account for incomplete penetrance are epigenetic regulation and/or potential modifier genes. As next-generation sequencing (84) becomes more commonplace and epigenetic techniques are introduced into the study of IEIs, evidence is finally being obtained to substantiate these reasonable presumptions.
We have proposed the concept of autosomal random monoallelic expression (aRMAE) (25, 85). Unlike imprinting, in which one allele—the maternal or paternal allele—is completely silenced throughout the organism, aRMAE results from a somatic but mitotically stable commitment to biased expression in favor of one allele rather than the other. De facto, results in transcriptional mosaicisms, as at DNA level, each cell is heterozygous for the variant concerned, but some cells (a lineage or sublineage) are committed to biased expression of one allele rather than the other, whereas other cells or lineages continue to express both alleles. This transcriptional bias may result in different proportions (e.g., 99%, 80%, or 60%) of the transcripts obtained originating from the paternal or maternal allele, suggesting that the system displays plasticity. We have, thus, proposed the term “transcriptotype” for this situation, which may differ from predictions based on genotype. In one family with a JAK1 gain-of-function variant, we documented suppression of the mutated allele in one healthy relative carrying the variant, across all cell types, whereas the proband had biallelic expression in T cells (85). Similarly a complete suppression of the mutated allele was documented in a family carrying a DN STAT1 variant (85). The healthy father was heterozygous for the dominant allele at DNA level, but his transcriptotype revealed the presence of the WT mRNA only in all cell types tested, contrasting with his sick child, who was heterozygous for the dominant allele at DNA level and had similar levels of both transcripts in a monocyte subset (but with a suppression of the mutated allele in T cells). H3K27 methylation and DNA methylation were suggested as possible mechanisms governing these processes (85). Screening in healthy individuals suggested that 4% of all IEI genes can display aRMAE in healthy donors. However, it should be stressed that disease-causing variant alleles may confer a homeostatic advantage or disadvantage on the host, and as such, it is likely that significantly more genes causing IEIs display aRMAE when mutated, resulting in discordant genotypes and transcriptotypes.
CVID—the most common form of immune deficiency—is an ideal model for studies of incomplete penetrance. A report on CVID-discordant monozygotic twins suggested that the twin with CVID displayed higher levels of DNA methylation in critical B-cell genes (PIK3CD, BCL2L1, RPS6KB2, transcription factor 3 [TCF3], and KCNN4) (86). Similarly, a follow-up analysis of 23 CVID patients revealed defective demethylation of selected CpG sites during the transition from naive to switched memory B cells (87). More recently a single-cell epigenomics and transcriptomics census of naïve-to-memory B cell differentiation in CVID-discordant monozygotic twins also suggested a role for epigenetic signatures, such as DNA methylation and chromatic accessibility (88). Additional concrete and diagnostic evidence is largely lacking for CVID, but the hypothesis of a role for epigenetic markers is particularly attractive given the variable disease expressivity in CVID.
Epigenetics and the genetic control of epigenetics may, therefore, play a particularly important role in penetrance.
Modifier genes
COPA syndrome is an IEI caused by variants of the COPA gene. It displays AD inheritance with incomplete penetrance. COPA patients present with interstitial lung disease and pulmonary hemorrhage, with the subsequent development of arthritis (89). Interestingly, there is a considerable clinical overlap between STING-associated vasculopathy with onset in infancy and COPA syndrome in terms of lung inflammation (90). In situations such as this, the genetic and biochemical interactions should perhaps be examined closely if they are not already obvious, as they may have functional consequences leading to different disease outcomes. Indeed, one common STING allele has been shown to prevent clinical penetrance for the rare COPA syndrome. Carriers of the deleterious COPA allele were not affected by the disease if they also carried a fairly common STING allele, which silenced the biochemical activity of the deleterious COPA, neatly explaining the observed penetrance (91).
Monogenic variants causing CVID continue to be identified but account for only a fraction of cases. Not infrequently, in inherited CVID (∼10–20%), a polygenic etiology is suggested (92). Specifically, deleterious variants of TNFRSF13B (TACI) are present in 1% of healthy individuals in public databases but in 10% of those with CVID, suggesting that this genetic background contributes to CVID but cannot drive the CVID phenotype alone (93). An enrichment has been observed in variants of other genes, such as TNFRSF13B, MSH5 and BAFFR, in cohorts of CVID patients, but these variants are also present in healthy populations and are, therefore, not sufficient to drive disease on their own (94, 95).
Digenic and, ultimately, polygenic inheritance can help us to analyze variable disease expressivity and incomplete penetrance. The idea is that a variant in cis or in trans disrupts biochemical epistasis, tipping the system toward disease. One recent study by Nomani et al. (96) did not address the issue of penetrance directly but suggested that disease inheritance is digenic in 66% of patients with adult-onset systemic autoinflammatory diseases. The combinations of genes involved in this digenic inheritance were NOD2/MEFV, NOD2/NLRP12, NOD2/NLRP3, and NOD2/TNFRSF1A. This discovery paves the way for detailed penetrance analyses in the context of both digenic and polygenic contributions to disease manifestations in families with these genetic variants. In another study, Massaad et al. reported that homozygosity for NEIL3 variants caused a uniformly fatal immune disease (recurrent infections and severe autoimmunity) in one family, but clinically silent immune dysfunction in an unrelated healthy individual. As an explanation for this incomplete penetrance, they cited the presence of a cryptic duplicated homozygous variant of LRBA—defects of which are known to cause systemic autoimmunity, recurrent infections, and hypogammaglobulinemia—exclusively in the affected family. They tested this “double-hit hypothesis;” they generated Neil3-deficient mice, which, like their human counterparts, displayed no overt signs of autoimmunity until faced with a second environmental challenge, suggesting that environmental effects can potentiate a genotype. Disruption of the genetic epistasis between NEIL3 and LRBA remains a very attractive hypothetical mechanism, contributing to differences in disease penetrance (97).
Elegant experimental evidence for epistasis in CVID was obtained with the discovery of a de novo TCF3 variant in a family already carrying a variant of the CVID-associated TNFRSF13B gene. The sick individual with both variants presented a severe CVID-like disorder and systemic lupus erythematosus. Family members with the TNFRSF13B variant only were asymptomatic or displayed only mild disease, and the son of the proband, who carried only the TCF3 variant, displayed a partial clinical phenotype (98). The effect of having two variants, disrupting epistasis, was documented by clinical scoring for disease severity and by in vitro studies documenting the biological phenotype. The effects of these genes converged on immunoglobulin class-switching pathways, resulting in severe disease.
Similar epistatic regulation was documented in ALPS patients with variants of both FAS and PRF1 (99) or FAS and CASP10 (100), patients with hyperimmunoglobulinemia D and periodic fever syndrome with MVK and TNFRSF1A variants (101), patients with broad susceptibility to infections associated with IFNAR1 and IFNGR2 variants (102), patients with X-linked immunodeficiency caused by XIAP variant and a CD40LG polymorphism (103), and pediatric patients with inflammatory bowel disease, in which a known NOD2 variant probably interacts with variants of GSDMB, ERAP2, or SEC16A (104).
In most of these cases, one of the two hits had previously been reported as the causal variant in isolation. This raises an obvious question, because if synergistic interactions between two or more genetic loci are required, how can one mutated locus be responsible? It is possible that these isolated cases due to TCF3 (105) or LRBA alone (75, 76, 106) corresponded to milder forms of disease or that these “isolated” cases actually involved a second, unknown gene. Alternatively, these combinatorial genetic defects may result in blended phenotypes due to overlapping clinical disease resulting from the co-occurrence of two independent monogenic defects. Recently unusual features of Williams–Beuren syndrome (WBS), including recurrent infections and skin abscesses in a child, were shown to be due to heterozygosity for a 0.53-Mb deletion on chromosome 7q11.23, corresponding to the known cause of WBS, together with a biallelic loss of NCF1, leading to AR chronic granulomatous disease (107). Blended phenotypes are common in clinical genetics (∼5% of rare disease diagnoses) (108, 109) and can affect IEIs (107, 110, 111). Whether through blended phenotypes or epistasis, digenic inheritance is increasingly recognized as a determinant of expressivity and penetrance.
What remains unknown and future avenues for research
Over the 5 years that have elapsed since our initial review, the numbers of epigenetic and combinatorial genetic hits, each probably surprising rare, have increased substantially. What remains largely unknown is the frequency with which common variants affect the incomplete penetrance of rare diseases. It also remains largely unknown how common the epigenetic control of WT vs. mutated allele transcription is in the incomplete penetrance of rare diseases. In epistasis, the modifier gene could plausibly be a common variant with a purely protective (or pathogenic) role acting in combination with a rare variant. This has already been shown for COPA, as described above, but is probably also the case in a few other monogenic forms of autoimmunity (e.g., APS1, IPEX, and CTLA4), in which relatively common autoimmunity-associated HLA alleles probably modify the risk of developing autoimmunity to specific autoantigens (59). It has also been suggested that X-linked variable immunodeficiency segregates with relatively common variants of CD40LG (103) and that susceptibility to familial Mediterranean fever is modified by interactions of MEFV variants with polymorphisms of SAA1 (112). Inspired by large studies in other genetic disciplines in which such occurrences are well documented, focusing on non-syndromic midline craniosynostosis caused by rare SMAD6 variants, for example, we should organize large registries of detailed, well-curated phenotypes (113). This would make it possible to document and, more importantly, to act on more subtle clinical signs and symptoms that are probably often missed. Studies of aggregate mutational burden in IEIs, in which the composite effects of many minor deleterious variants regulate disease risk, may indeed be as revealing as such studies have been for other types of rare diseases (114, 115, 116). Databases, such as UK Biobank, All of US, and BioMe, should be leveraged in addition to IEI registries, as the combination of these resources can tell us much about the degree to which clinical phenotypes are determined by particular genetic variants, their combined effects, and the situations in which protective alleles may have a particularly strong effect.
Clinical phenotypes form a spectrum with no clear distinguishing line between rare and common phenotypes, and the same is true for genetics. For example, rare variants leading to complete TYK2 deficiency result in monogenic susceptibility to Mycobacterium tuberculosis (TB) and Mendelian suspetibility to mycobacterial disease (MSMD) with relatively high penetrance (∼80%) (117, 118). Conversely, it was recently demonstrated that a common TYK2 variant (allele frequency of 4.2% in Europeans) confers a predisposition to TB ( odds ratio [OR] 89.3) and MSMD (OR 23.5) in homozygous individuals living in endemic regions. The estimated penetrance was ∼80% for TB and 0.05% for MSMD (119). This same allele was also shown to protect against autoimmune diseases (120, 121). Homozygous carriers of this allele are not yet considered to have an IEI, but these studies suggest that susceptibility to common infections can be caused by relatively frequent AR disorders in a proportion of patients and that this outcome comes with the upside of protection from autoimmunity. Increasing numbers of disorders on the borderline between rare and common or that between monogenic and polygenic are likely to be identified as sequencing databases expand.
Most of the genetic lesions discovered in IEIs were made by whole-exome sequencing (WES). This technique is limited to assessment of the coding part of the genome. As whole-genome sequencing (WGS) gradually becomes cheaper, we are poised to discover noncoding variants with strong effects on characterized disease-causing genes. To date, fewer than a few tens of IEIs have been shown to be associated with pathogenic variants in the noncoding genome—and most of these variants are located proximal to exons (122, 123, 124). Compound heterozygosity in which a coding sequence variant interacts with a noncoding cis regulatory variant to cause an IEI have been documented, and this remains an underexplored concept (123). Further analyses of whole-genome sequence databases should not only identify increasing numbers of causal variants in regulatory regions but should also reveal noncoding modifier alleles in cis and trans that modify the expression of just the WT or other known pathological variants, thereby modifying the transcriptotype and regulating penetrance. Finally, we hypothesize that there are also protective noncoding variants that can rescue aberrant biological features when present in cis or in trans. Given the complexity of these interactions, their discovery is likely to prove difficult, but not impossible.
Studies of copy number variants (CNVs) have fallen out of fashion with the replacement of single nucleotide polymophism (SNP) arrays by WES. Only with WGS advances allowing better CNV calls will this line of research return to the fore. A few global and site-specific CNVs have already been linked to IEIs (125, 126, 127, 128, 129). However, the impact of CNVs on penetrance remains unexplored and will require cutting-edge studies technically equipped to capture large structural variations.
We think that these mechanisms are probably only the tip of the iceberg, but they will nevertheless help us to unravel the truth about incomplete penetrance.
Principle III: Environmental exposures
What is known
Differences in environmental exposure are frequently highlighted as putative explanations for incomplete penetrance, albeit with limited evidence. The combined effects of the environment—often referred to as the “exposome”—constitute an area of active research extending well beyond the issue of penetrance. The exposome encompasses many factors relevant to the immune system, including infections, resident microbes, diet/metabolism, irradiation, air quality, injury, and sun exposure. Many of these environmental factors are sufficient to trigger a secondary immunodeficiency in previously healthy individuals (130). However, our knowledge of the effects of environmental modifiers on penetrance in IEIs remains very limited.
Environmental factors are most easily understood in the case of susceptibility to infection. Put simply, individuals harboring variants that confer susceptibility to specific pathogens do not present disease if they are never exposed to the pathogen concerned. This is most readily appreciated in individuals with variants linked to BCG disease who do not develop disease if they are not vaccinated with BCG (131). By definition, if a susceptible individual does not encounter the infectious agent to which they are susceptible, they cannot become sick.
One very nice example in which the environment (although it is difficult to prove causality) may have contributed to incomplete penetrance is deficiencies of TIRAP, a critical adapter in TLR-based sensing. Despite having a complete innate immune defect, only one in eight TIRAP-deficient homozygotes studied presented staphylococcal disease. In the other seven, acquired anti-lipoteichoic acid antibodies (LTA Abs) (staphylococcal LTA Abs) rescued TLR-dependent susceptibility to Staphylococcus (132). The idea that adaptive immune responses can compensate each other is well documented for invasive pneumococcal disease due to deficiencies of IRAK4 and MyD88. Immunity changes with age, and age is known to be a major determinant of disease. Penetrance is highest at the age of 10 years, but rates of invasive pneumococcal disease recurrence and mortality fall with age, presumably due to acquired antipneumococcal immunity (133). Paradoxically, this example suggests that the very environmental exposures thought to trigger clinical presentations can also be protective.
Infection may also worsen immune dysregulation after the acute infection phase. It is now generally accepted that pathogen infections are often the event triggering autoimmune and autoinflammatory disorders. This notion is illustrated well by familial hemophagocytic lymphohistiocytosis (HLH), a disease characterized by excessive macrophage and lymphocyte activity that used to be fatal. Individuals with disease-causing variants display this hallmark cellular dysfunction early in life, before the development of clinical disease (56). Furthermore, upper respiratory or gastrointestinal tract infections tend to occur at about the onset of HLH (134). This suggests that an infectious trigger may be required for the disease to occur. Variable disease presentations are, thus, a function of exposure to an infectious agent. Further detailed documentation of the type of infection and the exact time between infection and disease onset will probably unravel certain aspects of incomplete penetrance.
Other environmental factors, such as irradiation and chemotherapy, can also modulate penetrance in IEIs. Individuals with LIG4 variants, who have DNA repair defects leading to lymphocyte deficiencies and nonimmune features, are often healthy until treated with chemotherapy and radiotherapy. Asymptomatic carriers of LIG4 variants have therefore probably accumulated two few double-strand breaks to cross the threshold for the development of disease (135). Studies in animals are beginning to provide experimental documentation of such effects, as shown in Neil3-deficient mice (97). Another example is provided by Schimke immune-osseous dysplasia, in which reduced penetrance occurs and is not sufficiently accounted for by biallelic variants of SMARCAL1, a conserved chromatin regulator (136, 137, 138). Studies of Drosophila and murine models of SMARCAL1 deficiency have suggested that an additional environmental or genetic trigger is required for full disease development (139).
What remains unknown and future avenues for research
Immunization with live vaccines both provides answers and raises questions about the role of environmental exposures in variable penetrance. BCG vaccination is a particularly good example, as all individuals are inoculated with an identical pathogen at a very similar age, but we still observe incomplete penetrance, which is estimated at about 70% in individuals with IL12RB1 deficiency, suggesting that simple environmental differences alone cannot entirely account for this incomplete penetrance (32). Likewise, some IEI-specific pathogens are almost ubiquitous. This is the case for herpes simplex encephalitis (HSE), a sporadic disease with known monogenic etiologies (140, 141). Despite having almost complete cellular defects of TLR3-dependent IFN immunity, 4/6 TRIF-deficient, 2/3 UNC-93B-deficient, 3/8 TLR3-deficient, 3/4 IRF3-mutant, and 2/3 TBK1-hypomorphic individuals have been reported to have developed HSE (142, 143, 144, 145, 146, 147, 148, 149).
In these cases, incomplete penetrance may instead be a function of other factors, including age at exposure. In support of a major role for age in determining penetrance, HSE patients are mostly young and recurrence is rare (140). In this context, age is essentially simply a reflection of the time to first infection. What if the tonic type I IFN self-maintenance of neurons shown to be a hallmark of TLR3 deficiency was not constant but increases while oscillating in a sinusoid manner with development? If this were proven to be the case, it would provide an explanation for differences in penetrance at an early age, when HSE manifests if HSV-1 infection occurs at a upward or downward point in sinusoid type I IFN production, but not at the peak. This paradigm of increasing sinusoid type I IFN production would also help explain waning penetrance with age (143, 147, 150). This mechanism would be independent of the adaptive immune system, as HSE is not a phenotype of individuals born without adaptive immunity. In other instances of susceptibility to viruses and bacteria that are not neurotropic, prior exposure may, to a greater extent, effectively immunize the individual and regulate disease penetrance. We suggest that asymptomatic carriers of variants may have previously been exposed to noninfectious or exceedingly low doses of a pathogen, insufficient for productive infection but sufficient to induce adaptive immune responses capable of neutralizing future challenges that are truly infectious. A similar effect may occur in IL12RB1 deficiency, as the patients that develop BCG disease and those with environmental mycobacteriosis tend to form two mutually exclusive groups, suggesting that exposure to one pathogen may immunize against the other (79, 80). Similar mechanisms may operate in other susceptibilities to infection, but additional experimental evidence is required to demonstrate this.
Commensal organisms, such as the bacteria, fungi, and viruses, that naturally colonize our tissues, may be of the utmost importance. The microbiome is our most abundant source of exposure to microbes, and our symbiotic relationship to the microbiome is therefore of considerable importance. Early experiments in which the gut bacterial microbiome was eliminated with a cocktail of antibiotics ultimately resulted in higher levels of inflammation than in untreated animals, suggesting a true homeostatic function. In the last 2 decades, the microbiome has proved a major determinant of immune function and disease (151, 152). Despite these strong associations, the relevance of the microbiome in IEIs—the most extreme immune system diseases—remains unknown. Recent studies demonstrating changes to the bacterial microbiota in CVID and their correlation with immune activation and certain symptoms have begun to scratch the surface, but the direction of causality remains unclear (153, 154, 155). We suggest that the bacterial, viral, and fungal biota regulates the penetrance of IEIs by shaping relative innate and/or adaptive tolerance and reactivity. The divergence of the microbiome with geography and diet may underlie the variability of IEI phenotypes across populations with similar monogenic lesions. Detailed future studies are required to document the types and quantities of microbiota in IEI cohorts.
Our environment, which is continuing to change, is very different from that in which our ancestral immune system evolved. Six years ago, SARS-CoV2 was not present, and just a century ago, 30–50% of us would not have lived beyond early childhood, as death from infection was 200 times more frequent (156). It seems likely that a study on the genetics of infectious disease a century ago, with the tools of today, would have identified far more common alleles as casual. However, today, these genetic susceptibilities, which we refer to as common variants, are probably masked by the protective effects of good sanitation, vaccination, and antibiotic use. Perhaps we should consider all these potential susceptibility alleles in isolated systems (as we have mapped the key genes), as this would undoubtedly be informative and improve our understanding of incomplete penetrance. Conversely, the recent development of immunosuppressant use in transplantation, immunology, rheumatology, dermatology, and neurology may reveal new and old genetic susceptibilities with surprising frequencies and pathogen specificities.
Principle IV: The mosaicism of disease-causing alleles reduces clinical penetrance
What is known
But it is even more complicated than that. Up to this point in the discussion, we have assumed that all the cells in an affected individual carry the same variant. However, genetic differences between cells occur at a surprisingly high frequency within individuals. Genetic mosaicism originates from post-zygotic (de novo) variants that arise during the embryonic or postnatal period. The occurrence of such mosaicism in IEIs was initially thought to be rare, but it has since been found to be rather common. A recent systematic analysis across IEIs by targeted deep sequencing in 128 families estimated the rate of mosaicism at 23.4% (157). In the last 5 years alone, the number of mutated genes with mosaicism shown to cause IEIs has almost doubled, bringing the count to over 20 (158, 159). Interestingly, some of these variants cause disease in the mosaic state, as opposed to the mosaic and germline states, presumably due to a strong germline impact.
Disease onset and/or severity are variable in cases of mosaicism, as a direct consequence of gene dosage, the tissue affected, and time since somatic variant generation. Incomplete penetrance in mosaic IEIs was first documented in an extraordinary case of delayed-onset ADA deficiency, which is typically a severe form of SCID, during the 1980s and 1990s. ADA mosaicism was observed directly in peripheral blood cells and ADA-normal populations gradually came to predominate over time, with the resolution of clinical disease (160, 161, 162). After this discovery, several other documented cases of disease-associated mosaicism were reported (163, 164), some presenting as mild or atypical disease phenotypes, including variants of NLRP3 (165, 166), STAT3 (167,168), FAS (169, 170), CYBB (171), TNFAIP3 (172), TRAPS (173), IL6ST (174), TLR8 (175), and RAP1B (176). It should be noted that all of these cases predominantly concern disorders of immune hyperactivation rather than deficiency.
As the number of such cases has grown, the evidence for mosaicism and pseudogene dosage as a mechanism underlying reduced penetrance has also increased, with an apparently good correlation. In an analysis of 10 families in which one member carried a postzygotic IEI gene variant, 80% of the mosaic individuals were asymptomatic. The remaining mosaic individuals presented with only partial clinical disease, whereas their progeny with an inherited germline variant displayed full disease development (157). An evaluation of variant read frequencies in a family with PIK3CD variants revealed that affected siblings harbored more mutant cells than their mildly affected father, with allele fractions of 37–54% and 15%, respectively (16). Conversely, if mutated cells predominate in the relevant cell compartment, the clinical features of germline and somatic variant are more similar. ALPS patients harboring FAS variants in ∼100% of their DN T cells display complete disease development despite having undetectable levels of the variant in whole blood (169, 170).
In the realm of new somatic variants of genes causing IEIs that have never been described in the germline, the best example is probably that of UBA1 variants causing vacuoles, E1-ubiquitin-activating enzyme, X-linked, autoinflammatory, somatic (VEXAS) syndrome (177). The clinical signs of VEXAS syndrome overlap strongly with those of giant cell arteritis, relapsing polychondritis, systemic lupus erythematosus (SLE), and rheumatoid arthritis (RA). Since its discovery only 5 years ago, hundreds of patients have been identified, with extremely diverse clinical disease penetrance, despite the presence of exactly the same variants in most of these patients (178).
Somatic variants may act as modifiers and, thus, as “second hits” leading to the manifestation of clinical disease. ALPS patients have been documented to carry both an inherited heterozygous FAS variant and a somatic event in the second FAS allele, such as a missense variant, nonsense variant, or loss of heterozygosity (179, 180, 181). Alternatively, the second hit may occur at a different locus, as in a recent report of a somatic FAS variant occurring together with an existing CASP10 variant (182). Relatives who did not acquire a second variant post-zygotically remained asymptomatic or were only partially affected, suggesting an effect of second hit mosaicism on incomplete penetrance.
Conversely, the acquisition of a somatic variant can also rescue disease. Such events, often referred to as somatic reversions, underlie milder clinical disease or the absence of clinical disease. A good example is provided by the reversion of DOCK-8 deficiency, which is commonly, occurring in about half of all affected patients; this reversion is associated with longer survival and less severe allergic disease, although preliminary reports have suggested that infectious disease susceptibility remains the same (183). Full recovery from disease, including infectious phenotypes was reported in a more recent study (184). There are several other examples of reversions underlying incompletely penetrant clinical disease, for ADA (162, 185), XLA (186, 187), WASP (188, 189), leukocyte adhesion deficiency (190), X-linked immunodeficiency with ectodermal dysplasia due to variants in NEMO (191), Omenn syndrome with CARD11 deficiency (192), IKBKG-associated immunodeficiency (16), and GATA2 deficiency (193). Interestingly, these reversions may even involve second-site variants in the initially mutated gene creating altered non-WT, but still functional, gene products (194). Reversion may also occur via chromosomal and segmental chromosomal deletions, as shown for the SAMD9 and SAMD9L alleles, for which this reversion occurs via monosomy 7 (195). Reversions, thus, represent a common and complex component of incomplete penetrance.
Most mosaic IEIs appear to remain stable over time (157, 166). However, somatic reversions conferring a fitness advantage may enable the selective expansion of the reverted cell population to reestablish healthy immune cell populations. Documented reversions of variants of JAK3, an essential mediator of lymphocyte development, can repair immune cell proliferation and differentiation. In one family with JAK3 hypomorphic variants, the asymptomatic sibling displayed CD4+ T cell reversion, whereas a brother without this reversion suffered recurrent respiratory tract infections (196). One fascinating case was reported in a warts, hypogammaglobulinemia, infections, and myelokathexis (WHIM) syndrome patient cured by a process known as chromothripsis, or “chromosome shattering,” in which the chromosomes undergo massive deletion and rearrangement. Fortuitously, this event deleted the mutated CXCR4 allele in a single hematopoietic stem cell, which then took over the bone marrow and reconstituted immune function (197). By contrast, if there is no selective pressure due to treatment, as in enzyme replacement therapy in ADA deficiency with reversions or allogeneic stem cell therapy, the WT cells appear to lose their selective advantage and their proportions decline (188). This example raises the question as to how best to help patients to help themselves.
What remains unknown and future avenues for research
Mosaicism usually tempers the penetrance of its germline counterpart, but there are cases in which mosaic variants lead to an equally or even more severe disease (8, 157, 195, 198, 199). Of course, the same could be said of the first and only reports of patients with mosaic variants for which there is no germline counterpart, suggesting that germline defects may be lethal at the embryonic or perinatal stages (158, 168, 200, 201, 202, 203). The application of more recent technologies to larger cohorts and improvements in tissue sampling will be essential to address the remaining questions. Low-frequency somatic variants probably still escape most detection approaches. It is important to solve this problem as mosaic variants present even at a frequency of 0.5% of tissue can cause disease. This frequency is not simply a function of total mosaic fractions but also depends on the tissue analyzed. Many somatic variants are only detectable in specific immune cell types (158, 165, 168, 170, 178), and many more such cell type–specific variants than are currently known are likely to exist. Perhaps the most underexplored variants are extrahematopoietic variants, probably due to difficulties with tissue sampling.
As discussed above, going beyond the genotype, mosaicism can also exist at the transcript level across genetically identical cells in which one autosome is more transcriptionally active than the other due to RMAE (204, 205, 206, 207, 208, 209). This de facto transcriptional mosaicism can occur on top of genetic mosaicism, which we initially demonstrated in 2020 (210). Remarkably, up to 10% of the autosomal genome displays this phenomenon (204).
For these genes, allelic bias (whether a germline or somatic variant) is established in lineage differentiation via a unique chromatin signature, DNA methylation, and persists during subsequent cell divisions (85, 211, 212). Contrasting with the situation only 5 years ago, we are now increasingly able to understand the nature of this epigenetic phenomenon, which can occur on mosaic background as well. We are beginning to grasp the functional consequences, especially in light of genetic disease and penetrance. Computational predictions have suggested that there is an enrichment in monoallelic expression (MAE) among genes for which gain-of-function variants with AD inheritance have been linked to neuropsychiatric disease (213) Disease-related genes have been shown, experimentally, to undergo MAE (214) and, in 2020, the first gene variant with an allelic bias was documented in a mosaic patient with JAK1 variant (8). Earlier this year, we showed that MAE can account for disease penetrance in the members of families with JAK1, STAT1, CARD11, or PLCG2 variants (85). Beyond JAK1, it remains to be seen if MAE occurs in other mosaic patients.
Heterozygosity for variants of genes displaying MAE genes, thus, create a mixture of WT- and mutant-expressing cells with divergent phenotypes in affected individuals. We have now shown that, by creating this mosaic transcriptotype, MAE can modulate the functional impact of disease-causing variants in various ways and proportions (8, 85). MAE no longer a hypothesis can actually help explain phenotypic variation in genetic disease. In AD disease, mosaicism reduces the penetrance of disease phenotypes in patients. In AR disease, this phenomenon is predicted to occur in affected carriers but has not yet been experimentally demonstrated. We have shown that up to 4% of IEI genes can undergo MAE in healthy individuals. It remains unknown whether variant can itself drive MAE, but it may increase the proportion of genes capable of displaying MAE, perhaps to 30–50% of all IEI genes. It remains unclear whether MAE accounts for only a minority of cases with incomplete penetrance or whether the documented cases are just the tip of the iceberg, which on mosaic background will be very exciting to further document.
Conclusions
Not understanding penetrance in IEIs, and indeed in all genetic diseases, has hindered advances in human genetics. By documenting and classifying the cases of variable penetrance in IEIs, this review, like its predecessor (25), aims to shed light on the existing connections and the persistent gaps in our knowledge. It is clear that four major influences continually reduce penetrance— genetic variant quality, epigenetic and genetic modifiers, environmental influences, and mosaicism—whereas many aspects of these four principles, such as genomic compensation, protective variants, subinfectious inoculations, monoallelic expression, and peripheral tissue mosaics, remain unexplored (Fig. 1). We mostly discuss these principles separately here, but they do work in tandem and interact, as biology and medicine do not self-classify. We impose classifications to ensure clarity. The key breakthroughs in these domains do not come from single sources, but from the combined efforts of large cohorts, intense studies of single patients, model organisms, and even cell lines. It is important to keep an open mind, as many more natural laws remain to be discovered, and there are undoubtedly surprises hiding in plain sight. Furthering our understanding of penetrance will therefore continue to require both an open mind and rigorous studies.
Acknowledgments
I would like to thank Jean-Laurent Casanova, Conor Gruber, Luigi Notarangelo, Neil Romberg, Steve Holland, Michalis Lionakis, Ivona Aksentijevich, and Megan Cooper for helpful suggestions and discussions.
This research was supported by the National Institute of Allergy and Infectious Diseases Grants P01AI186771, R24AI167802, R01AI127372, and R01AI148963.
Author contributions: Dusan Bogunovic: conceptualization, funding acquisition, project administration, resources, software, visualization, and writing—original draft, review, and editing.
References
Author notes
Disclosures: D. Bogunovic reported being a founder of Lab11 Therapeutics Inc.
