Can genetic and clinical findings made in a single patient be considered sufficient to establish a causal relationship between genotype and phenotype? We report that up to 49 of the 232 monogenic etiologies (21%) of human primary immunodeficiencies (PIDs) were initially reported in single patients. The ability to incriminate single-gene inborn errors in immunodeficient patients results from the relative ease in validating the disease-causing role of the genotype by in-depth mechanistic studies demonstrating the structural and functional consequences of the mutations using blood samples. The candidate genotype can be causally connected to a clinical phenotype using cellular (leukocytes) or molecular (plasma) substrates. The recent advent of next generation sequencing (NGS), with whole exome and whole genome sequencing, induced pluripotent stem cell (iPSC) technology, and gene editing technologies—including in particular the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 technology—offer new and exciting possibilities for the genetic exploration of single patients not only in hematology and immunology but also in other fields. We propose three criteria for deciding if the clinical and experimental data suffice to establish a causal relationship based on only one case. The patient’s candidate genotype must not occur in individuals without the clinical phenotype. Experimental studies must indicate that the genetic variant impairs, destroys, or alters the expression or function of the gene product (or two genetic variants for compound heterozygosity). The causal relationship between the candidate genotype and the clinical phenotype must be confirmed via a relevant cellular phenotype, or by default via a relevant animal phenotype. When supported by satisfaction of rigorous criteria, the report of single patient–based discovery of Mendelian disorders should be encouraged, as it can provide the first step in the understanding of a group of human diseases, thereby revealing crucial pathways underlying physiological and pathological processes.
In the history of clinical genetics, the delineation of novel Mendelian phenotypes often started with the description of single cases, which prompted recognition of additional patients with the same condition, defining a clinical entity and suggesting a mode of inheritance (Speicher et al., 2010). In the last few decades, genome-wide linkage analysis and candidate gene approaches have enabled the molecular genetic dissection of over 4,000 single-gene inborn errors (http://www.omim.org/statistics/entry; Antonarakis and Beckmann, 2006). Most of these advances were based on the genetic study of multiplex families, or groups of unrelated sporadic cases, or both, and progress was often accelerated by the investigation of consanguineous families. Yet, for many patients with a well-defined clinical phenotype, no disease-causing mutations can be found in any of the known disease-associated genes. Further, at least 1,500 Mendelian conditions lack a defined genetic etiology. Purely sporadic conditions in non-consanguineous families may also be caused by familial single-gene defects (of incomplete penetrance) or by de novo mutations (of complete penetrance) causing disease by various mechanisms (dominant-negative effect, haploinsufficiency, gain of function). Finally, many patients have a distinctive, very unusual, and possibly Mendelian phenotype that has not been described in other patients. In some cases, the discovery of the causal gene in a single patient (defined as a single patient from a single kindred) can pave the way for its confirmation in other patients. Indeed, if a first patient is not reported because it is only a single patient, a second patient may also not be reported because it would be once again the “first” patient.
Single-patient reports are common in the fields of human genetics with a tradition of mechanistic and experimental studies rooted in the availability of blood samples, such as hematology and immunology, or inborn errors of metabolism. Some investigators in other areas of medical genetics, in statistical genetics, and in experimental biology, however, question the value of genetic studies in a single patient. This point of view is illustrated in recently published guidelines (MacArthur et al., 2014) that emphasize “the critical primacy of robust statistical genetic support for the implication of new genes.” Herein, we would like to emphasize the critical primacy of robust experimental support for the implication of new genes. In fulfillment of this statistical standard, the guidelines require that multiple confirmations of causality be obtained in multiple unrelated patients. Thus, multiple cases are required for the satisfaction of two of the three guidelines in “Assessment of evidence for candidate disease genes” and, in several of the other guidelines, statistical support from multiple cases is strongly encouraged (MacArthur et al., 2014). At first consideration, the logic of these recommendations is appropriate. Obviously, no reasonable person should object to the accumulation of more data to support an experimental finding. And we do not. We do agree that stringent criteria are required to avoid what statisticians call a type 1 error, the acceptance of a false hypothesis—in the current situation, the false attribution of causality based on single patients. However, insistence on too stringent accumulation of data may result in committing a type 2 error resulting in the rejection of a valid hypothesis.
We recognize that studies of single patients have limitations. First, when compared with studies of multiplex or multiple families, they do not benefit from the power of genetic homogeneity; in other words, several affected patients with the same clinical phenotype certainly provide added confidence that the altered gene is responsible for the phenotype. Second, a single affected patient does not permit one to draw firm conclusions if the candidate genotype does not display full clinical penetrance, even in the presence of a fully penetrant and relevant intermediate phenotype; in that case, the age and past medical history of the patient, as well as modifying genetic factors, may contribute to the phenotype. However, single-patient studies can be conclusive, provided there is rigorous selection of variations in silico followed by in-depth experimental validation in vitro via the dual characterization of the mutant alleles and a cellular or animal phenotype, which establishes a causal bridge between a candidate genotype and a clinical phenotype. With the notable exceptions of hematological and immunological patients (Orkin and Nathan, 2009; Ochs et al., 2014), and to a lesser extent patients with inborn errors of metabolism (Scriver et al., 2001), the description of novel gene defects in single individuals has rarely been reported. In the former patients, a blood sample is fortunately often sufficient to conduct in-depth mechanistic studies and discover relevant cellular phenotypes in erythrocytes, platelets, and any of the numerous leukocyte subsets.
In the recently published guidelines of MacArthur et al., (2014), the requirement of statistical support with an accumulation of cases is lacking in the 49/232 (21%) of monogenic primary immunodeficiencies (PIDs) first reported on the basis of a single case (Table 1). What then should be the criteria for a report based on a single patient? Based on our assessment of the 49 cases, reasonable requirements include (see Text box): in all cases, (1) population studies must indicate that the candidate genotype does not occur in healthy individuals and must have a frequency less than or equal to that predicted based on the frequency of the phenotype; (2) the genetic variants must impair, destroy, or alter the function of the gene product. In addition, for disorders that affect the function of a cell present in the patient: (3A) a patient-specific relevant cellular phenotype should be caused by the mutant allele (with its correction by complementation with the normal gene product and/or its replication by knockdown, knockout, or knock-in in relevant cells). Alternatively, for disorders that affect the development of a cell lacking in the patient: (3B) presentation of an animal model that recapitulates both the cellular and whole-organism phenotypes may replace the characterization of a relevant cellular phenotype. In either case, the third step is facilitated by the previous demonstration of genetic etiologies affecting the same physiological circuit. Together, these three steps establish a causal relationship between the candidate genotype and the clinical phenotype.
1. Family studies and population studies must indicate that the patient’s candidate genotype is monogenic and does not occur in individuals without the clinical phenotype (complete penetrance).
a. The clinical phenotype must be rare and distinctive and the candidate genotype must be monogenic.
b. Family studies must demonstrate that the candidate genotype of the patient (which includes alleles at both loci for autosomal genes or X-linked genes in females) is not shared by other family members. In other words, there must be complete clinical penetrance, with a Mendelian mode of inheritance (AR, XR, AD, or XD).
c. Population studies, including but not restricted to the same ethnic group, must indicate that the candidate genotype does not occur in healthy individuals tested, and that the frequencies of the candidate variants and genotype are not higher than that predicted by the frequency of the clinical phenotype.
d. If the variant leads to a premature stop codon (nonsense, frameshift, or essential splice variants), other variants giving rise to premature stop codons must not be more frequent in the general population than predicted by the frequency of the clinical phenotype.
2. In-depth experimental and mechanistic studies must indicate that the genetic variant destroys or markedly impairs or alters the expression or function of the gene product (or two genetic variants in the case of compound heterozygosity).
a. A variant in a protein-coding gene can be nonsynonymous (change the amino acid sequence) or, if synonymous, have a proven impact on mRNA structure or amount (e.g., create an abnormal splicing site). A variant in an RNA gene must affect its function (if its expression is detectable).
b. Studies should document whether the variant changes the amount or molecular weight of the gene transcript and of the encoded protein. Ideally, this should be done in control primary cells or iPSC-derived cell lines, and not only in control immortalized cell lines.
c. Computer programs that predict whether a missense variant is damaging are helpful but not conclusive. A variation that is not conservative and that occurs in a region or at a residue of the encoded protein that is highly conserved in evolution provides support for the hypothesis that the amino acid is functionally important.
d. The variants must be loss or gain of function for at least one biological activity. For variants that result in an amino acid substitution, insertion, or deletion, in vitro studies should document a functional change that reveals the mechanism by which the variant causes disease. For example, the protein may be unstable, it may not bind essential cofactors, or it may not localize appropriately.
3. The causal relationship between the candidate genotype and the clinical phenotype must be established via a relevant cellular or animal phenotype.
a. In all cases, the candidate gene should be known or shown to be normally expressed in cell types relevant to the disease process. These may be cells affected by the disease process, cells which produce factors needed by the affected cells or progenitors of the cell lineage affected by the disease. Some genes are broadly expressed but have a narrow clinical phenotype.
b. For disorders that affect the function of a cell (present in the patient), experimental studies in vitro must indicate that there is a cellular phenotype explained by the candidate genotype (see c). This cellular phenotype should reasonably account for the clinical phenotype because the cell type is known to be involved in the disease process and the clinical phenotype is consistent with it. For example, if the candidate gene can be connected to a known disease-causing gene via a common cellular phenotype (e.g., mutations in a second chain of a receptor), causality is thereby established between the genotype and the clinical phenotype.
c. The patient-specific cell type can include a convenient cell line (EBV-B cell, SV40 fibroblasts) but should also ideally include a more relevant leukocyte subset or a primary or iPSC-derived nonhematopoietic cell. This cellular phenotype must be rescued by a wild-type allele or for dominant-negative mutations by knockdown, knockout, or correction of the mutant allele. Negative dominance must be established by co-transfecting the mutant and wild-type alleles into cells deficient for the gene product. These experiments have become easier with new transfection approaches, siRNA and shRNA, and CRISPR/Cas9 editing. Alternatively or additionally, knockdown or knockout of the wild-type gene, or introduction of knock-in mutations in control cells, should reproduce the cellular phenotype.
d. For disorders that affect the development of a cell (lacking in the patient), a cellular phenotype is difficult to establish. The candidate gene can be connected to a known disease-causing gene via a common mechanism. Causality is thereby suggested between the genotype and the clinical phenotype. In this and other cases, when the candidate gene governs a novel circuit, an animal model in vivo must, however, indicate that there are causally related phenotypes that mimic the patient’s phenotypes (molecular, cellular, and clinical) and are explained by the candidate genotype. A biological phenotype underlying the patient’s clinical phenotype must be replicated in the mutant animal (e.g., IgA deficiency underlying a specific infection).
Addressing the significance and limitations of gene discovery in single patients is timely, as the NGS revolution, with whole exome sequencing (WES) and whole genome sequencing (WGS), is rapidly providing candidate variations in an increasing number of genetically undefined cases (Ng et al., 2009, 2010; Goldstein et al., 2013; Koboldt et al., 2013; Kircher et al., 2014). Although these methods may facilitate the recognition of the same genetic defects in unrelated patients, the number of single patients left without candidate genes shared by other patients will also grow. The NGS-based discovery of genetic disorders in single patients appears to have great promise in various fields of medicine, beyond hematology and immunology. Indeed, not only gradual improvements in techniques that permit transfection and knockdown of genes in primary cells and cell lines but also recent path-breaking approaches, such as iPSC (Takahashi et al., 2007; Takahashi and Yamanaka, 2013) and gene editing, especially with CRISPR/Cas9 (Marraffini and Sontheimer, 2010; Wiedenheft et al., 2012; Cong et al., 2013; Charpentier and Marraffini, 2014), offer new and almost unlimited possibilities to provide mechanistic insights into the significance of candidate mutations in the cell types that are the most relevant to the phenotype under study. What was common practice in hematology and immunology can now be widely applied.
Single-patient discoveries in the field of inborn errors of immunity
The field of PIDs illustrates the power of single-patient genetic studies. At first glance, up to 51 of 234 PIDs (22%) were first described in single patients (with 53 patients and papers because two disorders were each reported in two unrelated patients simultaneously; Table 1). In our view, however, two single-patient studies failed to convincingly establish causality between a germline genotype and a clinical phenotype (Table 1). A disease-causing mutation in NRAS that was initially reported to be germline was subsequently found to be somatic (Oliveira et al., 2007; Niemela et al., 2011). A mutation in UNC199 that was initially reported to be rare and disease-causing is in fact a common polymorphism that does not cause disease (Gorska and Alam, 2012). Most reports were published in journals that emphasize the importance of in-depth mechanistic studies; they were highly cited. There is a trend toward an increased number of such publications over the years, with two peaks in 1995–1999 and 2010–2013, the latter explained in part by the advent of WES (Fig. 1). Most of the reports of single patients described rare disorders caused by uncommon or private genetic variations, with two exceptions (MAPS2 and Ficolin 3 deficiency). In 44 of 49 conditions, the inheritance is AR, and in the remaining five it is autosomal dominant (AD). In the patients with AD disease, the mutation was proven to be de novo in the four patients whose parents could be tested (Ikaros, Rac2, IκBα, and TRAF3; Ambruso et al., 2000; Courtois et al., 2003; Pérez de Diego et al., 2010; Goldman et al., 2012). Parental consanguinity in patients with AR inheritance was high, with up to 18 of 46 kindreds (2 disorders where reported concurrently in 2 kindreds). Homozygous lesions in the absence of known consanguinity were found in 16 patients, with the remaining 12 patients being compound heterozygous. For 36 of 49 conditions, the subsequent description of other patients corroborated the initial discovery; most of the 13 exceptions were published recently (11 after 2010), suggesting that the findings may still be confirmed. In all cases, in-depth mechanistic studies in the patients’ myeloid or lymphoid cells proved the deleterious nature of the mutations found. In turn, the pathological nature of most alleles was documented on immunological grounds. For some genes, a connection with a known morbid gene was established by the characterization of a cellular phenotype. For most genes, the known role of the mouse ortholog provided evidence that the mutation was responsible for the clinical and immunological phenotype observed (Table 1). In these and other instances, the gene was often connected with known human pathways, and the morbid nature of the mutation was rarely provided on the sole grounds of genetic and experimental data.
What did we learn from these 49 PIDs?
The discovery of novel monogenic defects in single patients has often offered novel biological and pathological insights. For example, there was no previous evidence in mice or humans pointing to the role of RNF168 in DNA repair and immunity (Table 1). It is its defective biological function in human mutated cells that established both its morbid nature and physiological role, with RNF168 deficiency underlying RIDDLE syndrome (Stewart et al., 2009). For four components of the complement pathway, the human defect was documented before the development of the corresponding mutant mouse, and its disease-causing nature was established thanks to biochemical studies using human plasma (McAdam et al., 1988; Botto et al., 1990; Petry et al., 1995; Ault et al., 1997). In three instances, the mouse mutant has not been generated (Witzel-Schlömp et al., 1997; Inoue et al., 1998; Stengaard-Pedersen et al., 2003). In the case of APOL1 (Vanhollebeke et al., 2006) and ficolin 3 (Munthe-Fog et al., 2009), there is no mouse ortholog (Table 1). In two instances, Coronin and BLNK deficiency, the mouse and human defects were reported jointly (Minegishi et al., 1999b; Shiow et al., 2008). The other cases were no less interesting, as the associated phenotypes were often surprising when compared with the corresponding mutant mice, sometimes in terms of the impact of the mutation on the immune response and more often in terms of its impact on clinical phenotypes, particularly susceptibility to specific infections. For example, the role of OX40 in T cells had been characterized in mice but its role in human immunity against HHV-8 was only established by the demonstration of OX40 deficiency in a child with Kaposi sarcoma (Byun et al., 2013). Monogenic predisposition to other specific infectious diseases is increasingly documented (Alcaïs et al., 2010; Casanova and Abel, 2013). Hypomorphic (DNA-PK, CD3ε, Igβ, and Ikaros; Soudais et al., 1993; Dobbs et al., 2007; van der Burg et al., 2009; Goldman et al., 2012), gain-of-function (IκBα; Courtois et al., 2003), and/or heterozygous (TRAF3, Ikaros, Fas-ligand, Rac2, and IκBα; Wu et al., 1996; Ambruso et al., 2000; Courtois et al., 2003; Pérez de Diego et al., 2010; Goldman et al., 2012) human mutations also revealed phenotypes not seen in mice bearing two null alleles. Overall, an important added value of the genetic dissection of human PIDs, in single patients or multiple patients, besides its direct medical impact, is that it enables an analysis of immunology in natural as opposed to experimental conditions (Casanova and Abel, 2004, 2007; Quintana-Murci et al., 2007; Casanova et al., 2013).
Fully penetrant autosomal and X-linked recessive traits
What general lessons can we draw from the study of single patients with PIDs? First of all, what are the Mendelian modes of inheritance (in the sense of full penetrance) that are most appropriate for single-patient studies? Autosomal recessive (AR) inheritance is appropriate, especially in consanguineous families (Lander and Botstein, 1987), as one can benefit from linkage information and focus on homozygous mutations. The larger the number of healthy siblings, the easier it is to select candidate variations. The sequencing data can be filtered quickly to identify these mutations. Information increases with the level of the consanguinity loop—i.e., the more distant the parental relationship, the more accurate the linkage mapping. The large fraction of homozygosity in the patient (e.g., expected value of 6.25% if born to first cousins) increases the background noise of homozygosity for other, non–disease-causing, genomic variants. Other caveats should be kept in mind. Consanguineous families are not protected from X-linked diseases, compound heterozygous alterations, or de novo mutations. Moreover, if the family is highly consanguineous, the patient may be homozygous for uncommon, albeit non-rare, genetic variants that modify the phenotype. Conversely, non-consanguineous families reduce the background but leave the mode of inheritance uncertain. Homozygosity in a patient born to parents not known to be related is suggestive of AR inheritance due to cryptic consanguinity. Compound heterozygosity is even more suggestive of AR inheritance, as it is rare and deserves special attention. In that regard, the trio design, NGS analysis of the patient and both parents supports the search for compound heterozygous mutations. Finally, X-linked recessive (XR) inheritance is possible with a focus on hemizygous mutations in males. In the latter two cases, the existence of a de novo mutation provides further evidence given the relatively small number of coding de novo mutations per genome (Sanders et al., 2012; Veltman and Brunner, 2012; Ku et al., 2013). Obviously, incomplete penetrance and the impact of modifiers clearly cannot be studied in single patients (Cooper et al., 2013).
De novo mutations underlying dominant or XR traits
The trio design enables one to detect de novo mutations, which are more likely to be disease-causing and certainly easier to incriminate in single patients than inherited variations. A de novo mutation provides strong but not conclusive evidence that the variant is related to the clinical phenotype. Highly deleterious, distinctive phenotypes that occur in outbred populations may be due to heterozygous or hemizygous de novo mutations (Boisson et al., 2013). In such instances, the clinical penetrance is typically complete. It is estimated that 50–100 new sequence variants can be found in the genome of every individual. However, most of these genome variants have no functional consequences because they do not change the amino acid sequence of coding regions or they occur in noncoding or nonregulatory regions. On average, only 1 or 2 de novo mutations can be found in each exome (Sanders et al., 2012; Veltman and Brunner, 2012; Ku et al., 2013). These new mutations may cause a phenotype that is so severe that it is rarely passed on to the next generation. Variants that are found in the patient but not the parents are excellent candidate disease-causing mutations, especially if they are in a plausible gene (see section below). The de novo mutation alone may underlie an AD or XR disorder (or as discussed above an AR disorder if coupled with an inherited mutation on the other allele). It is important to test multiple cell types, hematopoietic and nonhematopoietic, as the apparently de novo mutation detected may not be germline but somatic, in which case it may remain disease-causing, albeit not underlying a monogenic trait, as shown for NRAS (Table 1). The search for de novo mutations alone justifies the strategy of sequencing trios, which is quite powerful and fits well with single patient investigation (Veltman and Brunner, 2012).
Focus on rare phenotypes and rare genotypes
The genotype and allele frequencies are important factors to consider when selecting candidate variants (Kircher et al., 2014). It is not too difficult to select candidate disease-causing genetic lesions in a single patient because the most common reason for a study to be restricted to a single patient is precisely that the disease is extremely rare. A rare phenotype is likely to be due to heterozygosity or homozygosity for a very rare or private allele. In current practice, one can search for the variant in multiple online databases. Advances in NGS have made it possible to collect information on the frequency of genetic variations in a much larger number of individuals of various ethnicities. Public databases of variants (dbSNP, HapMap, 1000 Genomes, and NHLBI “Grand Opportunity” Exome Sequencing Project [GO-ESP, https://esp.gs.washington.edu/]; Altshuler et al., 2010; Abecasis et al., 2012; NCBI Resource Coordinators, 2014) and disease-causing variants (HGMD; Stenson et al., 2014) typically include data on the genetic variability of between 10,000 and 100,000 individuals. This is valuable but occasionally not sufficient. The patient may be from an ethnic group that is under-represented in available databases. Further, some DNA sequences were not well enough covered using older approaches to catalog variants. An in-house database that includes data from over 500 ethnically matched DNA samples analyzed using the same technology is valuable. Moreover, private databases of neutral datasets are being developed at institutions that are taking advantage of NGS to search for disease-causing gene variants. With very rare phenotypes, it is helpful to select private variations in single patients. However, except in the specific context of de novo mutations, there are still a substantial number of private (or very rare) variants in any exome data. One should consider the genotype frequency, which is more relevant than the allele frequency. A very rare AR condition may be caused by a mutant allele that is rare, but not private to the family studied, and found in control heterozygotes. The next step is to prioritize these variants by further in silico studies to select the most plausible one.
Searching for plausible genes
Information about the mutated genes is also crucial in selecting variations and filtering out others. There are at least three ways by which a mutated gene can be selected as a candidate disease-causing gene in a single patient. First, the gene may encode a protein that belongs to a pathway already implicated in patients with the same phenotype. The relationship between these genes may be distant and indirect, and tools such as the human gene connectome can be helpful in revealing their connectivity (Itan et al., 2013). In other words, there can be physiological homogeneity behind genetic heterogeneity. Second, there is increasingly detailed information in various databases regarding the expression of human genes in a wide array of human cell lines, cell types, tissues, and organs (GTEx Consortium, 2013; Rung and Brazma, 2013). Transcripts for the gene of interest are likely to be expressed in the tissue affected by the phenotype or tissues that are known or could reasonably be expected to influence the phenotype. However, mutations in broadly expressed genes occasionally result in a phenotype that is highly tissue specific (Boisson et al., 2013). Third, genes crippled with deleterious mutations in the general population are unlikely to be causative of any rare phenotype with complete penetrance, as assessed for example with the gene damage index (unpublished data). Knowing the degree of purifying selection operating on the genes carrying variations in the patient under study is also helpful (Barreiro and Quintana-Murci, 2010; Quintana-Murci and Clark, 2013). The extent of purifying selection is now known for most human genes (Barreiro and Quintana-Murci, 2010; Quintana-Murci and Clark, 2013). Mutations in genes under tight purifying selection represent more likely culprits for rare diseases, especially for diseases that are life-threatening in childhood, and more so for AD than AR modes of inheritance (X-linked recessive being probably in between). To further incriminate haploinsufficiency for AD and XD traits that are life-threatening before reproductive age, the gene must be under purifying selection.
Searching for plausible mutations
The predicted impact of the mutation itself is also important, as some lesions are predicted to be more damaging than others (Kircher et al., 2014). As of now, protein-coding gene exonic mutations are more easily implicated than regulatory mutations or mutations in RNA-coding genes or in intergenic regions. Upcoming progress may facilitate the study of such mutations in single patients in the future. In protein-coding genes, UTR and synonymous variations are difficult to incriminate in single patients, even though they can interfere with splicing and other regulatory processes. In contrast, nonsense mutations affecting canonical splice site residues, in-frame and out-of-frame insertions and deletions, and mutations of the stop codon (stop loss) are most likely to be deleterious, although some can be hypomorphic, for example due to reinitiation of translation. The impact of missense mutations, which form more than 90% of rare and common nonsynonymous variations (and a smaller proportion of pathogenic mutations; Tennessen et al., 2012), is less easily predictable and has therefore received much attention. Some missense mutations are intrinsically more disruptive than others. Moreover, a missense mutation at a residue or in a domain that is highly conserved throughout evolution is probably damaging. Several software programs, such as Sorting Intolerant From Tolerant (SIFT; Kumar et al., 2009) and PolyPhen2 (Adzhubei et al., 2010), have been developed to predict the pathogenicity of missense mutations based on a combination of biochemical and evolutionary data (Li et al., 2013). Although mutations predicted to be loss-of-expression and/or loss-of-function are more readily convincing, amino acid substitutions may be more interesting because they can provide insights into function of a particular domain or the effects of decreased but not absent protein function, or even into the impact of a gain of function. Overall, the predicted impact of any given mutation influences its selection as a candidate lesion to be functionally investigated in single patients (Kircher et al., 2014).
Biochemically deleterious alleles support a causal relationship
In the PID field, the ability to experimentally validate the mutation in leukocytes can document its causal role based on a single patient observation. The same principle can now be applied to other fields, with equally high expectations. The report of a mutation that fulfills the in silico criteria listed above is likely to be disease-causing but must be confirmed by in-depth, functional, and mechanistic studies. The first step consists of testing the impact of the mutation on the expression of the gene product. This is easily done with protein-coding genes, even if there is no antibody to the protein of interest, using N- or C-terminal tags and a variety of easily transfectable recipient cells, which may not necessarily be relevant to the clinical phenotype. A biochemically deleterious allele, in terms of protein expression, is often the first experimental evidence that the variation is disease-causing. The subcellular trafficking, distribution, or location of the mutant proteins is often informative, although overexpression studies in irrelevant cell lines may be misleading. Expression studies can be performed regardless of the nature of the gene and even in the absence of patient’s primary cells or cell lines. In contrast, functional studies by gene transfer can be more difficult to conduct, as they require at least some knowledge regarding the function of the mutated gene. This biochemical step, which does not necessitate any cells from the patient, is important for the validation of candidate variations. It is currently more difficult to study the expression of RNA genes by gene transfer; this is a poorly investigated topic due to the rarity of such pathological lesions to date (Batista and Chang, 2013), which certainly deserves more effort in the future. Biochemical studies of the candidate mutant allele should ideally be compared with other mutants, rare or common, found in individuals without the phenotype under study and serving as negative controls.
Relevant cellular or animal phenotypes support a causal relationship
Relevant cells from the patient should demonstrate a functional abnormality that is caused by the mutant allele or alleles and that can explain the clinical phenotype. For loss-of-function mutations, introduction of a wild-type copy of the gene of interest should correct the cellular phenotype, unless the mutation is dominant-negative. Similarly, validation of the disease-causing role of gain-of-function or dominant-negative mutations may be obtained by introducing a mutated copy of the gene into wild-type cells. If the mutant allele is dominant by haploinsufficiency, it can be rescued by overexpression of the wild-type and a knockdown of the wild-type allele in control cells can also be informative. However, these approaches, albeit valid, suffer from the limitation that it is hard to maintain endogenous regulation and full control of the level of expression of the transfected/transduced gene. Recently, the CRISPR/Cas9 technology has opened new perspectives, as it allows the knock-in of the appropriate mutation in one or both alleles of the gene in control cells to mimic the abnormal phenotype, or to correct the mutant alleles in the patient’s cells (Marraffini and Sontheimer, 2010; Wiedenheft et al., 2012; Cong et al., 2013; Charpentier and Marraffini, 2014). This approach adds much confidence to the proposed causative relationship. These mechanistic studies have traditionally been easier with a blood sample or EBV-B cell lines, which accounts for the prominent role of hematology and immunology in the development of human molecular genetics (Speicher et al., 2010). Dermal fibroblasts have been used in various fields. More recently, the iPSC technology (Takahashi et al., 2007; Takahashi and Yamanaka, 2013), which enables the mechanistic study of many cell types that can be affected by disease, has opened up new perspectives in various fields. It has become possible to study patient-specific disease phenotypes in iPSC-derived neurons (Ming et al., 2011), hepatocytes (Schwartz et al., 2014), cardiomyocytes (Josowitz et al., 2011), or respiratory epithelial cells (Huang et al., 2014). This approach even enabled the study of nonhematopoietic PIDs, with the demonstration of impaired intrinsic immunity and enhanced HSV-1 growth in patient-specific, iPSC-derived TLR3-deficient neurons and oligodendrocytes (Lafaille et al., 2012). Along with biochemical studies, investigation of the specific functional consequences of a given mutation in the appropriate cell type (and correction thereof) represents the most important step in the experimental validation of candidate variations. In turn, the cellular phenotype can be causally related to the clinical phenotype in at least two ways. It can be shared by patients bearing mutations in a known disease-causing gene (e.g., mutations in another chain of the same receptor or molecular complex, or in another molecule along the same signaling pathway). Alternatively, it may be novel yet provide a plausible molecular and cellular mechanism of disease. If there is no relevant cellular phenotype, animal models can also validate the disease-causing effect of a genotype, if they recapitulate the human whole-organism phenotypes. The animal models can also serve to connect a cellular and a whole-organism phenotype. Overall, with the stringent in silico and in vitro criteria defined above, we argue that single-patient studies can be illuminating for the study of rare, Mendelian disorders in hematology, immunology, and beyond.
We have highlighted some criteria that facilitate the genetic study of single patients, particularly in families with fully penetrant AR traits and de novo mutations. In this paper, the concept was illustrated by a review of discoveries in the field of PIDs. The study of patients with unique conditions is important, both for purely clinical reasons and for increasing our understanding of physiology. Patients with a unique set of unusual findings need a genetic diagnosis. They cannot be ignored (Mnookin, 2014). Publication of these cases is also an efficient way to find and help other patients with the same phenotype and the same disease-causing gene. Discovering and reporting single-gene disorders in single patients is also important for biological reasons that go beyond the specific patient. It is not often recognized that murine knockout models are genetically more questionable than single patient studies. The murine phenotype is typically associated with and tested in a single genetic background under defined (and for this reason also unrepresentative and potentially misleading) experimental conditions (Andrews et al., 2013). The animals are therefore more likely to be homozygous for modifying genetic factors. Single patients are not 100% homozygous, making their phenotype more robust. Only 6.25% of the genome of patients born to first-cousin parents is homozygous. Interpretation of data from knockout mice is further complicated by the fact that the mice are reared in an environment that is tightly controlled. The patients’ phenotype occurs in natura, as opposed to experimental conditions (Casanova and Abel, 2004). The phenotype of patients with bi-allelic null mutations therefore often differs in informative ways from mice with null mutations in the same gene. Moreover, genetic variations that are found in patients are also more diverse and thus more illuminating than mutations created experimentally in the laboratory, although ENU mutagenesis also generates hypomorphic and hypermorphic variants (Andrews et al., 2013).
As shown with PIDs, it may be weeks, months, years, or even decades to find a second patient with the same phenotype caused by a mutation in the same gene. Of course, both international collaborations and the publication of a first patient may hasten this process. What is more challenging, rare, and informative is paradoxically a patient with a different phenotype, caused by the same mutation at the disease locus. An example can be found in the realm of infectious diseases because the patient must be exposed to the organism to be at risk. Ascertainment bias, i.e., the risk of diagnosing a given genotype only in patients sharing the phenotype of the first case, is probably a more serious problem than the (obviously nonexclusive) uncertainty of causal relationship between a genotype and a clinical phenotype in single patients. In studying single patients, one should keep in mind that there are no truly Mendelian disorders because humans, like other species, are not single-gene organisms and because environment affects the phenotype. One should consider the study of single patients in a Darwinian perspective, with Mayr’s population thinking, as opposed to essentialism (Mayr, 1988; Mayr, 1991) and consistent with Garrod’s concept of chemical individuality of man (Garrod, 1931; Bearn, 1993). One should attempt to establish a causal relationship between a genotype and a phenotype in a unique individual, being aware that the same genotype may cause another phenotype in another patient and that the same phenotype in another patient may be caused by another genotype. The three causal relationships would all be correct. In the vast majority of cases, however, replication in other patients has been observed, and single-patient discoveries, especially in the fields of hematology and immunology, in PIDs in particular, have stood the test of time (Table 1).
Genetic heterogeneity in human populations is such that some newly discovered single-gene inborn errors may be reported in only a single patient for years or even decades. The dissection of sporadic but genetically homogenous traits is difficult, but that of sporadic heterogeneous traits is even more difficult. Needless to say, the search for lesions found in multiple patients and multiple kindreds does greatly facilitate the genetic analysis of human conditions. However, the extreme diversity of human genetic variation is not only reflected in genetic heterogeneity of well-defined and homogeneous phenotypes. In fact, each patient is unique because of the heterogeneity of genetic variants and unique environmental history. Again, the concepts of population thinking and chemical individuality are essential when reflecting on this question (Garrod, 1931; Mayr, 1988, 1991; Bearn, 1993). Because of variability in clinical presentations, there are only patients; there are no diseases. Or, there are as many diseases as patients. Illnesses are designated by specific terms by default. NGS is pushing medicine further in that direction—we increasingly discover unique conditions and, decreasingly, universal diseases. Single patients and their families deserve this attention to unmet needs (Mnookin, 2014), and the construction of the biomedical edifice at large also deserves this effort. A single patient is a fragile bridge between two worlds, between basic scientists and practitioners. It would be unfortunate if interpretation of the guidelines proposed by MacArthur et al. (2014) to include insistence on multiple cases with similar defects results in inhibition of reports of novel monogenic inborn errors documented with only a single case. If none of the single cases are published, the only way a new entity would be reported is if at least two cases came to the attention of investigators at the same time. Consistent with a long tradition in hematology and immunology, in-depth mechanistic studies of appropriate cellular phenotypes in a single patient in relevant cell types can provide a causal bridge between a genotype and a clinical phenotype.
We thank Dusan Bogunovic, Alexandre Bolze, Jacinta Bustamante, David Cooper, Bruce Gelb, Joe Gleeson, Janet Markle, Robert Nussbaum, and Agata Smogorzewska for helpful suggestions, Bertrand Boisson and Capucine Picard for drawing our attention to the UNC199 and NRAS mutations, respectively, and Stéphanie Boisson-Dupuis for drawing the figure.