Systems cell biology melds high-throughput experimentation with quantitative analysis and modeling to understand many critical processes that contribute to cellular organization and dynamics. Recently, there have been several advances in technology and in the application of modeling approaches that enable the exploration of the dynamic properties of cells. Merging technology and computation offers an opportunity to objectively address unsolved cellular mechanisms, and has revealed emergent properties and helped to gain a more comprehensive and fundamental understanding of cell biology.
Systems cell biology: What it is and what it is not
Systems cell biology is the study of the emergent properties of a cell and its component parts using comprehensive and quantitative experimental methods that are interpreted by predictive mathematical and statistical models. Emergent properties result from “the whole being greater than the sum of its parts.” The progression to studying the cell as a system is a natural one for cell biologists who have always sought to meld the biochemical processes of molecules and modules with the spatial and structural features of cells (Alberts, 1998; Hartwell et al., 1999). Thus, understanding cell biology is inherently a multiscale problem, with many levels and hierarchies of cellular organization, compartmentalization, and temporal regulation (Fig. 1). Emergent properties within a cell derive from the interplay of system components arranged in complex motifs such as logic gates, feedback and feed-forward loops, and combinations thereof (Alon, 2007; Tyson and Novák, 2010). This complex interplay leads to behaviors that include switch-like functionality, filtering, signal amplification, oscillations, and multistability. This gives rise to systems-level properties of cells including robustness, hysteresis, modularity, and population heterogeneity. The goal of systems cell biology is therefore to achieve more than a description of the individual components and component properties. It is to achieve an understanding of how information is transmitted and interpreted by the cell. Systems cell biology is also more than simply the acquisition of large amounts of data, or the assembly and visualization of that data into networks, heat maps, and diagrams. It is also not an unbiased replacement for intuition, as many cellular processes can be intuitively explored from a systems perspective. A prime example is the eukaryotic cell cycle, where a rich history of applying nonlinear dynamical systems models that rely, in part, on an intuition about the interactions of key cell cycle regulators has dramatically advanced our understanding of this process (see Ferrell et al., 2011 and the references within).
Systems biology: A glove for every hand?
Systems biology is broadly defined as a framework for conducting quantitative and comprehensive scientific enquiry. This framework facilitates a rigorous analysis of the complexity of biological systems at all levels of cellular organization that contribute to a behavior or phenotype of interest (Kitano, 2002). However, this common definition is rather vague, and this has encouraged skepticism with regard to the ability of systems biology research to achieve the lofty goal of understanding complex biology (Brenner, 2010).
Irrespective of the focus of a study, a systems biology approach often includes several common elements: exploratory data acquisition and visualization, data integration and the formulation of quantitative models, and the testing of these models, along with the hypotheses they generate, with further experimentation (Ideker et al., 2001; Aitchison and Galitski, 2003). These results can then be used to guide iterative cycles of the systems approach that serve to refine the model in question (Fig. 1). Another way to consider this is that systems biology enables the identification of the many ways information can flow and be processed within a biological system (Ideker et al., 2001; Nurse, 2008). To function in this capacity, systems biology requires systems-level data collection.
The omics of systems biology: Exploratory data acquisition and visualization
Systems biology is commonly associated with large-scale “-omics” technologies such as genomics, proteomics, and functional genetics that are used to explore the state of a system under investigation (Short, 2009). However, it is a misapprehension to think that systems biology is only the acquisition of such large-scale datasets. The inclusion of an omic discovery component to the analysis of biological complexity assists in identifying situations where a phenotype is caused by an emergent or unanticipated property of the system (Aitchison and Galitski, 2003). This does not imply that emergent properties are naturally revealed through omics approaches, but rather that through the acquisition of a comprehensive and quantitative dataset such properties can be revealed through mathematical modeling and computational analysis. To be of practical use in discovering these essential components, omics technologies must be quantitative and amenable to high throughput approaches, comprehensible visualization, and statistical approaches. Even when properly executed, experiments often fall short of this goal, and, ideally, computational interpretation of the results can assist in identifying missing variables and influences or measurements that would be more informative. Moreover, computational analyses and modeling strategies can assist in revealing the underlying mechanisms of the system (Fig. 1). From this vantage point, proximal causes and effects can be separated from distal ones and be analyzed further with a cycle of modeling and experimentation.
Next-generation sequencing (NGS): Understanding genetic determinants in cellular systems
For the correct implementation of a great deal of other systems approaches, such as proteomics and functional genetics, a reference genome is an essential starting point and necessary tool for their implementation. NGS, also known as “deep-sequencing,” is helping to redefine our understanding of chromatin structure and organization (Yen et al., 2013), as well as the regulation of transcription (Rhee and Pugh, 2011, 2012) and translation (Ingolia et al., 2009, 2011; Guttman et al., 2013). Although we will avoid giving a thorough overview of the technology and various NGS platforms (which can be found in Koboldt et al., 2013; Mardis, 2013), we highlight recent discoveries that illustrate the need to view the cell from a systems perspective.
A current trend from the rapid rise in genomic sequencing is the inclusion of phylogenetic and comparative genomic analyses in considering mechanistic models of cell biology (Liti et al., 2009; Finnigan et al., 2012; Mast et al., 2014). Addressing the challenge of assigning cell components to a particular function is thus partly alleviated by considering the idiosyncratic origins of the components of a system. Evolutionary analyses of cell biology on a systems scale have benefitted from increased taxon sampling and are enabling tests of the implicit assumptions of molecular cell biology as well as the study of cellular phenomena in model systems. These evolutionary comparisons on a systems level are helping to place the findings from one cellular system within the context of all cellular systems (Elias et al., 2012; Koonin and Mulkidjanian, 2013). Furthermore, it enables the exploration of the origins of cellular complexity and helps restrict the search space of causal mechanisms to those that are congruent with evolutionary theory (Koonin, 2011; Doolittle, 2012; Koumandou et al., 2013). See the JCB review series on evolution (http://jcb.rupress.org/cgi/collection/7).
NGS has also resulted in an increase in the number of individual genomes from a single species (Liti et al., 2009; 1000 Genomes Project Consortium et al., 2010; Koboldt et al., 2013). Phenotypic analysis of related strains of budding yeast showed remarkable differences in response to a variety of stimuli including acclimation to temperature and tolerance to drugs (Liti et al., 2009). From a medical perspective, the intraorganismal comparisons of genome-wide association studies (GWAS) are identifying allelic heterogeneity that has important implications for organismal development and disease diagnosis, progression, and prognosis (Welch et al., 2012). For example, polymorphisms in the FOXO3 locus, a member of the forkhead family of transcription factors with roles in diverse cellular processes (Litvak et al., 2012; Eijkelenboom and Burgering, 2013), are prognostic for the outcome of patients diagnosed with Crohn’s disease (Lee et al., 2013). Importantly, the polymorphisms in FOXO3 are not diagnostic for the disease, and susceptibility is therefore contingent on other factors. In addition to Crohn’s disease, FOXO3 may also affect the severity of prognosis for other autoimmune diseases including rheumatoid arthritis. These results highlight the importance of allelic diversity to cellular function, an underexplored topic that will only be understood from the context of a systems perspective.
The role of NGS in providing high-resolution data on the transcriptome of cells also has mechanistic relevance for cell biology. Deep sequencing of RNA (RNaseq) provides absolute transcription levels of both annotated and unannotated regions of the genome. The result of its application has revealed a wealth of unanticipated complexity in transcript heterogeneity, including novel splice variants, alternative start and stop sites, the lengths of 5′ and 3′ untranslated regions, and the dynamic expression of bicistronic transcripts (Pelechano et al., 2013; Gupta et al., 2014; Pelechano et al., 2014). Transcription of the genome is also much more pervasive and ubiquitous than previously thought (Djebali et al., 2012). Use of NGS technology in combination with novel processing steps is just beginning to vastly redefine our understanding of transcriptional regulation and complexity (Mudge et al., 2013). For example, a recent survey of the yeast transcriptome identified 1.88 million unique mRNA transcript reads (Pelechano et al., 2013). From an organism originally characterized as having 5,885 genes (Goffeau et al., 1996), this is a staggering amount of diversity at the mRNA level. To avoid the biases of isoform analysis that result from enrichment strategies to sequence only the 5′ or 3′ end of individual mature mRNA molecules, a novel intramolecular ligation step after mRNA isolation allowed joint sequencing of both ends of a single mRNA isoform (Pelechano et al., 2013). Consistent with a functional relevance for at least some of this diversity, rather than a result of stochastic transcription initiation or termination, isoform variation was demonstrated to be responsive to changes in growth conditions. These new layers of transcriptional complexity will be of use in refining our understanding of the regulation and plasticity of a cell’s transcriptional response to environmental perturbations.
The ability to precisely map protein–nucleic acid interactions in a quantitative way is perhaps the most demonstrative example of the advancement and refinement of NGS technology. Protein–DNA or protein–RNA purification strategies in combination with exonuclease treatment before deep sequencing of the protected fragments provides the ability to map such interactions at a genome-wide scale to within single nucleotide resolution (Ingolia et al., 2009; Rhee and Pugh, 2011). These high-resolution genome-wide studies enable the comprehensive study and direct visualization of chromatin remodeling dynamics (Yen et al., 2013), identification of transcription factor binding sites (Rhee and Pugh, 2011), assembly of RNA polymerase pre-initiation complexes (Rhee and Pugh, 2012), and the profiling of ribosome occupancy of mRNA (Ingolia et al., 2009; Ingolia et al., 2011; Guttman et al., 2013). When visualized globally, the noisy signals from individual genes are smoothed and universal mechanisms are revealed. Aligning DNA sequences bound to RNA polymerase II pre-initiation complexes and viewing them at a genome scale has provided a unifying view of many regulatory mechanisms governing transcription. One striking revelation was the presence of degenerate TATA-like elements at previously characterized “TATA-less” promoters in yeast (Rhee and Pugh, 2012). Assembling genome-wide maps for an ensemble of RNA polymerase II–associated general transcription factors has also revealed consequences for deviations from the TATA consensus sequence, including increased reliance on nucleosome positioning for proper assembly (Rhee and Pugh, 2012). The fate of both coding and noncoding RNA has been mapped by immunoprecipitations of mRNA-binding proteins (Tuck and Tollervey, 2013). Sorting the ribonucleoprotein complexes using clustering approaches allowed for the identification and classification of several mRNP subclasses with implications for the importance of 3′ processing events in biogenesis, localization, and turnover (Tuck and Tollervey, 2013).
From genomics to proteomics
Despite the advancements in NGS, gene expression and mRNA levels are not very good proxies for protein levels or function in cells. Regulatory mechanisms exist at each stage of a protein’s life cycle: synthesis, folding, targeting, integration into distinct compartments and complexes, activity, stability, and degradation (Vogel and Marcotte, 2012). Measuring the half-life of proteins on a global scale has revealed complexity in protein turnover in a cell type–dependent manner (Claydon and Beynon, 2012). The constituents of protein complexes measured typically have similar turnover rates, although there are exceptions (for examples see Price et al., 2010). In addition, translation and proteolysis not only regulate the synthesis and degradation of proteins, but also serve to buffer intracellular amino acids levels, and must therefore receive regulatory inputs from several sources (Vogel and Marcotte, 2012). Profiling the association of ribosomes with mRNA provides one measure of the rate of protein synthesis (Ingolia et al., 2011). This technique was recently complemented with a proteomic analysis of protein longevity using isotope pulse labeling combined with shotgun tandem mass spectrometry (MS) to measure both the translation of new protein and the longevity of old protein in rat liver and brain cells (Toyama et al., 2013). In addition to discovering 37 long-lived proteins, the combined approach of ribosome profiling and semiquantitative MS revealed that despite the longevity of these proteins, all were pervasively translated. In several cases, discrepancies in the longevity of members of histone and nuclear pore complexes suggest mechanisms regulating the turnover and assembly of these complexes (Toyama et al., 2013).
Information on the pathways and function of a protein is often derived from knowledge of the proteins with which it interacts. Protein–protein interactions (PPIs) range from stable molecular machines of defined stoichiometries and functions to transient interactions whose mechanisms of dynamics are poorly defined. Areas of outstanding interest in proteomics research therefore concern the composition and stoichiometry of protein complexes, the interconnectivity and presence of shared components of different protein complexes, and identification of sites of posttranslational modifications. While detectable at the genomic and transcriptional level, the functional consequence of variation from alternative splicing, allelic variations, and point mutations often plays out in the altered activity or binding capacity of the encoded proteins. For example, one possible phenotype caused by the reduced expression or enhanced turnover of a protein may have more to do with the effect this has on that protein’s binding partners.
Quantitative, sensitive, and reproducible proteomics approaches
Recently, alternative operation modalities using certain types of mass spectrometers are making MS-based proteomics studies quantitative and reproducible, with attomole sensitivity (Doerr, 2013; Marx, 2013; Picotti et al., 2013a). Accumulated data on the fragmentation properties and chromatographic behavior of peptides has enabled the development of targeted and data-independent proteomics approaches (Farrah et al., 2012). In targeted proteomics, e.g., selective reaction monitoring (SRM), the mass spectrometer is tuned to selectively monitor predefined pairs of precursor and product ion masses of unique proteins. This approach has been greatly enabled by the availability of genomic data, inexpensive de novo peptide synthesis techniques, and large-scale peptide reference maps (Ackermann et al., 2008; Farrah et al., 2012; Holman et al., 2012; Picotti et al., 2013b). Multiplexing the assay by retuning the filter allows one to keep a quantitative tally for several hundred proteins in a single experiment.
Targeted proteomics enable an interrogation of the dynamics of PPI networks. Importantly, the focus of proteomic studies can move beyond the technicalities of coverage depth or reproducibility, and allows one to pursue interrogation of the kinetic properties of protein complexes. For example, by adapting an affinity purification strategy to SRM-MS, Bisson et al. (2011) identified 90 reproducible interactors of GRB2, an important hub in growth factor signaling, and mapped the binding site of each protein to one of three characterized protein binding domains within GRB2. Thus, with a single experiment, detailed and quantitative data for 90 PPIs were collected. The dynamics of GRB2 signaling hub complexes and their association with different receptor tyrosine kinases (RTKs) to form signaling scaffolds were then measured against a battery of different growth factor receptor stimulants (Bisson et al., 2011). These experiments revealed stimulation-specific GRB2 complexes that displayed unique temporal kinetics of assembly and disassembly (Bisson et al., 2011). Similarly, the consequences of the temporal kinetics of RTK scaffold assembly were explored with the epidermal growth factor receptor (EGFR) signaling scaffold protein Shc1 (Zheng et al., 2013). The dynamics of Shc1 phosphorylation at six residues and its association with 41 binding partners was followed over multiple time points after activation of EGFR by EGF. Analysis of the results revealed a dynamic network of phosphorylation-dependent regulated recruitment and assembly of three distinct signaling complexes (Zheng et al., 2013). Therefore, SRM-MS offers the ability to explore the dynamic properties of protein networks that are essential for mechanistic understanding of biological function. In addition to studying signaling cascade kinetics, SRM-MS has also been successfully applied to the interrogation of 464 known and putative RNA polymerase II–associated general transcription factors and used to probe them for DNA binding capacity (Mirzaei et al., 2013).
Refinements of MS methodologies have also allowed for the development of quantitative data-independent approaches. For example, the systematic fragmentation of precursor ions independently of ion count was applied to a proteomic analysis of the principle of polydispersity (Jung et al., 2013). Polydispersity is a population phenomenon of proteins owing to their localization to one or more organelles that have nonuniform properties leading, for example, to a collection of different sedimentation coefficients (De Duve et al., 1960; de Duve, 1964). The cosedimentation profile of proteins from cytosolic and organellar fractions of yeast grown under different nutrient conditions enabled a comprehensive look at the dynamics of protein movement between the cytosol and organelles such as mitochondria and peroxisomes. This data-independent acquisition protocol improved the dynamic range of protein identification by over an order of magnitude from the classic shotgun MS/MS approach (Yi et al., 2002; Marelli et al., 2004; Jung et al., 2013). Remarkably, this approach revealed that as many as ∼1,200 proteins, a substantial portion of the yeast proteome, shift their relative distributions between the cytosol and an organellar fraction in response to changes in nutrient conditions (Jung et al., 2013).
The goal of an unbiased data-independent approach is to combine the benefits of increased sensitivity and quantitative capacity of targeted approaches with the discovery component found in data-dependent approaches (Gillet et al., 2012). As with targeted SRM-MS, a priori information from preassembled spectral libraries can be used by targeted data-mining algorithms to identify protein-specific peptide fragment ion traces in complex fragment ion spectra (Gillet et al., 2012). With specialized mass spectrometers, a complete record of the proteins contained in a sample can be recorded by implementing comprehensive and systematic acquisition protocols that produce time-resolved and mass-segmented complex spectral ion maps. One such promising approach is sequential window acquisition of all theoretical spectra (SWATH-MS), which refers to the way the mass spectrometer is operated to collect these comprehensive proteomics data (Collins et al., 2013). In a proof-of-principle experiment, SWATH-MS of affinity purified 14-3-3β, an abundant cytosolic scaffold protein, consistently identified 1,967 interacting proteins and quantified the dynamic changes of 567 members of the promiscuous 14-3-3β scaffold interactome after stimulation of the insulin–PI3K–AKT pathway (Collins et al., 2013). In a complementary study, the interactome data generated by SWATH-MS was used to track changes to PPI networks induced by chemical inhibitors or allelic variations linked to disease pathologies (Lambert et al., 2013). Retaining the discovery component of traditional MS, experiments conducted with data-independent approaches, such as SWATH-MS, ensure an accurate measurement of the effect of biological perturbations on the study of cellular mechanisms. Also, the comprehensive spectral data generated serve as a reliable digital record of a protein sample, and ensure data integrity. These spectral maps can assist in experiment optimization or in comparing protocols or results between laboratories, or can be used for reassessment of samples to look for features that might have been initially missed or deemed unimportant. Excitingly, targeted and data-independent MS combined with cross-linking agents is an emerging approach to improve the detection and measurement of transient PPIs and for discovering the dynamic rearrangements within protein complexes (Gingras et al., 2007; Politis et al., 2014).
Systematically deciphering the genotype-to-phenotype paradigm
Functional genomic studies pursue a mechanistic explanation for the cause and effect relationship between genotype and phenotype (Fig. 2). At a systems level, the cause and effect of genetic perturbations are typically considered from a network perspective. In organisms that are easy to manipulate genetically, i.e., Saccharomyces cerevisiae, functional genetics have been automated using robotics-assisted synthetic genetic array (SGA) methodology and measurements of colony size as a function of cellular fitness for a phenotype (Tong et al., 2001; Schuldiner et al., 2005; Tong and Boone, 2006; Roguev et al., 2008). The first compilation of a global genetic map was composed of genetic interaction profiles that covered 75% of all genes in yeast (Costanzo et al., 2010). These initial studies have revealed that the genetic interaction profile of one allele, against a genomic collection of other alleles, comprises a unique phenotypic signature that can be used to deduce uncharacterized functions and to order sets of genes within novel functional pathways (Beltrao et al., 2010; Costanzo et al., 2010; Baryshnikova et al., 2013). Such global genetic interaction networks are assembled by systematically measuring the degree of epistasis that pairs of genetic alleles impart on each other. The strength of epistasis of one allele against another cannot be assumed to scale linearly across a systematic array of all alleles in a genome. However, the systematic assembly of epistatic interactions between an allele of one gene against alleles in all other genes has successfully revealed the modularity of protein complexes as well as the cooperativity and redundancy that exists between known biological pathways and processes (Baryshnikova et al., 2013). For example, comparing the genetic interaction network profiles with networks identified by chemical–genetic perturbations can help predict the cellular targets of chemical compounds (Hillenmeyer et al., 2008, 2010; Costanzo et al., 2010; Lee et al., 2014). These functional genetics studies also highlight the challenge of pleiotropy for determining gene function with a reductionist approach. The unbiased, systematic, and quantitative characterization of genetic interaction networks has inverted the reductionist paradigm in defining a process-centric model of gene function to a component-centric model (Weissman, 2010). For example, a compilation of 53 point mutation alleles of yeast RNA polymerase II was used to assemble and systematically interrogate the functional characteristics of each of its subdomains (Braberg et al., 2013). This detailed analysis allowed a high-resolution dissection of coordinated RNA polymerase II activities in transcriptional regulation, including the rate of transcription, splicing events, and start site selection (Braberg et al., 2013). Phenotypic screening is also not limited to cellular growth or fitness. For example, SGA technology has been coupled to an automated microscopy platform to allow systematic interrogation of spindle pole body assembly and microtubule dynamics in yeast (Vizeacoumar et al., 2010; Breker et al., 2013; Fig. 2). The combination of high-content screening and SGA technology has also been used to study peroxisome dynamics (Saleem et al., 2010; Cohen et al., 2014).
One exciting application of functional genetics is identifying novel drug candidates for cancer (Kuiken and Beijersbergen, 2010). Here, the idea is to search for pathways and genetic interactions that are relevant in the context of a particular cancer or infection and target these pathways and genes for therapeutic intervention. For example, synthetic lethal interactors of oncogenic MYC have been identified through systematic siRNA screens of “druggable” genes, a collection of the human genome whose protein products are known or considered likely to bind with high affinity to known small molecules (Cheng et al., 2007; Toyoshima et al., 2012). This strategy ensures that sensitivity to the drug only occurs in the presence of oncogenic MYC and therefore is applicable in cases where targeting the oncogene itself is not practical or feasible. It also greatly expands the number of druggable targets for a given disease. Similar strategies for certain infectious diseases, where a virus or bacterial pathogen usurps the role of host cellular machinery, seem possible and are another potential application of functional genomics and systems cell biology.
Systems analysis using public databases: Modeling guides experimentation
Publicly available databases of genetic expression data, proteomics data, functional genomic screens, and automated microscopy data repositories are available to provide the inputs necessary for large-scale systems analysis to initiate systems-level interrogations. In many cases, hypotheses formed from the evaluation of a systems dataset are easily addressed with targeted and more traditional approaches to cell biology. They may also serve as a guide for choosing the right type of systems approach to invest for use in further study.
This approach was recently validated for a globally predictive environmental and gene regulatory influence network (EGRIN) model of peroxisome biogenesis in yeast (Danziger et al., 2014). The predictive capacity of the model was subsequently verified in a gene-by-gene focused study of the top candidates to more accurately assess activator or repressor function. This layered and iterative approach added an additional regulatory circuit composed of genes previously not associated with regulating peroxisome biogenesis and integrated them into a model containing a well-studied regulatory circuit. The virtuous cycle of model refinement and the explanatory power of the mechanistically predictive model aptly demonstrate the promise of systems biology to improving our understanding of cellular mechanisms.
An important outcome and aim of systems cell biology will need to be the continued creation and curation of high-quality repositories for systems-level data that ensure accessibility and ease of use for the entire biological community (Hakenberg et al., 2004; Stark et al., 2006; Kowald and Schmeier, 2011; Chatr-aryamontri et al., 2013).
Modeling cellular systems
The spectrum of modeling approaches span from conceptual to mechanistic and from focused to broad (Aldridge et al., 2006). It is beyond the scope of this review to cover the plethora of modeling approaches that exist for cell biology (see Meier-Schellersheim et al., 2009; Chen et al., 2010; Ferrell et al., 2011; Ratushny et al., 2011a; Mogilner et al., 2012; Lander, 2013). However, within a systems biology paradigm, modeling forms a central part of a cycle that includes the interpretation and integration of existing and new data, the formation of new hypotheses, and the exploration of relative parameters that aid in designing new experiments to test the model (Fig. 1). Modeling brings objectivity and minimizes the phenomenon of pareidolia, the illusion or misperception of perceiving a vague or obscure stimulus as clear and distinct, in the complex patterns found in systems biology data (Fig. 3).
The goal of modeling is to not merely imitate biological behavior but to simulate perturbations to the system in order to provide quantitative and reliable predictions of function. However, the relationship between any particular model and a set of observations is rarely unique; the number of possible models for a given system is too large without a theory to focus the search space (Brenner, 2010). Therefore, the pairing of a modeling approach with a biological system is important, as each modeling method has individual requirements, limitations, and predictive power. The utility of any given model is in its ability to focus experiments that are predicted to be most informative to the biological area of interest. This is critical given the vast potential of solutions imparted by evolution. Models sharpen questions (Matessi and Karlin, 1984).
Combining modeling with experimentation often leads to new insights synergistically. For example, global monitoring of the GINS complex combined with a very simple model of its movements revealed a surprisingly uniform progression of replication across the genome (Sekedat et al., 2010). The GINS complex is essential for establishing the DNA replication fork that is central to chromosome replication (Labib and Gambus, 2007). Time-resolved chromatin immunoprecipitation (ChIP)-chip experiments were compared with simulations that recapitulated the observed dynamics using an iterative model that relies on reliable assumptions of the distribution of start times, replication velocity, efficiency of initiation, and pausing. The combination of systems data acquisition and accurate models that simulated the data was then used to study firing efficiencies at several replication origins and to study the effect of highly transcribed transfer RNA (tRNA) genes on replication fork arrest (Sekedat et al., 2010).
Qualitative models that use pictures and diagrams with connecting arrows to propose mechanisms are likely the most common and most familiar type of models used by cell biologists. Challenges arise when such models are too abstract, when they depict mechanisms that operate outside of the scale of study, or when an attempt is made to incorporate many different types of experimental observations made under different time frames, conditions, or scales. Formalizing these qualitative models into more mechanistic and multiscale models is an essential step in systems cell biology. For example, we have studied the mechanisms of peroxisome regulation and biogenesis by integrating various global systems datasets to build both kinetic models and genome-wide statistical models (Smith et al., 2007, 2011a,b; Ratushny et al., 2008, 2012; Danziger et al., 2014). These and other studies have revealed the coordination of peroxisome dynamics with other cellular processes. Focusing on peroxisome biogenesis, transcription was shown to control peroxisomal metabolism and peroxisome import and fission machineries, but not components of de novo peroxisome biogenesis. This suggests the utility of transcriptional regulatory data in informing models of regulated peroxisome biogenesis (for review see Smith and Aitchison, 2013).
Models also help to explore the features and topologies of large networks that are useful for studying emergent systems properties. Research into the universality of network structure has revealed several shared characteristics including the small-world phenomenon; that is, molecular networks are like social networks, separated by only a handful of connections (Milgram, 1967; Watts and Strogatz, 1998; Barzel and Barabási, 2013). However, at this level many networks fall prey to the “hair-ball” syndrome and can become unintelligible. Furthermore, the ontological assignments provided in these large-scale networks are oftentimes myopic because they are assigned based on partially characterized phenomena and ignore the unknowns. One solution to this problem is to systematically infer ontological features from the data itself (Dutkowski et al., 2013). By repeating this process in combination with the integration of new data into repositories, we can refine the ontologies that reflect the system characteristics of individual cellular components.
Bringing the leverage of systems biology tools from the level of large networks to the level of the molecules and macromolecular complexes populating these networks is a frontier where both progress and challenges exist. Here, the central challenge is to equate the structural elements of a protein encoded in the genome with the functional capabilities that are phenotypically observed. This is confounded by the modularity evidenced in cells as well as the observed fact that many proteins participate in multiple different complexes. Efforts to map the structure–function relationship with the subcomponents of the nuclear pore complex (NPC) help to illustrate this point (Rout et al., 2000; Hetzer and Wente, 2009; Aitchison and Rout, 2012; Fig. 4). Structural, biochemical, and genetic evidence has revealed a modular NPC with eightfold symmetry (Alber et al., 2007; Hoelz et al., 2011). Forming the outer rings of the NPC is the Nup84 complex, a heptameric modular structure composed of Nup133, Nup120, Nup145c, Nup85, Nup84, Seh1, and Sec13. Seh1 and Sec13 are also components of the Seh1-associated complex and the COPII vesicle-coating complex (Barlowe et al., 1994; Stagg et al., 2006), and this complicates the matter of assigning specific functions to these proteins within the NPC. To determine the subunit arrangement and morphology of the Nup84 complex, an extensive domain-mapping proteomics approach was used to identify contact points within the subcomplex, as well as between the Nup84 complex and the rest of the NPC (Fernandez-Martinez et al., 2012). In addition, negative stain electron microscopy was used to obtain structural information on the different truncated forms of the complex. These data were then translated into spatial restraints and integrated with existing structural data for individual components to build a density map for the Nup84 complex and its arrangement within the NPC (Fig. 4). This process of data integration in combination with modeling, iteration, and refinement, while specific to a portion of a much larger nuclear pore complex, is a specific example of how to systematically explore cellular function.
Notwithstanding our expanded potential to map and quantify molecular components, processes, and functions of biological systems with advanced technologies, our understanding of many parts of these systems is far from complete. There are several major challenges that remain to be addressed in order to effectively model, systematically explore, and predictably control biological systems.
First, high-throughput experimental measurements often uncover intricate relationships between hundreds or thousands of molecular components. This simple fact dramatically increases the number of parameters that need to be included in corresponding models, which in turn necessitates a deluge of new experiments and system interrogations for validation of these parameters. It is important to develop modeling and analytical approaches for rational formulation and parameterization of mathematical models and optimal experimental design. This is especially critical for analysis of combinatorial regulations to avoid an explosion in the number of parameters using algorithms for rational reduction of the model complexity (Bongard and Lipson, 2007; Likhoshvai and Ratushny, 2007). It is also important to develop methods for linking genome-scale models (Bonneau et al., 2007; Danziger et al., 2014) with meso- (Martin et al., 1990; Hoffmann et al., 2002; Ratushny et al., 2008; Ashall et al., 2009; Pang et al., 2013) and small-scale (Brandman et al., 2005; Tsai et al., 2008; Ratushny et al., 2012; Wurtmann et al., 2014) models of relevant subsystems for effective and systematic exploration of the underlying detailed mechanisms. Such methods and models (Karr et al., 2012) are essential for the investigation of transient responses of biological systems when their complexity is maximally exposed due to the presence of nonlinear cooperative and synergistic effects, along with feedback and feed-forward regulatory mechanisms (Alon, 2007; Bennett et al., 2008; Ratushny et al., 2008, 2011b; Ashall et al., 2009; Litvak et al., 2009).
This issue is compounded by the fact that many systems techniques take measurements of populations of cells and molecules rather than of single cells and molecules. Averaging a signal over the population can obscure important phenomena such as phase variation and the processing and response to stochasticity. Techniques that enable the measurement of the genome, transcriptome, metabolome, or proteome of single cells are being developed (Rubakhin et al., 2011; Giesen et al., 2014; Grün et al., 2014; Klemm et al., 2014). The ability to make quantitative measurements of many different molecules repeated on the same cell over time and to multiply this process for many cells would pave the way for understanding biological variability.
Second, the heterogeneity of experimental data and the difficulties in obtaining relevant high-quality data for particular conditions challenges the building cycle of predictive models. Often, available experimental data are not sufficient to fully inform molecular mechanisms of the vast majority of biological systems, and researchers struggle with “black” or “gray” box problems, trying to investigate partially known or poorly understood molecular subsystems. Thus, it is important to develop modeling approaches that allow rational selection of the most appropriate level of detail in the model and match its complexity with the complexity of available experimental data and prior knowledge about the biological system (Likhoshvai and Ratushny, 2007; Bongard and Lipson, 2007; Schmidt and Lipson, 2009; Ratushny et al., 2011a).
Third, molecular processes of many biological systems inherently occur on multiple scales in time and space. Temporally, they span from very fast processes (e.g., formation of molecular complexes and signaling) to relatively slow processes (e.g., organelle biogenesis and cell division). Spatially, biological systems are multicompartmental structures, and frequently many biomolecular processes within any given compartment are inhomogeneous (Mogilner et al., 2012). Integration and exploration of biological processes at multiple levels simultaneously or, in contrast, constructing multiscale models, are fundamental challenges. Furthermore, the nature of biological processes is very diverse and can be viewed/modeled as preferably discrete and stochastic, or, conversely, relatively continuous and deterministic. Therefore, it is crucial to develop flexible hybrid modeling approaches that simultaneously and effectively span the various processes in cells, from the molecular to the morphological. These approaches should integrate temporal and spatial multiscale properties of biological systems and allow a reversible cross-scale flow of information within a single model.
How the cell manages the processing, storage, and transmission of information across multiple scales remains an exceptional challenge (Nurse and Hayles, 2011). In particular, discovering the extent to which different systems mechanisms are responsible for cellular function, and how systems motifs can be combined to bring about new phenotypes, are exciting avenues of pursuit. The future of cell biology as outlined here will increasingly come to rely on systems approaches. It is an exciting time to be pursuing the new biology of the 21st century.
The authors thank members of the Aitchison laboratory for critical comments on the manuscript and for discussion. We also thank the anonymous reviewers, even Reviewer 3, who in this case was phenomenal.
Research in the Aitchison laboratory is supported by grants P50 GM076547, U54 GM103511, and U01 GM098256 from the National Institutes of Health.
The authors declare no competing financial interests.