Introduction and Perspectives
UWE SAUER
[SECTION EDITOR: FREDERICK C. NEIDHARDT]
Posted August 14, 2007
Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
Mailing address: Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland. Phone: 41-44-6333672, Fax: 41-44-6331051, E-mail:
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
During the heyday of fundamental metabolic research in the 1940s to 1960s, most of our now familiar pathways were assembled from in vitro data on enzymes catalyzing individual reactions and from auxotrophic mutant analyses and isotope-tracking experiments. Perhaps the most important conclusion from this period is now a basic tenet of modern biochemistry: despite extremely different lifestyles and physiology, the underlying chemistry of metabolism is remarkably similar in all domains of life. From Escherichia coli to humans, intermediary metabolism, many anabolic reactions, and respiratory energy generation are almost identical. The omnipresent metabolic pathway chart of Roche Applied Science (40) on laboratory and office walls is one manifestation of the breadth of this accumulated knowledge. Once the general mechanisms and most key reactions appeared to be known, molecular research shifted to other areas, where completely new mechanisms were waiting to be discovered using the evolving genetic techniques. Likewise, many of the remaining metabolic topics were quantitative in nature, requiring techniques that lacked the simplicity and beauty of an agarose gel (20).
Metabolic research in microbes became focused on genetic regulation, atomic resolution of reaction mechanisms, and the elucidation of more distal or exotic reactions, e.g., mechanisms of scavenging energy in extreme ecological niches such as methanogenesis, biosynthesis of biotechnologically relevant molecules, or degradation routes of rare substrates and xenobiotics. Based on the perception of a fairly well understood metabolism, metabolic engineering was proposed in the early 1990s as a framework for rational improvement of metabolic networks (and other cellular functions) in an iterative cycle of quantitative analysis, theory-driven design, and genetically implemented synthesis (5). The capacity to reprogram the expression of virtually any gene by recombinant DNA technology led to some success stories, but truly rational engineering of metabolism remains an illusion to this day. Although metabolic engineers appreciated right from the beginning the highly organized and regulated network nature of biochemical reactions, the sheer complexity of these interactions hampered rational improvements beyond simple reaction sequences.
What had been lacking in the early metabolic research? Classically, metabolism was investigated by dissociation; i.e., molecular characteristics of enzymes and their regulators were studied in isolation, and the pathway organization of the modules in this section reflects this mode of thinking. This reductionistic approach successfully established mechanistic relationships with the immediate interacting neighbors and allowed one to reconstruct the network structure from many individual reactions. However, it did not provide a quantitative understanding of the network. This statement should not be misinterpreted as a criticism, for early researchers on metabolism developed a superb ability to integrate new results with huge bodies of heterogeneous information into a coherent whole, thereby extracting qualitative understanding from a flood of disparate and sometimes vague observations. This ability certainly rivals today’s computational methods. In some instances contemporary theoretical approaches simply rediscover much of what was known and understood before.
Increasingly as the years went on, it became apparent what was missing: the ability to make precise predictions about the integrated operation of pathways and networks was limited by the absence of tools to cope with nonlinear and complex interactions. The burden of metabolic engineering is a consequence of this fact—one cannot yet predict with any certainty precisely what needs to be engineered to produce a more complex phenotype. The recent metabolic engineering of 1,3-propanediol production in E. coli by a large industrial team illustrates this point. Although eventually successful, more than 100 man-years of largely trial-and-error endeavors were necessary to develop a commercially viable process for a compound that is only four reactions away from a glycolytic intermediate in the best understood microbe (41). What were, and still are, missing were concepts, methods, and algorithms to integrate data and information into a quantitatively coherent whole, as well as theoretical concepts to reliably predict the consequence of environmental stimuli or genetic interventions.
This chapter starts with a brief overview of the panoply of novel and global measurement technologies that herald the dawning of systems biology and whose impact on metabolic research is apparent throughout EcoSal Domain 3, Metabolism and Metabolic Fluxes. In the middle section, applications to E. coli are used to illustrate general concepts of computational methods that approach metabolism as a network of interacting elements. The final section highlights prospective focus areas for metabolic research.
The availability of the E. coli genome sequence multiplies the capabilities of classical genetics through systematically constructed, more or less comprehensive libraries of in-frame knockout mutants (3), cloned open reading frames (ORFs) under inducible promoters (with and without histidine tags) (31), fluorescent transcriptional (60) or protein reporter fusions (31). These genome-based tools enabled many important applications to metabolism such as assessment of conditional essentiality of genes coding for products with a function in metabolism (27). Another example is dynamic measurement of promoter activities with transcriptional reporters. With fluorescence recording every few minutes over hours of growth in separately cultivated, large mutant sets, extremely rich data sets have been produced, and these have provided novel insights into the temporal expression program of amino acid biosynthesis (61) and the diauxic shift (60) in E. coli.
A large impact on metabolism has come from microarray-based transcriptomics. This method enables data-driven research by surveying global patterns of gene transcription (Fig. 1). In contrast to previous hypothesis-driven approaches, comprehensive identification of genes that are part of a regulon and respond to a stimulus became feasible. Transcriptomic data are considered in almost every module of this Web resource, and the methods are discussed in more detail in Chapter The Escherichia coli Proteome: Past, Present, and Future Prospects. Transcription measurements provide, for example, a foundation for computational mapping of the transcriptional network that regulates metabolism (6, 38). Caution is necessary, however, because expression changes are only indirect responses to a stimulus, hence cause and effect are difficult to discern.
An important advance is provided by high-density, oligonucleotide-based tiling microarrays that contain the entire genome sequence without prior consultation of existing gene annotation (7). An obvious application is unbiased interrogation of mRNA expression levels, leading to discovery of novel transcribed sequences and regulatory elements. Moreover, the tiling arrays also enable high-throughput identification of protein-DNA binding, for example, by chromatin immunoprecipitated chip (ChIP-chip) analysis (9). From such direct physical evidence, unbiased networks of transcriptional regulation are being assembled (23). It is hoped that these data will contribute to the long-range goal of mapping the entire network of transcriptional regulation of E. coli metabolism.
Current proteomics methods are limited compared with the more mature microarray-based expression analysis (see also Chapter The Escherichia coli Proteome: Past, Present, and Future Prospects), but proteomics has made important contributions also to metabolic research (reviewed in reference 22). The most widely used proteomic approach continues to be based on two-dimensional gel electrophoresis, and most of the typically abundant metabolic enzymes can be quantified from such gels. More recently, advanced mass spectrometry (MS) has enabled gel-free approaches, by which complex protein mixtures are protease digested and subjected to shotgun MS peptide detection. Both gel and nongel approaches are continuously being improved for quantification and coverage of protein species and their covalent modifications (e.g., phosphorylation), but there is a long way to go. Despite impressive progress in proteomics, reliable and reproducible quantification of all the proteins in a cell, or at least a defined subset of the proteome, remains elusive for technical limitations.
A potential solution to the incomplete coverage might be targeted detection of predefined, proteotypic peptides through non-gel-based proteomics (35). In a first step, iterative cycles of shotgun MS are used to generate a PeptideAtlas database (currently about 60 to 70% of all genome-encoded proteins). In the second step, a minimal peptide set is defined that represents the proteome or a functional subproteome (e.g., all metabolic enzymes). Targeted detection of this defined subset of proteotypic peptides would then enable comprehensive detection of all proteins in the defined subproteome. It is likely that the first application of such comprehensive approaches will be to the subset of metabolic enzymes because everything seems to be in place: the sequence of most enzymes and their location in the network are known, and genome-scale models can predict which subset ought to be active under a particular condition (44).
In contrast to proteomics, physical interactome studies would seem to be less relevant for metabolism because the structural organization (topology) is comparatively well known and most of the interactions are not on the protein-protein but metabolite-protein level. Nevertheless, new protein interactions within metabolism, such as unexpectedly large complexes in the tricarboxylic acid cycle of E. coli, are continuously being unraveled (2).
The limitation of transcript and protein data, global or not, is that they do not represent a functional readout on metabolism. Expression levels and their changes cannot be directly, and rarely ever quantitatively, linked to a metabolic phenotype because metabolic activity is regulated at multiple levels including small molecule-protein interactions. To understand metabolism, it is necessary to monitor its functional key variables: concentrations of pathway intermediates and metabolite fluxes through pathways and reactions (Fig. 1). About 45% of the annotated and putative genes in the E. coli genome are enzymes or transport proteins (47), thus their functional activity is, in principle, accessible through what is now known as metabolomics and flux analysis. While such measurements have a long tradition in biochemistry, in the past they necessarily focused on one or few compounds or fluxes (e.g., see reference 20). This situation is changing; the past 5 years have witnessed major breakthroughs in technology development toward more global analyses.
Vital as they are to kinetic modeling of metabolism, the in vivo concentrations of most metabolites in E. coli are not known, not to mention reliable dynamic data on the variation of these concentrations under different conditions. This information gap is largely due to the enormous technical challenges of reliably determining intracellular metabolite concentrations; hence, metabolomics has lagged behind its "omics" cousins (43). The first problem is how to separate, identify, and quantify large numbers of metabolites from complex mixtures at widely different concentration ranges. Traditionally enzymatic methods have been the prime choice because highly specific enzymes do this job also in vivo, but the tedium of setting up one assay per compound limits the usefulness of this approach. Hence, NMR and MS detection methods have been developed for monitoring wide ranges of intra- or extracellular metabolite levels (the metabolome). In contrast to proteomics, where the analytes are combinations of amino acids with known chemical structures, metabolomics has to cope with thousands of chemically rather heterogeneous compounds (e.g., sugar phosphates, acids, nucleotides, lipids, etc.), some of unknown structure. Other key metabolites such as glucose 6-phosphate and fructose 6-phosphate are extremely similar, presenting analytical challenges for separation. Advances in NMR and MS detection sensitivity and resolution, separation technology, and bioinformatics have the capacity to address these problems, and basically all current methods for metabolomics are based on NMR or MS (25). In general, the current methods fall into two categories: (i) semiquantitative metabolite profiling that aims for relative data on as many compounds as possible and (ii) targeted quantitative analysis of predefined subsets of metabolites (25, 43).
Beyond the analytical chemistry, other problems remain to be solved individually for intracellular measurements. A key issue is rapid quenching of metabolic activity because enzymes remain active in extracts and many metabolite pools are turned over in a millisecond to second range. Thus, artifacts due to sampling and sample processing are constant nuisances. Additional problems include (but are not limited to) instability of many metabolite species, concentration ranges that differ by orders of magnitude, cellular compartmentalization, effective intracellular concentrations, and cell-to-cell variability, to mention but a few. Consequently, metabolomics consists of a collection of methods that are tailored toward particular questions. Although some quantitative metabolite concentrations are being published for E. coli, it is not surprising to see that the coverage, data quality, and quantity differ quite a bit. Initial theoretical analyses indicate that some of the measured data are thermodynamically inconsistent with the reported fluxes through the pathways (34), highlighting a problem that will remain paramount to quantitative metabolomics: how to reduce systematic and unnoticed errors? Both improved experimental methods and theoretical concepts for validation will be needed, but there is little doubt that reliable and large-scale metabolite data will advance metabolic research at many different levels, e.g., functional genomics, guidance for metabolic engineering, and key data for modeling in systems biology.
Conceptually different from the concentration-based "omics," metabolic fluxes are the time-dependent motion (i.e., in vivo reaction rates) of metabolites through the network that result from the interaction of genes, proteins, and metabolites across multiple metabolic and regulatory layers within the cell. Some of the processes that affect fluxes are not directly under the control of the cell (but, in part, subject to evolution), such as the laws of mass action that are governed by stoichiometry, reaction thermodynamics, or the kinetic characteristics of the enzymes (e.g., Vmax and Km). Beyond such processes that are hard-wired into the reaction network, cells respond to environmental challenges by active regulation processes that can modulate fluxes through altered enzyme level and in vivo activity, e.g., gene expression and translation, protein stability, covalent protein modification, and allosteric regulation through binding of effector molecules. The integrated consequence of these interactions control the rate and distribution of metabolic fluxes, but the relevance of any such regulation process for the control of flux is not obvious at all and poorly understood at present.
Since the interactions of components are typically nonlinear, have varying strength, and sometimes even move in opposite directions, many properties of metabolism cannot be understood from just monitoring the concentration of transcripts, proteins, and metabolites. The distribution of fluxes, for example, is not obvious even if one knows all the concentrations of metabolites within a cell. In contrast to snapshot-like component concentrations, intracellular fluxes cannot be determined directly. Instead, they must be mathematically inferred from measurable quantities such as uptake and production rates and/or 13C labeling data (Fig. 1). Sparked by metabolic engineering, metabolic flux analysis is a collection of experimental approaches that allow computing intracellular behavior from such data (51).
From its early days when material fluxes were balanced within assumed reaction networks, flux methods have matured to identify novel reactions or unexpected modes of network operation. The pivotal advances were elaborate 13C labeling techniques and appropriate computational methods that go well beyond the traditional stable isotope-labeling studies (59). While the earlier focus was on in vivo identification of particular reactions or, more rarely, the quantification of flux through a single pathway from few selected isotope data (20, 58), it is now possible to quantify large sets of fluxes through many metabolic subsystems. Hence, experimental identification of intracellular carbon fluxes has become a major research interest in metabolic research, metabolic engineering, and more recently systems biology (51).
Since central metabolism is the backbone of catabolic and anabolic processes under all growth conditions, most flux research has focused on the core network. Some of the renaissance of interest in central metabolism was triggered by metabolic engineering applications that attempted to redirect the major carbon flows toward biotechnological products (50). An important conclusion from these studies was that the traditional concept of pathways with distinct metabolic functions is an oversimplification. Often reactions or pathways were found to be active under conditions when they would be expected to be inactive or even detrimental, e.g., the frequent occurrence of ATP-dissipating futile cycles in E. coli and other bacteria (53). In an extreme case, unexpected enzyme operation has been found to generate entirely novel reaction sequences in central metabolism such as the recently discovered PEP-glyoxylate cycle (16) that is further discussed in Chapter Tricarboxylic Acid Cycle and Glyoxylate Bypass. The network perspective is even more relevant for redox and energy metabolism because some of their components participate in literally hundreds of reactions (Chapter Energy Generation/Redox Control), and many unanswered questions remain on how the balance between anabolism and catabolism is achieved and controlled. For NADPH metabolism, flux data have been instrumental in quantifying the various NADPH-generating mechanisms, revealing the divergent roles of the two E. coli transhydrogenases in balancing the demands for this anabolic redox cofactor (52).
Most published data sets were obtained from (quasi) steady-state growth in glucose media, while other substrates remained largely unexplored although they are principally amenable to the current methods. Necessary further technological developments include extending flux methods beyond central metabolism, improving the accuracy, developing standards to assess the quality of the estimates, and analyzing dynamic conditions when cells are not in steady state, including situations in which cells do not grow at all or in media with multiple carbon sources (51).
Knowing the flux distribution within the cell, however, does not mean that we understand how the network is regulated by the multiple layers of metabolic and regulatory interactions. While flux analysis identifies the flux state, it does not directly reveal the underlying control mechanisms. To do so, we need to integrate data sets from different cellular levels. This subject is addressed in the following section.
Determination of concentration changes for large numbers of network components is feasible. Although impressive, in itself this is reductionism at a very high level unless the individual data points are connected. Statistical inference or data-mining methods can group subsets of components into classes of similar behavior and thus extract hidden patterns from huge quantities of data. These patterns, however, do not directly lead to mechanistic insight. The missing element in the transition from component measurements to systems organization and operation is the ability to connect the measured components and to integrate the data across functional levels. Both can be achieved through the combination of formal methods (algorithms) and abstract representations of known biological connection circuits (models) (Fig. 2). In constructing models, we assemble a higher hierarchical level (a system) from the components in a rigorous manner. Since systems exhibit new properties that could not have been predicted from knowledge of the constituents, reductionistic approaches alone will not get us there (32).
To illustrate the additional benefit gained from connecting components, let us continue this line of thought on cofactor metabolism. Consider the E. coli enzymes glucose-6-phosphate dehydrogenase and the energy-translocating transhydrogenase (PntAB), situated in far apart pathways (and textbook pages), they do not have much in common at first sight. Hence, increased expression of the former and decreased expression of the latter might well be unrelated events as might be changes in the metabolite concentrations around these reactions. The readily available reaction stoichiometry, however, tells us that both reactions produce the anabolic redox cofactor NADPH and thus are connected with each other and with many other reactions through their function in balancing the cofactor levels (52). Negatively correlated expression changes are therefore a first indication that the one reaction takes over the function of the other. While this may sound simple, it quickly becomes complicated when one attempts to extract meaning from thousands of measured mRNA, protein, or metabolite changes.
From increasing numbers of connections between components, more complex system properties emerge for networks, cells, and multicellular assemblies. For example, we almost take it for granted that biological networks, metabolism, in particular, are robust; i.e., maintain performance in the face of many different types of genetic or environmental perturbations. This higher-level property of metabolism is not an inherent property, but the consequence of different types of metabolic and regulatory connections, and we are just starting to understand the underlying key principles such as network redundancy, modularity, hierarchy, and feedback control (56). Mathematical models are now necessary to understand the mechanistic interplay, identify key mechanisms through testable hypotheses, and quantify their relative contribution to systems properties such as robustness.
Purely intuitive or statistical assaults on large data sets do not provide molecular insights and understanding of systems properties, but the combination of experimental data and computational modeling could (32). Computational approaches can be separated into those that infer functional network interactions from data and those where a priori knowledge on the biological system is used to generate a first, best-guess model (bottom-up) that can be used for in silico experiments to generate testable hypotheses. While the current focus of systems biological research on signaling and protein-protein interactions networks is more discovery-oriented toward the identification of the involved components and the network topology, the bottom-up approach can be taken directly to metabolism and places metabolic networks right in the front row of systems biology.
In general, metabolic models represent biology at different levels of abstraction (54), which is well illustrated in the case study of glycolysis in Chapter Glycolysis and Flux Control. In the simplest form, only static interactions between components are used to generate topological models (Fig. 2). In such graph models, the interaction between genes, proteins, or metabolite/proteins are represented in a nondirectional, unspecified, and time-independent manner. Even at this high level of abstraction, relevant insights may be derived, for example, by decomposing the complex transcriptional regulation network of E. coli and others into network motifs, whose dynamics and function can be understood individually (1).
Beyond simple interactions, the next level of abstraction exploits our knowledge of enzymatic reactions by adding physicochemical invariants such as the reaction stoichiometry and thermodynamics (reaction directionality) to the network topology (Fig. 2). Consequently, the potential operation of the network is constrained to a much smaller but biologically more meaningful range. For E. coli, and many other microbes, this approach has been taken to a full genome-scale model of metabolism (44, 46), where many of the biological principles are nicely rationalized in the still outstanding textbook on the molecular physiology of the cell (42). The purpose of such constraint-based models is not so much to make specific predictions as it is to systematically exclude unlikely/impossible phenotypes. Since the mathematically encoded material balances are consistently considered, these models are more rigorous representations of metabolism than are databases such as EcoCyc (29), and they can now be interrogated with different types of structural computational methods. One possibility is elementary flux mode analysis that decomposes large networks into basic (elementary) units without further assumptions (for a general introduction, see Chapter Glycolysis and Flux Control). Statistical analysis of such flux modes has been demonstrated to determine key aspects of functionality and regulation of E. coli metabolism (55). Although constrained, the models still describe a vast space of mathematically feasible solutions. A frequently used method to extract specific solutions from this space is to assume that the network operates according to optimality principles (so-called flux balance analysis). Thereby specific predictions of often surprising accuracy are possible, for example, the physiological end point of E. coli adaptive evolution on specific substrates (26) or phenotypes of deletion mutants (44). Since these stoichiometric approaches are past the immediate development phase, their qualitative and quantitative predictive capacity must now be rigorously assessed for various aspects of metabolic aspects such as evolution, extracellular phenotypes, and intracellular fluxes. Also the model structure must be continuously updated with new biological insights (compare with the functional genomics section below).
These topological and stoichiometric models do not afford quantitative predictions of specific components, for instance, the concentration of a metabolite or a protein and how it might affect the overall network behavior. Such predictions may be achieved with mechanism-based models that represent the also dynamic aspects of reaction biochemistry and regulation (54) (Fig. 2). While dynamic models are closer to reality and would ultimately be more useful in making precise, experimentally testable hypotheses, they are much more difficult to develop and handle. In sharp contrast to topological and stoichiometric models, dynamic models require mechanistic detail and kinetic parameters that are rarely available. Hence, such models are focused in general on a concise set of reactions around important nodes in metabolism. Although not a modeling approach in the above sense to enable in silico experiments, metabolic control analysis is related to kinetic models as a mathematically rigorous framework to quantify the sensitivity of steady-state variables (i.e., pathway flux and metabolite concentrations) to small changes in parameters (typically enzyme concentrations) (15). An important contribution from such control analyses was to reveal the misleading nature of the concept of a rate-limiting step in a metabolic pathway. Control is typically shared between enzymes and shifts quickly from one to another enzyme, for example, upon subtle overexpression.
In principle, models at any level can reveal interesting aspects of network structure and evolvability by computer simulations. Very importantly, they also provide a unique basis to integrate qualitatively and quantitatively diverse data sets within the known framework of biological mechanisms. An impressive example is integration of large-scale E. coli microarray data with a static topological model of the database- and literature-derived regulation network, by which the relative contribution of various transcription factors could be deconvoluted (28). A biologically tangible outcome of this integration is the demonstration that the gluconeogenic genes of E. coli, possibly through the phosphoenolpyruvate (PEP) level, provide a feedback loop to cAMP-dependent catabolite repression. Similarly, metabolite data can be integrated with topological models to reveal reactions around which significantly coordinated changes occur (10). By additionally considering thermodynamic principles in a stoichiometric model, an approach was developed that allows one to quantitatively integrate flux and metabolite concentration data (34). This integration allows the systematic generation of hypotheses on putative sites of active regulation in E. coli metabolism.
Obviously, computational systems biology must advance further to become more directly useful to biology. One key element is to supplant the current practice of first collecting large data sets from which eventually a model is constructed by a conjoint process, where experimental planning at least in part depends on the identification of sensitive model parameters or model structure from simulations. Often it is much more important to obtain quantitative, time-resolved data on a particular gene or protein rather than to perform entire "omics" experiments.
Many reactions, entire pathways, and their regulation still wait to be discovered, and specific open ends of E. coli metabolism are mentioned throughout the relevant modules in EcoSal Domain 3. However, several general aspects of metabolism either are not understood or are poorly understood. Without claiming comprehensiveness, the following section compiles such topics that will require intense and primarily systems biological research.
Even for E. coli we still have no functional annotation for about 30% of all genes (47). While the percentage might be better for genes encoding metabolic enzymes, the current genome-scale metabolic network models commonly include many enzymatic functions that cannot be attributed to any gene, so-called orphan activities. The most promising approach to identify genes for such orphan activities is computational strategies to suggest the most likely candidate genes for experimental testing. Promising results were obtained for the test case of all known genes of the E. coli metabolic network by combining knowledge on the local network structure with multiple types of functional association evidence, including genome position, similarity of phylogeny, gene expression, and others (30). Such bioinformatics methods will undoubtedly contribute to completing the functional assignment of genes with metabolic functions in E. coli and beyond.
Besides orphan activities, for which we can start with an expected function, the currently unassigned ORFs also encode enzymes and regulators of metabolism that are of completely unknown function. A useful alternative to empirical case-by-case functional annotation are genomewide enzymatic screens that are starting to produce many new enzyme annotations for E. coli (36). Other systematic approaches include large-scale phenotypic, metabolome, or flux-profiling studies with mutant or overexpression libraries and automated model extension and prediction of missing reactions by iterative comparison to large phenotypic databases (45).
The many known enzymes that catalyze more than one reaction underestimate the metabolic reality, as is illustrated by recent genomewide screens (36). This has potentially far-reaching implications for model construction because more reactions might occur than are in current databases. The potential contribution to the evolutionary plasticity of metabolism is also important for our understanding of how new pathways evolve and how the present networks emerged. Another not fully settled issue is whether all reactions can be considered as independent of each other or whether special mechanisms result in channeling of metabolites, preferentially along certain pathways.
The flood of data and their computational analysis are starting to reveal general ("design") principles on the organization, regulation, and operation of metabolic networks. An illustrative example obtained with a large green fluorescent protein fusion library is the just-in-time program of transcriptional regulation of biosynthetic pathways in E. coli, i.e., a wavelike temporal expression that ensures that genes early on in the pathway are transcribed from promoters with shorter response times and higher maximal activity than those further down the pathway (61). At the functional level of fluxes, accumulating evidence from 13C labeling experiments indicates that the distribution of intracellular fluxes within the network is surprisingly robust with respect to random genetic perturbations, but responds very flexibly to environmental stimuli (17). A yet unanswered question is just how the network achieves such stability and how metabolite levels respond. Metabolism and its regulation network will be key fields for deriving further design principles because comparatively mature experimental methods and structural understanding of the network topology are available for organisms like E. coli.
Obviously, design principles can only be understood in the light of evolution, which is now becoming experimentally tractable. Real-time evolution experiments follow microbes over hundreds and even thousands of generations to address issues like the dynamics of evolutionary adaptation, the genetic bases of adaptation, and the mechanisms that increase fitness, trade-offs, and the environmental specificity of adaptation (Chapter Genetic Variation in Laboratory-Evolved Populations of Escherichia coli) (13). Metabolism, of E. coli in particular, is becoming a focal point, because possible outcomes of adaptive evolution can be computationally evaluated for hypothesis-driven design of experiments, for example, by subjecting knockout mutants to a particular environmental selection (19). Within a few hundred generations, E. coli evolves to computationally predicted growth rates on substrates where its metabolism was not perfectly adapted (26). First applications of transcriptomics (4) and flux analysis (18) are starting to identify the molecular events and general principles by which the network adapts. A major leap in enabling technology is comparative genome sequencing for monitoring acquisition and fixation of mutations that convey a selective growth advantage, as demonstrated for adaptation of E. coli to growth on glycerol (24). More generally, combining network models with genome bioinformatics is providing insights into how structure and function of the network influence important evolutionary processes, such as the fixation of single-nucleotide mutations, gene duplications, and gene deletions (57).
An important ramification of laboratory evolution is the potential oversimplification of bacteria as static entities in common laboratory environments. While major genome arrangements are known in individuals of natural Salmonella and E. coli populations (EcoSal Domain 6, Chromosomes, Genomics, and Evolution), the apparent plasticity of phenotypes in adapted laboratory strains is another matter. The surprisingly rapid occurrence and radiation of adaptive mutations in the constant environment of a continuous culture to which E. coli is well adapted raises concerns about population average analyses (39). Within just 26 days under standard glucose limitation, E. coli evolved into a plethora of distinct phenotypes.
Even without the evolutionary dimension, advances in quantitative single-cell analysis through fluorescent imaging techniques give us a first glimpse that the common assumption of populations as homogeneous mixtures of approximately uniform entities is too simplistic for many aspects of microbial behavior. Clonal populations can exhibit substantial phenotypic variation that is essential for many biological processes, thus opening up a new dimension of complexity. The overall variability within cell populations is based on intrinsic noise arising from the stochasticity of biochemical processes and extrinsic noise from fluctuations in other cellular components or the environment, as was shown, for example, by following gene expression in individual E. coli cells (14). Bistability is one emerging mechanistic explanation for many types of microbial behavior; i.e., phenotypic heterogeneity is manifested by bifurcation into distinct subpopulations (12). For the time being, the relevance of population heterogeneity has been described for some regulatory phenomena in peripheral cellular systems, but some evidence indicates that such events are also relevant for metabolism and its regulation.
The electronically encoded information on the regulatory network is currently most comprehensive for E coli (49), with the RegulonDB database as an enormously important and well-curated resource (http://regulondb.ccg.unam.mx). Nevertheless, our understanding of the signaling and regulation networks that govern metabolism is far less advanced than our knowledge of stoichiometry and thermodynamics of the metabolic network itself (Fig. 1). Much of contemporary research focuses on transcriptional regulation because such responses can be mapped out with DNA arrays, in particular, for metabolism, but more attention must be paid to posttranslational and allosteric regulation (48). Evidence is accumulating that lack of direct correlation between mRNA and protein levels and metabolic flux is neither the rule nor the exception. While transcriptional repression is the main control mechanism for expression of many biosynthetic genes, shutting down the pathway flux requires mainly allosteric feedback inhibition (11). Compared with the more straightforward regulation of peripheral biosynthetic pathways, regulation of intermediary metabolism is even more complex because these reactions must be fine tuned to the demands under many different conditions, and many key regulatory interactions still remain to be discovered.
Much of molecular biology has been focused (very successfully) on identifying novel regulation mechanisms for gene expression. Whether regulation at this level controls metabolic activity and thus flux (to which extent? under which conditions?) is a separate question that has received considerably less attention. Knowing that a particular enzyme is expressed at a fivefold higher level does not necessarily mean that its catalyzed flux increases at all, and almost never by the same factor. Quantitative understanding of how phenotypes and behavior of cells are controlled is one of the major challenges in biological research, and metabolism is one of the best areas to address the question of control because the functional output in terms of fluxes can be quantified experimentally.
A key challenge for future metabolic research is to understand how the regulatory network (genetic as well as enzyme activity regulation through allosteric effectors) actually controls metabolism. First computational attempts include rigorous control analysis (48), the aforementioned thermodynamic analysis to identify active regulation sites (34), or simply digitizing (on/off) the available regulatory information (6). Experimental advances and open problems are nicely summarized for the tricarboxylic acid (TCA) cycle example of E. coli in Chapter Tricarboxylic Acid Cycle and Glyoxylate Bypass. The preceding addresses the question of how flux is controlled; why it is controlled is another question. Biosynthesis pathways are primarily controlled to ensure sufficient supply of the required end product and the glycolytic flux in E. coli appears to be controlled by the intracellular demand for ATP (33), but why particular fluxes are realized through other intermediary pathways or respiration is less obvious. We clearly need to go further and this will require new data, most importantly in vivo fluxes, but also new computational methods to integrate them.
A route not yet greatly explored that will undoubtedly gain relevance is designing and constructing artificial regulation circuits that might give us a better handle on understanding operating principles of regulation and control, because such approaches allow one to delineate, to some extent, defined subsystems from the otherwise complex interactions within cells. For glycolytic flux in E. coli, the design of a synthetic gene-metabolic oscillator demonstrated precisely how flux through a pathway can be a control factor that might be relevant for many behavioral traits (21).
Although not (yet) a hallmark of biological research, a certain level of standardized planning, execution, and reporting of experiments and sharing the data in community repositories becomes increasingly important to ensure broad usability of large data sets. Transcriptomics is again the most advanced in this respect (8), but the issue has been recognized and is being addressed in metabolomics (M. J. van der Werf, R. Takors, J. Smedsgaard, et al., Metabolomics, in press) and modeling (37).
Whatever level of understanding has been achieved by new methods and approaches, one key challenge to our capabilities is to translate it into useful products and processes. Metabolic engineering will continue to benefit from, but also drive many of these advances toward a truly rational engineering. Whether (re)engineering existing cell factories, or de novo engineering in the sense of synthetic biology, or a combination of both, will be more useful is currently a question of belief and (in part) semantics, but E. coli will continue to remain an important host for biotechnology.
Although we understand metabolism better than most other cellular systems, many open issues remain, for instance, on the function of pathway, on unassigned enzyme activities, and on many unassigned genes. The perhaps largest gaps in our present structural understanding are within the regulation processes that control metabolism, and we might have to descend to the level of individual cells to fully appreciate this intricate coordination network. These topics are most appropriately addressed by combining classical biochemistry, molecular biology, and more systems-oriented approaches.
Generation of more data, at higher quality, and on as yet unstudied properties is prerequisites for deeper insights. However, such efforts will not lead us directly, and certainly not automatically, to a quantitative understanding of metabolism. The grand challenge is to integrate the complex and highly diverse data into a coherent whole from which testable hypotheses on general principles, relevant control mechanisms, or evolution can be derived that will lead to a holistic, quantitative, and predictive understanding of the system under investigation.
The frequently raised expectation that systems biology will lead to a comprehensive mathematical model (of metabolism in our case) is perhaps overly ambitious. Models will remain abstract simplifications of reality, else they would be just as complicated as what they model. The purpose of the model largely defines the level of abstraction, e.g., dynamic responses to a stimulus, network evolution, or structure must be assessed with different models that can leave out different parts of reality. For many current applications, perfect predictions are neither feasible nor mandatory. Much of a model’s value is in the rigorous identification of pivotal experiments to validate a particular hypothesis or to discriminate between different principally feasible mechanisms. A key element of the integrated computational-experimental approach is that one does not need to measure everything at the highest quality and resolution, but can focus on the pivotal aspects. As biologists and theoreticians get more acquainted with each other, a paradigm shift in biological research might occur where experiments will increasingly be designed by computational methods, and metabolic research will likely be among the first to benefit from it.
References
1. Alon, U. 2006. An Introduction to Systems Biology: Design Principles of Biological Circuits, vol. 10. CRC Press, London, United Kingdom.
2. Arifuzzaman, M., M. Maeda, A. Itoh, K. Nishikata, C. Takita, R. Saito, T. Ara, K. Nakahigashi, H. C. Huang, A. Hirai, K. Tsuzuki, S. Nakamura, M. Altaf-Ul-Amin, T. Oshima, T. Baba, N. Yamamoto, T. Kawamura, T. Ioka-Nakamichi, M. Kitagawa, M. Tomita, S. Kanaya, C. Wada, and H. Mori. 2006. Large-scale identification of protein-protein interaction of Escherichia coli K-12. Genome Res. 16:686–691.[PubMed] [CrossRef]
3. Baba, T., T. Ara, M. Hasegawa, Y. Takai, Y. Okumura, M. Baba, K. A. Datsenko, M. Tomita, B. L. Wanner, and H. Mori. 2006. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. Article no. 2006.0008. [Online.] doi:10.1038/msb4100050. http://www.nature.com/msb/journal/v2/n1/full/msb4100050.html [CrossRef]
4. Babu, M. M., and L. Aravind. 2006. Adaptive evolution by optimizing expression levels in different environments. Trends Microbiol. 14:11–14.[PubMed] [CrossRef]
5. Bailey, J. E. 1991. Toward a science of metabolic engineering. Science 252:1668–1675.[PubMed] [CrossRef]
6. Barrett, C. L., C. D. Herring, J. L. Reed, and B. O. Palsson. 2005. The global transcriptional regulatory network for metabolism in Escherichia coli exhibits few dominant functional states. Proc. Natl. Acad. Sci. USA 102:19103–19108.[PubMed] [CrossRef]
7. Bertone, P., M. Gerstein, and M. Snyder. 2005. Applications of DNA tiling arrays to experimental genome annotation and regulatory pathway discovery. Chromosome Res. 13:259–274.[PubMed] [CrossRef]
8. Brazma, A., P. Hingamp, J. Quackenbush, G. Sherlock, P. Spellman, C. Stoeckert, J. Aach, W. Ansorge, C. Ball, H. C. Causton, T. Gaasterland, P. Glenisson, F. C. Holstege, I. Kim, V. Markowitz, J. C. Matese, H. Parkinson, A. Robinson, U. Sarkans, S. Schulze-Kremer, J. Stewart, R. Taylor, J. Vilo, and M. Vingron. 2001. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29:365–371.[PubMed] [CrossRef]
9. Bulyk, M. L. 2006. DNA microarray technologies for measuring protein-DNA interactions. Curr. Opin. Biotechnol. 17:422–430.[PubMed] [CrossRef]
10. Çakir, T., K. R. Patil, Z. İ. Önsan, K. O. Ülgen, B. Kırdar, and J. Nielsen. 2006. Integration of metabolome data with metabolic networks reveals reporter reactions. Mol. Syst. Biol. Article no. 2:50. [Online.] doi:10.1038/msb4100085. http://www.nature.com/msb/journal/v2/n1/full/msb4100085.html
11. Caldara, M., D. Charlier, and R. Cunin. 2006. The arginine regulon of Escherichia coli: whole-system transcriptome analysis discovers new genes and provides an integrated view of arginine regulation. Microbiology 152:3343–3354.[PubMed] [CrossRef]
12. Dubnau, D., and R. Losick. 2006. Bistability in bacteria. Mol. Microbiol. 61:564–572.[PubMed] [CrossRef]
13. Elena, S. F., and R. E. Lenski. 2003. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nat. Rev. Genet. 4:457–469.[PubMed] [CrossRef]
14. Elowitz, M. B., A. J. Levine, E. D. Siggia, and P. S. Swain. 2002. Stochastic gene expression in a single cell. Science 297:1183–1186.[PubMed] [CrossRef]
15. Fell, D. A. 1997. Understanding the Control of Metabolism. Portland Press, London, United Kingdom.
16. Fischer, E., and U. Sauer. 2003. A novel metabolic cycle catalyzes glucose oxidation and anaplerosis in hungry Escherichia coli. J. Biol. Chem. 278:46446–46451.[PubMed] [CrossRef]
17. Fischer, E., and U. Sauer. 2005. Large-scale in vivo flux analysis shows rigidity and sub-optimal performance of Bacillus subtilis metabolism. Nat. Genet. 37:636–640.[PubMed] [CrossRef]
18. Fong, S. S., A. Nanchen, B. O. Palsson, and U. Sauer. 2006. Latent pathway activation and increased pathway capacity enable Escherichia coli adaptation to loss of key metabolic enzymes. J. Biol. Chem. 281:8024–8033.[PubMed] [CrossRef]
19. Fong, S. S., and B. O. Palsson. 2004. Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes. Nat. Genet. 36:1056–1058.[PubMed] [CrossRef]
20. Fraenkel, D. G. 1992. Genetics and intermediary metabolism. Annu. Rev. Genet. 26:159–177.[PubMed] [CrossRef]
21. Fung, E., W. W. Wong, J. K. Suen, T. Bulter, S. G. Lee, and J. C. Liao. 2005. A synthetic gene-metabolic oscillator. Nature 435:118–122.[PubMed] [CrossRef]
22. Han, M.-J., and S. Y. Lee. 2006. The Escherichia coli proteome: past, present, and future prospects. Microbiol. Mol. Biol. Rev. 70:362–439.[PubMed] [CrossRef]
23. Herring, C. D., M. Raffaelle, T. E. Allen, E. I. Kanin, R. Landick, A. Z. Ansari, and B. O. Palsson. 2005. Immobilization of Escherichia coli RNA polymerase and location of binding sites by use of chromatin immunoprecipitation and microarrays. J. Bacteriol. 187:6166–6174.[PubMed] [CrossRef]
24. Herring, C. D., A. Raghunathan, C. Honisch, T. Patel, M. K. Applebee, A. R. Joyce, T. J. Albert, F. R. Blattner, D. van den Boom, C. R. Cantor, and B. O. Palsson. 2006. Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat. Genet. 38:1406–1412.[PubMed] [CrossRef]
25. Hollywood, K., D. R. Brison, and R. Goodacre. 2006. Metabolomics: current technologies and future trends. Proteomics 6:4716–4723.[PubMed] [CrossRef]
26. Ibarra, R. U., J. S. Edwards, and B. O. Palsson. 2002. Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420:186–189.[PubMed] [CrossRef]
27. Joyce, A. R., J. L. Reed, A. White, R. Edwards, A. Osterman, T. Baba, H. Mori, S. A. Lesely, B. O. Palsson, and S. Agarwalla. 2006. Experimental and computational assessment of conditionally essential genes in Escherichia coli. J. Bacteriol. 188:8259–8271.[PubMed] [CrossRef]
28. Kao, K. C., L. M. Tran, and J. C. Liao. 2005. A global regulatory role of gluconeogenic genes in Escherichia coli revealed by transcriptome network analysis. J. Biol. Chem. 280:36079–36087.[PubMed] [CrossRef]
29. Keseler, I. M., J. Collado-Vides, S. Gama-Castro, J. Ingraham, S. Paley, I. T. Paulsen, M. Peralta-Gil, and P. D. Karp. 2005. EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res. 33:D334–D337.[PubMed] [CrossRef]
30. Kharchenko, P., L. Chen, Y. Freund, D. Vitkup, and G. M. Church. 2006. Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics 7:177. [Online.] http://www.biomedcentral.com/1471-2105/7/177 [CrossRef]
31. Kitagawa, M., T. Ara, M. Arifuzzaman, T. Ioka-Nakamichi, E. Inamoto, H. Toyonaga, and H. Mori. 2005. Complete set of ORF clones of Escherichia coli ASKA library (A Complete Set of E. coli K-12 ORF Archive): unique resources for biological research. DNA Res. 12:291–299.[PubMed] [CrossRef]
32. Kitano, H. 2002. Computational systems biology. Nature 420:206–210.[PubMed] [CrossRef]
33. Koebmann, B. J., H. V. Westerhoff, J. L. Snoep, D. Nilsson, and P. R. Jensen. 2002. The glycolytic flux in Escherichia coli is controlled by the intracellular demand for ATP. J. Bacteriol. 184:3909–3916.[PubMed] [CrossRef]
34. Kümmel, A., S. Panke, and M. Heinemann. 2006. Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data. Mol. Syst. Biol. Article no. 2006.0034. [Online.] http://www.nature.com/msb/journal/v2/n1/full/msb4100074.html
35. Kuster, B., M. Schirle, P. Mallick, and R. Aebersold. 2005. Scoring proteomes with proteotypic peptide probes. Nat. Rev. Mol. Cell. Biol. 6:577–583.[PubMed] [CrossRef]
36. Kuznetsova, E., M. Proudfoot, S. A. Sanders, J. Reinking, A. Savchenko, C. H. Arrowsmith, A. M. Edwards, and A. F. Yakunin. 2005. Enzyme genomics: application of general enzymatic screens to discover new enzymes. FEMS Microbiol. Rev. 29:263–279.[PubMed] [CrossRef]
37. Le Novere, N., A. Finney, M. Hucka, U. S. Bhalla, F. Campagne, J. Collado-Vides, E. J. Crampin, M. Halstead, E. Klipp, P. Mendes, P. Nielsen, H. Sauro, B. Shapiro, J. L. Snoep, H. D. Spence, and B. L. Wanner. 2005. Minimum information requested in the annotation of biochemical models (MIRIAM). Nat. Biotechnol. 23:1509–1515.[PubMed] [CrossRef]
38. Liao, J. C., R. Boscolo, Y. L. Yang, L. M. Tran, C. Sabatti, and V. P. Roychowdhury. 2003. Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl. Acad. Sci. USA 100:15522–15527.[PubMed] [CrossRef]
39. Maharjan, R., S. Seeto, L. Notley-McRobb, and T. Ferenci. 2006. Clonal radiation in a constant environment. Science 313:514–517.[PubMed] [CrossRef]
40. Michal, H. 2006. Roche Applied Science "Biochemical Pathways." ExPASy. [Online.] http://www.expasy.ch/cgi-bin/search-biochem-index
41. Nakamura, C. E., and G. M. Whited. 2003. Metabolic engineering for the microbial production of 1,3-propanediol. Curr. Opin. Biotechnol. 14:454–459.[PubMed] [CrossRef]
42. Neidhardt, F. C., J. L. Ingraham, and M. Schaechter. 1990. Physiology of the Bacterial Cell: a Molecular Approach. Sinauer Associates, Inc., Sunderland, MA.
43. Nielsen, J., and S. G. Oliver. 2005. The next wave in metabolome analysis. Trends Biotechnol. 23:544–546.[PubMed] [CrossRef]
44. Price, N. D., J. L. Reed, and B. O. Palsson. 2004. Genome-scale models of microbial cells: evaluating the consequences of constraints. Nature Rev. Microbiol. 2:886–897. [CrossRef]
45. Reed, J. L., T. R. Patel, K. H. Chen, A. R. Joyce, M. K. Applebee, C. D. Herring, O. T. Bui, E. M. Knight, S. S. Fong, and B. O. Palsson. 2006. Systems approach to refining genome annotation. Proc. Natl. Acad. Sci. USA 103:17480–17484.[PubMed] [CrossRef]
46. Reed, J. L., T. D. Vo, C. H. Schilling, and B. O. Palsson. 2003. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol. 4:R54. [CrossRef]
47. Riley, M., and M. H. Serres. 2000. Interim report on genomics of Escherichia coli. Annu. Rev. Microbiol. 54:341–411.[PubMed] [CrossRef]
48. Rossell, S., C. C. van der Weijden, A. Lindenbergh, A. van Tuijl, C. Francke, B. M. Bakker, and H. V. Westerhoff. 2006. Unraveling the complexity of flux regulation: a new method demonstrated for nutrient starvation in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 103:2166–2171.[PubMed] [CrossRef]
49. Salgado, H., A. Santos-Zavaleta, S. Gama-Castro, M. Peralta-Gil, M. I. Penaloza-Spinola, A. Martinez-Antonio, P. D. Karp, and J. Collado-Vides. 2006. The comprehensive updated regulatory network of Escherichia coli K-12. BMC Bioinformatics 7:5.doi: 10.1186/1471-2105-7-5. [Online.] http://www.biomedcentral.com
50. Sanford, K., P. Soucaille, G. Whited, and G. Chotani. 2002. Genomics to fluxomics and physiomics—pathway engineering. Curr. Opin. Microbiol. 5:318–322.[PubMed] [CrossRef]
51. Sauer, U. 2006. Metabolic networks in motion: 13C-based flux analysis. Mol. Syst. Biol. 2:62. doi: 10.1038/msb4100109. [Online.] http://www.pubmedcentral.nih.gov [CrossRef]
52. Sauer, U., F. Canonaco, S. Heri, A. Perrenoud, and E. Fischer. 2004. The soluble and membrane-bound transhydrogenases UdhA and PntAB have divergent functions in NADPH metabolism of Escherichia coli. J. Biol. Chem. 279:6613–6619.[PubMed] [CrossRef]
53. Sauer, U., and B. Eikmanns. 2005. C3-carboxylation and C4-decarboxylation reactions: the anaplerotic node as a switchpoint for C-flux distribution. FEMS Microbiol. Rev. 29:765–794.[PubMed] [CrossRef]
54. Stelling, J. 2004. Mathematical models in microbial systems biology. Curr. Opin. Microbiol. 7:513–518.[PubMed] [CrossRef]
55. Stelling, J., S. Klamt, K. Bettenbrock, S. Schuster, and E. D. Gilles. 2002. Metabolic network structure determines key aspects of functionality and regulation. Nature 420:190–193.[PubMed] [CrossRef]
56. Stelling, J., U. Sauer, Z. Szallasi, F. J. Doyle III, and J. Doyle. 2004. Robustness of cellular functions. Cell 118:675–685. [PubMed] [CrossRef]
57. Vitkup, D., P. Kharchenko, and A. Wagner. 2006. Influence of metabolic network structure and function on enzyme evolution. Genome Biol. 7:R39. doi:10.1186/gb-2006-7-5-r39. [Online.] http://genomebiology.com [CrossRef]
58. Walsh, K., and D. E. Koshland, Jr. 1984. Determination of flux through the branch point of two metabolic cycles. J. Biol. Chem. 259:9646–9654.[PubMed]
59. Wiechert, W. 2001. 13C metabolic flux analysis. Metab. Eng. 3:195–206.[PubMed] [CrossRef]
60. Zaslaver, A., A. Bren, M. Ronen, S. Itzkovitz, I. Kikoin, S. Shavit, W. Liebermeister, M. G. Surette, and U. Alon. 2006. A comprehensive library of fluorescent transcriptional reporters for Escherichia coli. Nat. Methods 3:623–628.[PubMed] [CrossRef]
61. Zaslaver, A., A. E. Mayo, R. Rosenberg, P. Bashkin, H. Sberro, M. Tsalyuk, M. G. Surette, and U. Alon. 2004. Just-in-time transcription program in metabolic pathways. Nat. Genet. 36:486–491.[PubMed] [CrossRef]