Organization and Function of Transcription Regulatory Elements
Chapter
79
JAY D. GRALLA and JULIO COLLADO-VIDES
There are few more fundamental biological questions than why genomes are organized as they are. This question is especially intriguing with respect to bacteria, in which evolutionary pressures apparently work to keep the genomes relatively compact and free from extraneous sequences. The primary purpose of this chapter is to analyze and review the organization of transcription regulatory elements in the Escherichia coli genome. At this stage of development, a large number of promoters have had their nucleotide sequences determined and have also been analyzed in terms of function. Our analysis will consider only those promoters that have known interaction sites for regulatory proteins. They constitute a large database for analyzing the types of patterns that exist in promoters and for understanding how these patterns contribute to the overall regulatory scheme of the organism. We will both present these patterns and attempt to understand their functions in terms of how they direct the types of macromolecular interactions that control transcription.
This chapter grew out of a previous analysis presented several years ago when the database was smaller (8). Many of the conclusions reached in the present review are unchanged from that time. In such cases, we will simply present the conclusions, in terms of both patterns and mechanisms, and refer the reader to the previous report for detailed arguments. Even in these cases, however, the larger database allows conclusions to be stated with a higher level of certainty. In other cases, conclusions will need to be modified, and we will present the arguments in greater detail. Finally, the new perspectives that inevitably emerge with time have led to the identification of new patterns.
The primary focus of this chapter will be 132 E. coli promoters from E. coli (and Salmonella typhimurium [official designation, Salmonella typhimurium serovar Typhimurium]) that are recognized by the σ 70 transcription system, by far the major one in the organism. Because the number of promoters recognized by other members of the σ 70 family is much smaller, these promoters will not be analyzed for general patterns. Mechanistic studies have suggested that there probably are not fundamental differences within this family (31), which, of course, does not preclude there being interesting differences. Our analysis does include members of the σ 54 family of promoters, which, our last review showed, direct transcription by a fundamentally different mechanism. In this case, the analysis also includes Klebsiella promoters, for which σ 54 promoters have been studied intensively because of their central role in the nitrogen-fixing apparatus, which is absent from E. coli. There is not yet a large enough database for Klebsiella species or other bacteria to justify inclusion of their σ 70-type promoters.
Using the Medline database, we have identified 132 σ 70 promoters that are suitable for inclusion. Each promoter is of known nucleotide sequence and has proposed sites for the binding of regulatory proteins. The criteria for inclusion of a promoter include some level of functional information, not simply nucleotide sequence, as elaborated previously. Because the analysis concerns the E. coli genome, virus and plasmid promoters are not included. This database is approximately 25% larger than that used previously.
In addition, the database includes updated information on a number of promoters that were included previously. Tables 1 and 2 list the 25 new cases included here as well as 24 cases that were included previously and have now been updated. These modifications are primarily of two types. In some cases, new information has emerged about the regulation of particular promoters. This has led to modifications in the number and locations of regulatory binding sites. In other cases, the binding sites have now been located more precisely. In these cases, the positions were slightly modified to reflect the more detailed information.
Table 1New promotersa |
Table 2Modified promotersa |
Figure 1 displays the regulatory sites associated with each of these 132 promoters. The promoters are arranged alphabetically in groups, according to the most unique regulator that binds within them. For example, the first group includes two promoters, each controlled by the Ada regulator protein. The promoters are aligned by their transcription initiation points, designated position +1. Because promoters often contain multiple regulatory sites, the analysis described below actually uses several hundred protein-binding sites.
Figure 1 was analyzed for a number of interesting patterns, the statistical basis of which is presented in Table 3. A few simple definitions are necessary to explain the listings. "Repeated sites" refers to two or more sites within a single promoter that bind the same protein. "Remote sites" refers to those that appear in positions that are wholly upstream from position –70 or wholly downstream from position +40. In the sections that follow, the patterns and analyses in Fig. 1 and Tables 1 and 2 will be discussed in relation to transcriptional control.
Table 3Statistics for the collection of ?70 promotersa |
The data indicate that simple repression is the most common means of regulation of E. coli σ 70 promoters. Two-thirds of the listed promoters are controlled by the binding of repressors. The large majority of these are in fact controlled by a single type of repressor with no assistance from another regulator. That is, 69% of all promoters are repressible, 17% contain an additional activator site, and 3% contain a site for an additional repressor. Thus, much of E. coli macromolecular regulation (49% of promoters) can be seen as being accomplished by a collection of repressors each dedicated to a simple purpose—to mediating regulation in response to the level of a specific signal metabolite.
Activation is only slightly less common as a means of regulation, with half of promoters (49%) being regulated by the binding of activators. Activation too is mainly a dedicated type of regulation, as it is relatively rare for a promoter to use two different types of activators (only 7% do so). This rather simple view of regulation is further supported by the observation that only one promoter in six uses both an activator and a repressor. In fact, this latter statistic overstates the complexity of regulation, since it is almost always either Crp or Fnr that is used as the activator in a repressible promoter (Fig. 1).
Another view of the data is that approximately 75% of the promoters are controlled by a single type of regulator (49% by a single repressor and 25% by a single activator). Although these data point to a simple view of regulation via a collection of specifically dedicated regulators, the analysis does not imply that promoters are generally controlled in isolation from one another. That is, the data in Fig. 1 indicate that most promoters are part of regulons in which they are coregulated by independent binding of common regulators. For instance, there are six promoters controlled solely by the ArgR regulator, but these six are part of a single regulon. Indeed, the regulon can probably be viewed as the fundamental transcriptional regulatory unit in E. coli. We can identify only 2 (the aceB and kdpA promoters) of the 132 promoters analyzed as not being part of currently known regulons, and future investigation could conceivably find regulons that encompass even these two promoters. Thus, truly isolated promoters are exceedingly rare.
In summary, the regulatory complexity of the typical promoter can be seen as follows. It will be part of a regulon (at least 98%) and will be controlled by a repressor (most likely) or an activator (somewhat less likely) but rarely by both. Approximately 75% of the promoters in the database conform to this simple model of regulation.
We have used the data in Fig. 1 to analyze the distribution of locations of regulatory sites. In our previous report, this type of analysis was very revealing in terms of transcription mechanism. One significant conclusion, that is not altered in the current analysis, is that promoters rarely, if ever, contain remote binding sites in the absence of an additional proximal site. This will be further discussed below. Another major conclusion was that the location of a regulatory site is critically related to its function. This will be discussed here and in the following section.
The locations of regulatory sites were analyzed in a way that differed from that presented previously. For each promoter, the center of each binding site for each regulatory protein was located. The proximal promoter region was then broken down into intervals of 5 or 10 bp. Whenever a binding-site center appeared within a particular interval, its protein was counted as binding to that interval. However, when the same protein was found to bind to an interval more than once (in different promoters), it was counted as a single occurrence attributable to that interval. The number of occurrences within each interval was then compiled. Thus, we are analyzing the number of different types of proteins that bind in each 5- or 10-bp region of the promoter. This analysis reveals whether regulators in general have strongly preferred binding locations, whereas our previous analysis simply indicated which region of a promoter is most likely to overlap any regulator binding site.
The results of this analysis are presented in Fig. 2. Figure 2A shows the distribution of repressor binding sites, and Fig. 2B shows the distribution of activator binding sites. The most notable result, confirming the earlier analysis, is that the two distributions are very different. The difference is that there is a virtually exclusive zone of repression. That is, in positions downstream from –30, activator sites essentially do not appear, and thus this region is not represented in Fig. 2B. Repressor binding sites are by contrast very common within this zone (Fig. 2A). Thus, we consider the region downstream of –30 to be a virtually exclusive zone of repression. The database indicates that when a regulator binds to this region, it is virtually certain that it will act as a repressor.
We rationalize the exclusion of activators from this "forbidden" zone as being related to the mechanistic inference discussed previously, namely, that activators work primarily by touching polymerase. So far the existing biochemical data have indicated that polymerase may be touched by activators in either the alpha or the sigma subunit (23, 29). In the few cases studied thus far, these interactions are suggested to occur with that part of the polymerase that lies on the DNA near or upstream of the –35 recognition region (40). Activators would need to bind the DNA in a way that would allow them to touch this part of the polymerase. Thus, the forbidden zone for activation downstream from –30 probably reflects the difficulty in establishing such specific contacts from this downstream region. As discussed previously, the forbidden zone may also reflect an inability of activators to recognize double-stranded DNA and maintain their function while the region downstream from –12 becomes single stranded as an obligatory part of the opening that accompanies promoter activation.
Our view of the distribution of binding sites for activators differs somewhat from our previous analysis. Figure 2B shows that activators bind predominantly to positions between –80 and –30, but within this 50-bp region there are no strongly preferred positions. That is, the frequency of use of particular positions ranges from a low of two activator types in several intervals to a high of six different activators that bind from positions –76 to –80. In this analysis, all activators are given equal weight; our previous inference that position –40 is preferred derived primarily from the large number of Fnr and Crp sites that overlap this location; the different manner in which the data were analyzed previously led all of these sites to be counted independently.
The analysis might imply that polymerase can be touched by activators that bind to many different locations between –80 and –30. However, a further analysis, presented in Table 4, suggests that this might overstate the ability of polymerase to be touched from the further upstream positions. The derivation of this table differs from that of Fig. 2B in how it considers promoters that contain multiple activation sites. Table 4 lists only the proteins that bind nearer the promoter in cases in which there are multiple sites. The analysis shows that it is quite rare for an activator to have the center of its binding site far upstream and work without the assistance of a protein bound to a closer site. That is, of the 15 proteins that have binding sites centered upstream of –75 (Fig. 2B), only 2 work there without the assistance of another protein bound to a further downstream position. One of these, CysB, in fact has an exceptionally large site, and its downstream edge reaches well into the proximal region to position –50. TyrR is the sole protein that can apparently act alone from a long distance, as its footprint reaches no further downstream than –66. This analysis strongly suggests that an activator must closely approach the bound polymerase in order to activate in the absence of auxiliary sites.
Table 4Proximal activator sitesa |
When only the most proximal binding sites are considered (Table 4), it is clear that some positions are used more than others. As discussed previously, the common use of the region near –42 is derived largely, but not exclusively, from the large number of Crp and Fnr sites. However, this analysis does not imply that an individual activator can work from anywhere within the proximal region. The data in Table 4 suggest that individual proteins have preferred binding positions. As discussed previously, the proteins appear to have limited flexibility and function optimally from positions that are generally separated by a turn of the DNA helix. This observation implies that they must lie on a particular face of the DNA to contact a particular site on the bound polymerase.
Nonetheless, the analysis implies that different activators can accomplish this contact from quite different positions. We consider two simple models to account for this. In one, the activation surfaces of different proteins have different spatial relationships to their DNA-binding domains. That is, the activation surfaces of Crp lying at –42 and of MetR lying at –56 could conceivably be in the same spatial location. This location would be such that the activation surface was poised to touch the appropriate site on polymerase. Alternatively, there could be different sites on polymerase contacted by different activators; each such contact might be best accomplished from a unique promoter position. The current data cannot distinguish between these extreme models, but elements of both would be expected to have validity.
Regardless of the mechanistic details, it is clear that the location of a binding site is critically important to function. As discussed previously, it is known that if activator binding sites are artificially moved outside of the –30 to –80 zone identified here, their activation function is largely lost. In fact, activators can be converted to use as repressors by moving their binding sites to within the exclusive zone of repression. In one natural example, CysB, working from sites near –50, activates the cysJ and cysK promoters. However, CysB also represses its own promoter when its binding site is centered near position +5. This binding site is within the exclusive zone of repression where the binding of virtually any macromolecule has the potential to interfere with the function of polymerase, which itself covers the entire exclusive zone of repression.
There are a large number of naturally occurring examples of activators that function as repressors when the locations of their binding sites are changed; these are listed as dual regulators in Table 5. These cases typically involve the type of autoregulation just illustrated in the CysB example. Thus, it appears that promoters that control the production of regulators can keep the amount of regulator within fixed limits by using this strategy. The conversion of an activator to a repressor of its own synthesis allows production of the regulator to be reduced when it is in excess over need.
Table 5Roles of regulators in ?70 promotersa |
The listings in Table 5 also point up a remarkable property of E. coli regulators that was not observed previously. Basically, the data indicate that the rarest type of regulatory protein is the pure activator. Only 5 of the 21 different activator proteins analyzed have not yet been found to have an additional role as a repressor. By contrast, Table 5 shows that large numbers of repressors are apparently solely dedicated to their tasks. This observation suggests that it is quite easy to evolve a repressor function from an activator but more difficult to evolve an activator function from a repressor. This makes good sense in terms of what is known about the biochemical mechanisms involved in regulation. An activator can be converted to a repressor easily by simply introducing an appropriately placed binding site within what we have termed the exclusive zone of repression. To convert a repressor to an activator would require the more difficult task of changing the protein structure in a way that allowed an appropriate stereospecific contact to be made with the receptor sites on the polymerase.
Figure 2A shows that there do not seem to be highly preferred positions for the binding sites for repressors, as one might expect from this discussion. As we have noted, the overall zone of repression is essentially coincident with the DNA binding site for polymerase. The flexibility in operator location may reflect an ability to interfere with the function of polymerase from almost any position within its binding site. However, even for a single repressor, the position of binding is not constant when different promoters in a regulon are compared. As discussed previously, the location of the operator can determine how efficiently the repressor functions, and these differences may be used to advantage in designing into regulons a differential capacity to regulate their constituent operons. As with activation, the operating principle is not to maximize the quantitative extent of regulation but rather to make it appropriate to physiology.
The use of repetitive elements is fairly common in the database. Approximately half of all repressible promoters have two or more binding sites for the same repressor (corresponding to one-third of all promoters; Table 3). Activators use repeated elements somewhat less frequently, with approximately one-fourth having a repetitive element. The mechanistic basis for the need for repetitive elements in proximal positions is not known, but the data suggest possibilities. We note that the use of repetitive elements is primarily associated with a subset of regulators. Examples include duplicated operators in all six promoters repressed by ArgR and all five promoters repressed by MetJ. By contrast, the eight promoters that have proximal sites for the PurR regulator have no duplications. Although the database does contain examples of regulators appearing in both single and repetitive site contexts, an individual regulator generally shows a strong preference for one context or the other. This finding suggests that some regulators are simply designed to function optimally from duplicate binding sites. This could be for any number of reasons, including weak DNA binding or weak function of the bound protein. In such cases, occupancy of two sites could have evolved to be required for proper function.
Duplications could also be associated with specialized needs associated with individual promoters. Thus, they could also occur for proteins that have the theoretical capacity to function well from a single site. If the repetitive sites have a varying affinity for the regulator, then one can build in a gradual regulatory response. For example the location of the tightest binding site could have evolved to direct weak function. Stronger function could be directed only when the regulator concentration rises to levels that allow the lower-affinity sites to be bound.
There are two key observations about remote regulatory sites that are suggested by the analysis. First, regulator binding to remote elements occurs in a minority (approximately one-third) of promoters. Second, such sites appear to be obligatorily paired with another site that occurs in a proximal position (see above for activators). These considerations also apply to repression; for example, although the lac operon has operators in remote positions, it also has one in a proximal position. Similarly, although OmpR can act as an activator from remote sites, these sites are coupled to additional proximal sites. In several cases, it has been shown that inactivation of the proximal site nearly eliminates the function of the remote site, even though the remote site retains its original DNA sequence.
The mechanistic inference here is that regulation requires that there be a site located very close to the region where polymerase lies. Thus, as discussed above, virtually all σ 70 promoters appear to be controlled by a regulator touching the polymerase or its binding site and altering the function of polymerase. In a few cases, an upstream repressor site overlaps a proximal activator site and presumably represses by altering the properties or binding of the activator. In general, the function of the remote sites appears to be to modulate the primary interaction, which occurs between the polymerase and the regulator in the proximal region.
If the remote site is far from the proximal region, then this interaction may require looping out the intervening DNA. Presumably the considerable energy cost involved is compensated for by stabilizing protein-protein interactions at the base of the loop. Although one-third of σ 70promoters have sites in remote positions, there are not yet clear examples of remote regulators acting independently of the proximal regulator. Such observations would challenge this simple mechanistic view of regulation from remote sites.
The pairing of remote and proximal binding sites typically, but not exclusively, involves the same protein binding to the proximal and remote positions. As discussed previously, in several of these typical cases, the protein appears to bind cooperatively to the two sites. This can be easily rationalized in terms of protein quaternary structure; the Lac repressor tetramer contains two complete DNA-binding domains that can each recognize a complete operator, or the Deo repressor octamer contains four complete DNA-binding domains, giving it the potential to bind an even greater number of operators simultaneously. Even if the protein occurs predominantly as a dimer, loops might form if the protein has the potential to oligomerize into higher multimers, as in the case of AraC protein. In general, it has been argued that remote elements may occur more frequently when the proximal region is so crowded with elements that additional ones are most easily incorporated into remote positions.
The database also contains a small number of examples of remote and proximal sites that neither overlap nor bind the same protein. These cases with well-separated sites are intriguing because their function cannot be explained in terms of the quaternary structure of a single protein. Although the number of examples is too small to permit generalization, four of the six cases (mrp and nrdA are the exceptions) involve the global regulators Crp and Fnr. It is interesting that when Crp is involved, it is as a remote partner, and when Fnr is involved, it is as a proximal partner. It is not yet known if these two global regulators have evolved the unusual capacity to interact with other bound regulators as well as with polymerase. In the case of Crp, quite a diversity of activation mechanisms have been proposed, including preventing formation of competing repression complexes and changing DNA structure to a form preferred by polymerase or by other activators. However, the large majority of cases can still be accommodated with a model in which Crp touches polymerase, as has been shown directly in one case (6). In the cases discussed here, in which Crp works from a remote site, loop formation would be predicted to be involved. This could be facilitated in these special cases by the proximal activator which partly stabilizes the binding of polymerase, thereby providing an accessible target for Crp. The process is similar to the mechanism of activation described in the next section, whereby the polymerase is partly stabilized in the proximal region as a result of its association with σ 54, thus allowing activation from remotely bound activators.
Sigma factors are required for promoter recognition, and E. coli basically contains two families of sigma factors. The preceding section tabulates and analyzes the function of promoters recognized by the form of polymerase that contains the major sigma factor σ 70. With one exception all other sigma factors are members of the σ 70 family of proteins. Each member of the σ 70 family specifies binding to a type of promoter containing a different recognition element. In a sense, each sigma factor within this family can be thought of as a means of controlling a superregulon.
Although the amount of sequence and mechanistic information is still somewhat sparse, there is no indication yet that members of the σ 70 family use fundamentally different mechanisms to control promoter function. This is expected because, as discussed above, activators appear to work primarily through the alpha and sigma subunits of the polymerase. Polymerases within the σ 70 family contain identical alpha subunits and similar sigma subunits and thus are in principle subject to similar mechanisms of activation. For example, Crp appears to activate a promoter recognized by a stationary-phase sigma factor from a location that is common for σ 70 promoters (20). At least one promoter can be recognized by both types of holoenzymes (46), suggesting that it is bound in both cases in a way that would be compatible with contact between polymerase and activator. Although the differences within the sigma factors of this family could in principle allow for differences in regulation, this has not yet been evaluated.
By contrast, σ 54 is not a member of the σ 70 family of proteins. Instead, it has certain features that resemble somewhat those that are commonly associated with eukaryotic transcription factors (43). In the one case studied, σ 54 holoenzyme is not activated through the alpha subunit (28). Because its sigma subunit is so different, there is an opportunity for quite a different type of regulation, which appears to be the case. In fact, there is not yet an example of an activator stimulating both a σ 54 and a σ 70 promoter. Previously we discussed how the arrangement of regulatory elements in promoters recognized by σ 54-associated polymerase differed fundamentally from those recognized by the σ 70 form of polymerase. In this section, we discuss these arrangements and extend the comparison. The flow of the section will follow that used for σ 70 so that each topic can be compared for the two types of promoters.
Properties of known σ 54-dependent promoters of E. coli and Klebsiella aerogenes are summarized in Fig. 3. The small data set includes 14 promoters that use the three activators whose DNA binding sites have been localized. Even a cursory examination of Fig. 3 shows that the binding sites are in locations that differ from the locations of σ 70 activators. Almost all of the binding sites are in positions well upstream from the proximal zone that contains the σ 70 activation sites. The center of the distribution of activator sites for σ 54 promoters lies near position –120. As discussed above, this location appears to be too far from the polymerase-binding site to allow polymerase to be touched without looping out the intervening DNA. Indeed, a number of studies have confirmed the role of DNA looping in σ 54-dependent transcription (45).
Even for a particular activator, the location of its binding site appears to be quite variable when different promoters are compared. The preference for particular locations exhibited by certain σ 70 activators (e.g., Fnr binding near –40) is not apparent for the three known σ 54 activators. Thus, NRI/NtrC-binding sites appear in a number of positions between –90 and –160. Binding sites for FhlA and NifA also are common within these positions but also occur at much greater distances. A functional FhlA site was recently found more than 700 bp upstream (22), and a NifA site exists at a position 250 bp upstream. In addition, studies have shown that several of these σ 54 activation sites can be moved to even more remote positions and still retain residual function, as discussed previously. This flexibility of location has led to the designation of activators of σ 54 as enhancer proteins, based on the term that describes remote activators in mammalian cells. As discussed above, this flexibility is also in strong contrast to activators of σ 70-dependent transcription, which have tight restrictions on their locations for activation.
The enhancer-type mechanism of σ 54 promoters is easy to understand in terms of the involvement of DNA looping. Because loops are involved, the precise distance between the activator and the polymerase becomes less important as the intervening DNA is simply looped out. The lack of locations quite close to the polymerase-binding site is probably related to the difficulty in forming a short DNA loop, which has quite a high energy cost due to the stiffness of DNA. This DNA stiffness can cause the location to become more important when the distances are short.
Figure 3 shows that the regulatory choices of σ 54-dependent promoters are remarkably different from the choices evident for σ 70-dependent promoters. The primary difference is that no repressors of σ 54-dependent transcription have yet been discovered. All of the regulation appears to be effected by controlling the activity of activator proteins. This property too is reminiscent of eukaryotic systems, in which DNA-binding repressors are much more rare than activators. Obviously the concept of zones of repression and activation, developed for the σ 70 system, has little relevance to the σ 54 system. In this case, there are no repressors yet identified and an activator apparently can bind within any distance compatible with its ability to loop out the intervening DNA and contact the polymerase.
By the definitions used for σ 70 promoters, essentially all of the σ 54 promoters have activator sites in remote positions. However, the organization of σ 54 promoters differs fundamentally from that of even the minority of σ 70 promoters that contain remote sites. In the case of σ 70, remote sites are obligatorily coupled to other activator-binding sites that reside in proximal positions. This does not generally occur for σ 54 promoters. These organizational arrangements of σ 54 promoters are simple to rationalize in terms of the ability of remotely bound activators to contact and activate melting by the polymerase directly via looping mechanisms. This occurs easily because holoenzymes containing σ 54 form unusually stable closed complexes at their promoters. The remote σ 70 activators may be largely restricted to contacting another activator protein, itself bound close to the polymerase-binding site.
Because the E. coli genome is so compact, the common use of the σ 54-type remote-control mechanism would probably lead to difficulties in preventing undesirable cross-regulation. In the compact bacterial genome, promoters are sometimes located in close proximity. Thus, an activator capable of working from a variety of locations and distances could conceivably activate a nearby promoter connected to a different regulon. The problem does not arise in E. coli because σ 54 promoters are rare and σ 70 promoters respond poorly to long-range activation.
There are two known cases in which such problems could potentially arise as a result of closely arrayed σ 54 promoters. In these cases, the problem is avoided via the inclusion of a binding site for the general chromosomal protein integration host factor (IHF) (5, 22). IHF sites associated with regulatory regions of nearby promoters cause them to act as independent units of regulation; each promoter responds selectively to an activator bound to a different DNA site. This is proposed to occur by use of the known ability of IHF to bend DNA. It may bend the DNA into a specific form that brings certain activator sites into proximity with certain promoters. That is, in the presence of IHF bound to a specifically located site, the chromosome may assume a superstructure which is compatible with the formation of some loops but not others.
Approximately half of the σ 54 promoters contain a binding site for IHF. These include promoters that are not closely arrayed, suggesting another role for IHF. Evidence suggests that the role of IHF in these cases is to stabilize DNA loop formation when transcription might otherwise be weak because of poor binding of the polymerase to its binding site at the base of the loop. When mutations that strengthen closed complex formation by the polymerase are made, the stimulation by IHF is much decreased. IHF stimulation does not appear to be due to a role stabilizing closed complexes (41). Thus, in these cases, IHF sites appear to compensate for the weak promoter elements by facilitating the interaction of polymerase and activator via promoting an appropriately positioned loop. In principle, IHF could also compensate for weak activator-binding sites, although this has not yet been observed.
These considerations raise interesting questions about the global organization of regulatory elements in genomes. For a weak σ 54 promoter, it has been found that there is a strict spatial organization of the regulatory region. The activator, IHF, and polymerase sites must be strictly in phase; otherwise, IHF not only fails to activate but actually represses (7). This strongly restricts the ability of the regulatory regions of weak promoters to evolve, in a sense imposing restrictions like those in place for σ 70 promoters. By contrast, σ 54 promoters with strong basal elements do not rely on IHF, and thus activator-binding sites can be located at somewhat more variable positions. Thus, strong promoters may evolve more rapidly when required, but they must simultaneously evolve mechanisms to prevent nonspecific cross-activation. Eukaryotes have apparently solved this problem by evolving large genomes with large distances between promoters. In many bacteria, the use of σ 54 appears to be more common than for E. coli, and it is not yet clear how their genomes have evolved to deal with these issues.
In general, it seems that in almost every aspect, the arrangement of σ 54 promoters differs from that of σ 70 promoters (the exception is the use of repetitive elements, which is not very different). The σ 54 system is designed to respond to activators rather than repressors, and these activators work from remote positions. This ability to respond to remote activators is conferred on the polymerase by σ 54 itself because it occurs via the use of three different σ 54-specific activators. It has been argued that the key aspect of the mechanism is the ability of σ 54 to bring polymerase to the promoter in the form of a closed complex where its melting ability is held in check (49); this provides a target for the remote activator. Such targets are absent in σ 70 systems because the promoters are not occupied by closed complexes containing polymerase. The organizational design of σ 54 promoters thus may be driven by the ability of σ 54 to deliver the polymerase but keep its melting function in check until activator is delivered.
There are not yet clues as to why this alternative system exists in E. coli and other bacteria or as to when it evolved compared with the apparently analogous systems in higher cells. Several aspects of the σ 54 system, not discussed here, also resemble aspects of eukaryotic transcription (15). Further biochemical studies of bacterial and mammalian systems and evolutionary studies of genomic organization will be required to approach answers to such questions.
Approximately 150 E. coli promoters have been identified, and the locations of their regulatory sites have been cataloged and analyzed. A number of new perspectives emerge concerning how bacterial genomes and regulatory networks are organized. Conclusions include the following. (i) The most typical type of promoter regulation involves a single dedicated regulator, more often a repressor than an activator. Three-fourths of promoters are regulated in this manner. However, even these promoters are generally under regulon control, which encompasses at least 98% of promoters. (ii) Activators typically bind in a 50-bp region between –30 and –80. Most activators have evolved to also act as repressors by binding to certain promoters at positions downstream from –30. These examples often involve autoregulation. (iii) Repressors bind primarily to locations that overlap the polymerase binding site and thus are concentrated in an approximately 100-bp region. (iv) Binding sites for repressors and activators that occur outside these typical zones appear to act in conjunction with an additional site within these zones. (v) The exceptions are promoters recognized by σ 54 holoenzyme, which have a different organization and mechanism of control. These are regulated solely by activation, and the activators contact the polymerase by looping from far locations. Looping is often assisted by the general chromosomal protein IHF. (vi) Duplicated regulator binding sites appear in approximately half of promoters. Individual regulators have a strong tendency to appear in either a duplicated or a single-site context. (vii) All of these conclusions can be rationalized in terms of existing biochemical information about the mechanism of transcriptional regulation.
We are indebeted to Boris Magasanik for his very helpful suggestions and advice. J.C.-V. thanks Heladia Salgado for her work in preparing the figures. Preparation of this chapter was supported by Public Health Service grant GM35754 to J.D.G. and by grants fromCONACYT and DGAPA-UNAM to J.C.-V.
References
1. Andrews, A. E., B. Lawley, and A. J. Pittard. 1991. Mutational analysis of repression and activation of the tyrP gene in Escherichia coli. J. Bacteriol. 173:5068–5078.
2. Augustin, L. B., B. A. Jacobson, and J. A. Fuchs. 1994. Escherichia coli Fis and DnaA proteins bind specifically to the nrd promoter region and affect expression of an nrd-lac fusion. J. Bacteriol. 176:378–387.
3. Bonnefoy, V., and J. A. DeMoss. 1992. Identification of functional cis-acting sequences involved in regulation of narK gene expression in Escherichia coli. Mol. Microbiol. 6:3595–3602.
4. Bremer, E., A. Middendorf, J. Martunissen, and P. Valentin-Hansen. 1990. Analysis of the tsx gene, which encodes a nucleoside-specific channel-forming protein (Tsx) in the outer membrane of Escherichia coli. Gene 96:59–65.
5. Charlton, W., W. Cannon, and M. Buck. 1993. The Klebsiella pneumoniae nifJ promoter: analysis of promoter elements regulating activation by the nifA promoter. Mol. Microbiol. 7:1007–1021.
6. Chen, Y., Y. W. Ebright, and R. H. Ebright. 1994. Identification of the target of a transcription activator protein by protein-protein photocrosslinking. Science 265:90–92.
7. Claverie-Martin, F., and B. Magasanik. 1992. Positive and negative effects of DNA bending on activation of transcription from a distant site. J. Mol. Biol. 227:996–1008.
8. Collado-Vides, J., B. Magasanik, and J. D. Gralla. 1991. Control site location and transcriptional regulation in Escherichia coli. Microbiol. Rev. 66:371–394.
9. Cortay, J.-C., D. Negre, A. Galinier, B. Duclos, G. Perrière, and A. J. Cozzone. 1991. Regulation of the acetate operon in Escherichia coli: purification and functional characterization of the IcIR repressor. EMBO J. 10:675–679.
10. de Lorenzo, V., S. Wee, M. Herrero, and J. B. Neilands. 1987. Operator sequences of the aerobactin operon of plasmid ColV-K30 binding the ferric uptake regulation (Fur) repressor. J. Bacteriol. 169:2624–2630.
11. De Reuse, H., A. Kolb, and A. Chanchin. 1992. Positive regulation of the expression of the Escherichia coli pts operon: identification of the regulatory regions. J. Mol. Biol. 226:623–635.
12. DiRusso, C. C., T. L. Heimert, and A. K. Metzger. 1992. Characterization of FadR, a global transcriptional regulator of fatty acid metabolism in Escherichia coli. J. Biol. Chem. 267:8685–8691.
13. DiRusso, C. C., A. K. Metzger, and T. L. Heimert. 1993. Regulation of transcription of genes required for fatty acid transport and unsaturated fatty acid biosynthesis in Escherichia coli by FadR. Mol. Microbiol. 7:311–322.
14. Gerlach, P., L. Sogaard-Andersen, H. Pedersen, J. Martinussen, P. Valentin-Hansen, and E. Bremer. 1991. The cyclic AMP (cAMP)-cAMP receptor protein complex functions both as an activator and as a corepressor at the tsx-p 2 promoter of Escherichia coli K-12. J. Bacteriol. 173:5419–5430.
15. Gralla, J. D. 1991. Transcriptional control—lessons from an E. coli promoter database. Cell 66:415–418.
16. Hamilton, E. P., and N. Lee. 1988. Three binding sites for AraC protein are required for autoregulation of araC in Escherichia coli. Proc. Natl. Acad. Sci. USA 85:1749–1753.
17. He, B., K. Y. Choi, and H. Zalkin. 1993. Regulation of Escherichia coli glnB, prsA, and speA by the purine repressor. J. Bacteriol. 175:3598–3606.
18. He, B., J. M. Smith, and H. Zalkin. 1992. Escherichia coli purB gene: cloning, nucleotide sequence, and regulation by purR. J. Bacteriol. 174:130–136.
19. Heatwole, V. M., and R. L. Somerville. 1992. Synergism between the Trp repressor and Tyr repressor in repression of the aroL promoter of Escherichia coli K-12. J. Bacteriol. 174:331–335.
20. Hengge-Aronis, R., R. Lange, N. Henneberg, and D. Fischer. 1993. Osmotic regulation of rpoS-dependent genes in Escherichia coli. J. Bacteriol. 175:259–265.
21. Henry, M. F., and J. E. Cronan, Jr. 1992. A new mechanism of transcriptional regulation: release of an activator triggered by small molecule binding. Cell 70:671–679.
22. Hopper, S., M. Babst, V. Schlensog, H. M. Fischer, H. Hennecke, and A. Bock. 1994. Regulated expression in vitro of genes coding for formate hydrogenlyase components of Escherichia coli. J. Biol. Chem. 269:19597–19604.
23. Ishihama, A. 1992. Role of the RNA polymerase α subunit in transcriptional activation. Mol. Microbiol. 6:3283–3288.
24. Jennings, M. P., and I. R. Beacham. 1993. Co-dependent positive regulation of the ansB promoter of Escherichia coli by CRP and the FNR protein: a molecular analysis. Mol. Microbiol. 9:155–164.
25. Kasahara, M., K. Makino, M. Amenura, A. Nakata, and H. Shinagawa. 1991. Dual regulation of the ugp operon by phosphate and carbon starvation at two interspaced promoters. J. Bacteriol. 173:549–558.
26. Kukolj, G., and M. S. DuBow. 1992. Integration host factor activates the Ner-repressed early promoter of transposable Mu-like phage D108. J. Biol. Chem. 267:17827–17835.
27. Larson, T. J., J. S. Cantwell, and A. T. van Loo Bhattacharya. 1992. Interaction at a distance between multiple operators controls the adjacent, divergently transcribed glpTQ-glpACB operons of Escherichia coli K-12. J. Biol. Chem. 267:6114–6121.
28. Lee, H.-S., A. Ishihama, and S. Kustu. 1993. The C terminus of the α subunit of RNA polymerase is not essential for transcriptional activation of σ54 holoenzyme. J. Bacteriol. 175:2479–2482.
29. Li, M., H. Moyle, and M. M. Susskind. 1994. Target of the transcriptional activation function of phage lambda dI protein. Science 263:75–77.
30. Lobell, R. B., and R. F. Schleif. 1990. DNA looping and unlooping by AraC protein. Science 250:528–532.
31. Lonetto, M., M. Gribskov, and C. A. Gross. 1993. The sigma 70 family: sequence conservation and evolutionary relationships. J. Bacteriol. 174:3843–3849.
32. Monroe, R. S., J. Ostrowski, M. M. Hryniewicz, and N. M. Kredich. 1990. In vitro interactions of CysB protein with the cysK and cysJIH promoter regions of Salmonella typhimurium. J. Bacteriol. 172:6919–6929.
33. Ninnemann, O., C. Koch, and R. Kahmann. 1992. The E. coli fis promoter is subject to stringent control and autoregulation. EMBO J. 11:1075–1083.
34. Ostrovski de Spicer, P., K. O’Brien, and S. Maloy. 1991. Regulation of proline utilization in Salmonella typhimurium membrane-associated dehydrogenase binds DNA in vitro. J. Bacteriol. 173:211–219.
35. Ostrowski, J., and N. M. Kredich. 1991. Negative autoregulation of cysB in Salmonella typhimurium: in vitro interactions of CysB protein with the cysB promoter. J. Bacteriol. 173:2212–2218.
36. Phillips, S. E. V., I. Mansfield, I. Parsons, B. E. Davidson, J. B. Rafferty, W. S. Somers, D. Margarita, G. N. Cohen, I. Saint-Girons, and P. G. Stockley. 1989. Cooperative tandem binding of met repressor of Escherichia coli. Nature (London) 341:711–715.
37. Rabin, R. S., L. A. Collins, and V. Stewart. 1992. In vivo requirement of integration host factor for nar (nitrate reductase) operon expression in Escherichia coli K-12. Proc. Natl. Acad. Sci. USA 89:8701–8795.
38. Ramani, N., L. Hu, and M. Freundlich. 1992. In vitro interactions of integration host factor with the ompF promoter-regulatory region of Escherichia coli. Mol. Gen. Genet. 231:248–255.
39. Rex, J. H., B. D. Aronson, and R. L. Somerville. 1991. The tdh and serA operons of Escherichia coli: mutational analysis of the regulatory elements of leucine of leucine-responsive genes. J. Bacteriol. 173:5944–5953.
40. Ross, W., K. K. Gosink, J. Salomon, K. Igarashi, C. Zou, A. Ishihama, K. Severinov, and R. L. Gourse. 1993. A third recognition element in bacterial promoters: DNA binding by the alpha subunit of RNA polymerase. Science 262:1407–1413.
41. Santero, E., T. R. Hoover, A. K. North, D. K. Berger, S. C. Porter, and S. Kustu. 1992. Role of integration host factor in stimulating transcription from the σ54-dependent nifH promoter. J. Mol. Biol. 227:602–620.
42. Sarsero, J. P., P. J. Wookey, and A. J. Pittard. 1991. Regulation of expression of the Escherichia coli K–12 mtr gene by TyrR protein and Trp repressor. J. Bacteriol. 173:4133–4143.
43. Sasse-Dwight, S., and J. D. Gralla. 1990. Role of eukaryotic-type functional domains found in the prokaryotic enhancer receptor factor σ54. Cell 62:945–954.
44. Slany, R. K., and H. Kersten. 1992. The promoter of the tgt/sec operon in Escherichia coli is preceded by an upstream activation sequence that contains a high affinity FIS binding site. Nucleic Acids Res. 20:4193–4198.
45. Su, W., S. Porter, S. Kustu, and H. Echols. 1990. DNA-looping and enhancer activity: association between DNA-bound NTRC activator and RNA polymerase at the bacterial glnA promoter. Proc. Natl. Acad. Sci. USA 87:5504–5508.
46. Tanaka, K., Y. Takayanagi, N. Fujita, A. Ishihama, and H. Takahashi. 1993. Heterogeneity of the principal sigma factor in Escherichia coli: the rpoS gene product, sigma 38, is a second principal sigma factor of RNA polymerase in stationary-phase Escherichia coli. Proc. Natl. Acad. Sci. USA 90:3511–3515.
47. Tardat, B., and D. Touati. 1993. Iron and oxygen regulation of Escherichia coli MnSOD expression: competition between the global regulators Fur and ArcA for binding to DNA. Mol. Microbiol. 9:53–63.
48. Teo, I., B. Sedwick, M. W. Kilpatrick, T. V. McCarthy, and T. Lindahl. 1986. The intracellular signal for induction of resistance to alkylating agents in E. coli. Cell 45:315–324.
49. Tintut, Y., C. Wong, Y. Jiang, M. Hsieh, and J. D. Gralla. 1994. Core RNA polymerase binding mediated by the strongly acidic hydrophobic repeat region of Sigma-54. Proc. Natl. Acad. Sci. USA 91:2120–2124.
50. Tyson, K. L., A. I. Bell, J. A. Cole, and S. J. Busby. 1993. Definition of nitrite and nitrate response elements at the anaerobically inducible Escherichia coli nirB promoter: interactions between FNR and NarL. Mol. Microbiol. 7:151–157.
51. Vidal-Ingigliardi, D., and O. Raibaud. 1991. Three adjacent binding sites for cAMP receptor protein are involved in the activation of the divergent malEp-malKp promoters. Proc. Natl. Acad. Sci. USA 88:229–233.
52. Wang, Q., and J. M. Calvo. 1992. Lrp, a global regulatory protein of Escherichia coli cooperatively to multiple sites and activates transcription of ilvIH. J. Mol. Biol. 229:306–318.
53. Weickert, M. J., and S. Adhya. 1993. Control of transcription of gal repressor and isorepressor genes in Escherichia coli. J. Bacteriol. 175:251–258.