Recombinational Exchange among Clonal Populations
Chapter
145
ROGER MILKMAN
The structure of genetic variation in a bacterial species is the result of recombination superimposed on the repeated formation and spread of clones. An understanding of the dynamics of this genetic variation calls for an understanding both of clonality and of the processes of recombination together with an interpretation of the resultant patterns of replacement, i.e., the mosaic patterns of sequences seen when genomes are compared within the species. Clonality is not difficult to define or to describe in the necessary quantitative terms. Recombination processes, molecular and otherwise, have proved over the years to be both subtle and multifarious, yet the experimental progress toward their definitive description has been impressive. In fact, it is now practical to apply what we know about recombination in defined systems to the more loosely controlled complexities of recombination in nature: we have a small but substantial body of comparative sequence data that must be explained in terms of clonality and recombination.
Qualitatively, a clone consists of an exclusive common ancestor and all of its descendants. Exclusive means that all the genetic material in all the descendants has come from the common ancestor. Descent is the property by which clonality is defined (2). Genetic identity has in the past been part of the definition (2), but the vastly increased resolving power of sequencing makes it clear that classic genetic markers identified by morphological and physiological character states are rather widely scattered on the chromosome. Moreover, the mutant forms of these markers are not likely to be structurally uniform, and they are not even necessarily uniform in their specific functional differences from a standard allele. This brings to mind Walter Fitch’s familiar statement: "Similarity is an observation; homology is a conclusion" (W. M. Fitch, personal communication, 1970; see also reference 73). Further, it is now time, especially in the present context, to recognize the quantitative aspect of clonality. As illustrated in Fig. 1, clonality is a hierarchical relationship, subject in principle to limitless subdivision. Cell A and all its descendants constitute a clone (clone A). Cell B is a member of that clone and is also the ancestor of clone B. As cell B is more recent than cell A, so clone B has a more recent ancestry than clone A, in which it is nested. Clearly, any clone has an age, which is the time that has elapsed since its most recent common ancestor existed. In the present discussion, the age of a particular clone, hypothetical or estimated, is often of central importance.
Clonality is closely tied to two other ancestry-based relationships, homology and phylogeny. Homology is the relationship of two biological entities sharing a common ancestry. Phylogeny is the grand scheme of ancestry and descent in the living world. Actually, we are prepared to recognize many subsidiary and occasionally distinct phylogenies. We tentatively expect that all organisms share a common ultimate ancestry, and thus entire chromosomes can be traced to earlier times. In addition, though, we speak separately of chromosome parts: gene phylogeny (20), gene trees (5), (clonal) frameworks (56), clonal frames and clonal segments (49, 51), and families of genes, proteins, and domains (26). (See also references 42 and 90.)
Evolution recognizes two kinds of divergence. One follows the separation of lines of organismal descent, and the other follows the duplication of a gene within a single line of descent. The resultant homologous relationships are called, respectively, orthology and paralogy (20). In addition, we know that some genetic material is transferred across the boundaries of organismal phylogeny. In its new setting, the horizontally transferred DNA is called xenologous (22), and its particular phylogeny is quite distinct. It is thus clear that phylogeny, homology, and clonality all can be described for parts of genomes as well as for entire genomes.
The present paradigm of bacterial evolution is based on the experiments and resulting periodic selection model of Atwood, Schneider, and Ryan (4). Batch cultures of Escherichia coli were started with a low frequency of a histidineless mutant allele that reverted to wild type at a rate near 10–8 per cell generation; the forward rate of mutation was about 10–6. In the medium used, h + and h – were neutral. h + accumulated but not to the expected equilibrium level of 10–2. Instead, its frequency stabilized near 10–6. A series of additional experiments confirmed the explanation that somewhere in the genome, a mutation occasionally occurred that was favorable in the culture conditions, which of course were not identical to those conditions occurring over the evolutionary history of the bacteria used. The favorable mutation occurred almost always in the h – component of the population, which constituted the vast majority. Occurrence of the favorable mutation would rise in frequency, taking with it the entire "hitchhiking" genome in these nonrecombining cultures. This resulted in the periodic subcloning of the cultures, which could be inferred from fluctuations of the neutral marker frequencies. In the original experiments, waves of "periodic selection" drove the frequency of h + down. It was suggested that the h – population need not always win, but rather that in the long run, a rare h + cell might occasionally be the lucky recipient of a favorable mutation. (Various subsequent specialized studies confirm this interpretation. For example, Chao and Cox [12] showed that a certain mutator strain would usually replace a normal strain in a chemostat if the mutator strain started at a minimum frequency of 10–5.) The consequence of periodic selection is the repeated elimination of genetic variation that has accumulated by mutation and random genetic drift. As applied to the worldwide populations of the species as a whole, the model suggests that E. coli consists of a set of clones of various ages whose respective common ancestors date back, as a consequence of periodic selection, to times much more recent than that of the origin of the species. The diversity of the environments in which E. coli lives has not been quantified in a way that allows us to predict precisely the degree and dynamics of genotypic diversity, but it is expected that no single genotype is best for all individuals of the worldwide population. In other words, the clones that arise and spread will ordinarily not reach fixation but will fail to completely replace certain existing clones (74). It was not obvious from the model that any clone would be large enough to identify, but in fact, some clones have turned out to be very large indeed. Incidentally, they do not appear to be associated with particular animal hosts (44, 81).
The neutralist-selectionist controversy was at its peak in the years that immediately followed Kimura’s assertion (28, 30) that the observed degree of polymorphism in the amino acid sequences of proteins must be largely neutral, because otherwise, the selection necessary to maintain the polymorphisms would result in a greater number of deaths than any reasonable reproductive rate could overcome. Selection theory has advanced since then (14, 29, 30, 31, 45, 46), and the known degree of polymorphism has increased vastly with the advent of DNA sequencing, but 25 years ago, as now, an empirical resolution was sought. Many attempts were made to decide the issue by testing an equation put forward by the neutralists as follows. In general terms, a steady state of any quantity is achieved when the output is equal to the input. A good measure of the level of genetic variation in a population is the probability of finding different alleles in two successive random draws. We designate this probability P. Δ P is the change in P per generation, and k 1 and k 2 are first-order rate constants for the interconversion of the two probability values, P and 1 – P. Thus,
Δ P = –k 1P + k 2(1 – P). (1)
P = 1 – pi 2 where pi is the frequency of the ith allele, and so 1 – P = pi 2. Also, k 1 = 1/Ne, where Ne is the number of genomes behaving independently with respect to extinction (13, 30). Here, k 2 = 2un, twice the neutral mutation rate, because a new mutation in either of the two successively drawn alleles will make them differ. In the steady state, where
k 1P = k 2(1 –P ),
In the parlance of population genetics, the term "effective number of alleles," n e, is defined as 1/ pi 2, and the familiar form of the equation is thus
ne – 1 = 2unNe. (3)
Since there is always at least one allele at a locus, the proper expression of neutral genetic variation in this equation is indeed ne – 1. Various population properties are represented by Ne, with explicit reference to a specific property, as exinction is referred to above. In diploids (where N is used to represent the number of organisms (rather than of haploid genomes), the expression familiar to population geneticists is [from ne – 1 = 2un(2Ne)] now ne – 1 = 4Neun, which again explicitly contains parameters of importance to the issue.
Attempts to resolve the neutral versus selection controversy by experimental observation initially centered on the use of multilocus enzyme electrophoresis (MLEE) in animals and plants to estimate ne – 1 and determine whether it equalled 4Nuun. The steady-state requirement was initially ignored: for the equation to be valid, the population had to be at its effective size for (in diploids) at least 4Ne generations. (If a smaller value for the effective population size would suffice to demonstrate a clear deficiency of variation, a correspondingly smaller number of generations could be used.)
E. coli was proposed as an organism that could satisfy the steady-state condition (43); however, Atwood (43) immediately objected on the grounds that the regular recloning in nature inferred from the periodic selection model invalidated the steady-state equation for E. coli. According to this view, if variation is periodically eliminated by clonal selection, the required steady state is never reached (see reference 43 for the discussion that followed). Atwood’s view was subsequently supported by others (32, 36, 37, 40). Meanwhile, an MLEE study of E. coli was undertaken (44). Experiments on five loci in some 800 strains of diverse natural geographic origins and host sources produced an ne – 1 value 2 or 3 orders of magnitude lower than 2Neun (originally given, in error, as 4Neun), and neutrality was denied on the grounds that insufficient variation had been observed in a population believed to be at a steady state after many millions of years of evolution (44). Nevertheless, subsequent experiments by Selander and Levin (81), who used many of the same strains and other strains, initially with 20 loci, demonstrated the recurrence of complex genotypes to an extent that confirmed sufficient clonal structure to support the idea of recent genome-wide hitchhiking and to invalidate Milkman’s premise of a long-established steady state. Extensive MLEE studies of E. coli strains followed (reviewed in reference 80 and chapter 148). Also, Milkman and Crawford (48) found striking similarities in compared sequences (900 to 1,600 bp long) in the trpCBA region, supporting the existence of very large and widely distributed clones in the species. Three sequences were identical to the K-12 sequence, five differed by 1 nucleotide (each unique), three differed by a uniform set of 10 nucleotides, and one differed by 44 nucleotides. For a set of five properties, four electrophoretic and one of thermostability, no two of the wild K-12-like strains were alike.
Then in 1986, Dykhuizen and Green (19), using the very same strains, reported a dramatic contrast in both degree and clustering between sequence variation at the gnd locus and that found for trpCBA by Milkman and Crawford (48). At this point, recombination in E. coli in nature at evolutionarily significant frequencies became recognized as a fact of life. Soon after, in 1988, the comparative sequencing of phoA revealed short discontinuities within the gene, and these were concluded to result from intragenic recombination. Some replacements were seen as coming from outside the set of strains being studied, and a contrast in size was noted between the short discontinuous segments and the larger molecules that are transmitted by conjugation or transduction. A possible role of restriction-modification polymorphism in cutting up the entrant molecules and so producing numerous recombinogenic ends was suggested (18, 69). The paradigm of periodic selection thus had to be modified.
In its original form, periodic selection was proposed for "non-sexual bacteria," whose "immediate source of genetic variability resides in the capacity of the existing genotype to mutate, and not in the emergence of recombinant types" (4). The sequence studies show recombinational activity, but not enough to obliterate all evidence of clonality. Thus, the rate and extent of recombinational replacement together with the rate of recloning must be consistent with the persistence of sizable regions of uniform or nearly uniform DNA sequence. The origin and massive spread of a clone must occasionally take place quickly, so that on the order of perhaps 10% of the individuals in the species contain large stretches of nearly uniform DNA. Within such a clone (Fig. 2), members begin with genome-wide identity. Later, recombinational replacements from outside the clone are seen (the size of each replacement is greatly exaggerated for visual clarity). The replacements, which have their own clonal origins, are called clonal segments. Still later, early replacements may themselves be replaced, entirely or in part, by other segments; the remnants of the ancestral chromosome, while still the major portion, constitute the clonal frame. Eventually, the clonal frame may no longer be recognizable. The pattern of this process in turn requires both an extension of the clonal concept and a reconciliation of the values of several parameters.
The initial genome-wide clones, as well as sets of clonal segments covering a given stretch of the chromosome, can be described comfortably in existing terms. As noted previously, a group of entities that share exclusively and in their entirety a common ancestor constitute a clone. A problem arises, however, with the characterization of a group of genomes that merely share a clonal frame. While the clonal frame covers the majority of the chromosome in each genome, it will not be coextensive over the set of genomes and indeed is not likely to be coextensive over any two genomes compared at random from the set. The common clonal frame indicates that these genomes are in cells sharing a recent common cellular ancestor, but their genomic clonality is compromised by the presence of contributions from less closely related cells. It is therefore proposed to call such a set of genomes a meroclone (cf. "merozygote" and "merodiploid"). In the strains we have sequenced comparatively (50), the "red" group, which shares a K-12-like clonal frame, can now be referred to as the K-12 meroclone; the "purple" group can be referred to as the ECOR 70 meroclone; and the "green" group can be referred to as the ECOR 51 meroclone. A meroclone is thus a group of entities in each of which a majority of the genetic makeup is derived from a given most recent common ancestor. The term may also be useful with regard to relationships inferred from MLEE similarities.
Reconstruction of the recent microevolution of E. coli requires a quantitative expression of the periodic selection-recombination paradigm, including reconciliation of the following parameter values.
(i) The rate of mutation to a broadly favorable allele (a motivating allele) that carries an entire chromosome to high frequency and thus produces a large new clone. This rate must be very small compared even to the rate of specific nucleotide substitution (e.g., A→C) at any given site, for this rate is estimated at 10–10 per cell generation, as will be described shortly. The mutation to a favorable allele must therefore be a coincidence of two or three substitutions or else some other very improbable event. A higher rate would produce many tiny and therefore unrecognizable clones.
(ii) The rate of retained nucleotide substitution. The mutation rate covers insertions, deletions, and small inversions as well as nucleotide substitutions. Drake estimates the mutation rate in E. coli as 4 × 10–10 per nucleotide per cell generation(17); the nucleotide substitution rate is taken to be 3 × 10–10 per nucleotide (from data in reference 17 and from J. W. Drake, personal communication, 1990). It is tentatively assumed that on average, over the time period considered, each codon has about one neutral alternative (mainly in the third position), with other substitutions being deleterious. Thus, the rate of retained nucleotide substitution is 10–10 per codon per generation. This rate is the basis of the clock to which all other events are referred.
(iii) The mean selective advantage of a motivating allele. This advantage determines the number of generations required for a single new allele to reach a frequency of, say, 0.1 in the population sampled. Assuming that the world E. coli population consists of 1020 cells, all susceptible to inclusion in a sample, the motivating allele would thus reach a number of 1019. A selective advantage, s, of 10–5, equivalent to a fitness, W (= 1 + s), of 1.00001, would produce 1019 cells in some 4.4 × 106 generations, according to the simple formula
N is the final number of individuals, here 1019; W is fitness (1.00001); and g is the number of generations (4.4 × 106). Fitness is taken to be the value averaged over all the experiences along the way, that is, a realized fitness, rather than an a priori value that would still be vulnerable to varying selective conditions.
This formula assumes that selection operates effectively from the moment of origin of the motivating allele. In fact, that is not true: any new favorable allele with a selective advantage of less than 0.5 is more likely to be lost than saved. Random genetic drift governs the fate of any new favorable allele until it becomes extinct or reaches the safe number, which is on the order of 1/s and at which the allele is extremely likely to reach fixation (or some stable equilibrium frequency). Of course, the takeover by selective forces is gradual, not abrupt. In the present case, the safe number is about 100,000. The probability of reaching this number by drift alone is on the order of s, or 0.00001; the gradually increasing effect of selection makes this probability somewhat greater. Interestingly, the lucky allele’s trajectory is unusual not only in direction but in speed: it reaches the safe number faster than it would if selection were able to determine its rate of increase in the (impossible) absence of drift. The rate of drift is determined here by the number of copies of the motivating allele: here, the effect of the size of the population is negligible.
The foregoing information is necessary to understanding the events surrounding the formation of new large clones by periodic selection. The following parameter value is needed for conversion to real time.
(iv) The mean number of generations per year in the history of living E. coli cells. This number has been estimated at 200 (79; see also chapter 142 in this volume), which means that a clone with fitness 1.00001 could increase to a size of 1019 individuals in 22,000 years (4.4 × 106 generations divided by 200).
So far, we have considered parameter values relating to the formation of pure, genome-wide clones in the absence of recombination. In fact, as we have seen, comparative sequencing reveals an advanced state of mosaicism in chromosomes. Both the discontinuous distribution of clonal segments and the results of experiments on recombination demonstrate that no single effective rate of recombination or of recombinational replacement is yet within our grasp. In principle, however, it should be possible to reconcile two measures of the age of a clonal frame: (i) its sequence diversity and (ii) the number of recombinational replacements it contains that belong to one or more categories (classified on the basis of mechanism and/or source), which is discussed in the next section.
In addition to what we might call "normal" sequence variation in the chromosome, there appears to be at least one local "bastion of polymorphism," a region of extraordinarily high variation in and near the gnd region. The rfb gene complex is close to gnd, and selection for polymorphism of the O antigens (57, 74), which rfb determines, may be responsible for the extreme variation observed at gnd (8, 19, 44; see also chapter 148 of this volume) and apparently also (unpublished data; P. R. Reeves, unpublished data) in the his operon, which ends about 2 kb from gnd (76; chapter 110). During successive clonal sweeps, recombinant replacements bearing rare rfb alleles could retain ancient variants of gnd and/or his that would otherwise have been removed. In this case, his and gnd are thought to have been part of genomes whose rfb alleles, ancient or recent in origin, enabled them to compete successfully. They thus maintain sequences that differ far more from one another than the relatively recent variants characteristic of most of the genome. The size of the initial recombinant replacements will have influenced the chromosomal extent of the unusual level of polymorphism in the region, adding another element of interest to the recombination studies to be discussed below.
On another note, local polymorphism is evidently not increased near all targets of enemy fire: tonB, which acts as a receptor for phage T1, phage φ80, and colicin VB (66), does have structural variants (see Fig. 4) but the variation in the surrounding region is nothing like that seen in gnd.
In 1986, the 72-strain ECOR collection of naturally occurring variants of E. coli (55) was established, encouraging experiments on common ground. A number of these strains have been used in the restriction fragment length polymorphism and sequence studies to be described here (8, 19, 23, 49, 50, 53, 54).
A 12.7-kb stretch of DNA has been sequenced, beginning at the very end of trpD and running through trpC, trpB, trpA, the trp terminator region, seven open reading frames (65, 87), tonB (64), attB φ 80 (66), another open reading frame, kch (which codes a homolog of eukaryotic potassium channel proteins [47]) and cls (87a). Figure 3 illustrates the region in which the comparative sequencing has been done, placed on the physical coordinates of Kohara et al. (33) by Rudd (76; chapter 109) with approximate, normalized locations taken from the genetic map (6; chapter 133). Other illustrated elements to be considered later include a set of 1.5-kb PCR fragments that have served as the subject of restriction digests prefatory to the sequencing and also in the analysis of recombinational replacement in transductants. PCR fragments are locally amplified from genomic DNA (for example) by using oligonucleotide primers complementary to the two respective strands at a desired distance (77). Figure 4 lists the polymorphisms revealed by the sequencing of K-12 W3110 (see chapter 133) (specifically, strain 414 of the Irving Crawford collection, currently maintained by David Essar at Winona State University, Winona, Minn.) and 40 of the 72 ECOR strains. The polymorphic sites are listed vertically, and strains are compared horizontally to consider which nucleotide each possesses at a given site. Two sets of GenBank files now exist, covering positions 3629-8888 (accession numbers U23489-U23500, U25417-U25423, and U25425-U25429) and positions 8889-16350 (accession numbers U24195-U24206).
As noted, there are three major meroclones (K-12, ECOR 70, and ECOR 51), each unified both by a clonal frame characterized in the DNA sequences and by the chromosomally more widely distributed restriction fragment length polymorphism studies (49). These three meroclones correspond to MLEE groups A, B1, and B2, respectively (25; chapter 148). Of the other sequenced strains, ECOR 49 and ECOR 50 are nearly identical, and ECOR 40 is nearly identical both in restriction digests and MLEE to ECOR 38, ECOR 39, and ECOR 41, which have not been sequenced. The other sequenced strains do not show extensive affinities but share individual clonal segments frequently and intermittent sets occasionally. The data for the first 10.4 kb in Fig. 4 are summarized horizontally in Fig. 5 as follows. Each symbol stands for 50 bp of a given sequence type. Identical sequences share the same symbol. To be identified as different from a given sequence type, a particular stretch of DNA must differ from it both at a minimum of three nucleotides and by more than 1% of its nucleotides.
The detailed comparisons in Fig. 4 illustrate both the clustering of sequence types and the mosaic pattern of variation. For example, between positions 4674 and 4947, the third large group contains 19 unanimous deviations from consensus. Polymorphic sites in sequences of 10 strains are illustrated (the first column describes four sequences that are almost identical over the entire 12.7-kb range). These 10 strains are part of the ECOR 51 meroclone, whose clonal frame is seen in ECOR strains 51, 52, 54, and 56 over the entire illustrated sequence. In Fig. 5, the clonal frame sequence is given the symbol 5. In Fig. 4, the
left most large group shares a different sequence with K-12, which is given a separate column because it is the main reference strain for the species. Nevertheless, both it and the leftmost large group belong to the K-12 meroclone, whose clonal frame (K in Fig. 5) is seen over the entire sequence in eight strains. The ECOR 70 meroclone is displayed between K-12 and ECOR 51. The first five columns (plus the last column) differ from the two other groups described; but in this region, the next four columns appear to have K-12 DNA, which is interpreted as a clonal segment. The ECOR 70 clonal frame is seen in Fig. 5 as 7, and the K-12-like clonal segment is seen as K. Moving past the ECOR 51 meroclone, it can be noticed that ECOR 40 and ECOR 35 both resemble ECOR 51 in great detail over a short distance. ECOR 40 diverges first, and shortly thereafter, ECOR 35 does. ECOR 40 and ECOR 35 subsequently become identical for a few hundred bases. Finally, returning to the ECOR 51 meroclone and moving upward to lower position numbers, strains 61 and 62 are seen to differ strikingly from the rest of the meroclone between positions 4419 and 4517 and less dramatically a bit further in both directions. This clonal segment, shared between these two very similar strains, is obvious in Fig. 5 as a string of Xs going back to the beginning of the sequence, reflecting differences that can be noted in Fig. 4 as well.
Several points can be made. First, K-12 happens to have a fairly common sequence. Given, as we now see, that there are common sequences, it is not surprising that a strain chosen at random for experiments should have one. Second, K-12, like every other strain, occasionally deviates from the consensus. Indeed, it is to be expected that the K-12 strain will in some part of the chromosome carry a replacement from a different clone. Third, ECOR 70 (together with the very similar ECOR 71) does not display a unique clonal frame extending over the entire sequenced region, although K-12 and ECOR 51 do. There is no reason that it necessarily should. The perspective of a broader sample of chromosomal regions, afforded by restriction analysis, shows ECOR 70 and ECOR 71 as well as some frequently similar strains to be quite different from K-12 in some places (49). In any event, it appears that members of the K-12 meroclone recombine frequently with the ECOR 70 meroclone. Moreover, K-12 appears likely to be the more frequent donor. This is evidenced by the distribution of sequence types in Fig. 5; by the fact that in groups classified by restriction analysis (49; unpublished data), K-12 is generally in a larger group than ECOR 70 and ECOR 71; and by the larger size of the A MLEE group of 25 ECOR strains compared with the 16-strain B1 group (25; see chapter 148 of this volume). This apparent recombination polarity may be due to the K-12 meroclone’s greater numbers (now or historically) or to an asymmetry in the sets of restriction-modification systems present in the two groups (i.e., whatever the variation within each group, K-12 may match and exceed the usual restriction-modification systems of ECOR 70 and thus transfer more than it accepts; see Analysis below).
The foregoing examples illustrate some of the rewards and limitations of comparative sequencing: the rewards are in the detail; the limitations are in the chromosomal extent and the number of isolates that can be studied. Restriction mapping and MLEE can go far to alleviate these limitations; clearly, certain questions require the detail of sequencing, but often where to sequence can be determined from information at lower levels of resolution. It is likely, too, that an expanded ECOR set, containing some new strains deliberately chosen for possibly more distant phylogenetic positions, might extend our view of the variety contained in the species as well as shed some light on the provenance of clonal segments that (see the ECOR 64 sequence starting at position 8893) differ from the others by as much as 24% over a couple of hundred nucleotides.
The obvious pattern of variation among these sequences bears out the paradigm of periodic selection as modified by recombination. Three groups of strains are characterized by their respective sequence types, which constitute either the entire 12.7-kb stretch or a major part of it. Each group thus has a common clonal frame that is descended from a common ancestor. During its descent, the clonal frame acquired and retained nucleotide substitutions, which do not compromise its clonal relationship. Also, the frame may have been replaced locally by segments of DNA with a different clonal affinity. These clonal segments may be identified with corresponding regions of other clonal frames, to whose clone they therefore belong, or they may bear no immediate clonal relationship to other sequences in the sample. Recalling the hierarchical nature of clonal relationships, it is not surprising that (with a very few local exceptions) all the sequences in the sample appear to be part of an older, larger clone: that is, they do not differ from one another beyond recognition. They are clearly homologous.
Within a meroclone, it is of interest to attempt to compare the rates of divergence as established by nucleotide substitution and by recombinational replacement. For example, ECOR strains 70 and 45 can be compared. Starting with Fig. 5, one finds a minimum of four recombinational replacements in ECOR 45 necessary to explain the segmental differences over the 10.4-kb stretch. The first might have run from position ∼4200 (a relatively sharp border) to roughly position 6000. This replaces three clonal segments in ECOR 70 with K-12 DNA. The complexity in the replaced DNA is here assumed to predate the formation of the meroclone. The three additional replacements would run from approximately positions 10600 to 11100, 11800 to 12150, and 12900 to 13100. As for the nucleotide substitutions, they are pertinent only in the unrecombined region, which can be roughly estimated at 6,750 bp in aggregate. Twelve nucleotide substitutions are found, and this number corresponds to 18 in 10,400 bp. The ratio of 4 recombinational replacements per 18 nucleotide substitutions is remarkably (and doubtless gratuitously) close to the estimate of 0.2 made previously (49). A further comparison of K-12 W3110 strain 414 with each other member of the K-12 meroclone by the same procedure resulted in a ratio of 1 recombinant replacement to 4 nucleotide substitutions. (The comparisons exclude cryptic recombinant replacements between elements of the clonal frame that might affect the sequence subtly.)
These estimates are offered merely as an entrée to consideration of the role of recombination in the evolution of the chromosome. In particular, the present example raises the issue of whether the several replacements (and others further along the chromosome) are all independent or whether they might involve a cascade of independent chromosomal incorporations from a single entrant molecule. Can they actually be parts of a single already-mosaic replacement? Each possibility is the subject of an initial study described below (in Analysis of Transductants), one experimental and one a simulation. To return briefly to Fig. 5, the symbols other than those representing clonal frames (K, 7, 5) are to some extent arbitrary. For example, we cannot know whether in ECOR strains 15, 16, and 24 the clonal segments P and Q come from a common source, so they have separate symbols. Similarly, it is not known that all sequences represented by B are derived from a common sequence type; here, it seemed useful to emphasize a common set of affinities with ECOR 50. Finally, C is used to represent the local consensus sequence, which does not correspond over a long stretch to that of any particular strain.
The nucleotide substitutions and amino acid replacements listed in Fig. 4 are summarized in Table 1. Other variations are listed in Table 2, which describes small and large insertion/deletions (indels); the insertion of a partial-to-complete version of Atlas, a lambdoid phage (11; A. Stoltzfus, Ph.D. thesis, The University of Iowa, Iowa City, 1990; see also chapter 113 of this volume); and the nonhomologous substitution of IS1k (88) adjacent to Atlas.
Table 1Substitutions between positions 3629 and 14031, inclusivea |
Table 2Deletions, insertions, and nonhomologous replacements |
These data are best understood in the context of the several major techniques for the analysis of genomic variation in E. coli. The term genomic is used rather than the term genotypic to emphasize two evolutionarily important aspects of genetic variation that are now accessible to investigation: (i) the fine structure of the chromosome (the nucleotide level) and (ii) its relation to chromosomal position (34). Precise chromosomal locations are now afforded by physical maps (76; see also chapter 109 of this volume) and data banks (see references 21 and 35 among others), which are now rapidly being enriched by the impressively advancing systematic sequencing of the entire genome (10, 16, 62, 91). For example, MLEE, whose massive results initially contained a high proportion of spatially anonymous loci, can now be referred to an increasingly detailed chromosomal map. Only the random amplified polymorphic DNA technique remains unanchored at present, though it can claim greater resolution than MLEE (9, 89) via the restriction analysis of blindly generated PCR fragments. The number of character states distinguished per locus by MLEE is not great, but the product of this number and the number of loci is easily sufficient to identify significant phenotypic groupings.
The purposes of these several techniques are, first, to compare and classify isolates on a large scale. This remains the province of MLEE (80, 81; see also chapter 148 in this volume) in E. coli as well as many other bacterial species, in which serotyping (1, 57, 58) also plays a substantial role. The considerable agreement between classifications of ECOR strains based on MLEE and sequencing makes sense in terms of meroclones, which represent real biological groupings, though phylogenetically compromised. A second purpose is to infer evolutionary dynamics and to establish the physical dimensions of the structural participants in these dynamics. Comparative DNA sequencing offers the greatest information density and estimates the parameter values that are directly related to mutation rates, permitting the development of paradigms based on local observations, but it is unlikely that comparative sequencing will be able to cover a large number of sizable chromosome regions in the near future. This bears on the third purpose, which is to establish a coherent genome-wide picture of the evolutionary process. That there is a genome-wide picture to be discerned and understood is clear (34). So restriction analysis, as a means of surveying a small proportion of sequence variation, represents a useful intermediate (for investigations of 50 to 100 isolates) that requires less effort than sequencing but provides more detail, and in this context, more applicable detail, than MLEE or random amplified polymorphic DNA.
Two likely natural recombination processes in E. coli are transduction (75) and conjugation (88). The next step in the reconstruction of the recent evolution of the E. coli chromosome is the definition of the role of recombination in determining the mosaic structure of the DNA sequence variation in E. coli. Clearly, any observed mosaic structure is simply a snapshot of an ongoing process in which a chromosome is carried to high frequency and undergoes a succession of overlapping replacements that eventually reduce the extent of the original sequence type and obscure its ancestral position, even as a new composite chromosome is rising to such prominence that its jumble of sequence types is redefined (in our eyes) as a single sequence type and clonal frame.
The details (rate, physical extent, and pattern) of the recombinational replacements are critical to the description of this process. For example, in the unreplaced regions of ECOR 70 and ECOR 45 referred to above, the 12 nucleotide substitutions observed in some 6,750 bp indicate a divergence time of about 9 × 106 generations. The four (minimum) observed recombinational replacements then suggest at least two replacements per line per 10 kb in 9 × 106 generations. This means about 1,000 replacements per genome in this time, or roughly 100/106 generations. From one perspective, that is a minimal estimate: a number of replacements might be invisible (from close relatives). From another perspective, the replacements might be accounted for by a far smaller number of entrant DNA molecules, thus a far smaller number of basic recombinational events. A cascade of small, discontinuous replacements from one large donor fragment of a single sequence type would result in a simple series of segments of that type. On the other hand, the incorporation of an already mosaic large fragment (itself the result of a progression of overlapping replacements) could produce a variety of small segments in numbers much greater than the series of successive events that produced the mosaic. The range of possibilities is great because of the number of important causal variables, including the rates of conjugation and transduction, the proportion of transmissions complicated by restriction (that is, the incorporation of many small fragments of a single entrant DNA molecule due to digestion by nucleases), the relative recombinational accessibility of various strains to one another, and the size distribution of the entrant molecules. It is therefore of great interest to try to estimate the values of these variables.
In trying to estimate the value of the variables, the analysis of individual P1 transductants involving ordinary ECOR donors and K-12 recipients (42a, 48a) is informative. The recipient strain, K-12 W3110 trpA33, happens also to be a λ lysogen: λ evidently infected λ – W3110 at the same time trpA33 was transduced via P1 from Ymel (C. Yanofsky, personal communication, 1994), which is λ + (chapter 133). This is of interest because it is representative of the situation in wild E. coli strains, of which 98% (405 of 414) were found to be resistant to λ (78). The transductants to be described were selected for Trp+ and for no other property.
Eighteen ECOR 47 → K-12 transductants were analyzed by restriction digests of PCR fragments over a discontinuous 40-kb range (Fig. 3 and 6), and eight of them contained more than one discrete replacement (ranging from two to five). Most replacements were well below 10 kb in length.
A striking difference was observed when two sets of 15 (ECOR 47 → K-12) → K-12 backcrosses (back-transductants) were analyzed. All 15 back-transductants of 47K9 were identical to the donor at all restriction sites from –26000 to 5114 shown in Fig. 6 (note that 47K9 and K-12 are alike at site 5114). In the other set, 47K4 was the donor, and 13 progeny were identical to 47K4 from +10877 (where 47K4 and K-12 are alike) to –14388. One was donor type from at least +10238 to –8811, and one was donor type from +9543 to +578. The likely explanation of the contrast between transductants and back-transductants is that the donor DNA has been restricted in the transductants but not in the back-transductants. The locations of all the restriction-modification genes in K-12 are believed to be known (27), and they are out of range of cotransduction with the wild type of trpA33, the recipient’s marker. The hsd and mcrBC loci are some 29 min away, and mcrA is associated with an excisable element, e14, ordinarily about 107 kb away from trpA33. Thus, the transductants presumably have the recipient’s entire complement of restriction-modification genes, and the back-transductants should therefore be unrestricted. (Crosses to a restrictionless derivative of the recipient made by Elisabeth Raleigh of New England BioLabs support this view.)
Two other donor strains, ECOR 27 and ECOR 65, both show some evidence of restriction as well, with one and four cases of multiple incorporation in respective sets of 15 transductants; all the replacements are once again relatively small compared, for example, to the ECOR 47 back-transductants. The immediate possibility is that there is extensive polymorphism in restriction-modification systems in E. coli in nature. There already appears to be considerable natural polymorphism in restriction-modification systems in E. coli (7a). Indeed, the following strains all differ in their restriction-modification systems: K-12, B, ECOR 1 (cited in reference 15 as RM74A), E166, 15T–, and A58 (15, 82). The first three of these strains cluster with the A MLEE group of ECOR strains (25). Also, E. coli C has no known restriction-modification system (15). This fact predicts that transductional replacements will often (perhaps most of the time) be small and variable in extent and that they will not infrequently occur in discrete sets originating in a single event. Interestingly, some early transduction experiments foretold a similar role for restriction endonucleases, though in different terms (3, 24, 50, 61).
These experiments were motivated by the general observation of sequence mosaicism (Fig. 5) and particularly by the presence of a set of discrete K-12-like segments in ECOR 37, the ECOR strain most distant from K-12 and the strain most like ECOR 2 (25; see also chapter 142 in this volume). It seemed unlikely that two so distant strains would have recombined so frequently over the entire genome; an alternative possibility is that the set of replacements arose in a single cascade resulting from a relatively infrequent recombination event. This scenario would predict that K-12-like segments would not be found in most other regions of the ECOR 37 chromosome.
Another approach to the recombinational basis of the observed mosaic sequence pattern is the simulation of repeated randomly placed (and therefore frequently overlapping) replacements. A computer simulation program that displays a 100-unit window of a given sequence type and modifies it by successive replacement has been developed. A population of up to several hundred "individuals," starting with up to six different uniform sequence types, undergoes recombination as follows: an individual can be a recipient at one time and a donor at another, the size of the donor segment is determined for a given run by specifying the probability that it will completely cover the window, a donor and a different recipient are chosen at random, and the donor segment replaces the recipient segment either totally or partially (in the latter case either a left end or a right end is placed randomly in the window). If there are 120 individuals (the six sequence types can be in any frequency) and 2,400 events, each individual will have acted as a potential donor an average of 20 times and as a potential recipient 20 times. The likelihood of any donor-recipient pair actually recombining can be specified by a transmission coefficient between 0 and 1. This process is enough in a broad range of conditions to produce a mosaic pattern like that observed in compared sequences (47a). A unit may be taken to represent any number of base pairs (thus, a single run can be used to simulate recombination in stretches of various lengths). Although the patterns become complex, some are often found in several individuals at any given point in a run, as would be expected when complete replacements duplicate a particular stretch of donor DNA.
A small number of comparative sequencing studies have been undertaken over the past few years, motivated at least in part by Dykhuizen and Green’s demonstration in 1986 that phylogenies vary between chromosomal positions (19). One of the earliest studies, already referred to, stated in clear detail the questions that lay ahead (18). Table 3 summarizes several studies that show that the degree of sequence polymorphism varies locally; that synonymous substitutions predominate; and that, in agreement with the current paradigm, the MLEE phenogram is frequently but not always matched locally. Where the amassed information in databases suffices (enough variation, enough strains, enough sequence length), evidence for recombination emerges. The data sets are grouped by common investigators, since the strains and approaches used are generally the same.
Table 3Comparative sequencing studies |
The large discrepancy between the two gnd samples is due largely to the presence, in the extremely polymorphic case, of ECOR 4 and ECOR 16. These strains differ from K-12 and from one another as much as either of them differs from Salmonella sp. strain LT-2. The strains used in the two studies are mutually exclusive, but both sets show more polymorphism than do any of the other studies. Because the ECOR strains do fall into some fairly closely related groups, the comparison of mean pairwise differences between studies is not always informative. Nevertheless, both in the mosaic character of some of the sequences and in the occasional local variations in phylogeny, these studies support the general view of clonality modified, though not beyond recognition, by recombination. This view is expressed in each of the papers cited in Table 3.
Although comparative sequence data for more than two strains are not yet available for anywhere in a continuous stretch covering about half the chromosome, a number of comparisons of E. coli K-12 and B have been analyzed by Sharp (summarized in Fig. 2 of reference 82), and these comparisons include some from this otherwise missing half of a chromosome. A portion of the data (Sharp, personal communication, 1994) is presented in Table 4. These data are consistent with expectations, including some abrupt changes in similarity within the phn operon. Both strain B and strain K-12 are classified with the A group of ECOR strains (25) and thus presumably with the K-12 meroclone. Eighteen of the sequences differ by less than 1%; five more differ by less than 2%; eight more differ by 2 to 3%; and the remaining two differ by 4 to 6%. This distribution of differences is not unlike that seen in restriction analyses (49) of K-12 and ECOR 24, which are comparably distant in the A MLEE group.
Table 4Sequence comparisons for E. coli K-12 versus B/r |
While genome-wide sampling by MLEE and local sequence comparisons in the 1- to 10-kb sequence range have told us a great deal about the genetic architecture of E. coli, intraspecific sequence variation on an intermediate chromosomal scale has yet to be studied systematically. This study requires numerous sequence analyses over a considerable distance, and even though they need not be continuous, such analyses will take a great effort. The few indications of nonuniformity at present include, as mentioned earlier, extreme polymorphism in the vicinity of gnd; evidence of greatly increased recombination near the terminus of replication (38), which is likely to increase local sequence variation; and possibly increased polymorphism with distance from the origin of replication, suggested by interspecific variation patterns (83).
Another comparative study of considerable importance is the analysis of four genomic cleavage maps of six E. coli K-12 strains (60), namely, MG1655, two close relatives (W1485 and W3110), and three others. Numerous indels are described for these strains, "some of which are separated by numerous steps of mutagenic treatment" (60). It is hard to extrapolate these findings to ECOR strains, whose clonal frames have been separated for perhaps 100 ' 106 generations (as opposed to fewer than 105 generations) and that may have lived under a greater diversity of selection than have the respective laboratory strains but perhaps without extreme mutagenesis. Nevertheless, overall-length polymorphism among wild genomes is well known, and substantial chromosomally local variation should be anticipated. Conversely, comparative sequencing should be extended in moderate amounts to several K-12 strains to determine whether their typical pairwise variation is really ≤0.1%, as is often assumed.
Up to now, the mechanistic analysis of bacterial recombination has exploited strains thought to differ only in a set of genetic markers or occasionally in a gene rearrangement designed to reveal and localize processes such as crossing over and branch migration. Physical analysis using radioactive and density isotopes has also centered on a given genetically uniform strain. K-12 strains have been prominent in this work (59, 78, 85). The vast complexity of recombinational mechanisms, consisting of sets of participating proteins often having more than one biochemical function (39), has been enough to handle without the addition of supervening factors of strain differences that have nothing to do with the intrinsic process of recombination at the molecular level but can modify its results dramatically.
There is as yet no clear picture of the physical distribution of donor and recipient DNA in recombinant E. coli chromosomes. Experiments directed at understanding the formation and structure of recombinant DNA have used physical labels (heavy or radioactive isotopes). With respect to both transduction and conjugation, however, the results of physical and genetic studies are not yet definitively reconciled (63). The experiments presented in this chapter did not use isotopes but instead exploited sequence differences between donor and recipient to provide very high resolution of parental DNA segments in the recombinants. The effectiveness of this approach, however, depends upon controlling for two supervening effects that come with crossing different strains. One is restriction, which can cut up incoming DNA, and the other is sequence difference, which (at least at high mismatch levels) creates a barrier to recombination; the barrier is removed by the elimination of mismatch repair (41, 52, 70, 71, 72). The design of experiments involving the relief of the mismatch barrier to recombination between E. coli and Salmonella typhimurium (official designation, Salmonella enterica serovar Typhimurium) has completely (72) or largely (41) excluded restriction. For example, Matic et al. (41) report that exconjugants from an E. coli Hfr to mismatch-deficient and restriction-deficient Salmonella spp. appeared (with very few exceptions) to incorporate donor DNA in a continuous stretch (41), in agreement with previous conclusions based on the analysis of marker genotypes and with Smith’s (86) estimate that in cases he reviewed (mainly within K-12), 80% of E. coli exconjugants incorporated donor DNA in one "long chunk." The Salmonella recipient was known to lack two of the three described restriction-modification systems.
Finally, in the conjugal transmission of hsdK in E. coli, it has been reported that restriction endonuclease activity coded in donor DNA did not appear until 15 generations after conjugation, but that modification was seen at once (67, 68). In conjugation experiments between three K-12 Hfr donors and three ECOR recipient strains, restriction fragment length polymorphism analysis of PCR fragments amplified from exconjugant genomic DNA indicates considerable restriction activity (unpublished data). Restriction barriers to bacterial recombination are reviewed by Barcus and Murray (7).
The mismatch frequency studied by Radman and colleagues (71, 72) was over 20%, in contrast to the 1 to 4% most frequently observed among ECOR strains, a level that may have no substantial effect on recombination. In this context, local variation in recombination frequencies might be explored. As mentioned above, some important local properties of the genome may vary with position. There may in fact be an important and possibly complex interplay between recombination and mismatch frequency, to which further examination of the gnd and his regions, for example, may provide clues.
Population genetics seeks to interpret and explain the biogeographic distribution of the allele frequencies at each of many individual loci, and it deals with the loci in a basically combinatorial sense. In this effort, its ability to screen very large population samples is critical. To a lesser extent, population genetics explores chromosomally local variations in genetic properties. Molecular evolution, which finds more detail in fewer examples, seeks to interpret and explain the phylogenetic distribution of genes and chromosomes, examining and comparing continuous tracts of nucleotides with characteristic sequences and varying lengths, called clonal segments. The inheritance of the clonal segments is as much a part of eukaryotic evolution as of bacterial (and presumably archaeal) evolution, and the paradigm discussed here should apply to eukaryotes.
This work was supported in part by National Institutes of Health Grant GM-33518 and (together with the preparation of this manuscript) by National Science Foundation Grant BSR-9020173.
References
1. Achtman, M., and G. Pluschke. 1986. Clonal analysis of descent and virulence among selected Escherichia coli. Annu. Rev. Microbiol. 40:185–210.
2. Allaby, M. (ed.). 1985. The Oxford Dictionary of Natural History. Oxford University Press, New York.
3. Arber, W., and M. L. Morse. 1965. Host specificity of DNA produced by Escherichia coli. VI. Effects on bacterial conjugation. Genetics 51:137–148.
4. Atwood, K. C., L. K. Schneider, and F. J. Ryan. 1951. Selective mechanisms in bacteria. Cold Spring Harbor Symp. Quant. Biol. 16:345–355.
5. Avise, J. 1989. Gene trees and organismal histories: a phylogenetic approach to population biology. Evolution 43:1192–1208.
6. Bachmann, B. J. 1987. Linkage map of Escherichia coli K-12, edition 7, p. 807–876. In F. C. Neidhardt, J. L. Ingraham, K. B. Low, B. Magasanik, M. Schaechter, and H. E. Umbarger (ed.), Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology, vol. 2. American Society for Microbiology, Washington, D.C.
7. Barcus, V. A., and N. E. Murray. 1995. Barriers to recombination: restriction, p. 31–58. In R. Bishop (ed.), Population Genetics of Bacteria. Cambridge University Press, Cambridge.
7a. Barcus, V. A., A. J. B. Titheradge, and N. E. Murray. 1995. The diversity of alleles at the hsd locus in natural populations of Escherichia coli. Genetics 140:1187–1197.
8. Biseric ', M., J. Y. Feutrier, and P. R. Reeves. 1991. Nucleotide sequences of the gnd genes from nine natural isolates of Escherichia coli: evidence of intragenic recombination as a contributing factor in the evolution of the polymorphic gnd locus. J. Bacteriol. 173: 3894–3900.
9. Brikun, I., K. Suziedelis, and D. E. Berg. 1994. DNA sequence diversity among historic strains of Escherichia coli K-12 detected by arbitrary primer PCR (random amplified polymorphic DNA) fingerprinting. J. Bacteriol. 176:1673–1682.
10. Burland, V., G. Plunkett III, H. J. Sofia, D. L. Daniels, and F. R. Blattner. 1995. Analysis of the Escherichia coli genome. VI. DNA sequence of the region from 92.8 through 100 minutes. Nucleic Acids Res. 23:2105–2119.
11. Campbell, A., S. J. Schneider, and B. Song. 1992 Lambdoid phages as elements of bacterial genomes. Genetica 86:259–267.
12. Chao, L., and E. C. Cox. 1983 Competition between high and low mutating strains of Escherichia coli. Evolution 37:125–134.
13. Crow, J. F. 1986. Basic Concepts in Population, Quantitative and Evolutionary Genetics. W. H. Freeman & Co., New York.
14. Crow, J. F., and M. Kimura. 1979. Efficiency of truncation selection. Proc. Natl. Acad. Sci. USA 76:396–399.
15. Daniel, A. S., F. V. Fuller-Pace, D. M. Legge, and N. E. Murray, 1988 Distribution and diversity of hsd genes in Escherichia coli and other enteric bacteria. J. Bacteriol. 170:1775–1782.
16. Daniels, D. L., G. Plunkett III, V. Burland, and F. R. Blattner. 1992. Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. Science 257:771–778.
17. Drake, J. W. 1991. A constant rate of spontaneous mutation in DNA-based microbes. Proc. Natl. Acad. Sci. USA 88:7160–7164.
18. DuBose, R. F., D. E. Dykhuizen, and D. L. Hartl. 1988. Genetic exchange among natural isolates of bacteria: recombination within the phoA gene of Escherichia coli. Proc. Natl. Acad. Sci. USA 85:7036–7040.
19. Dykhuizen, D. E., and L. Green. 1991. Recombination in Escherichia coli and the definition of biological species. J. Bacteriol. 173:7257–7268.
20. Fitch, W. M., and E. Margoliash. 1970. The usefulness of amino acid and nucleotide sequences in evolutionary studies, p. 67–109. In T. Dobzhansky, M. K. Hecht, and W. C. Steere (ed.), Evolutionary Biology, vol. 4. Appleton-Century-Crofts, New York.
21. Genetics Computer Group. 1993. Sequence Analysis Software Package Version 7.3. Genetics Computer Group, Madison, Wisc.
22. Gray, G. S., and W. M. Fitch. 1983. Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Mol. Biol. Evol. 1:57–66.
23. Hall, B. G., and P. M. Sharp. 1992. Molecular population genetics of Escherichia coli: DNA sequence diversity at the celC, crr and gutB loci of natural isolates. Mol. Biol. Evol. 9:654–665.
24. Harris, D. J., and J. R. Christensen. 1966. P1 lysogeny and bacterial conjugation. J. Bacteriol. 91:898.
25. Herzer, P. J., S. Inouye, M. Inouye, and T. S. Whittam. 1990. Phylogenetic distributon of branched RNA-linked multicopy single-stranded DNA among natural isolates of Escherichia coli. J. Bacteriol. 172:6175–6181.
26. Hood, L., J. H. Campbell, and S. C. R. Elgin. 1975. The organization, expression, and evolution of antibody genes and multigene families. Annu. Rev. Genet. 9:305–353.
27. Kelleher, J. E., and E. A. Raleigh. 1991. A novel activity in Escherichia coli K-12 that directs restriction of DNA modified at CG dinucleotides. J. Bacteriol. 173:5220–5223
28. Kimura, M. 1968. Evolutionary rate at the molecular level. Nature (London) 217:624–626.
29. Kimura, M. 1981. Possibility of extensive neutral evolution under stabilizing selection with special reference to non-random usage of synonymous codons. Proc. Natl. Acad. Sci. USA 78:5773–5777.
30. Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge.
31. Kimura, M., and J. F. Crow. 1978. Effect of overall phenotypic selection on genetic change at individual loci. Proc. Natl. Acad. Sci. USA 75:6168–6171.
32. Koch, A. L. 1974. The pertinence of the periodic selection phenomenon to prokaryote evolution. Genetics 77:127–142.
33. Kohara, Y., K. Akiyama, and K. Isono. 1987. The physical map of the whole E. coli chromosome: application of a new strategy for rapid analysis and sorting of a large genomic library. Cell 50:495–508.
34. Krawiec, S., and M. Riley. 1990. Organization of the bacterial chromosome. Microbiol. Rev. 54:502–539.
35. Kröger, M., R. Wahl, and P. Rice. 1993. Compilation of DNA sequences of Escherichia coli (update 1993). Nucleic Acids Res. 21:2973–3000.
36. Kubitschek, H. E. 1974. Operation of selection pressure on microbial populations, p. 105–130. In M. J. Carlile and J. J. Skehel (ed.), Evolution in the Microbial World. Cambridge University Press, Cambridge.
37. Levin, B. R. 1981. Periodic selection, infectious gene exchange, and the genetic structure of E. coli populations. Genetics 99:1–23.
38. Louarn, J. M., J. Louarn, V. Francois, and J. Patte. 1991. Analysis and possible role of hyperrecombination in the termination region of the Escherichia coli chromosome. J. Bacteriol. 173:5097–5104.
39. Mahajian, S. K. 1988. Pathways of homologous recombination in Escherichia coli, p. 87–140. In R. Kucherlapati and G. R. Smith (ed.), Genetic Recombination. American Society for Microbiology, Washington, D.C.
40. Maruyama, T., and M. Kimura. 1980. Genetic variability and effective population size when local extinction and recolonization of subpopulations are frequent. Proc. Natl. Acad. Sci. USA 77:6710–6714.
41. Matic, I., M. Radman, and C. Rayssiguier. 1994. Structure of recombinants from conjugational crosses between Escherichia coli donor and mismatch-repair deficient Salmonella typhimurium. Genetics 136:17–26.
42. Maynard Smith, J., N. H. Smith, M. O’Rourke, and B. G. Spratt. 1993. How clonal are bacteria? Proc. Natl. Acad. Sci. USA 90:4384–4388.
42a. McKane, M., and R. Milkman. 1995. Transduction, restriction and recombination patterns in Escherichia coli. Genetics 139:35–43.
43. Milkman, R. 1971. How much room is left for non-Darwinian evolution? Brookhaven Symp. Biol. 23:217–229.
44. Milkman, R. 1973. Electrophoretic variation in Escherichia coli from natural sources. Science 182:1024–1026.
45. Milkman, R. 1978. Selection differentials and selection coefficients. Genetics 88:391–403.
46. Milkman, R. 1982. Towards a unified selection theory, p. 105–118. In R. Milkman (ed.), Perspectives on Evolution. Sinauer Associates, Sunderland, Mass.
47. Milkman, R. 1994. An E. coli homologue of eukaryotic potassum channel proteins. Proc. Natl. Acad. Sci. USA 91:3510–3514.
48. Milkman, R., and I. P. Crawford. 1983. Clustered third-base substitutions among wild strains of Escherichia coli. Science 221:378–380.
48a. Milkman, R., and M. McKane. 1995. DNA sequence variation and recombination in E. coli. Symp. Soc. Gen. Microbiol. 52:127–142.
49. Milkman, R., and M. McKane Bridges. 1990. Molecular evolution of the E. coli chromosome. III. Clonal frames. Genetics 126:505–517. (Erratum 126:1139.)
50. Milkman, R., and M. McKane Bridges. 1993. Molecular evolution of the E. coli chromosome. IV. Sequence comparisons. Genetics 133: 455–468.
51. Milkman, R., and A. Stoltzfus. 1988. Molecular evolution of the E. coli chromosome. II. Clonal segments. Genetics 120: 359–366.
52. Modrich, P. 1991. Mechanism and biological effects of mismatch repair. Annu. Rev. Genet. 25:225–253.
53. Nelson, K., T. S. Whittam, and R. K. Selander. 1991. Nucleotide polymorphism and evolution in the glyceraldehyde-3-phosphate dehydrogenase gene (gapA) in natural populations of Salmonella and Escherichia coli. Proc. Natl. Acad. Sci. USA 88:6667–6671.
54. Nelson, K., T. S. Whittam, and R. K. Selander. 1992. Evolutionary genetics of the proline permease gene (putP) and the control of the proline utilization operon in populations of Salmonella and Escherichia coli. J. Bacteriol. 174:6886–6895.
55. Ochman, H., and R. K. Selander. 1984. Standard reference strains of E. coli from natural populations. J. Bacteriol. 157:690–693.
56. Orkin, S. H., and H. H. Kazazian. 1984. The mutation and polymorphism of the human β-globin gene and its surrounding DNA. Annu. Rev. Genet. 18:131–171.
57. Ørskov, F. and I. Ørskov. 1992. Escherichia coli serotyping and disease in man and animals. Can. J. Microbiol. 38:699–704.
58. Ott, M., L. Bender, G. Blum, M. Schmittroth, M. Achtman, H. Tschape, and J. Hacker. 1991. Virulence patterns and long-range genetic mapping of extraintestinal Escherichia coli K1, K5, and K100 isolates: use of pulsed-field gel electrophoresis. Infect. Immun. 59:2664–2672.
59. Paul, A. V., and M. Riley. 1974. Joint molecule formation following conjugation in wild type and mutant Escherichia coli recipients. J. Mol. Biol. 82:35–56.
60. Perkins, J. D., J. D. Heath, B. R. Sharma, and G. M. Weinstock. 1993. XbaI and BlnI genomic cleavage maps of Escherichia coli K-12 strain MG1655 and comparative analysis of other strains. J. Mol. Biol. 232:419–445.
61. Pittard, J. 1964. Effect of phage-controlled restriction on genetic linkage in bacterial crosses. J. Bacteriol. 87:1256–1257.
62. Plunkett, G., III, V. Burland, D. L. Daniels, and F. R. Blattner. 1993. Analysis of the Escherichia coli genome. III. DNA sequence of the region from 87.2 to 89.2 minutes. Nucleic Acids Res. 21:3391–3398.
63. Porter, R. D. 1988. Modes of gene transfer, p. 1–42. In R. Kucherlapati and G. R. Smith (ed.), Genetic Recombination. American Society for Microbiology, Washington, D.C.
64. Postle, K., and R. F. Good. 1983. DNA sequence of the Escherichia coli tonB gene. Proc. Natl. Acad. Sci. USA 80:5235–5239.
65. Postle, K., and R. F. Good. 1985. A bidirectional rho-independent transcription terminator between the E. coli tonB gene and an opposing gene. Cell 41:577–585.
66. Postle, K., and W. S. Reznikoff. 1978. HindII and HindIII restriction maps of the attφ80-tonB-trp region of the Escherichia coli genome, and location of the tonB gene. J. Bacteriol. 136:1165–1173.
67. Prakash-Cheng, A., S. S. Chung, and J. Ryu. 1993. The expression and regulation of hsdK genes after conjugal transfer. Mol. Gen. Genet. 241:491–496.
68. Prakash-Cheng, A., and J. Ryu. 1993. Depalyed expression of in vivo restriction activity following conjugal transfer of Escherichia coli hsdK (restriction-modification) genes. J. Bacteriol. 175:4905–4906.
69. Price, C., and T. A. Bickle. 1986 A possible role for DNA restriction in bacterial evolution. Microbiol. Sci. 3:296–299.
70. Radman, M. 1991. Avoidance of inter-repeat recombination by sequence divergence and a mechanism of neutral evolution. Biochimie 73:357–361.
71. Rayssiguier, C., C. Dohet, and M. Radman. 1991. Interspecific recombination between Escherichia coli and Salmonella typhimurium occurs by the recABCD pathway. Biochimie 73:371–374.
72. Rayssigiuer, C., D. S. Thaler, and M. Radman. 1989. The barrier to recombination between Escherichia coli and Salmonella typhimurium is disrupted in mismatch-repair mutants. Nature (London) 342:396–401.
73. Reeck, G. R., C. de Haë, D. C. Teller, R. F. Doolittle, W. M. Fitch, R. E. Dickerson, P. Chambon, A. D. MacLachlan, E. Margoliash, T. H. Jukes, and E. Zuckerkandl. 1987. "Homology" in proteins and nucleic acids: a terminology muddle and a way out of it. Cell 50:667.
74. Reeves, P. R. 1992. Variation in O-antigens, niche-specific selection and bacterial populations. FEMS Microbiol. Lett. 79:509–516.
75. Robeson, J. P., R. M. Goldschmidt, and R. Curtiss III. 1980. Potential of Escherichia coli isolated from nature to propagate cloning vectors. Nature (London) 283:104–106.
76. Rudd, K. 1992. Alignment of E. coli DNA sequences to a revised, integrated genomic restriction map, p. 2.3–2.43. In J. H. Miller (ed.), A Short Course in Bacterial Genetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.
77. Sambrook, J., E. F. Fritsch and T. Maniatis. 1989. Molecular Cloning: a Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.
78. Sandri, R. M., and H. Berger. 1980 Bacteriophage P1-mediated generalized transduction in Escherichia coli: fate of transduced DNA in Rec+ and RecA– recipients. Virology 106:14–29.
79. Savageau, M. A. 1983. Escherichia coli habitats, cell types, and mechanisms of gene control. Am. Nat. 122:732–744.
80. Selander, R. K., D. A. Caugant, and T. S. Whittam. 1987. Genetic structure and variation in natural populations of Escherichia coli, p. 1625–1648. In F. C. Neidhardt, J. L. Ingraham, K. B. Low, B. Magasanik, M. Schaechter, and H. E. Umbarger (ed.), Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology, vol. 2. American Society for Microbiology, Washington, D.C.
81. Selander, R. K., and B. R. Levin. 1980. Genetic diversity and structure in Escherichia coli populations. Science 210:545–547.
82. Sharp, P. M., J. E. Kelleher, A. S. Daniel, G. M. Cowan, and N. E. Murray. 1992. Roles of selection and recombination in the evolution of type I restriction-modification systems in enterobacteria. Proc. Natl. Acad. Sci. USA 89:9836–9840.
83. Sharp, P. M., D. C. Shields, K. H. Wolfe, and W.-H. Li. 1989. Chromosomal location and evolutionary rate variation in enterobacterial genes. Science 246:808–810.
84. Shen, P., and H. V. Huang. 1986. Homologous recombination in Escherichia coli: dependence on substrate length and homology. Genetics 112:441–457.
85. Siddiqi, O., and M. S. Fox. 1973. Integration of donor DNA in bacterial conjugation. J. Mol. Biol. 77:101–123.
86. Smith, G. 1991 Conjugational recombination in Escherichia coli: myths and mechanisms. Cell 64:19–27.
87. Stoltzfus, A., J. F. Leslie, and R. Milkman. 1988. Molecular evolution of the E. coli chromosome. I. Analysis of structure and natural variation in a previously uncharacterized region between trp and tonB. Genetics 120:345–358.
87a. Tropp, B. E., L. Ragolia, W. Xia, W. Dowhan, R. Milkman, K. E. Rudd, R. Ivanisevic, and D. J. Savic. 1995. Identity of the Escherichia coli cls and nov genes. J. Bacteriol. 177:5155–5157.
88. Umeda, M., and E. Ohtsubo. 1989. Mapping of insertion elements IS1, IS2 and IS3 on the Escherichia coli K12 chromosome. Role of the insertion elements in the formation of Hfrs and F' factors and in rearrangements of bacterial chromosomes. J. Mol. Biol. 208:601–614.
88a. Wahl, R., and M. Kröger. 1995. ECDC—a totally integrated and interactively usable genetic map of Escherichia coli. Microbiol. Res. 150:7–61.
89. Wang, G., T. S. Whittam, C. M. Berg, and D. E. Berg. 1994 RAPD (arbitrary primer) PCR is more sensitive than multilocus enzyme electrophoresis for distinguishing related bacterial strains. Nucleic Acids Res. 21:5930–5933.
90. Wanntorp, H.-E. 1983 Reticulated cladograms and the identification of hybrid taxa, p. 81–88. In N. I. Platnick and V. A. Funk (ed.), Advances in Cladistics, vol. 2. Columbia University Press, New York.
91. Yura, T., H. Mori, H. Nagai, T. Nagata, A. Nishihama, N. Fujita, K. Isono, K. Mizobuchi, and A. Nakata. 1992 Systematic sequencing of the Escherichia coli genome: analysis of the 0–2.4 min region. Nucleic Acids Res. 20:3305–3308.