Genetic Variation and Evolutionary Processes in Natural Populations of Escherichia coli
Chapter
148
THOMAS S. WHITTAM
Bacterial population genetics is a young field that is addressing fundamental issues concerning the operation of genetic, ecological, and evolutionary processes in populations of bacteria in natural environments. Escherichia coli has occupied a central role in this growing field because of the deep knowledge of its molecular genetics, its importance as a pathogen of humans and domesticated animals, and its versatile ecology. The serious study of the genetics of natural E. coli populations began in 1973 (55), and in the past 2 decades, inquiry has proceeded along two complementary avenues. First, there are basic questions about how the processes of mutation, gene flow, natural selection, and genetic drift influence genetic variation and population structure, genomic organization, and rates of evolution change. Second, there are historical questions about the timing of divergence of particular populations and the emergence and geographic spread of pathogenic strains. Progress relevant to an understanding of the population genetics of E. coli has been summarized at various stages in reviews by Hartl and Dykhuizen (27), Achtman and Pluschke (2), Selander et al. (86), and Young (110).
The objective of this chapter is to highlight recent findings from the study of genetic variation in natural E. coli, with special emphasis on clonal structure of populations and the magnitude of the parameters of the evolutionary processes of mutation, recombination, and genetic drift in nature. The results come from investigations of protein polymorphism in housekeeping enzymes and DNA polymorphism in genes encoding a variety of structural proteins, which have begun to elucidate the nature of genetic variation and the significance of recombination in bacterial species. The final section of the chapter examines the evolutionary genetics underlying the origin and spread of new pathogenic clones of E. coli associated with the emergence of novel human enteric diseases.
An elementary view of the genetic structure of a bacterial species considers a population as a mixture of asexual cell lines or bacterial clones. In a rigorous sense, a bacterial clone consists of a single cell and all its descendants representing a monophyletic branch on an evolutionary tree. In this strict sense, a clonal lineage is a closed genetic system that accumulates differences only through genetic processes that occur within a single cell, such as point mutations, inversions, duplications and deletions, and transpositions. Clearly because of these processes, members of a clone are not necessarily genetically or phenotypically homogeneous. In practice, the term clone is used in a looser sense to refer to an extant group of bacteria within a species which have many similarities derived from a common ancestor that are not shared by other organisms in the species. Thus, clone is used in bacterial population genetics in the same sense that clade is used to refer to species that belong to a monophyletic branch of an evolutionary tree (47).
In a clonally structured population, genotypes are the currency of evolutionary change in a population and they can persist for many generations, diverging only through mutation and genetic changes that occur within cells. In the absence of recombination between cells, the appearance of an advantageous mutation results in the replacement of cell lines, a phenomenon referred to as periodic selection (4). In a clone mixture, the random sampling of lines resulting from stochastic extinction or periodic selection profoundly reduces the effective population size and the amount of genetic variation in a population (42, 44, 49).
Although the clone model of population structure assumes strictly asexual reproduction and no recombination between cell lines, it is clearly unrealistic in its strict form because many bacterial populations have evolved mechanisms for exchanging genetic information between cells. Comparative nucleotide sequencing of strains of E. coli, Salmonella enterica, and other bacterial species has revealed many mosaic segments of DNA (1, 23, 56, 57, 59, 88). To take into account the role of recombination in shaping genetic variation and genomic divergence in natural populations of E. coli, broader models of population structure have been developed.
To accommodate gene transfer and recombination, periodic selection, and mutation in the evolution of E. coli populations, Milkman and Bridges (56, 57) have developed a model in which a clonal frame, or specific chromosomal background, is driven to high frequency by an advantageous mutation. As the clone carrying the beneficial mutation increases in frequency, it replaces preexisting clones and purges variation in the population as a whole. While the adaptive clone is increasing and spreading geographically, it accumulates neutral mutations and replaces bits of its genome through gene transfer. The clonal frame refers to the remnants of DNA sequence representing the original chromosomal background of the founder cell. The degree to which the clonal frame remains intact depends, of course, on the relative rates of these processes.
The clonal frame model places special emphasis on the occurrence of a benficial mutation that marks the origin of a new clone. It predicts that as the adaptive clone expands, it becomes more diverse as transfer of genetic material from outside the clone creates mosaic genomes. In addition, if replacements are infrequent and involve short segments of DNA, the integrity of the clonal frame can remain evident over long periods of time.
More complex models of population structure have been devised that incorporate adaptation of bacterial subpopulations to different habitats or distinct niches (51, 80). To account for the well-known fact that proteins and lipopolysacchides found on the surface of E. coli cells are highly polymorphic (i.e., the number of types exceeds 160 for the somatic [O] antigens, 80 for K capsular antigens, and 60 for flagellar [H] antigens), Reeves (80) developed a model in which natural selection favors different alleles, encoding specific surface antigens, in different niches (e.g., host species, tissues, cellular environment). Beneficial alleles and their corresponding antigens are maintained in high frequency in specific niches because they confer a fitness advantage to the clones carrying the selected alleles. Gene transfer from organisms outside the niche introduces nonadaptive alleles which remain in low frequency if gene transfer is infrequent and selection is effective in eliminating nonadaptive alleles. With low levels of gene transfer and assortative recombination, the niche-specific selection model predicts specialization of clones to different niches as they accumulate niche-specific adaptive mutations. When a favorable mutation occurs, the clone carrying it will replace other clones in the particular habitats to which it is adapted, but it will not replace all other clones in the species as a whole (51). Thus, adaptive clonal genotypes will predominate in the bacterial subpopulations found in the favored habitats or niches. The extent to which these bacteria disperse or spread into different habitats or local geographic populations determines how common the adaptive clones become in the species as a whole. In addition, the original beneficial mutation may subsequently spread throughout the species by horizontal transfer to other clones.
In contrast, with high levels of gene transfer, adaptive mutations will spread rapidly and achieve high frequency in the bacterial subpopulation in the niche without association with a particular clone. In this case, gene frequencies, but not clonal genotypes, will vary between niche subpopulations.
A consequence of the restricted recombination in a clonally structured population is the presence of linkage disequilibrium, or the nonrandom association of alleles in haploid genotypes (27, 102, 103). Multilocus associations at the population level mean that certain combinations of alleles at different genes occur at greater frequency than expected, whereas other combinations are rare or absent. Extensive linkage disequilibrium reflects the genotypic structure of a population, which, in principle, decays as a function of the rate of recombination (44). In the absence of natural selection favoring specific gene combinations, frequent recombination will tend to randomize the allele combinations in genotypes, and the genotypic structure of the population will approach linkage equilibrium (28).
The contrast in genotypic structure between a clonal and a freely recombining population is illustrated in Fig. 1 (100). The dendrograms show genetic relationships for 30 haploid genotypes, depicted in the patterned bars as distinct allele combinations for 20 loci on a chromosome. In the top panel, cell lineages have diverged in multilocus genotype as mutations create new alleles without recombination. With the accumulation of allele substitutions, linkage disequilibrium builds up, as reflected in the deep branch in the dendrogram. In this case, the dendrogram reflects the phylogeny, or the underlying history of branching and ancestry, of the clonal genotypes. The allele mismatch distribution (found by comparing all pairs of chromosomal genotypes and tallying the number of allele differences between each pair) has a large variance and bimodal shape due to extensive multilocus linkage disequilibrium. The variance of the mismatch distribution forms the basis of a statistical measure (9) of multilocus genotypic structure discussed below.
With frequent recombination, however, alleles become randomized in genotypes, as shown in the bottom panel of the figure. In this case, there is little linkage disequilibrium. The dendrogram is bush-like, with short branches between interior nodes and long tips, and there is a reduction in the variance as the mismatch distribution becomes unimodal in shape. The dendrogram in this case is a phenogram that reflects the overall similarity in genotypes but does not correspond to a tree-like branching pattern expected from binary fission in the absence of recombination (50). Thus, in a population near linkage equilibrium, most genotypes lie near the mode of the distribution and recombination becomes a "cohesive force" that counters mutational divergence of bacterial lineages.
As a tool for studying genetic variation in natural populations, multilocus enzyme electrophoresis is based on the principle that differences in the migration rate of protein variants are a direct consequence of codon changes in the corresponding gene that cause amino acid substitutions affecting net electrostatic charge (85). Electrophoretic studies of protein variants of known amino acid sequence have demonstrated that this method detects most nucleotide substitutions resulting in amino acid replacements (54, 79). The allelic variation revealed by the electrophoresis of proteins, however, underestimates the total genetic variation at a locus because of amino acid replacements that do not alter mobility under standard conditions (12) and silent base substitutions that do not result in amino acid changes.
To examine the molecular basis of electrophoretic variation in the polymorphic enzyme malate dehydrogenase, Boyd et al. (7) determined nucleotide sequences for the mdh gene representing five electromorphs (protein mobility variants) among 20 strains of E. coli. They found that 6 of the 40 polymorphic nucleotide sites involved nonsynonymous substitutions which resulted in replacements of charged amino acids that accounted for all of the mobility variation. In contrast to mdh, the correspondence between amino acid replacements and electrophoretic mobility was poor for alleles of 6-phosphogluconate dehydrogenase (encoded by the gnd locus), where highly diverse alleles with multiple amino acid changes were indistinguishable in mobility (6). These limited data suggest that for certain highly polymorphic proteins, enzyme electrophoresis gives a poor assessment of the extent of allelic variation at a locus.
A comparison of the sequences of homologous genes from E. coli (strain K-12) and S. enterica (Typhimurium strain LT2) shows how much of the total genetic variation at a locus is revealed by examining amino acid differences (Fig. 2). For 87 pairs of homologous genes and their inferred protein products, the number of amino acid differences and charged amino acid differences is plotted against the number of base differences. The slopes of the linear regression lines show that about 26% of the base differences in genes result in amino acid differences in proteins and about 9% of the total number of base differences result in charged amino acid differences. If evolutionary divergence between alleles within E. coli populations follows the same dynamics as the divergence between species, then protein electrophoresis is capable of detecting about 1/10 of the total sequence variation for the average structural gene.
Thus, natural populations of E. coli harbor large amounts of genetic variation at individual loci, as revealed by protein polymorphisms of central metabolic enzymes. This observation agrees with the well-known fact that pathogenic strains of E. coli are among the most antigenically variable bacterial species seen in clinical settings (75) and with the findings of previous work on DNA polymorphisms among natural isolates (26).
A persistent question in the study of the evolutionary genetics of E. coli is what role recombination plays in the generation of genotypic diversity under natural conditions. To address this question, it is useful to make a distinction between three classes of recombination events (101). The classes are distinct, not because of the mechanism of gene transfer (conjugation or transduction) or the details of the molecular pathways for recombining DNA molecules, but because they are defined by the outcome of recombination events from a population genetic perspective. Events in each class involve the transfer of genetic material between cells and result in the production of recombinant genotypes. Assortative recombination events are those in which new chromosomal genotypes are produced by the reshuffling of existing alleles into new combinations. Intragenic recombination involves substitutions of pieces of genes, shorter in length than a cistron, that generate novel mosaic alleles and, thus, new genotypes. Finally, additive recombination occurs when genetic elements integrate to form a composite genotype that is the sum of the recombining molecules (84). One important type of additive event is the acquisition of genes from other bacterial species.
To assess the rates at which the alleles in natural populations of bacteria are reshuffled to create recombinant chromosomal genotypes, a method of monitoring allelic variation at multiple loci is required. Multilocus enzyme electrophoresis has been widely applied to bacterial populations because it readily permits an assessment of allelic variation at multiple loci in the large numbers of isolates required for population analysis (86).
In application to E. coli populations, the study of protein polymorphisms has provided evidence for a clonal structure. The clonal hypothesis originated from early observations of identical phenotypes, in such variable traits as serotype and biotype, among E. coli strains recovered from separate outbreaks of disease (74, 76). Selander and Levin (87) extended the clone concept to the E. coli species as a whole, based on the repeated recovery of isolates with identical multilocus enzyme genotypes. This finding is incompatible with high rates of assortative recombination, given the large number of alleles per locus. Further evidence for a clonal population structure in E. coli was the demonstration of extensive linkage disequilibrium for many enzyme loci (102, 103). The frequencies of multilocus genotypes in natural populations depart significantly from those expected under a model of random association (103), and the genotypic combination of the most common alleles, the modal genotype, was not observed in a sample of more than 1,600 isolates from natural sources (72).
Given a clonal population structure in which assortative recombination is infrequent, an analysis of all sets of genetic characteristics that are representative of the genome should yield similar genetic relationships among isolates. Numerous studies have demonstrated a concordance between phylogenetic relationships inferred from protein polymorphisms and other characteristics (3, 25, 60, 70, 77, 106); but at the same time, discordant phylogenies, based on comparisons of different genes or different character sets, have been used as evidence for past recombination events (5, 18, 20, 86).
Multilocus Linkage Disequilibrium.
Application of multilocus enzyme electrophoresis has revealed that E. coli populations are genetically diverse, with multiple electrophoretically detectable alleles found at most enzyme loci (86). Several examples of the amount of genetic diversity in E. coli populations from natural sources are given in Table 1. The data include electrophoretic variation among (i) 72 strains of the E. coli reference collection, a sample of isolates originally established to represent the range of genetic variation among the E. coli population (71, 86); (ii) 317 isolates cultured from stool samples of 13 children in three rural Mexican villages (101); and (iii) 283 isolates recovered from water samples in a reservoir (Whittam, unpublished data). For the samples from infants and water, the average single locus diversity (H) was 0.41 to 0.42. These values are slightly lower than those that have been reported for E. coli enteric populations from humans in other localities, which range from 0.44 in Finland to 0.54 in Massachusetts (86).
Table 1Multilocus linkage disequilibrium in E. coli from natural sourcesa |
The effect of recombination on the variation of the allele mismatch distribution forms the basis of a statistical method for assessing multilocus linkage disequilibrium. Brown et al. (9) devised an index of multilocus association, designated IA, which is equal to 1 minus the ratio of the observed variance of the mismatch distribution (VO) to the expected variance (VE) in the absence of linkage disequilibrium. IA is near zero as a population approaches linkage equilibrium and increases in value with increasing amounts of multilocus linkage disequilibrium. For the example in Fig. 1, the expected variance under linkage equilibrium is VE = 2.9 and is the same for both the top and bottom distributions because the expected variance is a simple function of single-locus diversity (9). The observed variance, however, decreases from VO = 7.7 (IA = 1.6) under the strictly clonal model to 3.8 (IA = 0.3) for a population at linkage equilibrium.
The variance ratio method was first applied to bacterial populations in studies of natural isolates of E. coli and demonstrated significant and widespread multilocus linkage disequilibrium both globally and in separate geographic populations (102, 103). The amount of multilocus linkage disequilibrium in E. coli populations is generally large and involves significant and complex allele associations at many loci (103). In commensal populations from humans and environmental samples from water (Table 1), multilocus linkage disequilibrium is indicated in the inflated variance of the mismatch distribution, which is one to two times the variance expected at linkage equilibrium (Table 1).
The variance ratio method was applied by Maynard Smith and colleagues (52) to strains of several serovars of Salmonella enterica and showed significant inflation of the variance of the mismatch distribution, as measured by IA, supporting the hypothesis that serogroups of S. enterica have predominantly clonal population structures. Interestingly, multilocus enzyme data from Neisseria gonorrhoeae and Rhizobium meliloti were close to linkage equilibrium, indicating that in these populations assortative recombination occurs so frequently that any disequilibrium is rapidly dissipated (52).
In addition to commensal E. coli populations, strains representing serotypes associated with enteric diseases also exhibit evidence of linkage disequilibrium (Table 1). Bacteria of three O serogroups (O55, O111, and O157) commonly associated with diarrheal diseases in infants, adults, or domesticated animals are highly genetically variable, with an average single-locus diversity among strains within these O serogroups several times greater than that found in the E. coli reference collection. These groups of enteric pathogens also show strong multilocus associations, as reflected in the significant IA values, when either isolates or electrophoretic types (ETs) are the unit of analysis. The degree of nonrandom association is more extreme when all isolates are examined because the bulk of the strains within a serogroup belong to a limited number of common ETs (10, 105, 107).
The results reinforce the general finding that populations of commensal E. coli and strains associated with enteric diseases have extensive multilocus structuring at the population level. This structure supports the hypothesis that, under natural conditions, assortative recombination of alleles of housekeeping enzymes produces new genotypic combinations at a slow rate, leaving clonal frames relatively intact and identifiable.
Assortative Recombination Parameter.
To estimate the rate at which alleles are assorted in nature, Hedrick and Thomson (28) began with computer simulations to examine the effect of recombination on measures of linkage disequilibrium between a pair of loci in a finite population. In these models, recombination was measured in a composite parameter, C = 2Nec, where Ne is the effective population size and c is the fraction of genotypes produced by recombination between the loci each generation. They tabulated the distribution of the linkage disequilibrium statistic (Q*) for different sample sizes, numbers of alleles, and levels of recombination. Using data from 100 E. coli isolates recovered from humans in Sweden and characterized for variation at 12 enzyme loci (11), they found that most of the mean values of Q* for 45 locus pairs were less than those predicted for a strictly clonal population (C = 0) and fell between the theoretical values of C = 10 and C = 100. The distribution of disequilibrium values for six pairs of loci with three alleles at one locus and six alleles at the other was close to the theoretical distribution for neutral alleles for C = 10. This analysis clearly demonstrated that the neutral recombination parameter, C, is greater than 0 in the Swedish E. coli population, indicating that recombination was assorting alleles into genotypes at a rate about five times the reciprocal of the effective population size.
The same method was used to estimate C between polymorphic protein loci in E. coli from children in Mexican villages (101). For the 190 possible pairwise comparisons between 20 polymorphic enzyme loci, 43 (23%) of the Q* values were significantly different from 0 when compared with a chi-square distribution, indicating that alleles for many locus pairs were in significant linkage disequilibrium. Overall, the average observed was Q* = 0.104. When compared with theoretical values for increments of C between C = 0 (strictly clonal) and C = 100 (virtually free recombination), theoretical curves for C equal to 1 and to 10 bracketed most of the observed means as well as the observed curve (101). Together these analyses of linkage disequilibrium reveal that the neutral recombination parameter, C, exceeds 0 in samples of E. coli populations in Mexico and Sweden. Although the estimates have large standard errors, the results suggest that the rate of assortative recombination in the average E. coli population is on the order of 5 to 10 times the reciprocal of the effective population size.
The amount of linkage disequilibrium estimated from multilocus enzyme electrophoresis data provides information about one type of recombination in bacteria, assortative recombination, in which whole alleles are transferred and give rise to new multilocus genotypes. Intragenic recombination, on the other hand, produces novel mosaic alleles that cannot be distinguished from new alleles that arise by point mutation without knowledge of the nucleotide sequences.
To obtain a more complete understanding of the genetic basis of allelic variation and elucidate the role of recombination, several laboratories have been involved in large-scale sequencing of multiple wild strains of E. coli (5, 7, 18, 20, 23, 25, 57, 58, 63, 65). Allelic variation in nucleotide sequence has been determined for 15 protein-encoding chromosomal genes among wild strains of E. coli, many of which are in the E. coli reference collection (70). Table 2 summarizes the data (extracted from GenBank) for the following genes, or parts of genes, and their associated products: trpB, tryptophan synthetase beta, and trpC, phosphoribosyl-anthranilate isomerase (58); gnd, 6-phosphogluconate dehydrogenase (5, 19, 20, 64); phoA, alkaline phosphatase (18); gapA, glyceraldehyde-3-phosphate dehydrogenase (65); putP, proline permease (63); mdh, malate dehydrogenase (7); pabB, p-aminobenzoate synthetase (24); sppA, protease IV (23); zwf, glucose-6-phosphate dehydrogenase (23); and two open reading frames (ORF2 and ORF3) in a region upstream of the trp operon (94). Table 2 also includes information on three genes that encode the phosphoenolpyruvate-dependent phosphotransferase system enzyme III protein specific for β-glucoside sugars (celC), glucose (crr), and glucitol (gutB) (25). In addition to these nucleotide sequences, Roger Milkman kindly provided high-resolution restriction site maps for five genes (56), including nirR (fumarate and nitrate reduction regulatory protein), fumB (anaerobic class I fumarase), tonB (sensitivity to T1 phages), topA (DNA topoisomerase I), and fumA (fumarase), data which were analyzed previously (101).
Table 2Nucleotide sequence variation in 15 polymorphic genes in E. coli strains from natural sources |
The analysis summarized in Table 2 lists two estimates for variation at the gnd locus. The first is based on the analysis of 10 sequences (5) with a sample size comparable to the other genes in the table. The second estimate is based on a combined data set consisting of 35 sequences and includes several highly divergent alleles (although alleles hypothesized to be foreign to E. coli have not been included [64]).
The average pairwise difference between alleles of a gene, expressed as a percentage, is given for both the nucleotide base and the amino acid level in Table 2. The amount of variation at the nucleotide level ranges from a low of 0.24% for the highly conserved gapA locus to a high of 4 to 7% for the highly variable gnd locus. Divergence in amino acid sequence as inferred from DNA sequences ranges from a high of 1.87% for the protein produced by pabB to no amino acid differences for the product of crr (Table 2). In general, the primary sequences of enzymes encoded by these genes are conserved, as reflected in the simple ratio of nucleotide to amino acid differences.
Neutral-Mutation Parameter.
Table 3 gives the number of polymorphic nucleotide sites (S), the average number of differences between pairs of sequences (k), the estimated neutral-mutation parameters, M(S) and M(k), and Tajima’s D statistic (95), a measure for detecting selection against deleterious alleles. The estimates of M range from 65.04 for M(S) for gnd to 2.36 for M(k) for the gap alleles (Table 3). Under the neutral-mutation hypothesis, the two estimates of M should not be significantly different, and none of the six D values falls outside the 95% confidence intervals listed in Tajima’s Table 2. However, the negative D for the gap alleles slightly exceeds the lower 90% limit for n = 13 (95), and it is in the direction suggesting that some variants may be weakly deleterious. Overall the differences between the two values of M are insignificant and do not deviate from those expected under the neutral-mutation hypothesis.
Table 3Estimates of the parameters of neutral-mutation and intragenic recombination |
A similar analysis of alleles of five coding regions based on DNA polymorphisms detected by four-cutter restriction enzymes provides two estimates of M that were used in the Tajima test (101). Here, M(S) was estimated by p/ in –1(1/i), in which n is the number of sequences and p is the proportion of polymorphic nucleotide sites (62); p was calculated from the number of polymorphic restriction sites (31), and M(k) was calculated from π (62). The paired estimates of M did not differ significantly, as indicated by the magnitude of Tajima’s D, and thus variations of these genes also do not show strong departures from the expectations of the neutral-mutation hypothesis.
Nucleotide Diversity.
Although M is an important quantity for determining amounts of neutral variation at the level of the locus, it is also necessary to have a measure of variation at the nucleotide level. At the nucleotide level, there are two measures that are useful for quantifying genetic variation: p, the proportion of polymorphic nucleotide sites, and π, the nucleotide diversity (62). For the 15 E. coli genes, the overall level of genetic variation at the nucleotide level ranges 10-fold, from the extensive polymorphism observed for gnd to the low level of polymorphism observed for gap alleles (Fig. 3). Overall, the proportion of polymorphic sites (p) averaged across the 15 genes is 0.062, which means that 6.2% of the sites differ in a sample of alleles. Nucleotide diversity (π) ranges from 0.039 for gnd (0.048 for 35 alleles) to 0.002 for gapA, with an average across 15 genes of 0.018, so that a pair of alleles differ on average at 1.8% of their sites.
Neutral-Recombination Parameter.
To understand the contribution of recombination at the DNA sequence level, Hudson (32) devised an estimator of the recombination parameter C (= 2Nec) based on the variance of the number of site differences between pairs of sequences in a random sample of genes. In this case, c is the recombination rate per generation between the ends of the DNA segment being studied. Hudson’s estimator is useful because it gives values near the true value of recombination for a wide range of evolutionary parameters, although large samples are required to obtain reliable values, and it can be applied to both nucleotide sequence and restriction site data (32).
The amounts of intragenic recombination based on the calculation of Hudson’s C for 15 E. coli genes are given in Table 3. The values of C when all polymorphic nucleotide sites were considered range from 23.2 for phoA to 331.3 for gapA genes (Table 3). A similar analysis of restriction site polymorphisms for five loci gives estimates of C range from about 4 to 34 (Table 4). The ratio C/M reflects the contribution of intragenic recombination relative to mutation in the generation of allelic variation. Interestingly, C and M are positively correlated across 14 loci, and their ratio also falls within a relatively narrow range of about 1 to 8 (Fig. 4). The positive correlation suggests that for genes under weaker constraints, both mutation and intragenic recombination rates are increased.
Table 4Estimates of the neutral-mutation and recombination parametersa from polymorphic restriction sites for five genes |
The greatest absolute estimate of C based on synonymous sites is for the gnd alleles sequenced by Biseric¢ et al. (5). The magnitude of this parameter suggests that intragenic recombination occurs at a rate more than 100 times greater than the reciprocal of the effective population size. Recombination at this locus has been invoked to explain the extensive clustering of polymorphic sites along the gene and the mosaic structure of alleles at this highly variable locus (20, 64, 82).
Additive recombination can result from a variety of gene transfer events, including integration of bacteriophages and plasmids and translocation of insertion sequences and transposable elements (84). I will not review the numerous studies of additive recombination here, but instead, will focus attention on genes that are suspected to have recently originated from outside the E. coli genome, thus representing interspecific transfer events.
Recent Acquisition of Foreign Genes.
There are many examples of gene exchanges between different species of prokaryotes, some involving very distantly related organisms (53). One intriguing example that points to the recent incorporation of foreign genes into the E. coli population is the observation that 15% of wild E. coli strains produce an unusual satellite DNA that is synthesized by bacterial reverse transcriptase (RT) (29, 40). First discovered in E. coli in strain B (48), this satellite DNA, called multicopy single-stranded DNA (msDNA), is unusual because it consists of a single-stranded DNA linked to an RNA sequence by a 2',5'-phosphodiester linkage (15) and exists in many copies per cell. The genes that encode RT and msDNA are linked together in a "retron," which is thought to represent a primitive type of retroelement (33).
Although uncommon in the E. coli population, msDNA-producing strains occur in several highly divergent clusters of strains, suggesting that retrons have been independently acquired in separate phylogenetic lineages (29, 40). In fact, Inouye and colleagues have discovered that, in one case, a retron is part of a cryptic prophage (34) and, in another case, a retron is contained within a larger unique sequence that was probably integrated into the E. coli genome by transposition or phage integration (30). The hypothesis that these retrons had diverged in other species of bacteria before being transferred to E. coli is supported by the observation that the average codon adaptation index (CAI) (91) for the RT genes of two msDNA-producing strains is 0.17 (29), a value lower than those reported for any of 165 E. coli genes (91). Such a low degree of codon adaptation strongly argues that retrons are foreign to E. coli and were transferred relatively recently and independently into different lineages. However, the phenotypic effects of msDNA and RT expression and the adaptive benefits that are conferred to host cells, if any, remain to be elucidated.
As this example illustrates, genes that have recently been transferred into the E. coli genome often show atypical patterns of codon usage. To estimate the extent of recent transfers representing gene flow into the E. coli population, Whittam and Ake (101) tabulated the percent GC content and CAI for 500 coding E. coli sequences obtained from GenBank (Fig. 5). Most of the loci fall into a cloud of points between 47 and 57% GC content and range in CAI from 0.20 to 0.85. However, approximately 6% of the points have a percent GC of <45 and CAI of <0.37. These genes are atypical for the E. coli genome and include, for example, the coding regions of the msDNA retron, the erm genes encoding high-level resistance to macrolide-lincosamide-streptogramin antibiotics (8), and hemolysin-encoding genes (21), all of which are considered to have been recently transferred into E. coli from distantly related organisms. For additional discussion of codon usage in E. coli, see chapter 114.
Estimates of Evolutionary Parameters.
One of the goals of empirical population genetics is to refine estimates of the parameters of the evolutionary process in natural populations. For the E. coli population comprising the normal enteric flora, estimates of several of the parameters of the evolutionary process are summarized in Table 5. Genetic diversity detected by enzyme electrophoresis typically falls in the range of 0.34 to 0.54 (86). Whittam and Ake (101) used the observed level of genetic diversity to estimate the neutral mutation parameter, M = 2Nev, that depends on the inverse relationship between the effective population size (Ne) and the rate of neutral mutation (v), so that information about the size of one parameter can be used to infer the size of the other (41). With a working value of M = 1 for E. coli populations, then, under the assumption that the rate of electrophoretically detectable mutation is on the order 10–7 per locus per generation, the effective population size is on the order of 107. This value for Ne seems unreasonably small considering that the standing crop of E. coli cells may be as large as 1020 (59). Hence, we arrive at a paradox; either Ne is much smaller than seems biologically reasonable or if Ne is larger, say on the order of 1010 (59), then the mutational production of electrophoretic variants occurs at a rate much slower than expected.
Table 5Summary of evolutionary genetic parameters for natural E. coli populations |
Resolution of this paradox comes from consideration of the effects of periodic selection (4) and random extinction of cell lines under low rates of recombination. Kubitschek (42) recognized that in asexual haploid populations accumulated neutral variation would be lost during periodic selection events. Maruyama and Kimura (49) found that when local extinction and recolonization occur frequently, the effective size of a haploid population consisting of independent cell lines can be profoundly reduced, to the order of 107, if local extinction occurs as frequently as every thousand generations. Levin (44) has also demonstrated that, in principle, low rates of periodic selection and gene exchange purge neutral variation in bacterial systems. He concludes that under reasonable values of population density, turnover rate, and gene transfer, the fate of neutral alleles in populations of E. coli would be similar to the fate of neutral alleles in populations of sexually reproducing species with small effective sizes.
A second consideration is that the production of electrophoretically detectable alleles, which involves the rate of nonsynonymous mutation, may be lower in bacteria than in eukaryotic species (73). Whittam and Ake (101) compared 62 published sequences for E. coli K-12 and S. enterica Typhimurium strain LT2 and calculated that, on average, there were 4.6 charge differences per locus per lineage. With the assumption that protein electrophoresis detects all charge differences and using Ochman and Wilson’s (73) estimate that E. coli and S. typhimurium diverged 140 million years ago, the rate of charged amino acid substitution was 3.2 × 10–8 per year, or 3.2 × 10–10 per generation (assuming 102 generations per year).
Thus, the equilibrium levels of protein polymorphism in natural populations of E. coli are consistent with a neutral-mutation parameter of M = 1. This parameter represents the product of the rate of mutation of electrophoretic variants per locus, which we estimate could be as low as 3 × 10–10 per generation, and the effective population size, which concomitantly could be as large as 3 × 109 (Table 5). A value of M = 1 is also consistent with a frequency of about 5% for the repeated recovery of the same electrophoretic type (based on 20 enzyme loci) from a population (see reference 51).
The level of assortative recombination between loci within populations falls within an order of magnitude (1 < C < 10) of the neutral-mutation parameter, which means that the rate of assortative recombination in natural E. coli populations could be as low as 10–9 per generation (Table 5). Although estimates of these parameters are crude at best, resting on the assumptions of strictly neutral mutations and a population at mutation drift steady state, they can be refined by studies of sequence variation at the DNA level.
For the 15 genes that have been examined for DNA polymorphism within E. coli, the amount of variation per locus ranges within an order of magnitude from 0.013 for gapA to 0.130 for gnd. The level of recombination within genes, as measured by Hudson’s C, indicates that the rate of intragenic recombination is several times greater than the neutral-mutation rate, on the order of 10–8 to 10–9 per locus per generation (Table 5). Milkman and Bridges (56) estimated a replacement rate of 5 × 10–12 per nucleotide per generation, which is on the same order of magnitude as our estimate of 10–9 to 10–10 per locus, given that the average locus is about 1,000 nucleotides. Guttman and Dykhuizen (23) obtained a higher estimate of about 50 times the mutation rate, when they compared strains within group A of the E. coli reference collection, but the rate dropped to about 14 times the mutation rate when data from both the group A and group D strains were combined. Together these findings indicate that intragenic and assortative recombination can play a significant role in rapidly generating the diversity of new alleles and driving the divergence of clonal backgrounds.
Intragenic recombination appears to have been particularly important in the evolution of variation at the gnd locus. This locus has two to three times more variation than expected based on the size of the gene (101); and although a highly variable enzyme in E. coli populations, 6-phosphogluconate dehydrogenase is not a good indicator of overall genetic relatedness of strains, as reflected in the low correlation with genetic distance (86).
One factor that may account for the unusually high variability of the gnd locus is the indirect effect of natural selection operating on the nearby rfb gene cluster, a region involved in the synthesis of the lipopolysaccharide O antigen (see chapter 147). In E. coli, antigenic expression is highly variable, and strains with particular O antigens are prevalent in certain diseases of humans and domestic animals while occurring nonrandomly in the normal enteric bacteria of these hosts (78). Biseric¢ et al. (5) have suggested that the close proximity of gnd and rfb inhibits genetic drift at the gnd locus, presumably because of the action of selection on antigenic variation. One possibility is that recombinants involving the gnd-rfb region have selective advantages and increase in frequency under certain conditions that favor strains with specific O antigens. By attaining higher frequencies, these variants are less likely to be lost by random drift or through turnovers caused by periodic selection.
Table 5 also includes Sharp’s analysis of sequence divergence between homologous genes of E. coli and S. enterica Typhimurium (90). The rates of synonymous and nonsynonymous substitution vary greater across genes, with an average KS = 0.94 per synonymous site and KA = 0.043 per nonsynonymous site. The average values were converted to evolutionary rates by dividing by the estimated splitting time of 140 million years. The synonymous rate is a function of the codon bias with a small effect of chromosomal position (proximity to the origin of replication) so that genes farther from the origin have slightly higher KS values (92).
The variation detected by multilocus enzyme electrophoresis has provided useful systems of genetic markers for analyzing the population structure of bacterial species that cause human infectious diseases. A key generalization derived from these studies is that many bacterial pathogens have a clonal population structure, with a limited number of geographically widespread, pathogenic clones accounting for most cases of disease (89). For example, Shigella sonnei, a pathogen that causes dysentery, represents a homogeneous, geographically widespread clone that is in reality a single clonal lineage of E. coli (72, 103). Karaolis et al. (37) have shown that strains of S. sonnei collected from unassociated hosts on different continents over a 40-year period are virtually homogeneous in serotype and ET and are similar in nucleotide sequence for two housekeeping genes. However, despite the overall low level of variation among S. sonnei strains, restriction site analyses of ribosomal genes show a clear change in the frequency of alleles, suggesting a recent clonal replacement (37).
The concept that E. coli populations are composed of widespread clones was introduced by the Ørskovs and colleagues (74, 76) to account for the observation of identical phenotypes, in such variable traits as serotype and biotype, among E. coli strains recovered from separate outbreaks of disease. The first group of strains that were epidemiologically linked to enteric disease were the enteropathogenic E. coli (EPEC) strains that caused severe outbreaks of infantile diarrhea in hospitals in Great Britain more than 40 years ago (45). Only a few specific O:H serotypes of E. coli were incriminated in outbreaks, and these strains were later shown to have a distinct phenotype defined by the pattern of adhesion to tissue culture cells, called localized adherence (LA) (13). The LA+ phenotype corresponds to the special ability of bacterial cells to form distinct microcolonies on eukaryotic cells. The full expression of the LA phenotype is mediated by a plasmid that carries an adherence factor (22, 61).
Characterization of the ETs of 50 LA+ strains of nine EPEC serotypes revealed that most serotypes represent homogeneous bacterial clones (77). Strains of the same serotype from diverse sources were identical in ET and indistinguishable in outer membrane protein profile and other phenotypic properties. More important, LA+ strains fell into two related groups, a division that was not evident in the distribution of O serotypes (77). Within each cluster, a variety of O antigens was present; however, flagellar antigens (H types) tended to be conserved within a lineage, with strains of one cluster typically expressing H6 antigen and those of the other cluster expressing H2 antigens (77).
These results indicate that classical EPEC strains, associated with infant gastroenteritis, represent widespread clones that are organized into two distinct complexes. Isolates of the clones have been recovered from cultures more than 50 years old, and in many cases the clones are sufficiently stable that they have spread intact into human populations on several continents. Most cases of EPEC disease are caused by infection by members of these clone complexes. EPEC clones possess special virulence properties: strains of both clusters express the LA phenotype, which is plasmid mediated. These two EPEC lineages are only distantly related to other pathogenic E. coli strains (107), suggesting that the LA ability has evolved in divergent chromosomal backgrounds, presumably through the horizontal spread of plasmid-borne genes. The maintenance of LA phenotypes in separate lineages also suggests that this ability confers a selective advantage to enteropathogenic strains.
In contrast to the situation in which bacteria of a single clone or clone complex cause the majority of cases of infectious disease (89), the analysis of genetic variation in E. coli strains from certain populations has disclosed surprisingly high levels of clonal diversity. That is, for certain infections, a large number of different E. coli genotypes are associated with disease. For example, Woodward et al. (109) characterized the multilocus enzyme genotypes of 87 E. coli isolates collected from neonatal pigs with diarrhea from Australia. Although the isolates represented three common O serogroups (O9, O20, and O101) and were collected from one pathological condition in a single country, the genetic diversity was extensive, with 73 multilocus genotypes resolved for 19 enzyme loci. There was no single predominant clonal type or common genetic background associated with porcine diarrheal disease. Similar results have been obtained from diseased birds in domesticated flocks, where three or four diverse clone complexes are typically found among isolates recovered from cases of colibacillosis, airsacculitis, or pericarditis (98, 106).
There are several hypotheses that could account for the diversity of strains recovered from certain disease populations. First, the clinical signs defining a specific condition may be the manifestation of more than one type of infection so that bacteria collected from affected hosts represent a mixture of divergent clones with different virulence factors and modes of pathogenesis. Second, strains implicated in disease might share a powerful virulence factor that has been spread horizontally among many clonal frames in the bacterial population. Under this hypothesis, the transmission of a factor by plasmids or phage could promote the virulence of a large number of strains with different chromosomal genomes. Third, a large fraction of the E. coli isolates recovered from diseased individuals may be opportunistic infections by the commensal microflora, which is multiclonal and dynamic (see reference 99 for a review). In this case, the frequency of opportunistic clones obtained from diseased individuals would reflect the prevalence of those bacterial genotypes in the normal intestinal flora. Evidence for this type of mixture of clones is found for strains implicated in urinary tract infections in humans and animals, in which population genetic studies have shown that the variety of ETs and serotypes recovered from affected individuals represent both uropathogenic clones and opportunistic infections (11, 108).
In 1982 several outbreaks of an unusual form of bloody diarrhea, called hemorrhagic colitis, drew attention to a new pathogenic E. coli strain, serotype O157:H7, with a novel mode of pathogenesis that previously had not been associated with enteric disease (81). The strains recovered from these outbreaks did not possess the virulence determinants typical of other E. coli strains that cause infectious enteric disease: they failed to produce the classical heat-labile or heat-stable enterotoxins, lacked invasive abilities, and were serotypically distinct from EPEC strains that have long been associated with worldwide outbreaks of infantile diarrhea (81). Since these initial outbreaks, strains of the O157:H7 serotype have caused many serious outbreaks of hemorrhagic colitis and hemolytic uremic syndrome and have been linked to hundreds of sporadic cases of gastrointestinal illness in the United States and Canada (39). For example, in January 1993, O157:H7 strains caused a regional outbreak of severe diarrhea in Washington that resulted in three deaths and more than 600 reported cases of illness (68). Illnesses associated with E. coli O157:H7 infections, as well as diseases caused by other cytotoxin-producing strains, have emerged as a major health problem in North America and Europe (14, 39, 93).
The mechanism of virulence and pathogenesis involved in infections of O157:H7 strains is not fully understood, but it differs from other mechanisms described for pathogenic E. coli that cause diarrheal disease in humans and animals. Several factors have been implicated in the virulence of O157:H7 strains, including a high level of expression of potent Shiga-like cytotoxins (66, 69), carriage of plasmids that encode adhesins mediating bacterial adherence to intestinal cells (38, 46), and production of intimin, a protein encoded by the chromosomal eaeA gene that is involved in the intimate attachment of bacteria to enterocytes and subsequent effacement of the microvilli (16, 36, 111).
To determine the genetic relationships of O157:H7 strains to other pathogenic forms of E. coli, multilocus enzyme electrophoresis was used to study the genetic diversity and clonal relationships among O157:H7 isolates and strains of other serotypes implicated in diarrheal disease (107). Population genetic analysis demonstrated that O157:H7 isolates from recent epidemics of hemorrhagic colitis and hemolytic uremic syndrome in North America belonged to a clone complex that was not closely allied to Shiga-like cytotoxin-producing strains of other E. coli serotypes (104), many of which produce a clinically similar form of bloody diarrhea (96, 97). The O157:H7 clone was also found to be only distantly related to other ETs of the O157 group associated with enteric infections in animals (105).
Further comparisons of O157:H7 strains to a diverse collection of isolates of serotypes associated with infectious diarrheal disease revealed that 72% of the isolates belong to 15 major ETs, each of which marks a bacterial clone with a widespread geographic distribution (107). Genetically, the O157:H7 clone is most closely related to a clone of O55:H7 strains that has long been associated with worldwide outbreaks of infantile diarrhea. Both O55:H7 and O157:H7 strains attach intimately to the surfaces of intestinal epithelial cells in the initial stages of infection, efface the microvilli, and induce characteristic histological and ultrastructural lesions, called attaching and effacing (A/E) lesions, in animal models (43). The production of A/E lesions is determined, in part, by the expression of intimin and other products of the chromosomal eae gene complex (17, 43). Presumably, the eae gene complex was present in the most recent ancestor of the O55:H7 and O157:H7 strains because of the overall close genomic relatedness of these strains.
The above observations suggest the hypothesis that the O157:H7 pathogenic clone emerged when an O55:H7-like progenitor, already possessing a mechanism for adherence to intestinal cells, acquired secondary virulence factors (Shiga-like cytotoxins and plasmid-encoded adhesins) via horizontal transfer and recombination. The working model is that, first, an ancestral E. coli strain evolved the chromosomally encoded gene products that mediate A/E adherence. This attribute alone may be sufficient for bacteria to cause diarrheal disease in infants, as is the case of the contemporary O55:H7 clone. Second, an O55:H7-like progenitor cell, already able to cause disease with the A/E mechanism, acquired secondary virulence factors, such as the genes encoding Shiga-like toxins and adhesins, via horizontal genetic transfer from other strains. With these genes expressed in the A/E chromosomal background, a new pathogenic lineage causing a new type of disease emerged—the O157:H7 clone.
The hypothesis that the O157:H7 clone originated from an O55:H7-like ancestor through horizontal transfer and recombination gains support from the observation that Shiga-like toxins occur in diverse lineages of the E. coli population (104) and can be transferred by phages under laboratory conditions (66, 69). The toxin-converting phages are members of a diverse family that appear to be widespread in nature (67). Related cytotoxins have also been discovered in Citrobacter freundii, suggesting that toxin genes have laterally spread among enteric bacteria (83). Finally, the pattern of codon usage in the Shiga-like toxin genes also differs from that of most E. coli protein-encoding genes (35) and the CAI (91) value falls below those reported for 165 E. coli genes. Such a low degree of codon adaptation strongly argues that the toxin genes are foreign to E. coli and were relatively recently acquired via horizontal transfer (101).
The study of genetic polymorphisms has shown that natural populations of E. coli harbor extensive genetic diversity that is organized into a limited number of genetically distinct clones. Such a clonal population structure suggests that the rate of recombination between strains in nature is low. Here I distinguish three types of past recombination events: assortative recombination, in which whole alleles are transferred between strains and assorted into new genotypic combinations; intragenic recombination, in which short pieces of genes recombine to generate new alleles and thus new genotypes; and additive recombination events, in which genes are transferred from other bacterial species into the E. coli genome. Distributions of linkage disequilibrium coefficients between allozymes indicate that assortative recombination occurs about as frequently as mutation in the evolutionary divergence of multilocus enzyme genotypes. Comparisons of the patterns of nucleotide polymorphisms among alleles of 15 different protein-encoding genes reveal that, in most cases, amino acid variation is highly constrained for housekeeping enzymes. The clustering of polymorphic sites, as reflected in the variance of the number of site differences between alleles, suggests estimates of the rate of intragenic recombination that are 1 to 10 times the point mutation rate. For gnd in particular, assortative, intragenic, and foreign transfer have all contributed to the variation at this locus, presumably through indirect selection on nearby loci encoding antigenic surface molecules. The joint distribution of GC content and CAI for 500 E. coli genes indicates that about 6% of genes show aberrant values, presumably as a result of their having been recently acquired by transfer from other species of bacteria. Finally, the acquisition of Shiga-like cytotoxins by a widespread pathogenic E. coli clone was a crucial step in the emergence of a new, highly virulent human pathogen.
References
1. Achtman, M., and R. Hakenbeck. 1992. Recent developments regarding the evolution of pathogenic bacteria, p. 13–31. In C. E. Hormache, C. W. Penn, and C. J. Smyth (ed.), Molecular Biology of Bacterial Infection: Current Status and Future Perspectives. Cambridge University Press, New York.
2. Achtman, M., and G. Pluschke. 1986. Clonal analysis of descent of virulence among selected Escherichia coli. Annu. Rev. Microbiol. 40:185–210.
3. Arbeit, R. D., M. Arthur, R. Dunn, C. Kim, R. K. Selander, and R. Goldstein. 1990. Resolution of recent evolutionary divergence among Escherichia coli from related lineages: the application of pulsed field electrophoresis to molecular epidemiology. J. Infect. Dis. 161:220–235.
4. Atwood, K. C., L. K. Schneider, and F. J. Ryan. 1951. Periodic selection in Escherichia coli. Proc. Natl. Acad. Sci. USA 37:146–155.
5. Biseric¢, M., J. Y. Feutrier, and P. R. Reeves. 1991. Nucleotide sequences of the gnd genes from nine natural isolates of Escherichia coli: evidence of intragenic recombination as a contributing factor in the evolution of the polymorphic gnd locus. J. Bacteriol. 173:3894–3900.
6. Biseric¢, M., and H. Ochman. 1993. The ancestry of insertion sequences common to Escherichia coli and Salmonella typhimurium. J. Bacteriol. 175:7863–7868.
7. Boyd, E. F., K. Nelson, F.-S. Wang, T. S. Whittam, and R. K. Selander. 1994. Molecular genetic basis of allelic polymorphism in malate dehydrogenase (mdh) in natural populations of Escherichia coli and Salmonella enterica. Proc. Natl. Acad. Sci. USA 91:1280–1284.
8. Brisson-Noël, A., M. Arthur, and P. Courvalin. 1988. Evidence for natural gene transfer from gram-positive cocci to Escherichia coli. J. Bacteriol. 170:1739–1745.
9. Brown, A. H. D., M. W. Feldman, and E. Nevo. 1980. Multilocus structure of natural populations of Hordeum spontaneum. Genetics 96:523–536.
10. Campos, L., C., T. S. Whittam, T. A. T. Gomes, J. R. C. Andrade, and L. R. Trabulsi. 1994. Escherichia coli serogroup O111 includes several clones of diarrheagenic strains with different virulence properties. Infect. Immun. 62:3282–3288.
11. Caugant, D. A., B. R. Levin, G. Lidin-Janson, T. S. Whittam, C. S. Edén, and R. K. Selander. 1983. Genetic diversity and relationships among strains of Escherichia coli in the intestine and those causing urinary tract infections. Prog. Allergy 33:203–227.
12. Coyne, J. A. 1982. Gel electrophoresis and cryptic protein variation. Curr. Top. Biol. Med. 5:1–32.
13. Cravioto, A., R. J. Gross, S. M. Scotland, and B. Rowe. 1979. An adhesive factor found in strains of Escherichia coli belonging to the traditional infantile enteropathogenic serotypes. Curr. Microbiol. 3:95–99.
14. Cryan, B. 1990. Enterohaemorrhagic Escherichia coli. Scand. J. Infect. Dis. 22:1–4.
15. Dhundale, A., B. Lampson, T. Furuichi, M. Inouye, and S. Inouye. 1987. Structure of msDNA from Myxococcus xanthus: evidence for a long, self-annealing RNA precursor for the covalently linked, branched RNA. Cell 51:1105–1112.
16. Donnenberg, M. S., and J. B. Kaper. 1992. Enteropathogenic Escherichia coli. Infect. Immun. 60:3953–3961.
17. Donnenberg, M. S., J. Yu, and J. B. Kaper. 1993. A second chromosomal gene necessary for intimate attachment of enteropathogenic Escherichia coli to epithelial cells. J. Bacteriol. 175:4670–4680.
18. DuBose, R. F., D. E. Dykhuizen, and D. L. Hartl. 1988. Genetic exchange among natural isolates of bacteria: recombination within the phoA gene of Escherichia coli. Proc. Natl. Acad. Sci. USA 85:7036–7040.
19. Dykhuizen, D. E., and L. Green. 1986. DNA sequence variation, DNA phylogeny and recombination in E. coli. Genetics 113:s71.
20. Dykhuizen, D. E., and L. Green. 1991. Recombination in Escherichia coli and the definition of biological species. J. Bacteriol. 173:7257–7268.
21. Felmlee, T., S. Pellett, and R. A. Welch. 1985. Nucleotide sequence of an Escherichia coli chromosomal hemolysin. J. Bacteriol. 163:94–105.
22. Gomes, T. A. T., M. A. M. Vieira, I. K. Wachsmuth, P. A. Blake, and L. R. Trabulsi. 1989. Serotype-specific prevalence of Escherichia coli strains with EPEC adherence factor genes in infants with and without diarrhea in São Paulo, Brazil. J. Infect. Dis. 160:131–135.
23. Guttman, D. S., and D. E. Dykhuizen. 1994. Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science 266:1380–1383.
24. Guttman, D. S., and D. E. Dykhuizen. 1994. Detecting selective sweeps in naturally occurring Escherichia coli. Genetics 138:993–1003.
25. Hall, B. G., and P. M. Sharp. 1992. Molecular population genetics of Escherichia coli: DNA sequence diversity at the celC, crr, and gutB loci of natural isolates. Mol. Biol. Evol. 9:654–665.
26. Harshman, L., and M. Riley. 1980. Conservation and variation of nucleotide sequences in Escherichia coli strains isolated from nature. J. Bacteriol. 144:560–568.
27. Hartl, D. L., and D. E. Dykhuizen. 1984. The population genetics of Escherichia coli. Annu. Rev. Genet. 18:31–68.
28. Hedrick, P. W., and G. Thomson. 1986. A two-locus neutrality test: applications to humans, E. coli, and lodgepole pine. Genetics 112:135–156.
29. Herzer, P. J., S. Inouye, M. Inouye, and T. S. Whittam. 1990. Phylogenetic distribution of branched RNA-linked multicopy single-stranded DNA among natural isolates of Escherichia coli. J. Bacteriol. 172:6175–6181.
30. Hsu, M.-Y., M. Inouye, and S. Inouye. 1990. Retron for the 67-base multicopy single-stranded DNA from Escherichia coli: a potential transposable element encoding both reverse transcriptase and Dam methylase functions. Proc. Natl. Acad. Sci. USA 87:9454–9458.
31. Hudson, R. R. 1982. Estimating genetic variability with restriction endonucleases. Genetics 100:711–719.
32. Hudson, R. R. 1987. Estimating the recombination parameter of a finite population model without selection. Genet. Res. 50:245–250.
33. Inouye, M., and S. Inouye. 1991. Retroelements in bacteria. Trends Biochem. Sci. 16:18–21.
34. Inouye, S., M. G. Sunshine, E. W. Six, and M. Inouye. 1991. Retronphage R73: an E. coli phage that contains a retroelement and integrates into a tRNA gene. Science 252:969–971.
35. Jackson, M. P., R. J. Neill, A. D. O’Brien, R. K. Holmes, and J. W. Newland. 1987. Nucleotide sequence analysis and comparison of the structural genes for Shiga-like toxin I and Shiga-like toxin II encoded by bacteriophages from Escherichia coli 933. FEMS Microbiol. Lett. 44:109–114.
36. Jerse, A. E., J. Yu, B. D. Tall, and J. B. Kaper. 1990. A genetic locus of enteropathogenic Escherichia coli necessary for the production of attaching and effacing lesions on tissue culture cells. Proc. Natl. Acad. Sci. USA 87:7839–7843.
37. Karaolis, D. K. R., R. Lan, and P. R. Reeves. 1994. Sequence variation in Shigella sonnei (Sonnei), a pathogenic clone of Escherichia coli, over four continents and 41 years. J. Clin. Microbiol. 32:796–802.
38. Karch, H., J. Heesemann, R. Laufs, A. D. O’Brien, C. O. Tacket, and M. M. Levine. 1987. A plasmid of enterohemorrhagic Escherichia coli O157:H7 is required for expression of a new fimbrial antigen and for adhesion to epithelial cells. Infect. Immun. 55:455–461.
39. Karmali, M. 1989. Infection by verocytotoxin-producing Escherichia coli. Clin. Microbiol. Rev. 2:15–38.
40. Kawaguchi, T., P. J. Herzer, M. Inouye, and S. Inouye. 1992. Sequence diversity of the 1.3 kb retron (retron-Ec107) among three distinct phylogenetic groups of Escherichia coli. Mol. Microbiol. 6:355–361.
41. Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, London.
42. Kubitschek, H. E. 1974. Operation of selection pressure on microbial populations, p. 105–130. In M. J. Carlile and J. J. Skehel (ed.), Evolution in the Microbial World. Cambridge University Press, London.
43. Law, D. 1994. Adhesion and its role in the virulence of enteropathogenic Escherichia coli. Clin. Microbiol. Rev. 7:152–173.
44. Levin, B. L. 1981. Periodic selection, infectious gene exchange, and the genetic structure of E. coli populations. Genetics 99:1–23.
45. Levine, M. M., and R. Edelman. 1984. Enteropathogenic Escherichia coli of classic serotypes associated with infant diarrhea: epidemiology and pathogenesis. Epidemiol. Rev. 6:31–51.
46. Levine, M. M., J. Xu, J. B. Kaper, H. Lior, V. Prado, B. Tall, J. Nataro, H. Karch, and K. Wachsmuth. 1987. A DNA probe to identify enterohemorrhagic Escherichia coli of O157:H7 and other serotypes that cause hemorrhagic colitis and hemolytic uremic syndrome. J. Infect. Dis. 156:175–182.
47. Li, W.-H., and D. Graur. 1991. Fundamentals of Molecular Evolution. Sinauer Associates, Inc., Sunderland, Mass.
48. Lim, D., and W. K. Maas. 1989. Reverse transcriptase-dependent synthesis of a covalently linked, branched DNA-RNA compound in E. coli B. Cell 56:891–904.
49. Maruyama, T., and M. Kimura. 1980. Genetic variability and effective population size when local extinction and recolonization of subpopulations are frequent. Proc. Natl. Acad. Sci. USA 77:6710–6714.
50. Maynard Smith, J. 1990. The evolution of prokaryotes: does sex matter? Annu. Rev. Ecol. Syst. 21:1–12.
51. Maynard Smith, J. 1991. The population genetics of bacteria. Proc. R. Soc. Lond. B 245:37–41.
52. Maynard Smith, J., N. H. Smith, M. O’Rourke, and B. G. Spratt. 1993. How clonal are bacteria? Proc. Natl. Acad. Sci. USA 90:4384–4388.
53. Mazodier, P., and J. Davies. 1991. Gene transfer between distantly related bacteria. Annu. Rev. Genet. 25:147–171.
54. McLellan, T. 1984. Molecular charge and electrophoretic mobility in cetecean myoglobins of known sequence. Biochem. Genet. 22:181–200.
55. Milkman, R. 1973. Electrophoretic variation in Escherichia coli from natural sources. Science 182:1024–1026.
56. Milkman, R., and M. M. Bridges. 1990. Molecular evolution of the Escherichia coli chromosome. III. Clonal frames. Genetics 126:505–517.
57. Milkman, R., and M. M. Bridges. 1993. Molecular evolution of the Escherichia coli chromosome. IV. Sequence comparisons. Genetics 133:455–468.
58. Milkman, R., and I. P. Crawford. 1983. Clustered third-base substitutions among wild strains of Escherichia coli. Science 221:378–380.
59. Milkman, R., and A. Stoltzfus. 1988. Molecular evolution of the Escherichia coli chromosome. II. Clonal segments. Genetics 120:359–366.
60. Miller, R. D., and D. L. Hartl. 1986. Biotyping confirms a nearly clonal population structure in Escherichia coli. Evolution 40:1–12.
61. Nataro, J. P., I. C. A. Scaletsky, J. B. Kaper, M. M. Levine, and L. R. Trabulsi. 1985. Plasmid-mediated factors conferring diffuse and localized adherence of enteropathogenic Escherichia coli. Infect. Immun. 48:378–383.
62. Nei, M. 1987. Molecular Evolutionary Genetics. Columbia University Press, New York.
63. Nelson, K., and R. K. Selander. 1992. Evolutionary genetics of the proline permease gene (putP) and the control region of the proline utilization operon in populations of Salmonella and Escherichia coli. J. Bacteriol. 174:6886–6895.
64. Nelson, K., and R. K. Selander. 1994. Intergeneric transfer and recombination of the 6-phosphogluconate dehydrogenase gene (gnd) in enteric bacteria. Proc. Natl. Acad. Sci. USA 91:10227–10231.
65. Nelson, K., T. S. Whittam, and R. K. Selander. 1991. Nucleotide polymorphism and evolution in the glyceraldehyde-3-phosphate dehydrogenase gene (gapA) in natural populations of Salmonella and Escherichia coli. Proc. Natl. Acad. Sci. USA 88:6667–6671.
66. Newland, J. W., N. A. Strockbine, S. F. Miller, A. D. O’Brien, and R. K. Holmes. 1985. Cloning of Shiga-like toxin structural genes from a toxin converting phage of Escherichia coli. Science 230:179–181.
67. O’Brien, A. D., and R. K. Holmes. 1987. Shiga and Shiga-like toxins. Microbiol. Rev. 51:206–220.
68. O’Brien, A. D., A. R. Melton, C. K. Schmitt, M. L. McKee, M. L. Batts, and D. E. Griffin. 1993. Profile of Escherichia coli O157:H7 pathogen responsible for hamburger-borne outbreak of hemorrhagic colitis and hemolytic uremic syndrome in Washington. J. Clin. Microbiol. 31:2799–2801.
69. O’Brien, A. D., J. W. Newland, S. F. Miller, R. K. Holmes, H. W. Smith, and S. B. Formal. 1984. Shiga-like toxin-converting phages from Escherichia coli strains that cause hemorrhagic colitis or infantile diarrhea. Science 226:694–696.
70. Ochman, H., and R. K. Selander. 1984. Evidence for clonal population structure in Escherichia coli. Proc. Natl. Acad. Sci. USA 81:198–201.
71. Ochman, H., and R. K. Selander. 1984. Standard reference strains of Escherichia coli from natural populations. J. Bacteriol. 157:690–693.
72. Ochman, H., T. S. Whittam, D. A. Caugant, and R. K. Selander. 1983. Enzyme polymorphism and genetic population structure in Escherichia coli and Shigella. J. Gen. Microbiol. 129:2715–2726.
73. Ochman, H., and A. C. Wilson. 1987. Evolution in bacteria: evidence for a universal substitution rate in cellular genomes. J. Mol. Evol. 26:74–86.
74. Ørskov, F., and I. Ørskov. 1983. The clone concept in epidemiology, taxonomy, and evolution of the Enterobacteriaceae and other bacteria. J. Infect. Dis. 148:346–357.
75. Ørskov, F., and I. Ørskov. 1992. Escherichia coli serotyping and disease in man and animals. Can. J. Microbiol. 38:699–704.
76. Ørskov, F., I. Ørskov, D. J. Evans, Jr., R. B. Sack, D. A. Sack, and T. Wadstrom. 1976. Special Escherichia coli serotypes among enterotoxigenic strains from diarrhoea in adults and children. Med. Microbiol. Immunol. 162:73–80.
77. Ørskov, F., T. S. Whittam, A. Cravioto, and I. Ørskov. 1990. Clonal relationships among classic enteropathogenic Escherichia coli (EPEC) belonging to different O groups. J. Infect. Dis. 162:76–81.
78. Ørskov, I., F. Ørskov, B. Jann, and K. Jann. 1977. Serology, chemistry, and genetics of O and K antigens of Escherichia coli. Bacteriol. Rev. 41:667–710.
79. Ramshaw, J. A. M., J. A. Coyne, and R. C. Lewontin. 1979. The sensitivity of gel electrophoresis as a detector of genetic variation. Genetics 93:1019–1037.
80. Reeves, P. R. 1992. Variation in O-antigens, niche-specific selection and bacterial populations. FEMS Microbiol. Lett. 100:509–516.
81. Riley, L. W., R. S. Remis, S. D. Helgerson, H. B. McGee, J. G. Wells, B. R. Davis, R. J. Hebert, E. S. Olcott, L. M. Johnson, N. T. Hargrett, P. A. Blake, and M. L. Cohen. 1983. Hemorrhagic colitis associated with a rare Escherichia coli serotype. N. Engl. J. Med. 308:681–685.
82. Sawyer, S. 1989. Statistical tests for detecting gene conversion. Mol. Biol. Evol. 6:526–538.
83. Schmidt, H., M. Montag, J. Bockemühl, J. Heesemann, and H. Karch. 1993. Shiga-like toxin II-related cytotoxoins in Citrobacter freundii strains from humans and beef samples. Infect. Immun. 61:534–543.
84. Schwesinger, M. K. 1977. Additive recombination in bacteria. Bacteriol. Rev. 41:872–902.
85. Selander, R. K., D. A. Caugant, H. Ochman, J. M. Musser, M. H. Gilmour, and T. S. Whittam. 1986. Methods of multilocus enzyme electrophoresis for bacterial population genetics and systematics. Appl. Environ. Microbiol. 51:873–884.
86. Selander, R. K., D. A. Caugant, and T. S. Whittam. 1987. Genetic structure and variation in natural populations of Escherichia coli, p. 1625–1648. In F. C. Neidhardt, J. L. Ingraham, K. B. Low, B. Magasanik, M. Schaechter, and H. E. Umbarger (ed.), Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology, vol. 2. American Society for Microbiology, Washington, D.C.
87. Selander, R. K., and B. R. Levin. 1980. Genetic diversity and structure in Escherichia coli populations. Science 210:545–547.
88. Selander, R. K., J. Li, E. F. Boyd, F.-S. Wang, and K. Nelson. 1994. DNA sequence analysis of the genetic structure of populations of Salmonella enterica and Escherichia coli, p. 17–49. In F. G. Priest, A. Ramos-Cormenzana, and R. Tindall (ed.), Bacterial Systematics and Diversity. Plenum, New York.
89. Selander, R. K., and J. M. Musser. 1990. Population genetics of bacterial pathogenesis, p. 11–36. In B. H. Iglewski and V. L. Clark, (ed.), Molecular Basis of Bacterial Pathogenesis. Academic Press, Inc., San Diego, Calif.
90. Sharp, P. M. 1991. Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J. Mol. Evol. 33:23–33.
91. Sharp, P. M., and W.-H. Li. 1987. The codon adaptation index—a measure of the directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15:1281–1295.
92. Sharp, P. M., D. Shields, K. H. Wolfe, and W.-H. Li. 1989. Chromosomal location and evolutionary rate variation in Enterobacterial genes. Science 246:808–810.
93. Smith, H. R., and S. M. Scotland. 1988. Vero cytotoxin-producing strains of Escherichia coli. J. Med. Microbiol. 26:77–85.
94. Stoltzfus, A., J. F. Leslie, and R. Milkman. 1988. Molecular evolution of the Escherichia coli chromosome. I. Analysis of structure and natural variation in a previously uncharacterized region between trp and tonB. Genetics 120:345–358.
95. Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595.
96. Tzipori, S., H. Karch, K. I. Wachsmuth, R. M. Robins-Browne, A. D. O’Brien, H. Lior, M. L. Cohen, J. Smithers, and M. M. Levine. 1987. Role of a 60-megadalton plasmid and Shiga-like toxins in the pathogenesis of infection caused by enterohemorrhagic Escherichia coli O157:H7 in gnotobiotic piglets. Infect. Immun. 55:3117–3125.
97. Tzipori, S., I. K. Wachsmuth, C. Chapman, R. Birner, J. Brittingham, C. Jackson, and J. Hogg. 1986. The pathogenesis of hemorrhagic colitis caused by Escherichia coli O157:H7 in gnotobiotic piglets. J. Infect. Dis. 154:712–716.
98. White, D. G., R. A. Wilson, A. S. Gabriel, M. Saco, and T. S. Whittam. 1990. Genetic relationships among strains of avian Escherichia coli associated with swollen-head syndrome. Infect. Immun. 58:3613–3620.
99. Whittam, T. S. 1989. Clonal dynamics of Escherichia coli in its natural habitat. Antonie Leeuwenhoek 55:23–32.
100. Whittam, T. S. 1992. Sex in the soil. Curr. Biol. 2:676–678.
101. Whittam, T. S., and S. E. Ake. 1993. Genetic polymorphisms and recombination in natural populations of Escherichia coli, p. 223–245. In N. Takahata and A. G. Clark (ed.), Mechanisms of Molecular Evolution. Sinauer Associates, Inc., Sunderland, Mass.
102. Whittam, T. S., H. Ochman, and R. K. Selander. 1983. Geographic components of linkage disequilibrium in natural populations of Escherichia coli. Mol. Biol. Evol. 1:67–83.
103. Whittam, T. S., H. Ochman, and R. K. Selander. 1983. Multilocus genetic structure in natural populations of Escherichia coli. Proc. Natl. Acad. Sci. USA 80:1751–1755.
104. Whittam, T. S., I. K. Wachsmuth, and R. A. Wilson. 1988. Genetic evidence of clonal descent of Escherichia coli O157:H7 associated with hemorrhagic colitis and hemolytic uremic syndrome. J. Infect. Dis. 157:1124–1133.
105. Whittam, T. S., and R. A. Wilson. 1988. Genetic relationships among pathogenic Escherichia coli of serogroup O157. Infect. Immun. 56:2467–2473.
106. Whittam, T. S., and R. A. Wilson. 1988. Genetic relationships among pathogenic strains of avian Escherichia coli. Infect. Immun. 56:2458–2466.
107. Whittam, T. S., M. L. Wolfe, I. K. Wachsmuth, F. Ørskov, I. Ørskov, and R. A. Wilson. 1993. Clonal relationships among Escherichia coli strains that cause hemorrhagic colitis and infantile diarrhea. Infect. Immun. 61:1619–1629.
108. Whittam, T. S., M. L. Wolfe, and R. A. Wilson. 1989. Genetic relationships among Escherichia coli isolates causing urinary tract infections in humans and animals. Epidemiol. Infect. 102:37–46.
109. Woodward, J. M., I. D. Connaugton, V. A. Fahy, A. J. Lymbery, and D. J. Hampton. 1993. Clonal analysis of Escherichia coli of serogroups O9, O20, and O101 isolated from Australian pigs with neonatal diarrhea. J. Clin. Microbiol. 31:1185–1188.
110. Young, J. P. W. 1989. The population genetics of bacteria, p. 417–438. In D. A. Hopwood and K. F. Chater (ed.), Genetics of Bacterial Diversity. Academic Press, New York.
111. Yu, J., and J. B. Kaper. 1992. Cloning and characterization of the eae gene of enterohaemorrhagic Escherichia coli O157:H7. Mol. Microbiol. 6:411–417.