Phylogenetics and the Amelioration of Bacterial Genomes
Chapter
142
HOWARD OCHMAN and JEFFREY G. LAWRENCE
Compared to mammals—or even most other metazoans—bacteria appear to have relatively few characteristics that might serve as a basis for their classification. Bacteriologists have traditionally relied upon morphological features supplemented by a suite of metabolic properties to differentiate among isolates of bacteria and to classify the isolates into species and higher taxonomic units (43, 89). But to what extent does this classification scheme reflect the actual evolutionary relationships among bacteria? For species constituting the Enterobacteriaceae, the agreement is reasonably good; however, this conclusion is based on some prior notion about the true phylogeny of these bacterial lineages.
While microbiologists have long been aware of the problems associated with establishing the relationships among bacteria (100), many of these uncertainties have been removed with comparisons of macromolecular sequences, particularly of rRNAs (69, 79, 108). These sequences have allowed both the identification of closely related species and the hierarchical grouping of species in a manner reflecting the evolutionary histories of the organisms. Our faith in phylogenies based on biological sequences is founded on considerations of how evolution occurs: cells acquire the majority of their genetic material from their direct ancestors (vertical transfer), and these heritable nucleic acids accumulate changes over time. Therefore, the degree of sequence similarity among bacterial lineages reflects the extent of their recent common ancestry.
Aside from establishing relationships among organisms, phylogenetic trees can provide a framework for examining the distribution, patterns of change, and relative timing of events in bacterial evolution. In this chapter, we examine several aspects of genome organization and evolution in bacteria from a phylogenetic perspective. The small size and relative paucity of noncoding DNA in bacterial chromosomes allow investigations into the structure of genes within operons, the physical relationships among operons within chromosomes, and the presence and distribution of genetic elements among chromosomes at levels not attainable in most eukaryotic species. Underlying each of these studies is the need to ascertain the evolutionary relationships of the genes, or organisms, possessing each trait.
Direct use of molecular sequences, i.e., changes in the primary structure of DNA, RNA, or proteins, can provide the information required to quantify the evolutionary relationships among bacterial genomes and their components. As summarized by Olsen and Woese (68), genotypic data hold certain advantages over phenotypic characters for phylogenetic reconstruction in that (i) variation in molecular sequences is discrete and well defined; (ii) each sequence comprises hundreds, if not thousands, of characters bearing potentially useful information; (iii) sequences accumulate substitutions rapidly and consistently, even when the phenotype does not change; and (iv) the multifactorial nature of many phenotypic traits renders these traits misleading and inadequate for phylogenetic inference.
Molecular analyses cannot completely remedy the problems encountered in reconstructing the phylogenetic relationships among taxa. While one usually assumes that the majority of the information accurately recounts evolutionary processes, some portion of the molecular data collected is contradictory because, in part, of the stochastic accumulation of molecular differences. DNA molecules bear only four character states, and identical bases therefore need not reflect identity by descent. As more distantly related taxa are included for analysis, the extent to which taxa converge on similar independently derived states ("homoplasy") increases and produces a background of conflicting data. Since the analysis of small regions of DNA can yield so much information, phylogenies are routinely constructed with data collected from only one segment of the genome, which may or may not be representative of the genome as a whole. However, once the "true" evolutionary framework has been revealed, further examination can show which incongruencies have arisen from interesting biological phenomena. For this reason, it becomes essential to infer robust, statistically confident phylogenies that allow one to determine the plausibility of each incongruency in the data set (22, 38). To minimize these problems, the molecular phylogenetic approach can be reduced to a few steps.
(i) Select a molecule to analyze. Early approaches applying molecular data to resolve relationships among bacterial taxa included the use of DNA hybridization (5). With this approach, the differences between entire genomes reduce to a measure of relatedness that is based on the melting temperature of interspecific heteroduplex molecules. These methods proved powerful in discriminating among groups of bacterial species but lacked the fine-scale resolution needed to discern interspecific relationships. More detailed data were gathered later as catalogs of partial sequences of nuclease-digested rRNAs (110). The degree of congruence among these oligonucleotide catalogs equalled the similarity of the genomes, and because of the presence of rRNA homologs in all life-forms and the slow rate of rRNA evolution, these measures provided a context for accurately assessing the relationships among very disparate classes of organisms. Recently, direct RNA sequencing has structured the previously chaotic world of bacterial systematics (106, 109).
While slowly evolving rRNAs can be used to resolve relationships among distantly related taxa, these molecules prove inefficient in establishing relationships among more closely related taxa because of the lack of informative characters. Protein-coding sequences comprise two major classes of sites (synonymous and nonsynonymous) that accumulate substitutions at different rates and are useful in determining the relationships among certain closely related organisms (52). Since substitutions at synonymous sites do not alter the amino acid composition of the protein, they are usually considered to be relatively unaffected by natural selection (i.e., they are neutral) and are thought to accumulate substitutions at frequencies approaching the mutation rate (47, 48). In comparisons between homologous genes from Escherichia coli and Salmonella typhimurium (official designation, Salmonella enterica serovar Typhimurium), the frequency of synonymous (or silent) substitutions was approximately 20 times that of nonsynonymous (or replacement) substitutions (65). Therefore, since genomes comprise elements that evolve at a variety of rates, the scope of the investigation determines the molecule of choice.
(ii) Align molecular sequences. The reliability of a phylogenetic analysis rests on the assignment of homologous characters; that is, the nucleotide positions being compared need to be related by ancestry. In practice, this means that sites in molecules from different taxa must be assigned as though they were derived from the same ancestral molecule. When few differences are observed between taxa, the task is not difficult, but few data are available. On the other hand, any doubts in the assignment of ancestry lead to uncertainties about the conclusions of the analysis. Such uncertainties can arise when large differences in sequences, as well as insertion or deletion events, provide more than one way of aligning nucleotide sequences. Since the alignment that yields the greatest degree of correspondence among sequences may not accurately represent homologous sites, it is necessary to cull any suspect data (23, 53).
(iii) Infer phylogenetic relationships. Once homologous sites have been identified, mathematical models are applied to translate differences in nucleotide sequences into plausible scenarios of evolutionary change. The results are typically depicted as a "tree" representing the degree of divergence of extant taxa from reconstructed primordial taxa over time. To the nonpractitioner, this aspect of phylogenetic analysis might seem to resemble religion more than science owing to the zeal displayed by advocates of each approach to extracting evolutionary relationships. The results of the analyses, when different, typically hinge on the weighting of certain characteristics and the resolution of conflicting data (homoplasies) on the basis of differing evolutionary models. Frequently employed methods available include maximum parsimony, which examines characteristics that are uniquely shared between taxa and is often used for deriving relationships when differences are few and typically irreversible (36, 90); evolutionary parsimony, which is based on analyses of transversion mutations to discriminate among three possible relationships for four taxa and is often used for elucidating relationships among distantly related taxa (50); phenetic-distance methods (such as neighbor joining), which rely on an assessment of the extent of difference between taxa (17, 20, 26, 76); and maximum likelihood methods, which attempt to infer relationships likely to have produced the data, in contrast to the above approaches, which attempt to deduce phylogenies from the data (18, 28). All of these methods have received detailed treatment elsewhere (61, 97), and the method of choice often depends on the type of data to be analyzed.
(iv) Assess the reliability of the phylogeny. Following the deduction of phylogenetic relationships among taxa, it is critical to assess the reliability, or statistical confidence, of the data. The use of phylogenies to identify biologically interesting differences from the evolutionary histories of molecules rests entirely on the robustness of the inferred relationships. If the inferred relationships are not reliable, inconsistencies in those relationships cannot be identified with any confidence. Several methods for assessing the reliability of phylogenetic trees have been described elsewhere (19, 71). The most frequently used method of examining the integrity of a phylogenetic tree is the "bootstrap" approach (15, 21), which systematically infers new phylogenetic relationships by resampling the data set and compares these new relationships with the phylogeny inferred from the original data set.
(v) Repeat until completed. Even if these procedures are followed, it is possible that the relationships among bacterial taxa will be misleading. In short, we have determined the relationships among a group of molecules making up a small fraction of the organism’s genome, and these segments might not be representative of the genome as a whole. This possibility necessitates the analysis of molecules from distinct regions of the genome, and as we amass congruent phylogenies, our confidence that inferred relationships reflect the evolutionary history of the genome as a whole increases (14, 52). While this is, admittedly, a substantial undertaking, such relationships have been determined for enteric bacteria and are accumulating for other taxonomic groups. While gene exchange and recombination among lineages would act to homogenize bacterial genomes, phylogenies based on several genes sequenced within and among enteric species are, with few exceptions, congruent. Such consistent branching orders indicate that most sequences are not subject to horizontal transfer and can be used to ascertain the relationships among enteric taxa.
While there are distinct advantages to assessing evolutionary relatedness by nucleotide sequence comparisons, the initial characterizations of nucleic acids in bacterial genomes were limited, for methodological reasons, to determining the overall base composition (GC content) of each species (56). As reviewed by Sueoka (91, 92, 93, 94, 95), these analyses of base composition within and among species resulted in several insights into the attributes, organization, and evolution of bacterial chromosomes. Moreover, the recent accumulation of DNA sequence information has substantially confirmed these early observations and allowed further scrutiny of the particular nature of base compositional changes within and among genomes. In this regard, four features of the DNA base composition of bacteria should be noted.
(i) Base composition varies among species. Surveys of GC contents in a large number of genomes indicate a wide variation in base composition among bacterial species. GC contents as determined for several hundred species (62) range from 25% in Mycoplasma spp. to 75% in Micrococcus spp., which is a much broader range than that observed in animals and plants (91). This range is due in part to the longer evolutionary history of bacterial lineages and to the large size of eukaryotic genomes, which may bear portions with distinct GC contents. The base compositions of bacterial genomes have been ascertained by a variety of methods, including chromatographic separation of nucleotides (2, 49, 98), thermal denaturation temperature (56), buoyant density in CsCl gradients (75, 78, 96), and chromogenic staining (54). The various procedures yield similar estimates of genomic base composition, and, as expected, there is relatively little variation in GC content among strains within a species (on the order of 1 to 2% and probably attributable to experimental error) (16, 78). The differences in base composition among bacterial species are largely due to mutational biases at each of the four bases, termed "directional mutation pressure" by Sueoka (92), which vary over species.
(ii) Base composition is related to phylogeny; i.e., GC contents of the chromosomes of closely related organisms tend to be similar. By examining the base compositions of bacteria taxa in the context of their genealogical relationships, we can observe that the phylogenetic tree is organized into large groups of lineages of fairly similar nucleotide contents (Fig. 1), which indicates that base composition is stable over long evolutionary periods. While it is possible for very divergent taxa to have the same base composition (E. coli and Corynebacterium diphtheriae are both 51% GC), it is usually phylogenetically related species whose GC contents are alike. For example, a group of related gram-positive species, including Bacillus, Staphylococcus, and Lactobacillus spp., has low GC contents, and an array of gram-positive taxa, such as Micrococcus and Streptomyces spp., have high GC contents. With few exceptions, such as Serratia marcescens (58% GC) and Proteus vulgaris (37% GC), most enteric species have nucleotide contents ranging from 50 to 55% GC.
(iii) Base composition is relatively uniform over the entire bacterial chromosome. Unlike the genomes of warm-blooded vertebrates, which are partitioned into large regions with distinct base compositions (3), bacterial genomes show relatively little spatial heterogeneity in GC content within themselves (Fig. 2). The majority of S. typhimurium coding sequences have GC contents between 48 and 58%. While mammalian genomes harbor very large regions (on the order of 1 Mb) that differ in their GC contents, small segments (<5 kb) of various GC contents are dispersed in bacterial genomes (4, 25). For example, the Salmonella hsdM gene (61% GC) is adjacent to the hsdS gene (41% GC).
The relative homogeneity of bacterial chromosomes was known in the 1950s, before sequence information became available (75), and has since been corroborated by analyses of the nucleotide sequences of individual genes (32, 107) and by long-range sequencing of bacterial genomes (8, 12, 41, 113). GC content was originally determined by the buoyant density of fractionated genomes in cesium gradients, and each species tested had a narrow, unimodal banding profile denoting homogeneity in the base composition of DNA fragments. In contrast to the wide variation in base composition across bacterial taxa, the vast majority of genes within a given species have the same base composition as the genome as a whole.
(iv) Within each species, codon positions 1, 2, and 3 have characteristic base compositions. The differences in genomic GC contents arise from biases in the mutation rates among the four bases and are apparent in the base composition at the each codon position of sequenced genes (44, 60). While there is a positive linear relationship between the genomic base composition and the GC content for each codon position, the effect is most extreme at codon position 3, where the majority of changes are synonymous substitutions (Fig. 3). For example, the GC contents at codon position 3 of sequenced genes are 10% in Mycoplasma spp., 58% in E. coli, and 95% in Streptomyces spp., whose chromosomal base compositions are 25, 51, and 73% GC, respectively. Since mutations at codon position 2 always cause an amino acid replacement and are therefore under stronger selective constraints than position 3, there are no very pronounced differences in base composition even among species with very different GC contents: species varying by as much as 50% in genomic GC contents differ by no more than 15% GC at position 2 of codons. As a consequence of these mutation biases, each codon position of genes has a characteristic GC content in a given species.
As with codon position 3, the base compositions of nontranscribed regions, which are essentially free of selective constraints, are thought to largely reflect the species-specific mutation biases (3, 44, 70, 84, 104); however, within some species, the GC contents of noncoding spacer sequences differ from that observed at position 3 (66, 80). In that mutational biases are evident in the base composition of codons, particularly at position 3, they are expected to have significant consequences on the choice among alternative synonymous codons (7, 83, 84). Patterns of codon usage are highly nonrandom and, like overall genomic base composition, vary among species (81, 105). Also, although codon preferences are not wholly determined by mutational biases (45, 66, 85), the observed differences in codon usage patterns among species are largely determined by directional mutation pressures.
As a consequence of the patterns observed in bacterial genes and genomes, regions of anomalous base composition or codon usage have often been cited as evidence of horizontal transfer in bacteria. This conclusion rests on phylogenetic considerations: since base composition is relatively uniform over the entire genome and is conserved within and among related lineages, a gene having atypical features is likely to have been acquired from a distantly related organism (Table 1). This rationale has been applied to infer the ancestry of several chromosomally encoded genes in enteric bacteria, including the following: phoN (32), sinR (33), nanH (39), the rfb cluster (103, 111), and the spa-inv complex (29, 31) in salmonellae, and argF (101), ermBC (6), lacAY (9), yibB (87), the hly genes (24), multicopy single-stranded DNA (37, 51), and certain virulence-associated loci (55) within E. coli. After examining the GC contents and codon usage patterns of 500 genes from E. coli, Whittam and Ake (107) estimated that some 6% of the sequences had atypical characteristics and were presumably introduced by horizontal transfer. This percentage may possibly be an underestimation of the extent of gene acquisition by lateral processes, because sequences are apt to be transferred from organisms whose base compositions are similar to that of E. coli. On the basis of codon usage, Médigue et al. (58) partitioned the genes of E. coli into three classes and hypothesized that one class, making up 16% of the sequenced genes, arose through horizontal transfer.
Table 1Sequenced genes confined to the chromosome of S. typhimurium |
Supplemental evidence indicates that segments of the genome with atypical base compositions have been acquired through horizontal processes. In some cases, a region is associated with translocatable elements that are known to promote gene exchange. For example, adjacent to the phoN gene of S. typhimurium is a short sequence exhibiting a high degree of similarity to the leading end of the transfer origins of IncFII plasmids (32). In E. coli, the argF gene and the region containing the lac operon are each flanked by insertion sequences (9, 40, 112), and pap and prs genes encoding pilus-associated adhesions are associated with transposable elements that are responsible for the dissemination of these genes (55).
Lateral transfer has been invoked in cases in which an unprecedented degree of similarity appears among genes from very divergent organisms, which are occasionally from different kingdoms (35, 57). Although rarely noted, such conclusions are grounded in phylogenetic principles: extreme similarity among genes from unrelated taxa imparts an inconsistency to the established phylogeny (88). Several of the genes listed above are known to have very limited phylogenetic distributions, and when they were tested by low-stringency hybridizations against a panel of DNAs from representative species, many of the regions, such as the phoN, lac, inv-spa, sinR, and hly genes, were restricted to one or a few lineages. Alternatively, these sequences may be polymorphic within certain species, or they may be present in additional lineages but remain undetected by DNA hybridizations because of a high level of sequence divergence. For example, Fitts (27) and Galan and Curtiss (30) isolated sequences that, according to Southern hybridizations, were originally considered to be unique to salmonellae; however, further analysis, applying less stringent experimental conditions, showed that many of these sequences were broadly distributed among enteric bacteria (31, 33). While the experimental procedures employed to detect homologous regions may not always yield the actual distribution of these regions among species, the observation that many of these genes are unique to a single lineage corroborates the use of base composition to infer ancestry.
The use of nucleotide contents or codon usage patterns to infer the ancestry of a region of the chromosome rests on two additional premises: that the gene has retained features of its original host chromosome and that "atypical" features are not attributable to either stochastic variation (46) or selective factors specific to a sequence. In practice, this means that genomes have characteristic nucleotide patterns reflecting species-specific mutational pressures, codon biases, and selective constraints and that constituent genes maintain these properties in the new environment. For example, the spa genes of S. typhimurium and Shigella flexneri present compositional motifs characteristic of certain AT-rich genomes (Fig. 3).
If base composition and codon usage patterns are principally caused by mutational biases, sequences introduced by horizontal transfer will incur substitutions and will eventually manifest features observed for other genes on the chromosome. This process of amelioration, whereby a sequence adjusts to the base composition and codon usage of the genome, is a function of the relative rate of A or T → G or C mutations and should be most evident at sites with few functional constraints, such as codon position 3 or noncoding DNA. Sueoka (92, 95) examined the consequences of this directional mutation pressure quantitatively, computing the number of generations necessary to shift a particular base composition to an equilibrium value, given certain frequencies of each type of mutation. To investigate how these estimates describe the effective rate of change in GC content in bacterial genomes requires information about both mutation rates and generation times (or the actual nucleotide substitution rate) in natural populations.
Despite the lack of a fossil record for bacteria, it is possible to obtain the dates of divergence among taxa and apply these divergence times to estimating rates of nucleotide substitutions. By relating certain branch points in the phylogenetic tree of bacteria to specific events in the geological past, Ochman and Wilson (64, 65) calibrated the rate of 16S rRNA evolution in bacteria and were able to determine divergence times between all pairs of species by interpolation. According to this approach, E. coli and S. typhimurium diverged from a common ancestor some 120 to 140 million years ago, a date that fits with evidence about the origins of the habitats now occupied by these enteric species. By applying this date, the substitution rate at synonymous sites was close to 1% per million years between these enteric species (corresponding to 0.5% per million years within a lineage), and the substitution rate at nonsynonymous sites was some 20 times slower.
Divergence times and subsequent estimates of substitution rates rest on assumptions that the rate of evolution in rRNAs is similar for different taxa and that species of bacteria diverge as new habitats or hosts become available. To test these assumptions, Moran et al. (59) compared the phylogenetic relationships, as determined by 16S rRNAs, among bacterial endosymbionts and their aphid hosts, whose divergence times could be inferred from fossils. Because the molecular phylogeny of the bacterial endosymbionts was consistent with that of their hosts, Moran et al. concluded that speciation probably occurs simultaneously in the bacteria and their hosts. This cospeciation allowed Moran et al. (59) to apply the dates of divergence determined for fossil aphids to calibrate the rate of 16S rRNA evolution in the bacterial symbionts. Their estimate of 0.01 to 0.02 substitution per site per 50 million years for 16S rRNA was very similar to that used by Ochman and Wilson (64, 65) to construct a timescale for bacterial evolution and corroborates dating of the E. coli-S. typhimurium divergence.
Applying a synonymous substitution rate per lineage of 0.5% per million years enables us to predict the rates and patterns of amelioration at sites having few or no functional constraints. Therefore, the rate of change at codon position 3 is expected to be much higher than the rate at position 1 or 2. If the value of 0.5% per site per million years is substituted into the equations of Sueoka (92, 95), then the GC content approaches 58% (the GC content at codon position 3 in E. coli and S. typhimurium) for regions with different initial base compositions (Fig. 4). Sequence amelioration is a relatively slow process, requiring as many as 400 million years for a region to attain the equilibrium base composition, depending on the GC content of the sequence at the time of introgression. Moreover, the nucleotide contents of codon position 1, 2, and 3 ameliorate at different rates, reflecting different selective constraints on their substitution rates (95). Although most bacterial genomes accommodate sequences that at some time in the past were acquired through horizontal transfer, it is usually not possible to observe the process of amelioration under normal circumstances. Cox and Yanofsky (11) analyzed the change in base composition in a strain of E. coli harboring a mutation in the mutT gene that increases the frequency of mutations by 3 orders of magnitude (99) and preferentially changes AT to GC base pairs. After an estimated 1,200 to 1,600 generations, the overall increase in genomic GC content was 0.4 to 0.7%, representing about seven directional substitutions per genome per generation.
To monitor the rates and patterns of compositional change in natural populations requires knowledge of both the GC content of a sequence at the time the sequence is introduced into the genome and the amount of time the sequence has been in the new genome. The age and ancestry of a sequence can often be traced by analyzing the occurrence of homologous regions among lineages with known divergence times; however, the original base composition of the sequence is generally difficult to determine. In several species of plants, genes that differ in base composition, dinucleotide frequencies, and codon usage patterns have been transferred from the chloroplast to the nuclear genome. In each case, the relocated sequences are more similar to the overall features of genes encoded by the nucleus (67); however, the antiquity of the transfer events makes it impossible to estimate an absolute rate of amelioration for these genes.
There is convincing evidence that an identical set of low-GC-content genes has been acquired independently by salmonellae and shigellae, providing the first opportunity to directly examine the rates at which base composition is adjusted over evolutionary time. The spa genes are required for bacterial entry into host intestinal cells by both Salmonella and Shigella spp., and the sizes, orientation, and degree of overlap of reading frames are nearly identical in the two species, indicating a common source (31, 102). Judging from the distribution of spa sequences among strains of E. coli, Shigella spp., and Salmonella spp., it is likely that these genes were incorporated into the Salmonella chromosome some 140 million years ago (corresponding to the divergence between Salmonella spp. and E. coli), while shigellae acquired a large plasmid carrying the spa genes (and other virulence attributes) about 20 million years ago (63, 65). The base composition of the spa region in Salmonella spp. is 46% GC (compared to 52% GC estimated for the entire chromosome) but that for the homologous region from Shigella spp. is only 34% GC (Table 2).
Table 2Compositional structure of spa operons of S. typhimurium and Shigella flexneri |
To a first approximation, the relatively recent acquisition of the spa genes by Shigella spp. implies that the 34% GC content is similar to the original base composition of the region and that the increase in GC content for the homologous region in Salmonella spp. reflects the process of amelioration during residence in this genome. We can also compare the shift in base composition of the spa genes to that predicted by the models presented above. Examining the overall base composition of the region reveals an increase in GC content of 12% over some 100 million years; however, the most pronounced effect is observed at the third positions of the codons, which are largely synonymous sites (Table 2). Within each species, there is very little variation among genes in GC content at each codon position, as expected if substitutions are neutral, and in Salmonella spp., the GC content of the spa complex approaches that of typical genes (Fig. 3). Given that the average GC content at position 3 in E. coli and S. typhimurium is 58%, the observed rate of increase at this site agrees with the expectation that all changes are due to directional mutation pressure. The fit of these values to the predicted curve substantiates the use of this timescale and substitution rate for analyzing bacterial evolution and provides further evidence that the spa genes of Shigella and Salmonella spp. were obtained from a common source (Fig. 4).
Although GC content is fairly uniform over the entire bacterial chromosome, some regions are expected to differ from average base composition of the genome. Several factors other than introgression might cause deviations from the characteristic base composition of a genome, as exemplified by the long AT-rich tail in the distribution shown in Fig. 2. Because most of the bacterial chromosome consists of protein-coding regions, constraints on amino acid composition and codon usage patterns serve as a source of variation in the base composition observed among genes. Noncoding regions may contain repetitive elements or certain regulatory sequences whose base compositions are not representative of the chromosome as a whole (10). Finally, some degree of variation is probably attributable to stochastic factors encountered when small portions of a broadly homogeneous region are sampled.
There is ample evidence that bacterial chromosomes are chimeric, i.e., that they contain sequences inherited by both vertical and horizontal processes, and that regions gained from extraneous sources will often display anomalous structural characteristics. Given sufficient time, however, the base compositions of regions gained through horizontal transfer should adjust to that of the resident genome, while more recent acquisitions will retain some features of their original host genome. How is it possible to distinguish regions acquired through horizontal transfer from those that vary because of selection or random processes? To answer this question, we need to examine the base composition, codon usage patterns, and phylogenetic distribution of an array of genes from a single species.
In this regard, S. typhimurium, with its large number of genes of known sequence and map locations, serves as the ideal model organism. However, the principal advantage for examining genes from S. typhimurium is the availability of corresponding information from E. coli. The most comprehensive data pertaining to the structure, organization, and patterns of change in bacterial genes and genomes have come from comparative analyses of these two enteric species. Their chromosomes are roughly the same size and base composition, and when their genetic maps are compared, the order, orientation, and spacing of mapped loci are highly conserved (73, 74). Therefore, for the vast majority of genes from S. typhimurium, it is possible to examine the compositional structure, codon bias, map position, and coincidence of adjacent loci in a homologous region from E. coli. With the munificence of sequence data available for E. coli, it is likely that the homologs for most Salmonella genes, if present, have already been identified in E. coli.
S. typhimurium chromosomal gene sequences were extracted from the GenBank (version 82) sequence database and culled of short, incomplete, and duplicated sequences to yield 421 independent genes, most of them >150 bp long (Table 3). Genes were analyzed for (i) GC content; (ii) GC content at each codon position; (iii) codon adaptation index (CAI), which measures the relative degree to which codon usage is biased toward the set preferred by very highly expressed genes in E. coli (82), and (iv) deviation from uniform codon usage by the χ 2 statistic(86). The overall base composition of the Salmonella genes as estimated from these sequences is 52.1% ± 5.8% GC.
Table 3S. typhimurium genes analyzed |
For a majority of reading frames, codon position 2 was more AT-rich than either position 1 or 2, as anticipated for protein-coding regions from Salmonella spp., and the CAI and χ 2 values were larger than those of the +1 and +2 reading frames. However, several genes within this set have atypical GC contents (Fig. 2) and CAIs (Fig. 5). Genes with a high CAI, mainly highly expressed housekeeping genes, have GC contents between 50 and 57%, and sequences known to be present in both E. coli and S. typhimurium, e.g., the his, trp, pur, and nad genes, have moderate to high CAIs (0.25 to 0.8) and typical nucleotide contents. The anomalous base compositions and CAIs of some genes could result from stochastic or selective factors; however, many of the atypical genes are confined to Salmonella spp. (Table 1), which suggests that these genes arose through horizontal transfer.
The relationship between the GC content at codon position 3 and the total nucleotide content of a gene provides some indication of the source of the observed diversity in base composition (Fig. 6). If the variation in base composition of a gene reflects differences in amino acid composition or stochastic processes, the GC content of nucleotide position 3, primarily silent sites, would be relatively unaffected; rather, it would vary in a manner independent of the GC content at codon positions 1 and 2. However, as displayed in Fig. 6, there is a strong correlation between genic nucleotide content and position 3 nucleotide content (r 2 = 0.82, P < 0.0001), and variations at codon positions 1 and 2 are similarly correlated with genic GC content (r 2 = 0.62 and 0.35, respectively). These patterns are the same as those observed for the variation in base composition among taxa (Fig. 3), such that species with different GC contents have characteristic base compositions at codon positions 1, 2, and 3 (60), and is consistent with the introgression of these sequences into the Salmonella genome. Since variation at codon position 3 is correlated with that at positions 1 and 2 (r 2 = 0.50, P < 0.001), it is unlikely that much of the variation in base composition arose through stochastic or selective processes acting within the Salmonella genome.
Not surprisingly, genes of low GC content have low CAIs (Fig. 7) and are atypical for genes native to the Salmonella genome. Also, while a low GC content is generally a reliable indicator of genes acquired through horizontal transfer, this feature would not reveal those genes transferred from organisms bearing GC contents close to the Salmonella average. However, close examination of codon usage patterns can often reveal whether a gene was acquired through horizontal transfer, even if its base composition resembles that of the entire chromosome. When codon usage deviates from uniformity, i.e., when codons for each amino acid are not utilized at equal frequencies, the χ 2 value for codon usage increases, and if the deviation is toward the preferred set of codons used by E. coli and Salmonella spp. (81, 82), the CAI also increases (Fig. 7). However, genes introduced by horizontal transfer are not necessarily biased toward the set of codons preferred by Salmonella spp., and these genes would be expected to have a high χ 2 value but a low CAI.
Applying these criteria, i.e., a high χ 2 coupled with a low CAI, to the listing of Salmonella sequences uncovered several genes that have apparently been introduced into the Salmonella genome by horizontal transfer (Fig. 7). For some genes, such as those making up the spa region, there is already compelling evidence of introgression; however, this approach can also be used to substantiate the origin of other sequences. For example, the open reading frame (ORF) downstream of the miaE gene is present in Salmonella spp. but not in E. coli (72), perhaps because of its introduction into the Salmonella genome or because of deletion of the corresponding region from E. coli. In this case, its low GC content (43%) and abnormally high χ 2 value (1.03) for its CAI (0.25) support the hypothesis that this gene was acquired by the Salmonella genome rather than lost from E. coli. In contrast, S. typhimurium lacks an acid phosphatase, PhoA, that is present in E. coli and other enteric species (13). The phoA gene has high χ 2 and CAI values, as expected for a region ancestral to enteric bacteria, which further argues that this phosphatase gene was deleted after the divergence of Salmonella spp.
Aside from genes with restricted phylogenetic distributions, some sequences with aberrant characteristics are common to both E. coli and S. typhimurium. Even if a gene had been introduced just prior to the E. coli-Salmonella divergence, one would anticipate some degree of amelioration of the sequence, particularly at codon position 3, over the past 140 million years. However, it appears that the uhp sugar phosphate transport genes, with a GC content of 72% at codon position 3 (42), were acquired from a very divergent GC-rich organism and have not yet adjusted to characteristics of the resident genome. In some cases, where an atypical sequence is present in both E. coli and Salmonella spp., selective constraints counteract the directional mutation pressures. For example, the rplL operon encoding the L7L12 ribosomal protein of S. typhimurium has an unusually low GC content of 29% at codon position 3, but as a structural component of the ribosomes, it is certainly ancestral to bacteria. The protein has a very atypical amino acid composition, with alanine, lysine, valine, and glutamic acid making up 71 of its 121 residues (114). For each of these amino acids, the preferred codon (as recognized in highly expressed genes in E. coli and S. typhimurium) contains either adenine or thymine at position 3, and functional constraints on the protein sequence combined with codon usage bias therefore produce a sequence with an uncharacteristic base composition.
When examining the base composition of genes from E. coli and S. typhimurium, several authors have noted that the distribution of GC contents, as shown in Fig. 2, is asymmetric, with a relatively large number of genes with low GC content. Since this AT-rich tail in the distribution largely reflects sequences obtained through horizontal transfer, it seems that E. coli and Salmonella spp. acquire sequences from low-GC rather than high-GC organisms. However, since enteric bacteria are likely to encounter and obtain genes from organisms of any base composition, there is probably a mechanistic rather than an ecological explanation for the paucity of genes with contents much above the average. There is evidence from E. coli that the extent of homologous exchange promoted by RecA decreases with GC content such that very GC-rich sequences will not be incorporated into the chromosome (34). By blocking the recombination of sequences on the basis of their GC contents, this mechanism, along with directional mutation pressure, serves to homogenize the base compositional structure of genomes.
According to these analyses, what portion of the Salmonella genome originated through horizontal transfer? To date, 10% of the sequenced genes from Salmonella spp. have features that depart from the prevalent characteristics of the genome. However, certain biases in the data set could lead to an over- or underestimation of the amount of introgressed DNA. For example, a large number of sequences from Salmonella spp. were originally recovered because of their overall similarity to known genes in E. coli, while several others were examined because they conferred novel characteristics, i.e., traits not observed in E. coli. However, the overall proportion of atypical sequences in Salmonella spp. is close to that estimated for E. coli on the basis of sequence characteristics (58, 107) and similar to the amount of DNA apparently unique to S. typhimurium (∼400 kb), as determined by aligning genetic maps (73).
This work was supported by postdoctoral fellowship GM-15868 to J.G.L. and grant GM-48407 to H.O. from the National Institutes of Health.
References
1. Bachmann, B. J. 1990. Linkage map of Escherichia coli K-12, edition 8. Microbiol. Rev. 54:130–197.
2. Belozersky, A. N., and A. S. Spirin. 1958. A correlation between the compositions of deoxyribonucleic and ribonucleic acids. Nature (London) 182:111–112.
3. Bernardi, G. 1989. The isochore organization of the human genome. Annu. Rev. Genet. 23:637–661.
4. Blake, R. D., and S. Earley. 1986. Distribution and evolution of sequence characteristics in the E. coli genome. J. Biomol. Struct. Dynam. 4:291–307.
5. Brenner, D. J., G. R. Fanning, K. E. Johnson, R. V. Citarella, and S. Falkow. 1969. Polynucleotide sequence relationships among members of the Enterobacteriaceae. J. Bacteriol. 98:637–650.
6. Brisson-Noel, A., M. Arthur, and P. Courvalin. 1988. Evidence for natural gene transfer from gram-positive cocci to Escherichia coli. J. Bacteriol. 170:1739–1745.
7. Bulmer, M. 1988. Are codon usage patterns in unicellular organisms determined by selection-mutation balance? J. Evol. Biol. 1:15–16.
8. Burland, V., G. Plunkett III, D. L. Daniels, and F. R. Blattner. 1993. DNA sequence and analysis of 136 kilobases of the Escherichia coli genome: organizational symmetry around the origin of replication. Genomics 16:551–561.
9. Buvinger, W. E., K. A. Lampel, R. J. Bojanowski, and M. Riley. 1984. Location and analysis of nucleotide sequences at one end of a putative lac transposon in the Escherichia coli chromosome. J. Bacteriol. 159:618–623.
10. Cardon, L. R., C. Bruge, G. A. Schachtel, E. B. Blaisdell, and S. Karlin. 1993. Comparative DNA sequence features in two long Escherichia coli contigs. Nucleic Acids Res. 21:3875–3884.
11. Cox, E. C., and C. Yanofsky. 1967. Altered base ratios in the DNA of an Escherichia coli mutator strain. Proc. Natl. Acad. Sci. USA 58:1895–1902.
12. Daniels, D. L., G. Plunkett III, V. Burland, and F. R. Blattner. 1992. Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. Science 257:771–778.
13. DuBose, R. F., and D. L. Hartl. 1990. The molecular evolution of alkaline phosphatase: correlating variation among enteric bacteria to experimental manipulations of the protein. Mol. Biol. Evol. 7:547–577.
14. Dykhuizen, D. E., and L. Green. 1991. Recombination in Escherichia coli and the definition of biological species. J. Bacteriol. 173:7257–7268.
15. Efron, B., and G. Gong. 1983. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37:36–48.
16. Ezaki, T., S. M. Saidi, S.-L. Lui, Y. Hashimoto, Y. Yamamoto, and E. Yabuuchi. 1990. Rapid procedure to determine the DNA base composition from small amounts of gram-positive bacteria. FEMS Microbiol. Lett. 67:127–130.
17. Farris, J. 1972. Estimating phylogenetic trees from distance matrices. Am. Nat. 106:645–668.
18. Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368–376.
19. Felsenstein, J. 1983. Statistical inference of phylogenies. J. R. Soc. Stat. 146:246–272.
20. Felsenstein, J. 1984. Distance methods for inferring phylogenies: a justification. Evolution 38:16–24.
21. Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791.
22. Felsenstein, J. 1988. Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet. 22:521–565.
23. Feng, D. F., M. S. Johnson, and R. F. Doolittle. 1984. Aligning amino acid sequences: comparison of commonly used methods. J. Mol. Evol. 21:112–125.
24. Fermlee, T., S. Pellett, and R. A. Welch. 1985. Nucleotide sequence of an Escherichia coli chromosomal hemolysin. J. Bacteriol. 163:94–105.
25. Fickett, J. W., D. C. Tourney, and D. R. Wolf. 1992. Base compositional structure of genomes. Genomics 13:1056–1064.
26. Fitch, W. M., and E. Margoliash. 1967. Construction of phylogenetic trees. Science 155:279–284.
27. Fitts, R. 1985. Development of a DNA-DNA hybridization test for the presence of Salmonella in foods. Food Technol. 39:95–102.
28. Fukami-Kobayashi, K., and Y. Tateno. 1991. Robustness of maximum likelihood tree estimation against different patterns of base substitutions. J. Mol. Evol. 32:79–91.
29. Galan, J., C. Ginnochio, and P. Costeas. 1992. Molecular and functional characterization of the Salmonella invasion gene invA: homology of InvA to members of a new protein family. J. Bacteriol. 174:4338–4349.
30. Galan, J. E., and R. Curtiss III. 1991. Distribution of the invA, -B, -C, and -D genes of Salmonella typhimurium among other Salmonella serovars: invA mutants of Salmonella typhi are deficient for entry into mammalian cells. Infect. Immun. 59:2901–2908.
31. Groisman, E. A., and H. Ochman. 1993. Cognate genes govern invasion of host epithelial cells by Salmonella typhimurium and Shigella flexneri. EMBO J. 12:3779–3787.
32. Groisman, E. A., M. H. Saier, Jr., and H. Ochman. 1992. Horizontal transfer of a phosphatase gene as evidence for mosaic structure of the Salmonella genome. EMBO J. 11:1309–1316.
33. Groisman, E. A., M. Sturmoski, F. Soloman, R. Lin, and H. Ochman. 1993. Molecular, functional and evolutionary analysis of sequences specific to Salmonella. Proc. Natl. Acad. Sci. USA 90:1033–1037.
34. Gruss, A., V. Moretto, S. D. Ehrlich, P. Duwat, and P. Dabert. 1991. GC-rich DNA sequences block homologous recombination. J. Biol. Chem. 266:6667–6669.
35. Heinemann, J. A. 1991. Genetics of gene transfer between species. Trends Genet. 7:171–175.
36. Hennig, W. 1965. Phylogenetic systematics. Annu. Rev. Entomol. 10:97–116.
37. Herzer, P. J., S. Inouye, M. Inouye, and T. S. Whittam. 1990. Phylogenetic distribution of branched RNA-linked multicopy single-stranded DNA among natural isolates of Escherichia coli. J. Bacteriol. 172:6175–6181.
38. Hillis, D. M., J. P. Huelsenbeck, and C. W. Cunningham. 1994. Application and accuracy of molecular phylogenies. Science 264:671–677.
39. Hoyer, L. L., A. C. Hamilton, S. M. Steenbergen, and E. R. Vimr. 1992. Cloning, sequencing and distribution of the Salmonella typhimurium LT2 sialidase gene, nanH, provides evidence for interspecies gene transfer. Mol. Microbiol. 6:873–884.
40. Hu, M., and R. C. Deonier. 1981. Mapping of IS1 elements flanking the argF region of the E. coli K-12 chromosome. Mol. Gen. Genet. 181:222–229.
41. Ikemura, T., K.-W. Wada, and S.-I. Aota. 1990. Giant G+C% mosaic structures of the human genome found by arrangement of GenBank human DNA sequences according to genetic positions. Genomics 8:207–216.
42. Island, M. D., B.-Y. Wei, and R. J. Kadner. 1992. Structure and function of the UHP genes for the sugar phosphate transport system in Escherichia coli and Salmonella typhimurium. J. Bacteriol. 174:2754–2762.
43. Johnson, R., R. R. Colwell, R. Sakazaki, and K. Tamura. 1975. Numerical taxonomy of the Enterobacteriaceae. Int. J. Syst. Bacteriol. 25:12–37.
44. Jukes, T. H., and V. Bhushan. 1986. Silent nucleotide substitutions and G+C content of some mitochondrial and bacterial genes. J. Mol. Evol. 24:39–44.
45. Jukes, T. H., S. Osawa, and A. Muto. 1987. Divergence and directional mutation pressures. Nature (London) 325:668–670.
46. Karlin, S., and V. Brendel. 1993. Patchiness and correlations in DNA sequences. Science 259:677–680.
47. Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge.
48. King, J. L., and T. H. Jukes. 1969. Non-Darwinian evolution. Science 164:788–98.
49. Krane, D. E., D. L. Hartl, and H. Ochman. 1991. Rapid determination of the nucleotide content and its application to the study of genome organization. Nucleic Acids Res. 19:5181–5185.
50. Lake, J. A. 1987. A rate-invariant technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol. Biol. Evol. 4:167–191.
51. Lampson, B. C., J. Sun, M.-Y. Hsu, J. Vallejo-Ramirez, S. Inouye, and M. Inouye. 1989. Reverse transcriptase in a clinical strain of Escherichia coli: production of branched RNA-linked msDNA. Science 243:1033–1038.
52. Lawrence, J. G., H. Ochman, and D. L. Hartl. 1991. Molecular and evolutionary relationships among enteric bacteria. J. Gen. Microbiol. 137:1911–1921.
53. Lipman, D. J., W. J. Wilbur, T. F. Smith, and M. S. Waterman. 1984. On the statistical significance of nucleic acid similarities. Nucleic Acids Res. 12:215–226.
54. Mabuchi, T., and S. Nishikawa. 1990. Selective staining with fluorochromes of DNA fragments on gels depending on their AT-content. Nucleic Acids Res. 18:7461–7642.
55. Marklund, B. I., J. M. Tennent, E. Garcia, A. Hamers, M. Baga, F. Lindberg, W. Gaastra, and S. Normark. 1992. Horizontal gene transfer of the Escherichia coli pap and prs pili operons as a mechanism for the development of tissue-specific adhesive properties. Mol. Microbiol. 6:2225–2242.
56. Marmur, J., and P. Doty. 1962. Determination of the base composition of deoxy-ribonucleic acid from thermal denaturation temperature. J. Mol. Biol. 5:109–118.
57. Mazodier, P., and J. Davies. 1991. Gene transfer between distantly related bacteria. Annu. Rev. Genet. 25:147–171.
58. Médigue, C., T. Rouxel, P. Vigier, A. Hénaut, and A. Danchin. 1991. Evidence of horizontal gene transfer in Escherichia coli speciation. J. Mol. Biol. 222:851–856.
59. Moran, N. A., M. A. Munson, P. Baumann, and H. Ishikawa. 1993. A molecular clock in endosymbiotic bacteria is calibrated using insect hosts. Proc. R. Soc. Lond. Sect. B 253:167–171.
60. Muto, A., and S. Osawa. 1987. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc. Natl. Acad. Sci. USA 84:166–169.
61. Nei, M. 1991. Relative efficiencies of different tree-making methods for molecular data, p. 90–128. In M. M. Miyamoto and J. Cracraft (ed.), Phylogenetic Analysis of DNA Sequences. Oxford University Press, New York.
62. Normore, W. M., and J. R. Brown. 1970. G+C composition in bacteria, p. 24–74. In H. A. Sober (ed.), Handbook of Biochemistry: Selected Data for Molecular Biology. CRC Press, Inc., Cleveland.
63. Ochman, H., and E. A. Groisman. 1994. The origin and evolution of species differences in Escherichia coli and Salmonella typhimurium, p. 479–493. In B. Schierwater, B. Streit, G. P. Wagner, and R. DeSalle (ed.), Molecular Ecology and Evolution: Approaches and Applications. Birkhauser Verlag, New York.
64. Ochman, H., and A. C. Wilson. 1987. Evolutionary history of enteric bacteria, p. 1649–1654. In F. C. Neidhardt, J. L. Ingraham, K. B. Low, B. Magasanik, M. Schaechter, and H. E. Umbarger (ed.), Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology, vol. 2. American Society for Microbiology, Washington, D.C.
65. Ochman, H., and A. C. Wilson. 1988. Evolution in bacteria: evidence for a universal substitution rate in cellular genomes. J. Mol. Evol. 26:74–86.
66. Ohama, T., A. Muto, and S. Osawa. 1990. Role of GC-biased mutation pressure on synonymous codon choice in Micrococcus luteus, a bacterium with high genomic GC-content. Nucleic Acids Res. 18:1565–1569.
67. Oliver, J. L., A. Marin, and J. M. Martinez-Zapater. 1990. Chloroplast genes transferred to the nuclear plant genome have adjusted to nuclear base composition and codon usage. Nucleic Acids Res. 18:65–71.
68. Olsen, G. J., and C. R. Woese. 1993. Ribosomal RNA: a key to phylogeny. FASEB J. 7:113–123.
69. Olsen, G. J., C. R. Woese, and R. Overbeek. 1994. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol. 176:1–6.
70. Osawa, S., T. H. Jukes, A. Muto, F. Yamao, T. Ohama, and Y. Andachi. 1987. Role of directional mutation pressure in the evolution of the eubacterial genetic code. Cold Spring Harbor Symp. Quant. Biol. 52:777–789.
71. Penny, D., and M. D. Hendy. 1986. Estimating the reliability of evolutionary trees. Mol. Biol. Evol. 3:403–417.
72. Persson, B. C., and G. R. Bjork. 1993. Isolation of the gene (miaE) encoding the hydroxylase involved in the synthesis of 2-methylthio-cis-ribozeatin in tRNA of Salmonella typhimurium and characterization of mutants. J. Bacteriol. 175:7776–7785.
73. Riley, M., and S. Krawiec. 1987. Genome organization, p. 967–981. In F. C. Neidhardt, J. L. Ingraham, K. B. Low, B. Magasanik, M. Schaechter, and H. E. Umbarger (ed.), Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology, vol. 2. American Society for Microbiology, Washington, D.C.
74. Riley, M., and K. E. Sanderson. 1990. Comparative genetics of Escherichia coli and Salmonella typhimurium, p. 85–96. In M. Riley and K. Drlica (ed.), The Bacterial Chromosome. American Society for Microbiology, Washington, D.C.
75. Rolfe, R., and M. Meselson. 1959. The relative homogeneity of microbial DNA. Proc. Natl. Acad. Sci. USA 45:1039–1042.
76. Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.
77. Sanderson, K. E., and J. R. Roth. 1988. Linkage map of Salmonella typhimurium, edition 7. Microbiol. Rev. 52:485–532.
78. Schildkraut, C. L., J. Marmur, and P. Doty. 1962. Determination of the base composition of deoxyribonucleic acid from its buoyant density in CsCl. J. Mol. Biol. 4:430–443.
79. Schleifer, K. H., and E. Stackebrandt. 1983. Molecular systematics of prokaryotes. Annu. Rev. Microbiol. 37:143–187.
80. Sharp, P. M. 1990. Processes of genome evolution reflected by base frequency differences among Serratia marcescens genes. Mol. Microbiol. 4:119–122.
81. Sharp, P. M., E. Cowe, D. G. Higgins, D. C. Shields, K. H. Wolfe, and F. Wright. 1988. Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, and Homo sapiens: a review of the considerable within-species diversity. Nucleic Acids Res. 16:8207–8211.
82. Sharp, P. M., and W.-H. Li. 1987. The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15:1281–1295.
83. Sharp, P. M., and G. Matassi. 1994. Codon usage and genome evolution. Curr. Opin.Genet. Dev. 4:851–860.
84. Shields, D. C. 1990. Switches in species-specific codon preferences: the influence of mutation biases. J. Mol. Evol. 31:71–80.
85. Shields, D. C., and P. M. Sharp. 1987. Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res. 15:8023–8040.
86. Shields, D. C., P. M. Sharp, D. G. Higgins, and F. Wright. 1988. "Silent" sites in Drosophila are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704–716.
87. Sirisena, D. M., R. P. MacLachlan, S.-L. Liu, A. Hessel, and K. E. Sanderson. 1994. Molecular analysis of the rfaD gene, for heptose synthesis, and the rfaF gene, for heptose transfer, in lipopolysaccharide synthesis in Salmonella typhimurium. J. Bacteriol. 176:2379–2385.
88. Smith, M. W., D.-W. Feng, and R. F. Doolittle. 1992. Evolution by acquisition: the case for horizontal gene transfers. Trends Biochem. Sci. 17:489–493.
89. Sneath, P. H. A. 1974. Phylogeny of micro-organisms. Soc. Gen. Microbiol. Symp. 25:1–39.
90. Stewart, C.-B. 1993. The powers and pitfalls of parsimony. Nature (London) 361:603–607.
91. Sueoka, N. 1961. Variation and heterogeneity of base composition of deoxyribonucleic acids: a compilation of old and new data. J. Mol. Biol. 3:31–40.
92. Sueoka, N. 1962. On the genetic basis of variation and heterogeneity in base composition. Proc. Natl. Acad. Sci. USA 48:582–592.
93. Sueoka, N. 1988. Directional mutaton pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:2653–2657.
94. Sueoka, N. 1992. Directional mutaton pressure, selective constraints, and genetic equilibria. J. Mol. Evol. 34:95–114.
95. Sueoka, N. 1993. Directional mutaton pressure, mutator mutations, and dynamics of molecular evolution. J. Mol. Evol. 37:137–153.
96. Sueoka, N., J. Marmur, and P. Doty. 1959. Heterogeneity in deoxyribonucleic acids. II. Dependence of the density of deoxyribonucleic acids on guanine-cytosine. Nature (London) 183:1427–1931.
97. Swofford, D., and G. J. Olson. 1990. Phylogenetic reconstruction, p. 411–501. In D. M. Hillis and C. Moritz (ed.), Molecular Systematics. Sinauer, Sunderland, Mass.
98. Tamaoka, J., and K. Komagata. 1984. Determination of DNA base composition by reversed-phase high-performance liquid chromatography. FEMS Microbiol. Lett. 25:125–128.
99. Treffers, H. P., V. Spinelli, and N. O. Belser. 1954. A factor (or mutator gene) influencing mutation rates in Escherichia coli. Proc. Natl. Acad. Sci. USA 55:274–281.
100. van Niel, C. B. 1946. The classification and natural relationships among bacteria. Cold Spring Harbor Symp. Quant. Biol. 11:285–301.
101. Van Vliet, F., A. Boyen, and N. Glansdorff. 1988. On interspecies gene transfer: the case of the argF gene in E. coli. Ann. Inst. Pasteur Microbiol. 139:493–496.
102. Venkatesan, M. M., J. M. Buyusse, and E. V. Oaks. 1992. Surface presentation of Shigella flexneri invasion plasmid antigens requires products of the spa locus. J. Bacteriol. 174:1990–2001.
103. Verma, N., and P. Reeves. 1989. Identification and sequence of rfbS and rfbE, which confer antigenic specificity on group A and group D salmonellae. J. Bacteriol. 171:5694–5701.
104. Wada, A., A. Suyama, and R. Hanai. 1991. Phenomenological theory of GCAT pressure on base composition. J. Mol. Evol. 32:374–378.
105. Wada, K., Y. Wada, F. Ishibashi, T. Gojobori, and T. Ikemura. 1992. Codon usage tabulated from the GenBank genetic sequence data. Nucleic Acids Res. 18:2367–2413.
106. Wheelis, M. L., O. Kandler, and C. R. Woese. 1992. On the nature of global classification. Proc. Natl. Acad. Sci. USA 89:2930–2934.
107. Whittam, T. S., and S. Ake. 1992. Genetic polymorphisms and recombination in natural populations of Escherichia coli, p. 223–246. In N. Takahata and A. G. Clark (ed.), Mechanisms of Molecular Evolution. Japan Scientific Society Press, Tokyo.
108. Wilson, A. C., S. S. Carlson, and T. J. White. 1977. Biochemical evolution. Annu. Rev. Biochem. 46:573–639.
109. Woese, C. R. 1987. Bacterial evolution. Microbiol. Rev. 51:221–271.
110. Woese, C. R., and G. E. Fox. 1977. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. USA 74:5088–5090.
111. Wyk, P., and P. Reeves. 1989. Identification and sequence of abequose synthase, which confers antigenic specificity on group B salmonellae: homology with galactose epimerase. J. Bacteriol. 171:5687–5693.
112. York, M. K., and M. Stodolsky. 1981. Characterization of P1argF derivatives from E. coli K-12 transduction. I. IS1 elements flank the argF segment. Mol. Gen. Genet. 181:230–240.
113. Yura, T., H. Mori, H. Nagai, T. Nagata, A. Ishihama, N. Fujita, K. Isono, K. Mizobuchi, and A. Nakata. 1992. Systematic sequencing of the Escherichia coli genome: analysis of the 0–2.4 min region. Nucleic Acids Res. 20:3305–3308.
114. Zhyvoloup, A. N., M. I. Woodmaska, I. V. Kroupskaya, and E. B. Paton. 1990. Nucleotide sequence of the rplJL operon and the deduced primary structure of the encoded L10 and L7L12 proteins of Salmonella typhimurium compared to that of Escherichia coli. Nucleic Acids Res. 18:4620.