Blog gratis
¡Crea tu blog!
"El que quiera instruirse debe primeramente saber dudar, pues la duda del espíritu conduce a la manifestación de la verdad." ╚»♥GABYVEN*कलियुग♥
07 de Diciembre, 2008 · General



Aspectsof this invention were made with government support (DOE grant numberDE- FG02-02ER63453). The government has certain rights in theinvention. This application claims the benefit of the filing date ofU.S. provisional application

60/725,295, filed October 12, 2005, which is incorporated by reference herein in its entirety.


Thisinvention relates, e.g., to the identification of non-essential genesof bacteria, and of a minimal set of genes required to supportviability of a free-living organism.


Oneconsequence of progress in the new field of synthetic biology is anemerging view of cells as assemblages of parts that can be put togetherto produce an organism with a desired phenotype(l). That perspectivebegs the question: "How few parts would it take to construct a cell?"In an environment that is free from stress and provides all necessarynutrients, what would comprise the simplest free-living organism? Thisproblem has been approached theoretically and experimentally in ourlaboratory and elsewhere.

In a comparisonof the first two bacterial genomes sequenced, Mushegian and Kooninprojected that the 256 orthologous genes shared by the Gram negativeHaemophilus influenzae and the Gram positive M. genitalium genomes area close approximation of a minimal gene set for bacterial life(2). Morerecently Gil et al. proposed a 206 protein-coding gene core of aminimal bacterial gene set based on analysis of several free-living andendosymbiotic bacterial genomes (3).

In1999 some of the present inventors reported the first use of globaltransposon mutagenesis to experimentally determine the genes notessential for laboratory growth of M. genitalium{A). Since then therehave been numerous other experimental determinations of bacterialessential gene sets using our approach and other methods such as sitedirected gene knockouts and antisense RNA (5- 12). Most of thesestudies were done with human pathogens, often with the aim ofidentifying essential genes that might be used as antibiotic targets.Almost all of these organisms contain relatively large genomes thatinclude many paralogous gene families. Disruption or deletion of suchgenes shows they are non-essential but does not determine if theirproducts perform essential biological functions. It is only throughgene essentiality studies of bacteria that have near minimal genomesthat we bring empirical verification to the compositions ofhypothetical minimal gene sets.

The Mollicutes, generically known as the mycoplasmas, are an excellentexperimental platform for experimentally defining a minimal gene set.These wall-less bacteria evolved from more conventional progenitors inthe Firmicutes taxon by a process of massive genome reduction.Mycoplasmas are obligate parasites that live in relatively unchangingniches requiring little adaptive capability. M. genitalium, a humanurogenital pathogen, is the extreme manifestation of this genomicparsimony, having only 482 protein-coding genes and the smallest genomeat -580 kb of any known free-living organism capable of being grown inpure culture(13). The bacteria can grow independently on an agar platefree of other living cells. While more conventional bacteria withlarger genomes used in gene essentiality studies have on average 26% oftheir genes in paralogous gene families, M genitalium has only 6%(Table 1). Thus, with its lack of genomic redundancy and contingenciesfor different environmental conditions, M. genitalium is already closeto being a minimal bacterial cell.

The1999 report by some of the present inventors on the essential microbialgene for M. genitalium and its closest relative, Mycoplasma pneumoniae,mapped ~2200 transposon insertion sites in these two species, andidentified 130 putatively non-essential M. genitalium protein-codinggenes or M. pneumoniae orthologs of M. genitalium genes. In that report(Hutchison et al. (1999) Science 286, 2165-9), those authors estimatedthat 265 to 3,50 of the protein-coding genes of M. genitalium areessential under laboratory growth conditions(4). However proof of genedispensability requires isolation and characterization of pure clonalpopulations, which they did not do. In that report, the authors grewTn4001 transformed cells in mixed pools for several weeks, and thenisolated genomic DNA from those mixtures of mutants. They sequencedamplicons from inverse PCRs using that DNA as a template to identifythe transposon insertion sites in the mycoplasma genomes. Most of thegenes containing transposon insertions encoded either hypotheticalproteins or other proteins not expected to be essential. Nonetheless,some of the putatively disrupted genes, such as isoleucyl andtyrosyl-tRNA synthetases (MG345 & MG455), DNA replication gene dnaA(MG469), and DNA polymerase HI, subunit alpha (MG261) are thought toperform essential functions. They hypothesized how genes generallythought to be essential might be disrupted: a gene may be tolerant ofthe transposon insertion and not actually disrupted, cells couldcontain two copies of a gene, or the gene product may be supplied byother cells in the same mixed pool of mutants.

Disclosedherein is an expanded study in which we have isolated and characterizedM. genitalium Tn4001 insertion mutants that were present in individualcolonies picked from agar

plates. This analysis has provided a new, more thorough, estimate of the number of essential genes in this minimalist bacterium.

DESCRIPTIONOF THE DRAWINGS Figure 1 shows the accumulation of new disrupted M.genitalium genes (top line, thick) and new transposon insertion sitesin the genome (bottom line, thin) as a function of the total number ofanalyzed primary colonies and subcolonies with insertion sitesdifferent from that of the parental primary colony.

Figures2A - 21 show global transposon mutagenesis of M. genitalium. Thelocations of transposon insertions from the current study are noted bya Δ below the insertion site on the map. The letters over the Gene Loci(MG###) refer to the functional category of the gene product as listed.

Biosynthesis of cofactors.prosthetic grps,and

A carriers

Purines, pyrimidines,nucleosides,and

B nucleotides

C Cell envelope

D Cellular processes

E Central intermediary metabolism

F DNA metabolism

G Energy metabolism

H Fatty acid and phospholipid metabolism

I Hypothetical proteins

J Protein fate

K Protein synthesis

L Regulatory functions

M Transcription

N Transport and binding proteins X Unknown function

P cell/organism defense

R rRNA and tRNA genes

Figure3 shows the frequency of Tn4001tet insertions. These histograms showthe frequency we identified mutants with transposon insertions atdifferent sites in the genome. The abscissa is the M. genitalium genomesite where the transposon inserts. Some mutations proved to be highlyprone to transposon migration, hi subcolonies with insertion sitesdifferent than the primary clone there was a preference to jump to aregion of the genome from ~350,000 to 500,000 base pairs rich intopological features such as palindromic regions and cruciform elements(van Noort et al. (2003) Trends Genet 19, 365-369).

Figure 4 shows metabolic pathways and substrate transport mechanismsencoded by M. genitalium. White letters on black boxes marknon-essential functions or proteins based on our current genedisruption study. Question marks denote enzymes or transporters notidentified that would be necessary to complete pathways, and thosemissing enzyme and transporter names are italicized. Transporters aredrawn spanning the cell membrane. The arrows indicate the predicteddirection of substrate transport. The ABC type transporters are drawnwith a rectangle for the substrate-binding protein, diamonds for themembrane-spanning permeases, and circles for the ATP-binding subunits.


Theinventors have identified 101 protein-coding genes that arenon-essential for sustaining the growth of an organism, such as abacterium, in a rich bacterial culture medium, e.g. SP4. Such a culturemedium contains all of the salts, growth factors, nutrients etc.required for bacterial growth under laboratory conditions. A minimalset of genes required for sustaining the viability of a free- livingorganism under laboratory conditions is extrapolated from theidentification of these nonessential genes. By a "minimal gene set" ismeant the minimal set of genes whose expression allows the viability{e.g., survival, growth, replication, proliferation, etc.) of afree-living organism in a particular rich bacterial medium as discussedabove.

The 101 protein-coding genes of M.genitalium that were disrupted in the bacteria and neverthelessretained viability, and are thus dispensable (non-essential) forgrowth, are listed in Table 2, where they are grouped by theirfunctional roles. The 381 genes that were not disrupted are summarizedin Table 3, where they are also grouped by functional roles. Thesegenes form part of a minimal essential gene set. Other genes may alsobe part of a minimal gene set. At minimum, these other genes includeprotein-coding genes for ABC transporters for phosphate and/orphosphonate, and certain lipoproteins and/or glycerophosphoryl diesterphosphodiesterases; and RNA-encoding genes.

Asnoted above, the some of the present inventors published a preliminarystudy in 1999 that reported putative sets of genes that appeared to beeither essential or disposable for viability. Table 4 lists genesidentified in the present study as being dispensable, but which werenot so identified in the 1999 paper. Table 5 lists genes identified inthe present study as being required for growth, but which were not soidentified in the 1999 paper.

One aspectof the invention is a set of protein-coding genes that provides theinformation required for replication of a free-living organism underaxenic conditions in a rich bacterial culture

medium, such as SP4, (e.g., a minimal set of protein-coding genes),wherein the gene set lacks at least 40 of the 101 protein-coding geneslisted in Table 2 (the "lacking genes"), or functional equivalentsthereof, wherein at least one of the genes in Table 4 is among thelacking genes; wherein the set comprises between 350 and 381 of the 381protein-coding genes listed in

Table3, or functional equivalents thereof, including at least one of thegenes in Table 5; and wherein the set comprises no more than 450protein-coding genes.

A set of genes that"provides the information" required for replication of a free-livingorganism can be in any form that can be transcribed (e.g. into mRNA,rRNA or tRNA) and, in the case of protein-encoding sequences,translated into protein, wherein the transcription/translation productsprovide functions that allow the free-living organism to function.

Thisset of protein-coding genes is smaller than the complete complement ofgenes found in M. genitalium (482 genes), the smallest known set ofnaturally occurring genes in a free-living organism. A set ofprotein-coding genes of the invention can lack at least about 55 ( least about,

70, 80 or 90) of the geneslisted in Table 2), and/or it can comprise at least about 360 (e.g. atleast about 370 or 380) of the genes listed in Table 3. »

Aset of the invention can further comprise: genes encoding an ABCtransporter for phosphate import, selected from the group consisting of(a) MG410, MG411 and MG412, and (b) MG289, MG290 and MG291, andfunctional equivalents thereof; and/or a lipoprotein-encoding geneselected from the group consisting of MGl 85 and MG260, and functionalequivalents thereof; and/or a glycerophosphoryl diesterphosphodiesterase gene selected from the group consisting of MG293 andMG385, and functional equivalents thereof.

Furthermore,a set of the invention can further comprise the 43 RNA-coding genes ofMycoplasma genitalium, or functional equivalents thereof.

Thegenes in a set of the invention may constitute a chromosome; and/or maybe from M. genitalium. Another aspect of the invention is a free-livingorganism that can grow and replicate under axenic conditions in a richbacterial culture medium (such as SP4), whose set of genes consists ofa set of the invention, e.g. a set that comprises at least one geneinvolved in hydrogen or ethanol


Anotheraspect of the invention is a method for determining the function of agene, comprising inserting, mutating or removing the gene into/in/fromsuch a free-living organism, and measuring a property of the organism.Another aspect of the invention is a method of hydrogen or, ethanolproduction, comprising growing a free-living organism of that inventionthat comprises at least one gene involved in hydrogen or ethanolproduction, in a suitable medium such that hydrogen or ethanol isproduced.

Another aspect of the inventionis an effective subset of a set as noted above. An "effective subset,"as used herein, refers to a subset that provides the informationrequired for replication of a free-living organism in a rich bacterialculture medium, such as SP4.

A minimalgene set of the invention has a variety of applications. For example, aminimal gene set of the invention can be introduced into cells of amicroorganism, such as a bacterium, which lack a genome or a functionalgenome (e.g. ghost cells) and used experimentally to investigaterequirements for cell growth, protein synthesis, replication or otherbacterial functions under varying conditions. One or more of theminimal genes in the ghost cells can be modified or substituted withorthologous genes or genes or substituted with non-orthologous genesthat express proteins which perform the same function(s), to allowstructure/function studies of those genes. Cells comprising a minimalgene set of the invention can be modified to further comprise one ormore expressible heterologous genes, either integrated into the genomeor replicating on one or more independent plasmids. These cells can beused, e.g., to study properties or activities of the heterologous genes(e.g., structure/function studies), or to produce useful amounts of theheterologous proteins (e.g. biologic drugs, vaccines, catalyticenzymes, energy sources, etc).

As noted, aminimal gene set is one that provides the information required forreplication of a free-living organism in a rich bacterial culturemedium. The minimal gene set described herein was identified based ongenes that were shown to be non-essential for bacterial growth in themedium SP4 (whose composition is described in reference # 17), in thepresence of tetracycline selection (the ^^tetracycline resistance geneis present in the transposon used to inactivate the genes which wereshown to be non-essential). The set of non-essential genes may bedifferent for organisms grown under different conditions (e.g. indifferent bacterial medium, under different selection conditions, etc).In general, a culture medium that supports growth and proliferation ofa minimal organism (containing a gene set as discussed herein), with asfew environmental stresses as possible, contains energy sources such asglucose, arginine or urea; protein or peptides; all amino acids;nucleotides;

vitamins; cofactors; fatty acids and other membrane components such ascholesterol; enzyme cofactors; salts; minerals and buffers.

Sucha medium is SP4 (Spiroplasma medium), which is a highly nutritiousmixture of beef heart infusion, peptone supplemented with yeastextract, CMRL 1066 Medium and 17 % fetal bovine serum. The yeastextract provides diphosphopyridine nucleotides and the serum providescholesterol and a source of protein. (See, e.g., Tully et al (1979) JInfect. Dis 139, 478-82.) In particular, SP4 medium contains thefollowing components:


Mycoplasma Broth Base 3.5g Bacto Tryptone 1Og

Bacto Peptone 5.3g

Distilled water 600ml

Adjust pH to 7.5 Autoclave at 121° C for 15 min Add Asepticallv

20% Glucose 25ml

CMRL 1066 (10X) 50ml

7.5% Sodium Bicarbonate 14.6ml

20OmM L-Glutamine 5ml Yeast extract Solution 35ml

2% Autoclaved TC Yeastolate 100ml

Fetal Bovine Serum (Heat inactivated)....170ml

Penicillin G (I O7 IU/ml) lOOμl

CMRL 1066 Components 1 ' i _ Chemical , 1X Molarity (mM)J

Calcium chloride (CaCI2-2H2O) 1.800

Potassium Chloride (KCI) 5.300

Magnesium sulfate (MgSO4) 0.814

Sodium chloride (NaCI) 116.000

Sodium phosphate, mono (NaH2PO4) 1.010

Thiamine pyrophosphate 0.0021

Coenzyme A 0.00326

2'-deoxyadenosine 0.0398

2'-deoxycytidine 0.4441

2'-deoxyguanosine 0.0375

Beta-nicotinamide adenine dinucleotide 0.0105

Flavin adenine dinucleotide 0.00127

D-Glucose 3.33000

Glutathione reduced 0.0325

5-Methyl-2'-deoxycytidine 0.0004

Phenol red 0.0502

Sodium acetate-3H2O 0.6100 d-Glucuronic acid 0.0177

Thymidine 0.0413 beta-nicotinamide adenine dinucleotide

0.0013 phosphate

Tween 80 5 mg/L

Uridine-δ'-triphosphate 0.0020

L-Alanine 0.281

L-Arginine 0.330

L-Aspartic acid 0.230

L-Cystine 1.480

L-Cysteine 0.108

L-Glutamic 0.510

Glycine 0.667

L-Histidine 0.952 trans-4-Hydroxy-L-proline 0.763

L-lsoleucine 0.153

L-Leucine 0.458

L-Lysine 0.383

L-Methionine 0.101

L-Phenylalanine 0.152

L-Proline 0.348

L-Serine 0.238

L-Threonine 0.252

L-Tryptophan 0.049

L-Tyrosine disodium salt 0.260

L-Valine 0.214

Biotin 0.000041

D-Pantothenic acid hemicalcium salt 0.000021

Choline Chloride 0.0035

Folic acid 0.0000227 myo-inositol 0.0002

Niacinamide 0.00203

Niacin 0.0002

4-Aminobenzoic Acid 0.0003

Pyridoxal Hydrochloride 0.0001

Pyridoxine Hydrochloride 0.00012

Riboflavin 0.0000266

Thiamine hydrochloride 0.0000297

Ascorbic Acid 0.284

Cholesterol 0.000517

Sodium bicarbonate (NaHCO3) 26.200

L-Glutamine 2.000

Theterm "gene," as used herein, refers to a polynucleotide comprising aprotein-coding or RNA-coding sequence, in an expressible form, e.g.operably linked to an expression control sequence. The "codingsequences" of the gene generally do not include expression controlsequences, unless they are embedded within the coding sequence. Indifferent embodiments of the invention, the coding sequences of thegenes' listed in Tables 2 to 5 can be under the control ofthe naturally occurring expression control sequences or they can beunder the control of heterologous expression control sequences, orcombinations thereof.

An "expressioncontrol sequence," as used herein, refers to a polynucleotide sequencethat regulates expression of a polypeptide coded for by apolynucleotide to which it is functionally

("operably")linked. Expression can be regulated at the level of the mRNA orpolypeptide. Thus, the term expression control sequence includesmRNA-related elements and protein-related elements.

Suchelements include promoters, domains within promoters, ribosome bindingsequences, transcriptional terminators, etc. An expression controlsequence is operably linked to a nucleotide sequence when theexpression control sequence is positioned in such a manner to effect orachieve expression of the coding sequence. For example, when a promoteris operably linked 5' to a coding sequence, expression of the codingsequence is driven by the promoter.

Theminimal gene set suggested in the Examples herein is composed of genesor sequences from Mycoplasma genitalium (M. genitalium) G37 (ATCC33530). The complete genome of this bacterium is provided as Genbankaccession number L43976. The individual genes are annotated in theGenbank listing as MGOOl, MG002 through MG470. The sequences of thegenes were published on the TIGR web site in early October, 2005.

However,any of a variety of other protein- or RNA-coding genes or sequences canbe substituted in a minimal gene set for the exemplified protein- orRNA-coding gene or sequences, provided that the protein or RNA encodedby the substituting gene can be expressed and that it provides asufficient amount of the activity, function and/or structure tosubstitute for the M. genitalium gene or sequence in a minimal geneset. Such substitutes are sometimes referred to herein as "functionalequivalents" of the exemplified genes or coding sequences.

Suitablegenes or coding sequences that can be substituted include, for example,an active mutant, variant, polymorph etc. of aM genitalium gene; or acorresponding (orthologous) gene from another bacterium, such as adifferent Mycoplasma species (e.g., M. capricolum). Furthermore, genesor sequences from the minimal gene set can be substituted withorthologous genes from an

evolutionarily more diverse organism, such as an archaebacterium or aeukaryotic organism. Genes from eukaryotic organisms which must bepost-translationally modified in order to function by a mechanismunavailable in a bacterial host cannot, of course, be used. Similarly,expression control sequences from eukaryotic genes can be used only ifthey can function in the background of a bacterial cell.

Inone embodiment of the invention, genes from the minimal gene set arereplaced by non- orthologous gene displacement (by a different set ofgenes providing an equivalent function or activity). For example, genesfrom the glycolytic pathway of M. genitalium as shown in the Examplescan be substituted with genes from a different organism that utilizes adifferent source for generating energy (such as hydrolysis of urea,fermentation of arginine, etc.).

Forexample, M. genitalium generates energy via glycolysis. One cansubstitute a different energy generation system from another organismthat would make most of the genes that express the enzymes of theglycolytic pathway superfluous. For instance energy generation inUreaplasma parvum, a bacterium closely related to M. genitalium isbased on the hydrolysis of urea. That system includes 8 genes thatencode the urease enzyme complex, two ammonium transporters, and as yetunidentified nickel ion transporter (presumably one of several U.parvum cation transporters), and possibly a urea transporter (notransporter has been identified, and the very small urea molecule mayenter the cell by diffusion). We expect that substitution of these11-12 U. parvum genes for 15-20M genitalium genes encoding glycolyticenzymes and carbohydrate transporters would produce an organism withfewer genes capable more robust growth as is seen with U. parvum.

Asused herein, the term "polynucleotide" includes a single stranded DNAcorresponding to the single strand provided in the Genbank listing, orto the complete complement thereto, or to the double stranded form ofthe molecule. Also included are RNA and DNA-like or RNA-like materials,such as branched DNAs, peptide nucleic acids (PNA) or locked nucleicacids (LNA). Functional equivalents of genes can also include a varietyof variant polynucleotides, provided that the variant polynucleotidecan provide at least a measureable amount of the function of theoriginal polynucleotide from which it varies. Preferably, the variantcan provide at least about 50%, 75%, 90% or 95% of the function of theoriginal polynucleotide. For example, a functional variant of apolynucleotide as described herein includes a polynucleotide thatincludes degenerate codons; or that is an active fragment of theoriginal polynucleotide; or that exhibits at least about 90% identity(e.g. at least about 95% or 98% identity) with the originalpolynucleotide; or that can hybridize specifically to the originalpolynucleotide under conditions of high stringency.

Unless otherwise indicated, the term "about," as used herein, refers toplus or minus 10%. Thus, about 90%, as used above, includes 81 % to99%. As used herein, the end points of a range are included with therange.

Functionalvariant polynucleotides may take a variety of forms, including, e.g.,naturally or non-naturally occurring polymorphisms, including singlenucleotide polymorphisms (SNPs), allelic variants, and mutants. Theymay comprise, e.g., one or more additions, insertions, deletions,substitutions, transitions, transversions, inversions, chromosomaltranslocations, variants resulting from alternative splicing events, orthe like, or any combinations thereof.

Thedegree of sequence identity can be obtained by conventional algorithms,such as those described by Lipman and Pearson (Proc. Natl. Acad. Sci.80:726-730, 1983) or Martinez/Needleman- Wunsch {Nucl Acid Research77:4629-4634, 1983).

A polynucleotide thathybridizes specifically to a second polynucleotide under conditions ofhigh stringency hybridizes preferentially to that polynucleotide.Conditions of "high stringency," as used herein, means, for example,incubating a blot or other hybridization reaction overnight {e.g., atleast 12 hours) with a long polynucleotide probe in a hybridizationsolution containing, e.g., about 5X SSC, 0.5% SDS, 100 μg/ml denaturedsalmon sperm DNA and 50% formamide, at 42°C. Blots can be washed athigh stringency conditions that allow, e.g., for less than 5% bpmismatch {e.g., wash twice in 0.1X SSC and 0.1% SDS for 30 min at65°C), thereby selecting sequences having, e.g., 95% or greatersequence identity. Other non-limiting examples of high stringencyconditions include a final wash at 65 °C in aqueous buffer containing30 mM NaCl and 0.5% SDS. Another example of high stringent conditionsis hybridization in 7% SDS, 0.5 M NaPO4, pH 7, 1 mM EDTA at50°C, e.g., overnight, followed by one or more washes with a 1% SDSsolution at 42°C. Whereas high stringency washes can allow for lessthan 5% mismatch, reduced or low stringency conditions can permit up to20% nucleotide mismatch. Hybridization at low stringency can beaccomplished as above, but using lower formamide conditions, lowertemperatures and/or lower salt concentrations, as well as longerperiods of incubation time.

The minimalgene set suggested herein has been derived by taking into account someof the following factors. Furthermore, the minimal gene set may bemodified, e.g. for growth under other culture conditions, taking intoaccount some of the following factors: Although the notedprotein-coding genes appear to be essential for growth under theconditions of the experiments described herein, additionalprotein-coding genes may be required under other conditions. Forexample, we isolated mutants in DNA metabolism genes that were

expendable for the duration of our experiment, but might be necessaryfor the long-term survival of the organism. These were six genesinvolved in recombination and DNA repair: recA (MG339), recU (MG352),Holliday junction DNA helicases ruvA (MG358) and ruvB (MG359),formamidopyrimidine-DNA glycosylase mutM (MG262Λ), which excisesoxidized purines from DNA, and a likely DNA damage inducible proteingene (MG360). Perhaps because of an accumulation of cell damage overtime, mutants in chromosome segregation protein SMC (MG298) andhypothetical gene MGl 15, which is similar to the cinA gene ofStreptococcus pneumoniae competence-inducible (cm) operon, grew morepoorly after repeated passage.

Evenwith its near minimal gene set M. genitalium has apparent enzymaticredundancy. We disrupted two complete ABC transporter gene cassettesfor phosphate (MG410, MG411 , MG412) and putatively phosphonate (MG289,MG290, MG291) import. The PhoU regulatory protein gene (MG409) was notdisrupted, suggesting it is needed for both cassettes. Phosphate is anessential metabolite that must be imported. Either phosphate might beimported by both transporters as a result of relaxed substratespecificity by the phosphonate system, or there is a metabolic capacityto interconvert phosphate and phosphonate. Although we disrupted bothof these three gene cassettes, cells presumably need at least onephosphate, transporter. Therefore, a minimal gene set preferablycontains three ABC transporter genes for phosphate importation. Relaxedsubstrate specificity is a recurring theme proposed and shown forseveral M. genitalium enzymes as a mechanism by which this bacteriummeets its metabolic needs with fewer genes (21, 22). M. genitaliumgenerates ATP through glycolysis, and although none of the genesencoding enzymes involved in the initial glycolytic reactions weredisrupted, mutations in two energy generation genes suggested there maybe still more unexpected genomic redundancy in this essential pathway.We identified viable insertion mutants in genes encoding lactate/malatedehydrogenase (MG460) and the dihydrolipoamide dehydrogenase subunit ofthe pyruvate dehydrogenase complex (MG271). Mutations in either ofthese dehydrogenases would be expected to have glycolytic ATPproduction, and unbalanced NAD+ and NADH levels, which arethe primary oxidizing and reducing agents in glycolysis. Thesemutations should have greatly reduced growth rate and acceleratedacidification of the growth medium While the MG271 mutants grew about20% slower than wild type cells, inexplicably, the lactatedehydrogenase mutants grow ~20% faster. We also isolated a mutant inglycerol-3-phospate dehydrogenase (MG039), a phospholipid biosynthesisenzyme. The loss of functions in these mutants could have beencompensated for by other M. genitalium dehydrogenases or reductases.This could be another case of mycoplasma enzymes having a relaxed

substrate specificity as has been reported for lactate/malate dehydrogenase(21) and nucleotide kinases(22).

Underour laboratory conditions we identified 101 non-essentialprotein-coding genes. It appears that the remaining 381 M. genitaliumprotein-coding genes, plus three phosphate transporter genes, and 43RNA-coding genes comprise the essential genes set for this minimal cell(Table 3). We disrupted genes in only 5 of the 12 M genitaliumparalogous gene families. Only for the two families comprised oflipoproteins MGl 85 and MG260 and glycerophosphoryl diesterphosphodiesterases MG293 and MG385 did we disrupt all members.Accordingly, these families' functions may be essential, and weexpanded our projection of the essential gene set to 386 genes toinclude them (one each of MG 185 or MG260, and one each of MG 293 andMG385). This is a significantly greater number of essential genes thanthe 265-350 predicted in the inventors' previous study of M.genitalium(4), or in the gene knockout/disruption study that identified279 essential genes in B. subtilis, which is a more conventionalbacterium from the same Firmicutes taxon as M. genitalium(6).Similarly, our finding of 386 essential protein-coding genes greatlyexceeds theoretical projections of how many genes comprise a minimalgenome such as Mushegian and Koonin's 256 genes shared by both H.influenzae and M. genitalium (2), and the 206 gene core of a minimalbacterial gene set proposed by Gil et al(3). One of the surprises aboutthe present essential gene set is its inclusion of 108 hypotheticalproteins and proteins of unknown function.

Thesedata suggest that a genome constructed to encode the 386 protein-codingand 43 structural RNA genes could sustain a viable synthetic cell,which has been referred to hypothetically as a Mycoplasma laboratorium(24). A variety of mechanisms can be used for preparing such a viablesynthetic cell. For example, the minimal gene set can be introducedinto a ghost cell, from which the resident genome has been removed ordisabled. In one embodiment, ribosomes, membranes and other cellularcomponents important for gene regulation, transcription, translation,post-transcriptional modification, secretion, uptake of nutrients orother substances, etc, are present in the ghost cell, m anotherembodiment, one or more of these components is prepared synthetically.hi one embodiment of the invention, the genes in the minimal gene set,or a subset of those genes, are cloned into conventional vectors, form a library. The DNA to be cloned can be obtained from anysuitable source, including naturally occurring genes, genes previouslycloned into a different vector, or artificially synthesized genes. Thegenes may be cloned by in vitro, synthetic procedures, such as thosedisclosed in co-pending PCT application PCT/2006/16349, filed 1 May2006, "Amplification and Cloning of Single DNA Molecules Using RollingCircle Amplification,"

incorporated by reference herein in its entirety. For example,synthetically prepared genes of the gene set may be amplified andassembled to form a synthetic gene or genome. This can be performed bydiluting DNA molecules, such that each sample of diluted DNA contains,on average, one molecule of DNA, in fragments of about 5kb, forexample, and then converting to single stranded DNA circles, and thenamplifying the DNA circles using Φ29 polymerase.

Asa library, the gene sets of the invention can be arranged in any form,in single or multiple copies, and can be arranged in individualoligonucleotides each having a section of one of the genes, one of thegenes, or more than one of the genes. These oligonucleotides can bearranged as cassettes. The cassettes can be joined up to form largergene assemblies, including a minimal genome comprising or consisting ofall the genes of the gene set of the invention. The genes can beassembled by a method such as that described in PCT InternationalPatent Application No. PCT/US06/31214, filed 11 August 2006, "MethodFor In Vitro Recombination Employing a 3' Exonuclease Activity "incorporated by reference herein in its entirety. PCT/US06/31214describes methods of joining cassettes of genes into larger assemblies,and can be used to produce a single DNA molecule comprising the geneset of the invention. In particular, that application describes an invitro method, using isolated proteins, for joining two or moredouble-stranded (ds) DNA molecules of interest, wherein the distalregion of the first DNA molecule and the proximal region of the secondDNA molecule of each pair share a region of sequence identity,comprising (a) treating the DNA molecules with an enzyme having anexonuclease activity, under conditions effective to yieldsingle-stranded overhanging portions of each DNA molecule which containa sufficient length of the region of sequence homology to hybridizespecifically to the region of sequence homology of its pair; (b)incubating the treated DNA molecules of (a) under conditions effectiveto achieve specific annealing of the single-stranded overhangingportions; and (c) treating the incubated DNA molecules in (b) underconditions effective to fill in remaining single-stranded gaps and toseal the nicks thus formed, wherein the region of sequence identitycomprises at least 20 non-palindromic nucleotides (nt).

TheDNA molecules of the library may have a size of any practical length.The lower size limit for a dsDNA to circularize is about 200 basepairs. Therefore, the total length of the joined fragments (including,in some cases, the length of the vector) is preferably at least about200 bp in length. The DNAs can take the form of either a circle or alinear molecule. The library may include from two to a very largenumber of DNA molecules, which can be joined together, hi general, atleast about 10 fragments can be joined.

More particularly, the number of DNA molecules or cassettes that may bejoined to produce an end product, in one or several assembly stages,may be at least or no greater than about 2, 3, 4, 6, 8, 10, 15, 20, 25,50, 100, 200, 500, 1000, 5000, or 10,000 DNA molecules, for example inthe range of about 4 to about 100 molecules. The DNA molecules orcassettes in a library of the invention may each have a starting sizein a range of at least or no greater than about 80 bs, 100 bs, 500 bs,1 kb, 3 kb, 5 kb, 6 kb, 10 kb, 18 kb, 20 kb, 25 kb, 32 kb, 50 kb, 65kb, 75 kb, 150 kb, 300 kb, 500 kb, 600 kb, or larger, for example inthe range of about 3 kb to about 100 kb. According to the invention,methods may be used for assembly of about 100 cassettes of about 6 kbeach, into a DNA molecule of about 600 kb. One embodiment of theinvention is to join cassettes, such as 5-6 kb DNA moleculesrepresenting adjacent regions of a gene or genome included in a geneset of the invention, to create combinatorial assemblies. For example,it may be of interest to modify a bacterial genome, such as a putativeminimal genome or a minimal genome, so that one or more of the genes iseliminated or mutated, and/or one or more additional genes is added.Such modifications can be carried out by dividing the genome intosuitable cassettes, e.g. of about 5-6 kb, and assembling a modifiedgenome by substituting a cassette containing the desired modificationfor the original cassette. Furthermore, if it is desirable to introducea variety of changes simultaneously (e.g. a variety of modifications ofa gene of interest, the addition of a variety of alternative genes, theelimination of one or more genes, etc.), one can assemble a largenumber of genomes simultaneously, using a variety of cassettescorresponding to the various modifications, in combinatorialassemblies. After the large number of modified sequences is assembled,preferably in a high throughput manner, the properties of each of themodified genomes can be tested to determine which modifications conferdesirable properties on the genome (or an organism comprising thegenome). This "mix and match" procedure produces a variety of testgenomes or organisms whose properties can be compared. The entireprocedure can be repeated as desired in a recursive fashion.

Methodsof cloning, as well as many of the other molecular biological methodsused in conjunction with the present invention, are discussed, e.g., inSambrook, et al. (1989), Molecular Cloning, a Laboratory Manual, ColdHarbor Laboratory Press, Cold Spring Harbor, N.Y.; Ausubel et al.(1995). Current Protocols in Molecular Biology, N. Y., John Wiley &Sons; Davis et al. (1986), Basic Methods in Molecular Biology, ElseveirSciences Publishing,, Inc., New York; Hames et al. (1985), Nucleic AcidHybridization, DL Press; Dracopoli et al. Current Protocols in HumanGenetics, John Wiley & Sons, Inc.; and Coligan et al. CurrentProtocols in Protein Science, John

Wiley & Sons, Inc.

Anotheraspect of the invention is a set of genes or polynucleotides on theinvention which are in a free-living organism. The organism may be in adormant or resting state (e.g., lyophilized, stored in a suitablesolution, such as glycerol, or stored in culture medium), or it maygrowing and/or replicating, for example in a rich culture medium, suchas SP4.

Another aspect of the invention isa set of polypeptides encoded by a set of genes or polynucleotides ofthe invention. The polypeptides may be, e.g., in a free-living organism.

Anotheraspect of the invention is a set of genes or polynucleotides of theinvention that are recorded on computer readable media. As used herein,"computer readable media" refers to any medium that can be read andaccessed directly by a computer. Such media include, but are notlimited to: magnetic storage media, such as floppy discs, hard discstorage medium, and magnetic tape; optical storage media such asCD-ROM; electrical storage media such as RAM and ROM; and hybrids ofthese categories such as magnetic/optical storage media. The skilledartisan will readily appreciate how any of the presently known computerreadable media can be used to create a manufacture comprising computerreadable medium having recorded thereon a polynucleotide or amino acidsequence of the present invention.

As usedherein, "recorded" refers to a process for storing information oncomputer readable medium. The skilled artisan can readily adopt any ofthe presently known methods for recording information on computerreadable medium to generate manufactures comprising the nucleotide oramino acid sequence information of the present invention.

Avariety of data storage structures are available to a skilled artisanfor creating a computer readable medium having recorded thereon a setof nucleotide or amino acid sequences of the present invention. Thechoice of the data storage structure will generally be based on themeans chosen to access the stored information. In addition, a varietyof data processor programs and formats can be used to store thenucleotide sequence information of the present invention on computerreadable medium. The sequence information can be represented in a wordprocessing text file, formatted in commercially-available software suchas WordPerfect and Microsoft Word, or represented in the form of anASCII file, stored in a database application, such as DB2, Sybase,Oracle, or the like. The skilled artisan can readily adapt any numberof dataprocessor structuring formats (e.g., text file or database) inorder to obtain computer readable medium having recorded thereon thenucleotide sequence information of the present invention.

By providing a set of nucleotide or amino acid sequences of the invention in computer

readable form, the skilled artisan can routinely access the sequenceinformation for a variety of purposes. For example, one skilled in theart can use the nucleotide or amino acid sequences of the invention incomputer readable form to compare the sequences with orthologoussequences that can be substituted for the present sequences in analternative version of the minimal genome. Computer software ispublicly available which allows a skilled artisan to access sequenceinformation provided in a computer readable medium for analysis andcomparison to other sequences. A variety of known algorithms aredisclosed publicly and a variety of commercially available software forconducting search means are and can be used in the computer-basedsystems of the present invention. Examples of such software include,but are not limited to, MacPattern (EMBL), BLASTN and BLASTX (NCBIA)4

Forexample, software which implements the BLAST (Altschul et al. (1990) J.MoI. Biol. 215:403-410) and BLAZE (Brutlag et al. (1993) Comp. Chem.17:203-207) search algorithms on a Sybase system can be used toidentify open reading frames (ORFs) of the sequences of the inventionwhich contain homology to ORFs or proteins from other libraries. SuchORFs are protein encoding fragments and are useful in producingcommercially important proteins such as enzymes used in variousreactions and in the production of commercially useful metabolites.

In the foregoing and in the following example, all temperatures are set forth in unconnected degrees Celsius; and, unless otherwise indicated, all parts and percentages are by weight.


I - Materials and Methods

A. Cells andplasmids. We obtained wild type M. genitalium G37 (ATCC®Number: 33530™) from the American Type Culture Collection (Manassas,VA). As part of this project we re-sequenced and re-annotated thegenome of this bacterium. The new M. genitalium G37 sequence (Genbankaccession number CPOOO 122) differed from the previous M. genitalium{) genome sequence at 34 sites. Several genes previously listed ashaving frameshifts were merged including MGO 16, MGO 17, and MG018(DEAD helicase) and MG419 and MG420 (DNA polymerase III gamma/tausubunit). Our transposon mutagenesis vector was the plasmid pFVT- 1 ,which contains the Tn4001 transposon with a tetracycline resistancegene (tetM)(l5), and was a gift from Dr. Kevin Dybvig at the Universityof Alabama at Birmingham.

B. Transformation of M. genitalium with Tn4001 by electroporation. Confluent flasks of M.

genitalium cells were harvested by scraping into electroporation buffer(EB) comprised of 8 mM HEPES + 272 mM sucrose at pH 7.4. We washed andthen resuspended the cells in a total volume of 200-300 μl EB. On ice,100 μl cells were mixed with 30 μgpIVT-1 plasmid DNA and transferred toa 2 mm chilled electroporation cuvette (BioRad, Hercules, CA). Weelectroporated using 2500 V, 25 μF, and 100 Ω. After electroporation weresuspended the cells in 1 ml of 37°C SP4 medium and allowed the cellsto recover for 2 hours at 370C with 5% CO2.Aliquots of 200 μl of cells were spread onto SP4 agar plates containing2mg/l tetracycline hydrochloride (VWR, Bridgeport, NJ). The plates wereincubated for 3-4 weeks at 370C with 5% CO2 untilcolonies were visible. When colonies were 3-4 weeks old, we transferredindividual M. genitalium colonies into SP4 medium + 7 mg/L tetracyclinein 96 well plates. We incubated the plates at 370C with 5% CO2until the SP4 in most of the wells began to turn acidic and becameyellow or orange (~4 days). We froze those mutant stock cells at -8O0C.

C.Amplification of isolated colonies for DNA extraction. We inoculated 4ml SP4 containing 7 μg/ml tetracycline in 6 well plates with 20 μltransposon mutant stock cells and incubated the plates at 370C with 5% CO2until the cells reached 100% confluence. To extract genomic DNA fromconfluent cells, we scraped the cells and then transferred the cellsuspension to a tube for pelleting by centrifugation. Thus anynon-adherent cells were not lost. We washed the cells in PBS(Mediatech, Herndon, VA) and then resuspended them in a mixture of 100μl PBS and 100 μl of the chaotropic MTL buffer from a Qiagen MagAttractDNA Mini M48 Kit (Qiagen, Valencia, CA). Tubes were stored at -2O0C until the genomic DNA could be extracted using a Qiagen BioRobot M48 workstation (Qiagen).

D.Location of Tn4001tet insertion sites by DNA sequencing from M.genitalium genomic templates. Our 20 μl sequencing reactions contained~0.5 μg of genomic DNA, 6.4 pmol of the 30 base oligonucleotideGTACTCAATGAATTAGGTGGAAGACCGAGG (SEQ ID NO:1) (Integrated DNATechnologies, Coralville, IA). The primer binds in the tetM gene 103basepairs from one of the transposon/genome junctions. Using BLAST welocated the insertion site on the M. genitalium genome.

E. Quantitative PCR to determine colony homogeneity and genes duplication. We designed

quantitative PCR primers (Integrated DNA Technologies) flankingtransposon insertion sites using the default conditions for the primerdesign software Primer Express 1.5 (Applied Biosystems). Usingquantitative PCR done on an Applied Biosystems 7700 Sequence DetectionSystem, we determined the amounts of the target genes lacking a Tn4001insertion in genomic DNA prepared from mutant colonies relative to athe amount of the those genes in wild type M. genitalium. Reactionswere done in Eurogentec qPCR Mastermix Plus SYBR Green (San Diego, CA).Genomic DNA concentrations were normalized after determining theirrelative amounts using a TaqMan quantitative PCR specific for the 16SrRNA gene that was done in Eurogentec qPCR Mastermix Plus. Wecalculated the amounts of target genes lacking the transposon in mutantgenomic DNA preparations relative to the amounts in wild type using thedelta-delta Ct method(16).

II. Identification of a Minimal Gene Set

Wesequenced across the transposon-genome junctions of our mutants using aprimer specific for Tn4001tet. Presence of a transposon in the centralregion of a gene of a viable bacterium indicated that gene wasdisrupted and therefore non-essential (dispensable). We consideredtransposon insertions disruptive only if they were after the firstthree codons and before the 3 '-most 20% of the coding sequence of agene. Thus, non-disruptive mutations resulting from transposon mediatedduplication of short sequences at the insertion site (18, 19), andpotentially inconsequential COOH-terminal insertions do not result inerroneous determination of gene expendability. Without wishing to bebound by any particular theory, it is suggested that these disruptionsactually occurred, even though theoretically, some genes might toleratetransposon insertions, and we did not confirm the absence of the geneproducts. To exclude the possibility that gene disruptions were theresult of a transposon insertion in one copy of a duplicated gene, weused PCR to detect genes lacking the insertion. This showed us thatalmost all of our colonies contained both disrupted and wild typeversions of the genes identified as having the Tn4001. Further analysisusing quantitative PCR showed most colonies were mixtures of two ormore mutants, thus we operationally refer to them and any DNA isolatedfrom them as colonies rather than clones. This cell clumping led us toisolate individual mutants using filter cloning. To do this we forcedcells through 0.22 μm filters before plating to break up clumps ofcells possibly containing multiple different mutants. We used thesecells to produce subcolonies which we both sequenced and analyzed usingquantitative PCR. For each disrupted gene we subcloned at least oneprimary colony.

In total we analyzed 3,152 M. genitalium transposon insertion mutant primary colonies, and

subcolonies to determine the locations of Tn4001tet inserts. For 75% ofthese we generated sequence data that enabled us to map the transposoninsertion sites. Colonies containing multiple Tn4001tet insertionscannot be characterized using this approach. Only 62% of primarycolonies generated useful sequence. This was likely because of thetendency of mycoplasma cells to form persistent cell aggregates leadingto colonies containing mixtures of multiple mutants that provedrefractory to sequencing. For subcolonies the success rate was 82%. Ofthe successfully sequenced subcolonies in 59% the transposon insert wasat a different site than in the parental primary colony. The rate atwhich we identified mutants with previously unhit insertion sites onthe genome was higher for the primary colonies than the subcolonies.However the rate of accumulation of new insertion sites dropped afterour first 600 colonies, indicating we were approaching saturationmutagenesis of all non-lethal insertion sites (Fig 1).

Wemapped a total of 2293 different transposon insertion sites on thegenome (Fig.2). Eighty- seven percent of the mutations were inprotein-coding genes. None of the 43 RNA encoding genes (for rRNA,tRNA, or structural RNA) contained insertions. To address the questionof which M. genitalium genes were not essential for growth in SP4(17),a rich laboratory medium, we used the following criteria to designate agene disruption. We considered transposon insertions disruptive if theywere after the first three codons and before the 3 '-most 20% of thecoding sequence of a gene. Thus, non-disruptive mutations resultingfrom transposon mediated duplication of short sequences at theinsertion site (18, 19), and potentially inconsequential COOH-terminalinsertions do not result in erroneous determination of geneexpendability. Using these criteria we identified a total of 101dispensable M. genitalium genes (Table 2). In Fig. 1, it can be seenthat new genes disrupted as a function of primary colonies andsubcolonies plateaus, suggesting that we have or very nearly havedisrupted all non-essential genes. Transposon mutants in non-essentialgenes were able to form colonies on solid agar, and isolated colonieswere able to grow in liquid culture, both under tetracycline selection.

Wewanted to determine if any of our disrupted genes were in cells bearingtwo copies of the gene. Unexpectedly, PCRs using primers flanking thetransposon insertion sites produced amplicons of the size expected forwild type templates from all 5 colonies initially tested. End-stageanalysis of PCRs could not tell us if the wild type sequences weamplified were the result of a low level of transposon jumping out ofthe target gene, or if there was a gene duplication. To address this,for at least one colony or subcolony for each disrupted gene we usedquantitative PCR to measure how many copies of contaminating wild typeversions of that gene there were in the sequenced DNA


Analysisof the quantitative PCR results showed most colonies were mixtures ofmultiple mutants. This was likely a consequence of our hightransformation efficiency and the tendency of mycoplasma cells toaggregate. The direct genomic sequencing identified only the pluralitymember of the population. To address this issue we adapted our mutantisolation protocol to include one or two rounds of filter cloning.Existing colonies of interest were filter subcloned. We isolated 10subcolonies and the sites of their Tn4001 insertions were determined.We took both rapidly growing colonies and M. genitalium colonies thatwere delayed in their appearance. Often only a minority of thesubcolonies had inserts in the same location as found with the parentalcolony. After filter cloning we still found that almost every subcolonyhad some low level of a wild type copy of the disrupted gene. This islikely the result of Tn4001 jumping(20). After subcloning we were ableto isolate gene disruption mutant colonies for 100 of our 101 differentdisrupted M. genitalium genes that had less than 1% wild type sequence.

Severalmutants manifested remarkable phenotypes. While many of the mutantsgrew slowly, mutants in lactate/malate dehydrogenase (MG460), andconserved hypothetical proteins MG414 and MG415 mutants had doublingtimes up to 20% faster than wild type M. genitalium (data not shown).Cells with transposon insertions in the transketolase gene (MG066),which encodes a membrane protein and pentose phosphate pathway enzyme,grew in chains of clumped cells rather than in the monolayerscharacteristic of wild type M. genitalium. Other mutant cells grew insuspension rather than adhering to plastic. Some cells would lyse whenwashed with PBS, and thus had to be processed in either SP4 medium or100% serum.

We isolated mutants withtransposon insertions at some sites much more frequently than others(Fig. 3). We found colonies with mutations at hot spots in four genes:MG339 (recA), the fast growing MG414 and MG415 and MG428 (putativeregulatory protein) comprised 31 % of the total mutant pool. There wasa striking difference in the most frequently found transposon insertionsites among primary colonies relative to the subcolonies havingdifferent insertions sites than their parental colonies (Fig. 3). Weisolated 169 colonies and subcolonies having different insertion sitesthan their parental colonies with Tn400 ltet inserted at basepair517,751, which is in MG414. Only 5 (3%) of those were primary colonies.Conversely, we isolated 209 colonies with inserts in the 520,114 to520,123 region, which is in MG415, and 56% of those were in primarycolonies. The MG414 mutants were probably due both to rapid growth andto Tn4001 preferential jumping to that genome region, whereas the highfrequency and near equal distribution of MG415 primary and

subcolony transposon insertions may only be because those mutants growmore rapidly than others. III. Verification (or modification) of theminimal gene set

Asnoted above, at least 386 protein-coding genes and all of the RNA genesare essential and could form a minimal set. However, it seems unlikelythat all of those "one-at-a time" dispensable genes could be eliminatedsimultaneously. To determine a subset that can be simultaneouslydeleted, a wild type chromosome is constructed synthetically. Thesynthetic genome is constructed hierarchically from chemicallysynthesized oligonucleotides. Subsets of the dispensable genes are thenremoved. The synthetic natural chromosome and the reduced genome aretested for viability by transplantation into cells from which theresident chromosome has been removed. Rapid advances in gene synthesistechnology and efforts at developing genome transplantation methodsallow the confirmation that the M genitalium essential gene setdescribed above is a true minimal gene set, or provide a basis tomodify that gene set.

References 1. Ferber, D. (2004) Science 303, 158-61.

2. Mushegian, A. R. & Koonin, E. V. (1996) Proc Natl Acad Sd USA 93, 10268-73.

3. Gil, R., Silva, F. J., Pereto, J. & Moya, A. (2004) Microbiol MoI Biol Rev 68, 518-37, table of contents.

4.Hutchison, C. A., Peterson, S. N., Gill, S. R., Cline, R. T., White,O., Fraser, C. M., Smith, H. O. & Venter, J. C. (1999) Science 286,2165-9.

5. Forsyth, R. A., Haselbeck, R.J., Ohlsen, K. L., Yamamoto, R. T., Xu, H., Trawick, J. D., Wall, D.,Wang, L., Brown-Driver, V., Froelich, J. M. & et al. (2002) MoIMicrobiol 43, 1387-400.

6. Kobayashi, K.,Ehrlich, S. D., Albertini, A., Amati, G., Andersen, K. K., Arnaud, M.,Asai, K., Ashikaga, S., Aymerich, S., Bessieres, P, & et al. (2003)Proc Natl Acad Sci USA 100,


7. Salama, N. R., Shepherd, B. & Falkow, S. (2004) JBacteriol 186, 7926-35.

8. Herring, C. D., Glasner, J. D. & Blattner, F. R. (2003) Gene 3U, 153-63.

9.Mori, H., Isono, K., Horiuchi, T. & Miki, T. (2000) Res MicrobiolJjLL, 121-8. 10. Ji, Y., Zhang, B., Van, S. F., Horn, Warren, P.,Woodnutt, G., Burnham, M. K. &

Rosenberg, M. (2001) Science 293, 2266-9. 11. Reich, K. A., Chovan, L. & Hessler, P. (1999) JBacteriol m, 4961-8.

12. Sassetti, C. M., Boyd, D. H. & Rubin, E. J. (2001) Proc Natl Acad Sci USA 98, 12712-7.

13.Fraser, C. M., Gocayne, J. D., White, O., Adams, M. D., Clayton, R. A.,Fleischmann, R. D., BuIt, C. J., Kerlavage, A. R., Sutton, G., Kelley,J. M. & et al. (1995) Science 270, 397-403.

15.Dybvig, K., French, C. T. & Voelker, L. L. (2000) JBacteriol 182,4343-7. 15a. Pour-El, L, Adams, C. and Minion, F. C. (2002). Plasmid47, 129-37.

16. Relative Quantitation of Gene Expression (1997) The Perkin-Elmer Corporation., Foster City, CA.

17. Tully, J. G., Rose, D. L., Whitcomb, R. F. & Wenzel, R. P. (1979) J Infect Dis 139, 478-82.

18.Dyke, K. G., Aubert, S. & el SoIh, N. (1992) Plasmid28, 235-46. 19.Rice, L. B., Carias, L. L. & Marshall, S. H. (1995) AntimicrobAgents Chemother 39, 1147- 53.

20. Mahairas, G. G., Lyon, B. R., Skurray, R. A. & Pattee, P. A. (1989) JBacteriol YJX, 3968- 72.

21. Cordwell, S. J., Basseal, D. J., Pollack, J. D. & Humphery-Smith, I. (1997) Gene 195, 113- 20.

22. Pollack, J. D., Myers, M. A., Dandekar, T. & Herrmann, R. (2002) Omics 6, 247-58.

23. Dhandayuthapani, S., Rasmussen, W. G. & Baseman, J. B. (1999) Proc Natl Acad Sci USA 96, 5227-32.

24. Reich, K. A. (2000) Res Microbiol 151, 319-24.


We used a common definition formembers of paralogous gene families requiring they have 30% identityover 60% of the length of the longer protein sequence (a single linkageclustering then defines the families).

Table 2. Mycoplasma genitalium genes with Tn4001tet insertions that are disrupted. Genes are grouped by functional roles.

All information is based on the M.genitalium genome sequence and annotation reported herein. Genes aregrouped by main biological roles. The columns are as follows:

M. genitalium gene locus

Gene symbol

Gene common name

A. Orthologous genes essential in Bacllus. subtili$().

B.In theoretical minimal 256 gene set defined by Mushegian and Koonin asorthologous genes present in M. genitalium and H. influenzae^!).

C. In theoretical 206 gene core of a minimal genome set defined by Gil et al(3j.


1Kobayashi, K., Ehrlich, S. D., Albertini, A., Amati, G., Andersen, K.K., Arnaud, M., Asai, K., Ashikaga, S., Aymerich, S., Bessieres, P., etal.(2003) Proc Natl Acad Sd US ^ 100, 4678-83.

2. Mushegian, A. R. & Koonin, E. V. (1996) Proc Natl Acad Sci USA 93, 10268-73.

3. Gil, R., Silva, F. J., Pereto, J. & Moya, A. (2004) Microbiol MoI Biol Rev 68, 518-37, table of contents.

Table3. Mycoplasma genitalium protein coding genes that were not disruptedin this study. Genes are grouped by functional roles.

All information is based on the M. genitalium genome sequence andannotation reported herein. Genes are grouped by main biological roles.The columns for the protein coding genes are as follows:

M. genitalium gene locus Gene symbol Gene common name

A.Orthologous genes essential in Bacllus. subtil is(l). B. In theoreticalminimal 256 gene set defined by Mushegian and Koonin as orthologousgenes present in M. genitalium and H. influenzae(2). C. In theoretical206 gene core of a minimal genome set defined by Gil et al(3j.

References 1 Kobayashi, K., Ehrlich, S. D., Albertini, A., Amati, G., Andersen, K. K., Arnaud, M.,

Asai, K., Ashikaga, S., Aymerich, S., Bessieres, P., et al.(2003) Proc Natl Acad Sci US ^ 100, 4678-83.

2. Mushegian, A. R. & Koonin, E. V. (1996) Proc Natl Acad Sci USA 93, 10268-73.

3. Gil, R., Silva, F. J., Pereto, J. & Mθya, A. (2004) Microbiol MoI Biol Rev 68, 518-37, table of contents.

Table4. Mycoplasma genitalium genes with Tn4001tet insertions that were notreported as being disrupted (dispensable) in the 1999 study byHutchison et ah, tbut which have been shown to be dispensable in thepresent study. Genes are grouped by functional roles. t


Locus Symbol Common Name A B

Cell envelope membrane protein, putative (disrupted 7/06 using different tn40001

MG147 system)

DNA metabolism

MG214 segregation and condensation protein B x

MG262.1 mutM formamidopyrimidine-DNA glycosylase x

MG298 smc chromosome segregation protein SMC x x

MG315 DNA polymerase III, delta subunit, putative x x

MG358 ruvA Holliday junction DNA helicase x

MG359 ruvB Holliday junction DNA helicase RuvB x

Energy metabolism

MG063 fruK 1-phosphofructokinase, putative x x

MG066 tkt transketolase x x x

MG112 rpe ribulose-phosphate 3-epimerase x x

MG271 IpdA dihydrolipoamide dehydrogenase x

MG398 atpC ATP synthase F 1 , epsilon subunit x x

MG460 ldh L-lactate dehydrogenase/malate dehydrogenase x x

Fatty acid and phospholipid metabolism

MG437 cdsA phosphatidate cytidylyltransferase x x

Hypothetical proteins

MG 134 conserved hypothetical protein

MG 149.1 conserved hypothetical protein

MG220 conserved hypothetical protein

MG248 conserved hypothetical protein

MG397 conserved hypothetical protein

MG456 conserved hypothetical protein

Protein fate

MG210 signal peptidase Il x

MG238 tig trigger factor x

Protein synthesis

MG012 alpha-L-glutamate ligases, RimK family, putative X

MG463 dimethyladenosine transferase X


MG367 rnc ribonuclease III

Transport and binding proteins

MG061 Mycoplasma MFS transporter x

MG121 ABC transporter, permease protein x

MG289 phosphonate ABC transporter, substrate binding protein (P37), putative

MG290 phosphonate ABC transporter, ATP-binding protein, putative

Unknown function

MG056 tetrapyrrole (corrin/porphyrin) methylase protein x x

MG115 competence/damage-inducible protein CinA domain protein

MG 138 lepA GTP-binding protein LepA x x

MG360 ImpB/MucB/SamB family protein x

MG454 OsmC-like protein

Allinformation is based on the new M. genitalium genome sequence andannotation reported here. Genes are grouped by main biological roles.The columns are as follows:

M. genitalium gene locus Gene symbol Gene common name A. Orthologous genes essential in Bacllus. $ubtilis(ϊ).

B. In theoretical minimal 256 gene set defined by Mushegian and Koonin as orthologous genes present in M. genitalium and H.

C. In theoretical 206 gene core of a minimal genome set defined by Gil et sλ(3). References

1 Kobayashi, K., Ehrlich, S. D., Albertini, A., Amati, G., Andersen, K. K., Arnaud, M.,

Asai, K., Ashikaga, S., Aymerich, S., Bessieres, P., et al.(2003) Proc Natl Acad Sd US A 100, 4678-83.

2.Mushegian, A. R. & Koonin, E. V. (1996) Proc Natl Acad Sd USA 93,10268-73. 3. Gil, R., Silva, F. J., Pereto, J. & Moya, A. (2004)Microbiol MoI Biol Rev 68, 518-37, table of contents.

Table5. Mycoplasma genitalium genes with Tn4001tet insertions that were notreported as being required in the 1999 study by Hutchison et ah, butwhich are shown to be required in the present

study. Genes are grouped by functional roles.


Locus Symbol Common Name B D

Biosynthesis of cofactors, prosthetic groups, and carriers

MG394 glyA serine hydroxymethyltransferase

Cell envelope

MG068 lipoprotein, putative P X

MG218 hmw2 HMW2 cytadherence accessory protein P

MG306 membrane protein, putative P

MG307 lipoprotein, putative P

MG320 membrane protein, putative P

MG443 membrane protein, putative P

MG025 glycosyl transferase, group 2 family protein X X

MG191 mgpA MgPa adhesin X X

MG192 p110 P110 protein X X

MG317 hmw3 HMW3 cytadherence accessory protein X X

MG338 lipoprotein, putative X

MG395 lipoprotein, putative X

MG440 lipoprotein, putative X

Cellular processes

MG278 relA GTP pyrophosphokinase P X

MG335 GTP-binding protein engB, putative X X

DNA metabolism

MG261 polC-2 DNA polymerase III, alpha subunit P X X

MG469 chromosomal replication initiator protein DnaA P X X

MG 186 Staphylococcal nuclease homolόgue, putative X

MG421 uvrA excinuclease ABC, A subunit X X

Energy metabolism

MG118 galE UDP-glucose 4-epimerase P X

MG299 pta phosphate acetyltransferase P X

Hypothetical proteins

MG074 conserved hypothetical protein P

MG241 conserved hypothetical protein P

MG389 conserved hypothetical protein P

MG141.1 conserved hypothetical protein X

MG202 conserved hypothetical protein X

MG296 conserved hypothetical protein X

MG323.1 conserved hypothetical protein X

MG366 conserved hypothetical protein X

MG423 conserved hypothetical protein X X

MG442 GTP-binding conserved hypothetical protein X

Protein fate

MG055 preprotein translocase, SecE subunit P X X

MG208 glycoprotease family protein P

MG270 lipoyltransferase/lipoate-protein ligase, putative P X

MG392 groL chaperonin GroEL P X X X

Protein synthesis

MG059 smpB SsrA-binding protein P X X

MG455 tyrS tyrosyl-tRNA synthetase P X X X

MG182 tRNA pseudouridine synthase A X X

MG209 pseudouridine synthase, RIuA family X X tRNA (5-methylaminomethyl-2-thiouridylate)-

MG295 trmU methyltransferase

MG345 ileS isoleucyl-tRNA synthetase x x x ;

MG372 thiamine biosynthesis/tRNA modification protein Thil x

MG426 rpmB ribosomal protein L28 x x x :

Purines, pyrimidines, nucleosides, and nucleotides'

MG231 nrdE ribonucleoside-diphosphate reductase, alpha chain p x x :

MG049 deoD purine nucleoside phosphorylase x x

MG052 cytidine deaminase x x


MG249 rpoD RNA polymerase sigma factor RpoD p x j

Transport and binding proteins

ABC transporter, spermidine/putrescine binding protein,

MG045 putative p x

MG014 ABC transporter, ATP-binding/permease protein x x

MG085 hprK HPr(Ser) kinase/phosphatase x

MG467 ABC transporter, ATP-binding protein x x

MG468 ABC transporter, permease protein x Unknown function

MG137 UDP-galactopyranose mutase p

MG236 expressed protein of unknown function p

MG263 Cof-like hydrolase p

MG029 DJ-1/Pfpl family protein x

MG130 uncharacterized domain HDIG x

MG132 HIT domain protein x ;

MG308 ATP-dependent RNA helicase, DEAD/DEAH box family x x

MG310 hydrolase, alpha/beta fold family x

MG327 hydrolase, alpha/beta fold family x

MG470 CobQ/CobB/MinD/ParA nucleotide binding domain x x

Allinformation is based on the M. genitalium genome sequence andannotation reported herein. Genes are grouped by main biological roles.The columns for these protein coding genes are as follows: M.genitalium gene locus

Gene symbol

Gene common name

A.M. genitalium genes disrupted in the 1999 study are noted with an "X".Genes assumed to be non-essenmtial because only the M. pneumoniaeorthologs of the M. genitalium gene was disrupted are noted with a "P".

B. Orthologous genes essential in Bacllus. subtilis(l).

C.In theoretical minimal 256 gene set defined' by Mushegian and Koonin asorthologous genes present in M. genitalium and H. influenzae^.).

D. In theoretical 206 gene core of a minimal genome set defined by Gil et al(3j. References

1Kobayashi, K., Ehrlich, S. D., Albertini, A., Amati, G., Andersen, K.K., Arnaud, M., Asai, K., Ashikaga, S., Aymerich, S., Bessieres, P., etal.(2003) Proc Natl Acad Sci US ^ 100, 4678-83.

2.Mushegian, A. R. & Koonin, E. V. (1996) Proc Natl Acad Sci USA 93,10268-73. 3. Gil, R., Silva, F. J., Pereto, J. & Moya, A. (2004)Microbiol MoI Biol Rev 68, 518-37, table of contents.

Fromthe foregoing description, one skilled in the art can easily ascertainthe essential characteristics of this invention, and without departingfrom the spirit and scope thereof, can make

changes and modifications of the invention to adapt it to various usageand conditions and to utilize the present invention to its fullestextent. The preceding specific embodiments are to be construed asmerely illustrative, and not limiting of the scope of the invention inany way whatsoever. The entire disclosure of all applications, patents,publications (including U.S. provisional application 60/725,295, filedOctober 12, 2005) cited above and in the figures, are herebyincorporated in their entirety by reference.

Palabras claves
publicado por gabyven a las 09:50 · Sin comentarios  ·  Recomendar
Más sobre este tema ·  Participar
Comentarios (0) ·  Enviar comentario
Enviar comentario


E-Mail (no será publicado):

Sitio Web (opcional):

Recordar mis datos.
Escriba el código que visualiza en la imagen Escriba el código [Regenerar]:
Formato de texto permitido: <b>Negrita</b>, <i>Cursiva</i>, <u>Subrayado</u>,
<li>· Lista</li>
En imagen


FULLServices Network | Blog gratis | Privacidad