The Samuel Roberts Noble Foundation, Inc.    
     
Center for Medicago Genomics Research: ESTs
 
 
     

Expressed Sequence Tags (ESTs)

The Medicago Genome Initiative (MGI) is an EST sequence database of the model legume M. truncatula. The database is available to the public and results from a collaborative research effort between the Noble Foundation and the National Center for Genome Resources to investigate the genome of M. truncatula. MGI was first reported in the Nucleic Acids Research 2001 Database Issue and featured a prototype database, interface and analysis pipeline. 7 We have since developed an entirely new system that retains the advantages of the prototype, with improvements that make it more portable, modular, flexible, interactive, and reusable. 8 The data model is designed around the concept of an analysis operation (which may run a third-party sequence analysis tool) whose input and output consists of sets of sequences (zero, one, or many sequences). This permits analysis methods that use individual (e.g. similarity search) or multiple (e.g. EST clustering) sequences to interact with the same generalized relational database structure. It also allows for the flexible addition of sequence analysis methods, and the storage and analysis of genomic DNA sequences in the same schema.

 

The analysis pipeline is run automatically upon receipt of new sequences and can be configured to perform any series of available operations. The current suite of operations include: Import; Vector Screen; Quality Control; BLASTN search to identify non-mRNA contamination; clustering, multiple sequence alignment, and extraction of a consensus; BLASTX versus a protein database; and Blocks+ (protein motif) search. Annotation is automated by linking high-scoring BLAST and Blocks+ hits to their cognate entries in the Gene Ontology database (http://geneontology.org). Users view, query, and manipulate their data via a WWW browser through an interface running on a secure server. All analysis operations are performed on consensus sequences (gene sequences) resulting from the clustering and assembly operation, rather than on individual ESTs. MGI now incorporates all publicly available M. truncatula data available from Genbank combined with public Noble data in clustering and analysis runs. Typically the data are refreshed, including a complete reanalysis with all available new data, four times per year. As of September 2001, MGI contained over 95,000 sequences of which the 65,000 GenBank ESTs grouped into 8,843 clusters and 11,279 singletons resulting in 20,122 total analyzed consensus sequences. Clusters ranged in membership from two ESTs (3585) to 256 ESTs (one). A publicly viewable version of MGI has been deployed (http://xgi.ncgr.org/mgi), which can be accessed by following the login instructions on the main page.

In addition to data from the Noble Foundation and the inclusion of all publicly available M. truncatula data from GenBank, the database and analysis system has been designed to present a gene-centric view of ESTs. The new interface improvements include keyword searches, query restriction by library and sequence type, a multiple sequence alignment viewer, and a features and annotation viewer. These additions, coupled with automated assignment of Gene Ontology annotations, have resulted in a vastly improved information resource for model legume research.

As of May 2003, the M. truncatula research community has generated more than 189,000 ESTs from 32 different cDNA libraries. These ESTs separate into more than 170,000 tentative consensus sequences and more than 19,000 singletons. Greater than 36,000 unique sequences have been identified. A breakdown of the types of genes identified, based on Gene Ontology assignments, can be found in Figure 2. In addition to MGI, the TIGR Medicago truncatula Gene Index (http://www.tigr.org/tdb/mtgi/) and the NSF-sponsored Medicago truncatula Consortium (http://www.medicago.org/) are two additional databases of particular interest to the legume research community.

 

 


Figure 2. Gene Ontology assignments of the M. truncatula expressed sequence tags.
Click to enlarge

A M. truncatula whole genome sequence program has begun at the University of Oklahoma (http://www.genome.ou.edu/medicago.html). The initial goal of the project was to generate an approximately one-fold whole genome shotgun sequence data of the 500 megabase genome from a plasmid-based genomic library and obtain target shotgun clones for additional primer walking-based sequencing. However, preliminary results from the shotgun approach suggest that the M. trucatula genome is highly repetitive. As previously predicted, estimates are that approximately 80% of the genome is highly repetitive and that approximately 80% of the gene-rich regions represent only 20% of the total genome. To reduce the amount of redundant sequence, the strategy has been modified now to sequence bacterial artificial chromosome (BAC) clones from an M. truncatula BAC library. More than 1,000 BACs will be identified based on DNA markers or gene content and will be sequenced to working draft coverage (four- to five-fold) utilizing a BAC-based shotgun sequencing approach, in the first year.

The whole genome shotgun approach has already resulted in the sequencing of the M. truncatula chloroplast genome, since the total genomic DNA preparation not only contains the nuclear genome, but also a significant level of the chloroplast DNA. The DNA sequence of the M. truncatula chloroplast genome has now been completed and consists of one contiguous 124,039 base pair circle. Artificially linearizing the sequence at the histidine tRNA prior to the psbA gene allows the Medicago chloroplast genomic sequence to be co-linear with the Arabidopsis, tobacco, and most other chloroplast genomes. The semi-automated annotation of the M. truncatula chloroplast genome using Web-Artemis has been completed, and can be viewed at: http://www.genome.ou.edu/medicago_chloroplast/med_chloro_art.html.


Medicago genome

 

ABI 3700 DNA Analyzer

 

Bio Mek 2000 Liquid Handling Robot

 
         
       
© 1997-2008 by The Samuel Roberts Noble Foundation, Inc.