Data sources


Genome sequences


Organism Source and release Number of sequences
Oryza sativa v6 67393
Arabidopsis thaliana v9 33200
Medicago truncatula v2 38749
Sorghum bicolor v1.4 36338
Populus trichocarpa v1.1 45555
Vitis vinifera v1 30434
Physcomitrella patens v1.1 35938
Selaginella moellendorffii v1 34697
Glycine max v1 75778
Ostreococcus tauri v2 7725
Chlamydomonas reinhardtii v4 16706
Cyanidioschyzon merolae v1 5014
Brachypodium distachyon v1 44411
Carica papaya n/a 24782
Ricinus communis v0.1 31221
Zea mays v4 53764

Remark about sequence identifiers

Sequences are usually identifed by a the locus tags defined by the consortia responsible of the annotation (e.g. At5g20240.1). For some draft genomes, we have modified id using scaffold id_ species code (e.g. Phypa_96903)

Data associated to genomes

Protein domain and domain architecture

Each sequences was analysed using InterProScan to identify InterPro domain [1](InterPro 16.2, 04 February 2008)
Each group of sequence was analysed using Meme/Mast to define domain his domain architecture. [2, 3]

UniProt (Universal Protein Resource)

Correspondance between UniProtKB [4]( last update: 22 may 2009) was made on the ordered locus when available (in 'Gene names' section).
Otherwise, mapping was done using the first blast hit of blast having anidentity score > 90%.
Uniprot Taxonomy

Kegg (Kyoto Encyclopedia of Genes and Genomes)

Kegg [5] data were download from the KEGG Orthology (KO) Database when available (last update: 02/02/2009)

Gene Ontology (Controlled vocabulary of terms for describing gene product)

GO terms, and particularly Plant GOslim, were obtained from the interpro and UniProt.

Pubmed

Clusters are tag by selected pubmed id referenced in UniProt entries. Annotator can also add their own publications.

References

  1. Zdobnov E.M. and Apweiler R. "InterProScan - an integration platform for the signature-recognition methods in InterPro" Bioinformatics, 2001, 17(9): p. 847-8.
  2. Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.
  3. Timothy L. Bailey and Michael Gribskov, "Combining evidence using p-values: application to sequence homology searches", Bioinformatics, Vol. 14, pp. 48-54, 1998.
  4. Schneider M, Bairoch A, Wu CH, Apweiler R. Plant protein annotation in the UniProt Knowledgebase.Plant Physiol. (2005) 138:59-66
  5. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006).

Bioversity cirad GCP