Title: Comparative genome analysis data resource
1Comparative genome analysis data resource
- Or, Gramene, A Web-accessible database available
at http//www.gramene.org
2(No Transcript)
3Overview
- Background (Gary)
- Genomic sequence data (Aaron)
- Databases and ontologies (Paul)
- Web site overview (Zhiping)
- Real world query (Nolan)
4What is a comparative genome analysis data
resource?
- Our goal is to facilitate the study of
cross-species homology relationships using
information derived from public projects involved
in genomic and EST sequencing, protein structure
and function analysis, genetic and physical
mapping, interpretation of biochemical pathways,
gene and QTL localization and descriptions of
phenotypic characters and mutations.
5Where does Gramene fit in?
- Lies somewhere between a resource like a Model
Organism Database (MOD) and a resource like the
National Center for Biotechnology Information
(NCBI) - Gramene
- MOD (specific) NCBI (general)
- NCBI
- a national resource for molecular biology
information
6What is a MOD?
- Model Organisms
- Set of organisms studied in great detail
- Advantages for experimental research
- e.g. S. cerevisiae, D. melanogaster, D. rerio,
- Model Organism Databases
- Core data resource for a community
- Integrates data from many sources with expert
domain knowledge, e.g. FlyBase http//flybase.bio.
indiana.edu - Gramene
- Is like a MOD for Rice
- And a comparative grass genomics resource
7Gramene scope
- Genomic data Phenotypic data
8Gramene scope
- Rice
- whole genome sequence
- Other crop grasses
- Maize
- Sorghum
- Millet
- Sugarcane
- Wheat
- Oats
- Barley
9Synteny in Gramene
- Synteny
- genes on the same chromosome
- Conserved synteny
- genes on same chromsome in one species also found
together on same chromsome in another species - Synteny among crop grasses
- Rice genomic sequence a window onto other genomes
10(No Transcript)
11Gramene curation
- Curation is the use of expert domain knowledge to
enter data into a database - Much of the rice data is curated
- Maps
- Markers
- Proteins
- Phenotypes
- Literature
- Other data automated
12Importance of MODs()
- Researchers becoming dependent on for information
needed to do research - Gramene uses rice as framework to organize
information for other grasses - Provides comparative maps
- Maize has huge amount of classical genetic data
- Whole is greater than the sum of the parts
13(No Transcript)
14Crown Eukaryote Genome Sequencing Projects
15Cereal Genome Sizes
- Sorghum 1000 Mb
- Maize 3000 Mb
- Barley 5000 Mb
- Wheat 16,000 Mb
- Rice 420 Mb
16Two rice genomes
- Beijing Genomics Institute (established 1999)
- Private nonprofit, sequence available on GenBank
- Oryza sativa indica
- 4 x coverage, 99.9 accuracy
- Shotgun sequencing of small insert clones
- RePS assembly (repeat-masked Phrap with
scaffolding) - Did not assemble across highly repetitive regions
- Syngenta
- Private, sequence available, but not on GenBank
- Oryza sativa japonica
- 6 x coverage, 99.8 accuracy, 10 of cost of
IRGSP - Shotgun sequencing of small insert clones,
assembly anchored by genetic and physical maps - Compared against International Rice Genome
Sequencing Project sequence for quality control
17Japonica and Indica
18The Rice Genome
- 32,000 62,000 genes, depending on how you
predict genes and count them - 77 of predicted genes showed homology to another
predicted gene within genome - estimated 15,000 gene families
- Polyploidization event 40-50 mya
- 85 of Arabidopsis genes have homolog in rice
- 50 of predicted rice genes have Arabidopsis
homolog (annotation problems?) - 8000 Arabidopsis proteins (33) found in rice but
no other sequenced genome plant specific - 25 of genes involved in metabolism
19Predicted Functions
20Proportion of genome allocated to given functions
similar Rice - Arabidopsis
21Cereal GC content unusual among sequenced genomes
22Comparing the two subspecies
23Repetitive Elements
24Genes from other cereal genomes have homologs in
Rice
25Rice and Maize Synteny
26Ontology/Controlled vocabulary
- Ontology as a Collaboration tool
- Gene Ontology Gene Ontology Consortium
- Molecular function of a gene product
- Biological process
- Cellular component that identifies the
localization of the protein in a cell - Plant Ontology
- Anatomy
- Developmental stages
- Trait Ontology
- Traits
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Trait Ontology (TO) to describe
Mutants/phenotypes in rice
31(No Transcript)
32(No Transcript)
33Informatics Organization
Firewall
Phenotype
Map CUGI,FPC
Ontology GO,TO,PO
Grass Genes and Proteins
SWISSPROT_ACC SWISSPROT_ID Gene_Product_Descripti
on Q06967 1433_ORYSA 14-3-3-like protein S94
Q07215 1A11_ORYSA 1-aminocyclopropane-1-carboxyla
te synthase 1ACC1ACC synthase
1S-adenosyl-L-methionine methylthioadenosine-lyas
e 1 P17814 4CL1_ORYSA 4-coumarate--CoA ligase
14CL14-coumaroyl-CoA synthase 14CL4CL 1
Q42982 4CL2_ORYSA 4-coumarate--CoA ligase
24CL24-coumaroyl-CoA synthase 24CL 24CL.2
P37833 AATC_ORYSA Aspartate aminotransferase,
cytoplasmicTransaminase A
34(No Transcript)
35Rice Protein Sequence
- gt1433_ORYSAQ06967
- MSPAEASREENVYMAKLAEQAERYEEMVEFMEKVAKTTDVGELTVEERNL
LSVAYK - ARRASWRIISSIEQKEESRGNEAYVASIKEYRSRIETELSKICDGILKLL
DSHLVPSATA - AESNVFYLKMKGDYHRYLAEFKSGAERKEAAENTLVAYKSAQDIALADLP
TTHPIRLG - LNLSVFYYEILNSPDRACNLAKQAFDDAIAELDTLGEESYKDSTLIMQLL
RDNLTLWTS - NAEDGGDEIKEAAKPEGEGH
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40Gramene a tool for grass genomics(www.plantphysi
ol.org/cgi/doi/10.1104/pp.015248)
- A comparative genome mapping database for grasses
and a community resource for rice. - It combines cereal genomics, ESTs, genetic maps,
map relations and publications. - Comparative maps of rice, maize, sorghum, barley
and oat.
41(No Transcript)
42Map SearchComparativeMaps
43Map Search matrix
44Map Search Physical Maps
45Map Search Physical Maps
46Map Search Physical Maps
47Rice Genome Browser
48(No Transcript)
49(No Transcript)
50Orthology, paralogy and determining gene functions
51Homologs, orthologs and paralogs
- Homologs - any genes related by common decent.
- Orthologs - genes in different species that
evolved from a common ancestral gene by
speciation. - Paralogs - genes related by duplication within a
genome.
52Orthologs Gene A in several species
Gene A
Gene A
Gene A
Gene A
53Paralogs genes A and B
Gene A
Gene A Gene B
54Paralogs and orthologs
Gene A
Gene A Gene B
Gene A Gene B
Gene A Gene B
Gene A Gene B
55Homologs orthologs or paralogs
www.ncbi.nlm.nih.gov/Education/
BLASTinfo/orthologs3.gif
56Anthocyanins
57An important gene Lc
- A41388 Lc regulatory protein -
maizegi100897pirA41388100897
58Finding homologs BLAST
59Rice CDS search
60IRGSP search nearby genes
61Info on Ra through Gramene
62Mutant phenotypes
63Rice Ra
- Gene Name Rice R gene-a
- Gene Symbol Ra
- Phenotypic Description One rice R gene. An
active homologue with extensive homology with
other R genes. Located at a position on
chromosome 4 previously shown to be in synteny
with regions of maize chromosomes 2 and 10 that
contain the B and R loci, respectively. Ra gene
can activate the anthocyanin pathway. - 53 other genes in R gene family in rice.
64Maize Lc and Rice Ra Orthologs?
- Reciprocal best hits
- Very similar function
- But not easy to tell because of similarity to
other genes in the gene family
65Homologues in A. thaliana
- SCORE P ACCESSION GI PROTEIN
DESCRIPTION - 765 11 AAB72192 2335192 bHLH protein
- 761 11 AAG52418 12324939 putative
transcription factor - 608 11 Q9FT81 27151708 TRANSPARENT TESTA 8
protein - Transcription factors, 2 of which are involved in
regulating anthocyanin biosynthesis in ATH
66Conclusions
- Gene function can be established through homology
to known genes. - Orthology can be determined by reciprocal BLAST
hits - Related genes can be found in EST and other
databases for numerous species - Conserved synteny may aid in confirming orthology
and in building new maps