Comparative genome analysis data resource - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Comparative genome analysis data resource

Description:

'Our goal is to facilitate the study of cross-species homology relationships ... (MOD) and a resource like the National Center for Biotechnology Information (NCBI) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 67
Provided by: bioInform1
Category:

less

Transcript and Presenter's Notes

Title: Comparative genome analysis data resource


1
Comparative genome analysis data resource
  • Or, Gramene, A Web-accessible database available
    at http//www.gramene.org

2
(No Transcript)
3
Overview
  • Background (Gary)
  • Genomic sequence data (Aaron)
  • Databases and ontologies (Paul)
  • Web site overview (Zhiping)
  • Real world query (Nolan)

4
What is a comparative genome analysis data
resource?
  • Our goal is to facilitate the study of
    cross-species homology relationships using
    information derived from public projects involved
    in genomic and EST sequencing, protein structure
    and function analysis, genetic and physical
    mapping, interpretation of biochemical pathways,
    gene and QTL localization and descriptions of
    phenotypic characters and mutations.

5
Where does Gramene fit in?
  • Lies somewhere between a resource like a Model
    Organism Database (MOD) and a resource like the
    National Center for Biotechnology Information
    (NCBI)
  • Gramene
  • MOD (specific) NCBI (general)
  • NCBI
  • a national resource for molecular biology
    information

6
What is a MOD?
  • Model Organisms
  • Set of organisms studied in great detail
  • Advantages for experimental research
  • e.g. S. cerevisiae, D. melanogaster, D. rerio,
  • Model Organism Databases
  • Core data resource for a community
  • Integrates data from many sources with expert
    domain knowledge, e.g. FlyBase http//flybase.bio.
    indiana.edu
  • Gramene
  • Is like a MOD for Rice
  • And a comparative grass genomics resource

7
Gramene scope
  • Genomic data Phenotypic data

8
Gramene scope
  • Rice
  • whole genome sequence
  • Other crop grasses
  • Maize
  • Sorghum
  • Millet
  • Sugarcane
  • Wheat
  • Oats
  • Barley

9
Synteny in Gramene
  • Synteny
  • genes on the same chromosome
  • Conserved synteny
  • genes on same chromsome in one species also found
    together on same chromsome in another species
  • Synteny among crop grasses
  • Rice genomic sequence a window onto other genomes

10
(No Transcript)
11
Gramene curation
  • Curation is the use of expert domain knowledge to
    enter data into a database
  • Much of the rice data is curated
  • Maps
  • Markers
  • Proteins
  • Phenotypes
  • Literature
  • Other data automated

12
Importance of MODs()
  • Researchers becoming dependent on for information
    needed to do research
  • Gramene uses rice as framework to organize
    information for other grasses
  • Provides comparative maps
  • Maize has huge amount of classical genetic data
  • Whole is greater than the sum of the parts

13
(No Transcript)
14
Crown Eukaryote Genome Sequencing Projects
15
Cereal Genome Sizes
  • Sorghum 1000 Mb
  • Maize 3000 Mb
  • Barley 5000 Mb
  • Wheat 16,000 Mb
  • Rice 420 Mb

16
Two rice genomes
  • Beijing Genomics Institute (established 1999)
  • Private nonprofit, sequence available on GenBank
  • Oryza sativa indica
  • 4 x coverage, 99.9 accuracy
  • Shotgun sequencing of small insert clones
  • RePS assembly (repeat-masked Phrap with
    scaffolding)
  • Did not assemble across highly repetitive regions
  • Syngenta
  • Private, sequence available, but not on GenBank
  • Oryza sativa japonica
  • 6 x coverage, 99.8 accuracy, 10 of cost of
    IRGSP
  • Shotgun sequencing of small insert clones,
    assembly anchored by genetic and physical maps
  • Compared against International Rice Genome
    Sequencing Project sequence for quality control

17
Japonica and Indica
18
The Rice Genome
  • 32,000 62,000 genes, depending on how you
    predict genes and count them
  • 77 of predicted genes showed homology to another
    predicted gene within genome
  • estimated 15,000 gene families
  • Polyploidization event 40-50 mya
  • 85 of Arabidopsis genes have homolog in rice
  • 50 of predicted rice genes have Arabidopsis
    homolog (annotation problems?)
  • 8000 Arabidopsis proteins (33) found in rice but
    no other sequenced genome plant specific
  • 25 of genes involved in metabolism

19
Predicted Functions
20
Proportion of genome allocated to given functions
similar Rice - Arabidopsis
21
Cereal GC content unusual among sequenced genomes
22
Comparing the two subspecies
23
Repetitive Elements
24
Genes from other cereal genomes have homologs in
Rice
25
Rice and Maize Synteny
26
Ontology/Controlled vocabulary
  • Ontology as a Collaboration tool
  • Gene Ontology Gene Ontology Consortium
  • Molecular function of a gene product
  • Biological process
  • Cellular component that identifies the
    localization of the protein in a cell
  • Plant Ontology
  • Anatomy
  • Developmental stages
  • Trait Ontology
  • Traits

27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Trait Ontology (TO) to describe
Mutants/phenotypes in rice
31
(No Transcript)
32
(No Transcript)
33
Informatics Organization
Firewall
Phenotype
Map CUGI,FPC
Ontology GO,TO,PO
Grass Genes and Proteins
SWISSPROT_ACC SWISSPROT_ID Gene_Product_Descripti
on Q06967 1433_ORYSA 14-3-3-like protein S94
Q07215 1A11_ORYSA 1-aminocyclopropane-1-carboxyla
te synthase 1ACC1ACC synthase
1S-adenosyl-L-methionine methylthioadenosine-lyas
e 1 P17814 4CL1_ORYSA 4-coumarate--CoA ligase
14CL14-coumaroyl-CoA synthase 14CL4CL 1
Q42982 4CL2_ORYSA 4-coumarate--CoA ligase
24CL24-coumaroyl-CoA synthase 24CL 24CL.2
P37833 AATC_ORYSA Aspartate aminotransferase,
cytoplasmicTransaminase A
34
(No Transcript)
35
Rice Protein Sequence
  • gt1433_ORYSAQ06967
  • MSPAEASREENVYMAKLAEQAERYEEMVEFMEKVAKTTDVGELTVEERNL
    LSVAYK
  • ARRASWRIISSIEQKEESRGNEAYVASIKEYRSRIETELSKICDGILKLL
    DSHLVPSATA
  • AESNVFYLKMKGDYHRYLAEFKSGAERKEAAENTLVAYKSAQDIALADLP
    TTHPIRLG
  • LNLSVFYYEILNSPDRACNLAKQAFDDAIAELDTLGEESYKDSTLIMQLL
    RDNLTLWTS
  • NAEDGGDEIKEAAKPEGEGH

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
Gramene a tool for grass genomics(www.plantphysi
ol.org/cgi/doi/10.1104/pp.015248)
  • A comparative genome mapping database for grasses
    and a community resource for rice.
  • It combines cereal genomics, ESTs, genetic maps,
    map relations and publications.
  • Comparative maps of rice, maize, sorghum, barley
    and oat.

41
(No Transcript)
42
Map SearchComparativeMaps
43
Map Search matrix
44
Map Search Physical Maps
45
Map Search Physical Maps
46
Map Search Physical Maps
47
Rice Genome Browser
48
(No Transcript)
49
(No Transcript)
50
Orthology, paralogy and determining gene functions
  • Nolan Kane

51
Homologs, orthologs and paralogs
  • Homologs - any genes related by common decent.
  • Orthologs - genes in different species that
    evolved from a common ancestral gene by
    speciation.
  • Paralogs - genes related by duplication within a
    genome.

52
Orthologs Gene A in several species
Gene A
Gene A
Gene A
Gene A
53
Paralogs genes A and B
Gene A
Gene A Gene B
54
Paralogs and orthologs
Gene A
Gene A Gene B
Gene A Gene B
Gene A Gene B
Gene A Gene B
55
Homologs orthologs or paralogs
www.ncbi.nlm.nih.gov/Education/
BLASTinfo/orthologs3.gif
56
Anthocyanins
57
An important gene Lc
  • A41388 Lc regulatory protein -
    maizegi100897pirA41388100897

58
Finding homologs BLAST
59
Rice CDS search
60
IRGSP search nearby genes
61
Info on Ra through Gramene
62
Mutant phenotypes
63
Rice Ra
  • Gene Name Rice R gene-a
  • Gene Symbol Ra
  • Phenotypic Description One rice R gene. An
    active homologue with extensive homology with
    other R genes. Located at a position on
    chromosome 4 previously shown to be in synteny
    with regions of maize chromosomes 2 and 10 that
    contain the B and R loci, respectively. Ra gene
    can activate the anthocyanin pathway.
  • 53 other genes in R gene family in rice.

64
Maize Lc and Rice Ra Orthologs?
  • Reciprocal best hits
  • Very similar function
  • But not easy to tell because of similarity to
    other genes in the gene family

65
Homologues in A. thaliana
  • SCORE P ACCESSION GI PROTEIN
    DESCRIPTION
  • 765 11 AAB72192 2335192 bHLH protein
  • 761 11 AAG52418 12324939 putative
    transcription factor
  • 608 11 Q9FT81 27151708 TRANSPARENT TESTA 8
    protein
  • Transcription factors, 2 of which are involved in
    regulating anthocyanin biosynthesis in ATH

66
Conclusions
  • Gene function can be established through homology
    to known genes.
  • Orthology can be determined by reciprocal BLAST
    hits
  • Related genes can be found in EST and other
    databases for numerous species
  • Conserved synteny may aid in confirming orthology
    and in building new maps
Write a Comment
User Comments (0)
About PowerShow.com