Ensembl Database and Web Browser www'ensembl'org - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Ensembl Database and Web Browser www'ensembl'org

Description:

Ensembl Database and Web Browser www'ensembl'org – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 39
Provided by: stephe78
Category:
Tags: browser | database | dj | ensembl | org | software | web | www

less

Transcript and Presenter's Notes

Title: Ensembl Database and Web Browser www'ensembl'org


1
EnsemblDatabase and Web Browserwww.ensembl.org
Stephen Baird Apoptosis Research
Centre Childrens Hospital of Eastern
Ontario sbaird_at_arc.cheo.ca
2
  • Focus on vertebrates
  • No fungi/plants
  • Brassica/Arabidopsis genome browser is at
    http//ensembl.warwick.ac.uk/

3
What is Ensembl?
  • Joint project of EBI and Sanger
  • Automated annotation of eukaryotic genomes
  • Open source software
  • Relational database system
  • Web interface

The main aim of this campaign is to encourage
scientists across the world - in academia,
pharmaceutical companies, and the biotechnology
and computer industries - to use this free
information.
- Dr. Mike Dexter, Director of the Wellcome Trust
4
Ensembl components
Search tools
Data
Chromosomes (FeatureView, KaryoView, Ctyoview,
MapView)
SNPs and Haplotypes (SNPView, GeneSNPView, HaploVi
ew, LDView)
Sequence Similarity (BLAST, SSAHA)
Diseases (DiseaseView)
Genome Sequence (ContigView)
Genes (GeneView, TransView, ExonView, GeneSeqView)
Markers (MarkerView)
Functions (GOView)
Text (TextView)
Other Annotations
Protein (ProtView, DomainView, FamilyView
Anything (BioMart/Martview)
Comparative Genomics (ContigView,
MultiContigView, SyntenyView, GeneView)
5
Ensembl Gene Annotation
  • Basis for initial analysis and publication of
    most vertebrate genomes
  • Genome assembly from NCBI
  • Gene build system
  • Targeted gene builds predict known genes
  • Similarity gene builds predict novel genes

6
Curwen et al, Genome Res 14 942-950, 2004
7
Targeted gene build
  • Align known proteins with pmatch and BLAST
  • Incorporate aligned cDNA sequences to find splice
    sites, UTRs with genewise

ContigView of best in genome gene with associated
evidence
UTRs predicted
Known gene (p53)
Proteins aligned
Unigene clusters aligned
cDNAs aligned
8
Similarity gene build
  • Identify novel exons ab initio using Genscan
  • Confirm exons by BLAST to known proteins, mRNAs,
    UniGene clusters

9
Ensembl Gene Annotation
  • Resulting Ensembl genes are highly accurate
    with low false positive rates
  • Ensembl human gene identifiers are 95 stable
    between builds
  • Ensembl and RefSeq differ with 8-12 of the
    genes
  • The Consensus CDS (CCDS) project is a
    collaborative effort between Ensembl/EBI, UCSC
    and NCBI to identify a core set of human protein
    coding regions that are consistently annotated
    and of high quality (13,000 genes).

10
Manually curated genes VEGA
  • Some chromosomes contain manually curated genes
    from VEGA database
  • Otter manual annotation system allows
    integration of automatic and manual annotations
    (eg. from Apollo) into Ensembl by The Human and
    Vertebrate Annotation (HAVANA) group annotators
    at the Sanger center

VEGA gene
11
Ensembl EST genes
  • ESTs not accurate enough to produce Ensembl
    genes, but important for identifying alternative
    transcripts
  • ESTs aligned to genome and merged to create an
    independent set of EST genes

Known gene
EST genes
Unigene clusters aligned
12
Pseudogenes
  • Processed pseudogenes in annotation identified
    (lack of introns, frameshifts, presence of
    multi-exon version elsewhere in genome, etc.)

Pseudogene
13
Noncoding RNA Genes
  • Genes with no ORFs that are functional (tRNAs,
    rRNAs, miRNAs )
  • 7220 annotations from Sean Eddy and Tom Jones

miRNAs
Coding gene
14
Example 1 Exploring Caspase-3
  • Aim to demonstrate basic browsing and views
  • Caspase-3 is a gene involved in apoptosis (cell
    suicide)
  • We will look at
  • Gene annotation
  • SNPs
  • Orthologs and genome alignments
  • Alternative transcripts and EST genes
  • Protein Structure

15
Text Search
Species-specific homepage
caspase-3
Gene
16
GeneView
GeneSpliceView
GeneRegulationView
ContigView
GeneSNPView
ExonView
TransView of transcript
ProteinView
ExportView
Orthologs predicted by sequence similarity and
synteny
17
GeneView
DAS - Distributed Annotation System - external
annotation of splicing, transcripts, array
expression, pubmed links, associated phenotypes,
Protonet, Reactome, UniProt.
Information for each Transcript - similarity
matches, links to RefSeq, OMIM, PDB, Array
probes, GO, InterPro, Protein FamilyView,
transcript structure, protein properties.
18
GeneView
GeneSNPView
19
GeneSNPView
20
Other SNP/Haplotype tools
  • SNPView info on a single SNP
  • ProteinView (protein sequence with SNP markup)
  • LDView View linkage disequilibrium (only limited
    regions)
  • HaploView View haplotypes (only limited regions)

21
GeneView
Click Back to
22
ContigView
Chromosome and bands
Sequence contigs
To Detailed view
23
ContigView Detailed View
See other tracks, options in menus
Genscan predictions
Targetted gene predictions (2 alternative
transcripts)
Gene annotations
EST genes
Other tracks Aligned sequences etc.
Base View Region
24
ContigView- Features menu
Export image (ps, pdf, svg) or fasta file
Click on close menu
25
MultiContigView
Conserved regions
Rat ortholog
26
Other Comparative Genomics Tools
  • Up to 6 genome alignments with MLAGAN in
    AlignSliceView
  • Other view is SyntenyView
  • Also access comparative genomics through EnsMart

27
DAS-Distributed Annotation System
28
Data Mining with BioMart
  • Allows very fast, cross-data source querying
  • Search for genes (features, sequences, etc.) or
    SNPs based on
  • Position function domains similarity
    expression etc.
  • Accessible from Ensembl website (MartView) as
    well as stand-alone
  • Extremely powerful for data mining

29
Example 2 BioMart
  • A new disease locus has been mapped between
    markers D21S1991 and D21S171. It may be that the
    gene involved has already been identified as
    having a role in another disease. What candidates
    are in this region?

30
BioMart Choosing your dataset
31
BioMart Filtering
21
32
BioMart Output
Note you can output different types of information
33
BioMart Output
34
Sequence Similarity Searching
  • Use SSAHA for exact matches (fast)
  • Use BLAST for more distant similarity (slow)

35
Looking for Help?
36
DAS Getting your Own Data in Ensembl
  • DAS (Distributed Annotation System)
  • Anyone can load data into Ensembl and allow
    others to view it in the same view (eg.
    ContigView) as other Ensembl annotations
  • Click on Managesources in DAS dropdownmenu

37
Other Ways to Access Ensembl
  • MySQL database directly accessible
  • APIs for Perl and Java
  • Other software
  • Apollo Java genome annotation viewer/editor
  • Sockeye Java viewer
  • You can get your own local version of Ensembl
    software and data freely available
  • http//www.ensembl.org/Docs/

Sockeye
38
Exercises
  • Ex 1. Homologues of human genes are often present
    in Fugu rubripes in more condensed form (with
    shorter introns). Is this true for the gene PTEN,
    a tumor suppressor often mutated in advanced
    cancers?
  • Try MultiContigView can you think of another way
    to get this information as well?
  • Ex 2. The microRNA bantam regulates the
    Drosophila (fruitfly) gene hid by binding the 3
    UTR. Hid is involved in apoptosis, and it is
    possible that binding sites for bantam could be
    found in the 3 UTR of other apoptosis genes as
    well. Obtain the 3 UTR sequence of all
    Drosophila genes known to be involved in
    apoptosis.
  • Using BioMart, the GO term for apoptosis is
    GO0006915, evidence code TAS
  • Ex 3. The file PCR_product.txt on the webserver
    contains the sequence of a PCR product amplified
    from a mouse cDNA library. What gene does the
    product correspond to? Does it contain the
    complete coding sequence of that gene?
  • Would it be better to use BLAST or SSAHA?
Write a Comment
User Comments (0)
About PowerShow.com