Title: 1 of 34
1Genomes with Ensembl
Dr. Giulietta M. Spudich European Bioinformatics
Institute Hinxton, UK
2Today
- Introduction to the Ensembl project
- Walk-through of the browser
- BioMart
- Variation
- Comparative Genomics
3Introduction to Ensembl
- Why do we have genome browsers?
- Why Ensembl?
- Ensembl genes and genomes
- Help and tutorials
4Genome browsers provide a map
Figure adapted from the EnCODE project www.nature.
com/nature/focus/encode/
5Genome Browsers
- Ensembl Genome browser
- http//www.ensembl.org
- NCBI Map Viewer
- http//www.ncbi.nlm.nih.gov/mapview/
- UCSC Genome Browser
- http//genome.ucsc.edu
6What Distinguishes Ensembl from the UCSC and NCBI
Browsers?
- The gene set. Automatic annotation based on mRNA
and protein information. - Programmatic access via the Perl API (open
source) - BioMart
- Integration with other databases (DAS)
- Comparative analysis (gene trees)
7Subjects
- Why do we have genome browsers?
- Why Ensembl?
- How can we extract data from Ensembl?
- Where can I find help?
8To meet a challenge
- Ensembls AIM To provide annotation for the
biological community that is freely available and
of high quality -
- Started in 2000
- Joint project between EBI and Sanger
- Funded primarily by the Wellcome Trust,
additional funding by EMBL, NIH-NIAID, EU, BBSRC
and MRC
9Vertebrates are available
Extension to other genomes Plants,
Microorganisms, www.ensemblgenomes.org
Non-chordates D. melanogaster C. elegans S.
cerevisiae
10 Extending Ensembl across the taxonomic space
Slide design by Jeff Almeida-King
F. D. Ciccarelli, T. Doerks, C. von Mering, C. J.
Creevey, B. Snel P. Bork. Towards automatic
reconstruction of a highly resolved tree of life.
Science, 3 March 2006.
11Exploring genomes
- Vertebrates focus www.ensembl.org
- Other species www.ensemblgenomes.org
12Subjects
- Why do we have genome browsers?
- Why Ensembl?
- Ensembl (vertebrate) genes genomes
- Help and tutorials
13What is known?
Genomic assemblies from sequencing consortia
14What is known?
Proteins and cDNA/mRNA sequences from the
research community found in
- UniProt/Swiss-Prot (manually curated)
- UniProt/TrEMBL
- www.uniprot.org
- NCBI RefSeq (manually curated)
- www.ncbi.nlm.nih.gov/RefSeq
15Combining genes and genomes
tgcctgttag...
16Too many pieces
17Ensembl shows one transcript
with underlying evidence
18VEGA/Havana
- Automatic annotation pipeline Gene building all
at once (whole genome) - Ensembl
- Manual curation case-by-case basis
- VEGA Vertebrate Genome Annotation
- Havana
19HAVANA
- http//www.sanger.ac.uk/HGP/havana/
20Genes and Transcripts in Ensembl
- Ensembl known transcripts
- Ensembl novel transcripts
- Ensembl merged transcripts (Havana)
- EST clusters
- More manual curation (SGD, WormBase, FlyBase)
21Ensembl/Havana
- Transcripts are labelled
- Ensembl
- Havana
- Ensembl/Havana merge
22Names in Ensembl
- ENSG Ensembl Gene ID
- ENST Ensembl Transcript ID
- ENSP Ensembl Peptide ID
- ENSE Ensembl Exon ID
- For other species than human a suffix is added
- MUS (Mus musculus) for mouse ENSMUSG
- DAR (Danio rerio) for zebrafish ENSDARG,
etc.
23Low-coverage genomes
- High-coverage sequencing is time-consuming and
expensive - BAC clones (gt10x) Human, Mouse, Zebrafish
- Whole Genome Shotgun (6x) Chimp, Rat,
Chicken,... - Low (2x) coverage genome sequencing
- Faster, cheaper, but only useful when annotated
- Assembled into lots of scaffolds
- Classic Ensembl gene-build would result in many
partial and fragmented genes
24Some 2X genomes
25Low-Coverage Gene-Build
- Whole Genome Alignment to an annotated
high-quality reference genome - Guided re-ordering of scaffolds
- Annotation of longer, more complete gene
structures
262X Genebuild
Human gene
Human genome
Cat scaffold 2
Cat scaffold 1
Human or dog gene (projected)
27What other annotation?
- Non-coding (nc)RNAs
- IDs in other databases
- microarray probes, clonesets, BAC maps
- Other features of the genome
- repeats, CpG islands
- Comparative data
- orthologues and paralogues, protein families,
whole genome alignments, syntenic regions - Variation data
- SNPs, InDels
- Regulatory data (a first guess at promoter and
enhancer elements) - Data from external sources (DAS)
28Sources of Variation
- NCBI dbSNP
- Import alleles, flanking sequence, frequencies,
- Calculate position, transcript effect
- http//www.ncbi.nlm.nih.gov/SNP/
- For human also
- HGVbase
- Affy GeneChip 100K and 500K Mapping Array
- Affy Genome-Wide SNP array 6.0
- Ensembl-called SNPs (from Celera reads and Jim
Watsons and Craig Venters genomes) - For mouse, rat, dog and chicken also
- Sanger- and Ensembl-called SNPs (other strains /
breeds) - STAR Project for rat, other projects
29External Sources
- Large-scale variations in
- DECIPHER
- Database of Chromosomal Imbalance and Phenotype
in Humans using Ensembl Resources - DGV loci
- Database of Genomic Variants
- CNVs, Inversions, InDels
30Subjects
- Why do we have genome browsers?
- Why Ensembl?
- Ensembl genes and genomes
- Help and tutorials
31How is this information organised?
- Ensembl Views (Website)
- Ensembl Database (open source)
- BioMart DataMining tool
32Help and Information
- Comments and questions?
- helpdesk_at_ensembl.org
- Check out our tutorials page
- www.ensembl.org/info/website/tutorials/index.h
tml - Videos http//www.youtube.com/user/EnsemblHelpdesk
- Mailing list ensembl-announce_at_ebi.ac.uk
- Come visit our blog! http//ensembl.blogspot.co
m/ - FTP site ftp//ftp.ensembl.org
- Amazon Web Services http//aws.amazon.com/publicd
atasets
33Ensembl Team
Ensembl Paul Flicek (EBI), Steve Searle (Sanger Institute)
Software Glenn Proctor, Andreas Kähäri, Stephen Keenan, Rhoda Kinsella, Eugene Kulesha, Ian Longden, Daniel Rios, Iliana Toneva
Comparative Genomics Javier Herrero, Kathryn Beal, Stephen Fitzgerald, Leo Gordon, Albert Vilella
Functional Genomics Ian Dunham, Nathan Johnson, Steven Wilder
Variation Fiona Cunningham, Yuan Chen, Pontus Larrson, Will McLaren
Analysis and Annotation Bronwen Aken, Julio Banet, Susan Fairley, Jan-Hinnerck Vogel, Simon White, Amonida Zadissa
Web Team Anne Parker, Eugene Bragin, Bethan Pritchard, Steve Trevanion (VEGA)
Outreach Xosé M Fernández, Jeff Almeida-King, Bert Overduin, Michael Schuster (QC), Giulietta Spudich
Systems Support Guy Coates, James Beal, Gen-Tao Chiang, Peter Clapham, Simon Kelley, Shelley Goddard, Tracy Mumford, Kerry Smith
Research Benoît Ballester, Damian Keefe, Dace Ruklisa, Petra Catalina Schwalie, Guy Slater
Vertebrate Genomics Illka Lappalainen, Chao-Kung Chen, Laura Clark, Jonathan Hinton, Vasudev Kumanduri, Edoardo Marcora, Damian Smedley, Richard Smith, Phil Wilkinson, Holly Zheng-Bradley
Ensembl Genomes Paul Kersey, Paul Derwent, Matthias Haimel, Arnaud Kerhornou, Uma Maheswari, Michael Nuhn, Dan Staines, Andy Yates
VectorBase Dan Lawson, Gautier Koscielny, Karyn Megy
Zebrafish Kerstin Howe, Kim Brugger (GRC), Will Chow, Britt Reimholz, James Torrance
Ensembl Strategy Ewan Birney, Richard Durbin, Tim Hubbard
34The Wellcome Trust Genome Campus