Title: Genome Analysis
1Genome Analysis
2- Gene catalogues
- Data retrieval
- Comparative Genome Analysis
3Sequencing large genomes
The hybrid approach
4The evidence for a gene
mRNA
reverse transcription
cDNA
5Ensembl Gene catalogue
Supporting evidence (exon)
Protein alignment
DNA alignment
6Gene catalogues - information hubs
Ensembl UCSC NCBI
entry points for interrogation
7Reference Sequences
Goal One sequence entry for each naturally
occurring DNA, RNA and protein molecule
chromosome
gene
NC_000000
NG_000000
Key curated calculated
mRNA
protein
contig
NM_000000
NP_000000
NT_000000
predictedmRNA
predictedprotein
XM_000000
XP_000000
Multiple products for one gene are instantiated
as separate RefSeqs with the same GeneID.
8General Gene classification
Known genes as catalogued by the reference
sequence project Ensembl known genes (red
genes) NCBI known genes Novel genes (1)
based on similarity to known genes, or cDNAs
these need not have 100 matching supporting
evidence Ensembl novel genes (black) NCBI Loc
genes (locus)
9General Gene classification
Novel genes (2) based on the presence of
ESTs resource of alternative splicing EST genes
in Ensembl (purple) Database of transcribed
sequences (DOTs) www.allgenes.org Acembly
(assembly) Gene prediction Single organism
Genscan Comparative information
Twinscan commonalities and differences Pseudog
enes - matches a known gene but with a disrupted
ORF Gene prediction with NO prior expressed
sequence as evidence
10Ensembl entry points
11Ensembl entry points
12- Gene catalogues
- Data retrieval
- Comparative Genome Analysis
13Data Retrieval
14- Gene catalogues
- Data retrieval
- Comparative Genome Analysis
15The perfect model?
16Mouse genes and MGI
http//www.informatics.jax.org/
17And the list goes on.
18Comparative Genome Analysis
- Functionally conserved units may be conserved
at the sequence level - Evolutionary conserved regions
functional units
19- Why Comparative Genome Analysis?
- allows us to achieve a greater understanding of
vertebrate evolution - tells us what is common and what is unique
between different species at the genome level - the function of human genes and other regions
may be revealed by studying their counterparts in
lower organisms - helps identify both coding and non-coding genes
and regulatory elements - Genome browsers have done some work, we can do
more
20Compare Genome Sequences-Multicontigview in
Ensembl
21Compare Genome Sequences- UCSC
22Identify Evolutionary Conserved Regions yourself
- To identify ECRs, you must
- Identify (if necessary extract) the relevant
genome sequences - Annotate the genome sequences
- Mask out repetitive sequences
23Programmes for identifying ECRs
- Pipmaker http//bio.cse.psu.edu/cgi-bin/pipmaker
- requires repeatmasked and annotation
files - Vista http//www-gsd.lbl.gov/vista
- requires annotation files, repeat masks
for you - zPicture - http//www.dcode.org
-
24PIP - http//bio.cse.psu.edu/cgi-bin/pipmaker/
25VISTA - http//www-gsd.lbl.gov/vista/
26zPicture
Uploads sequence and annotation files directly
from UCSC Allows regulatory elements to be added
to the alignment via link to rVista
27Visualisation of ECRs