The Genome Access Course Genome Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

The Genome Access Course Genome Analysis

Description:

The Genome Access Course Genome Analysis – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 41
Provided by: james858
Category:

less

Transcript and Presenter's Notes

Title: The Genome Access Course Genome Analysis


1
TheGenomeAccessCourseGenome Analysis
From a 13th century French Bible
2
  1. Genome Sequencing and Assembly
  2. Genome Analysis
  3. Genomes on Display

3
Poliovirus 1981 7,433 nt First eukaryotic virus
Haemophilus influenza 1995 1.83 Mbp First cellular organism
Saccharomyces cerevisiae 1996 13 Mbp First eukaryote
Escherichia coli 1997 4.6 Mbp
Caenorhabditis elegans 1998 97 Mbp First multicellular organism
Drosophila melanogaster 2000 137 Mbp
Arabidopsis thaliana 2000 125 Mbp First plant
Homo sapiens 2001 3.2 Bbp First vertebrate
Oryza sativa 2002 430 Mbp

4
Hierarchical vs. Whole Genome Shotgun
5
Hierarchical Shotgun Sequencing
6
Sequencing Software Examples
  • Phred base-calling
  • Phrap assembler
  • Cross_Match
  • Consed graphical editor
  • AutoFinish finishing

7
Sequence Trace
8
Phrap Aligns Reads
AGCTNGTATCGTAGCTNGATCGCAA
GTAGCTAGATCGCTATACGTACNACGT
GATCGCTATACGTACCACGT
9
(No Transcript)
10
Finishing A BAC
Multiple clone coverage of both strands
Area of single clone coverage
Area of single strand coverage

Alignment
Gap
Gap
11
(No Transcript)
12
Constructing Maps Using Fingerprints
  • Restriction digest (HinDIII) of BACs generates a
    series of fragments
  • Determine size of fragments by gel
    electrophoresis
  • Fingerprint comparison determines overlap
  • STS markers localize fingerprint clone contigs

13
BAC Fingerprints
14
Contig Assembly
  1. Use FPC map
  2. Find (potential) overlapping sequences
  3. Order the fragments
  4. Generate the sequence

15
Directed Sequencing
  • Sequence walking
  • Use a primer near the end of the contig
  • Extend the sequence
  • Repeat if the gap is not covered

16
Genome Analysis
  • Whole genome analysis
  • Gene count
  • Gene classification
  • Repeat content
  • Chromosomal duplications
  • Multi-Genome Analysis
  • Synteny
  • Sequence similarity
  • Gene classification comparisons

17
Gene Count-How do we find genes in genomic
sequences?
Map cDNA sequences to a genome. Sim4
(http//pbil.univ-lyon1.fr/sim4.html) EST2Genome
(http//bioweb.pasteur.fr/seqanal/interfaces/est2g
enome.html) Genomewise
18
Finding Genes Cont.
Gene Predictions Fgenesh (http//www.softberry.
com) GenemarkHMM (http//opal.biology.gatech.edu/
GeneMark/eukhmm.cgi) Genscan (http//genes.mit.ed
u/GENSCAN.html) Grail (http//compbio.ornl.gov/Gr
ail-1.3/) Glimmer (http//www.tigr.org/softlab/gl
immer/glimmer.html) Homology blastx
19
Gene Prediction Types
Known cDNA evidence/homology Putative Gene
prediction which has homology to
known gene Unknown EST matching a gene
prediction Hypothetical Gene prediction(s)
only
20
Gene Classification
  • Automated
  • Similarity search against an annotated database
  • Swiss-Prot
  • Nr
  • Protein Domain search
  • i. Pfam (http//www.sanger.ac.uk/Software/Pfam/)
  • ii. Prosite
  • iii. Prints
  • iv. Prodom
  • v. Interpro (http//www.ebi.ac.uk/interpro/scan.ht
    ml)

21
Gene Classification Cont.
  • 2) Curated
  • Similar to above but usually people will verify
    results through literature searches

22
Looking for Repeats
  • RepeatMasker can find and mask repeats in DNA
    sequence
  • Can be run on cerebus or at
  • http//repeatmasker.genome.washington.edu/cgi-bi
    n/RepeatMasker
  • 3. RepeatMasker is often run on genomic sequences
    before doing gene predictions

23
Comparative Genome Analysis
  • MUMmer
  • PipMaker
  • Vista

24
MUMmer
  • Whole genome alignments
  • Compares closely related sequences
  • Maximally Unique Matching subsequences
  • agctcgatGGGCTTTAGACTCTCGATAggcgcagagGCTCGCTAGAATCG
    CTAGATCac
  • agacctaaGGGCTTTAGACTCTCGATAagtctatccGCTCGCTAGAATCG
    CTAGATCta

25
(No Transcript)
26
Segmentally duplicated regions in the Arabidopsis
genome, detected using MUMmer
Individual chromosomes are depicted as horizontal
grey bars (with chromosome 1 at the top),
centromeres are marked black. Coloured bands
connect corresponding duplicated segments.
Analysis of the genome sequence of the flowering
plant Arabidopsis thaliana. 2000.Nature
408796-815
27
Aligning Human Chromosomes using MUMer
  • Regular MUMer for chromosome level is not
    sufficiently sensitive to align chromosome, as it
    was designed to align similar sequences
  • Modification
  • Concatenate all proteins in the order they occur
    on each chromosome (on any strand)
  • The concatenated strings were aligned using
    MUMer.
  • The resulting matched were clustered to extract
    all sets of three or more that occur in close
    proximity on each chromosome these are potential
    duplications

Science 291 February 2001.
28
(No Transcript)
29
(No Transcript)
30
PIPMaker
  • PIP stands for Percent Identity Plot
  • Graphical view of similarity between two or more
    sequences
  • http//bio.cse.psu.edu/pipmaker/

31
Alignment
PIP Plot
Dot Plot
32
(No Transcript)
33
Fugu PTEN
2-6
A
B
1
7
8
9
5
100
100
H. sapiens
50
50
M. musculus
D. melanogaster
C. briggsae
C. elegans
A. thaliana 2
A. thaliana 3
L. major
S. pombe
2kb
4kb
6kb
8479
1
X. laevis
1239
1
34
Vista
  • Similar to PipMaker
  • http//www-gsd.lbl.gov/vista/

35
(No Transcript)
36
Genomes on Display
  • UCSC Browser
  • Ensembl browser
  • NCBI Browser
  • GMOD

37
UCSC Browser
  • http//genome.ucsc.edu

38
Ensembl Browser
  • http//www.ensembl.org/

39
NCBI Browser
  • http//www.ncbi.nlm.nih.gov/cgi-bin/Entrez/hum_src
    h?chrhum_chr.infquery

40
GMOD
  • Generic Model Organism Database
  • Attempt to make a common set of tools for
    databases/browsers for various species
  • www.gmod.org
Write a Comment
User Comments (0)
About PowerShow.com