Dr' Wishart - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Dr' Wishart

Description:

TATA box. ATGACAGATTACAGATTACAGATTACAGGATAG. Frame 1. Frame 2. Frame 3. Simple Gene Finding ... RNA polymerase promoter site (-10, -35 site or TATA box) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 39
Provided by: Comp632
Category:
Tags: tata | wishart

less

Transcript and Presenter's Notes

Title: Dr' Wishart


1
Dr. Wishart
  • Office hours
  • Generally available right after class
  • Best time to catch me is after 400 pm in my
    office in Athabasca Hall
  • Usually around til 600 pm
  • Arranging an appointment is always best (email)
  • Email responses take 1-2 days

2
Where to get your notes
  • http//redpoll.pharmacy.ualberta.ca/
  • Look under Courses

3
General Course Outline
  • Bioinformatics introduction
  • Bionformatics Databases
  • Sequence Alignment
  • Protein Feature ID
  • Computational microbiology
  • Peptide/Protein Analysis
  • Protein Structure (Xray)

4
General Course Outline
  • Protein Structure (NMR)
  • Mass Spectrometry 1
  • Mass Spectrometry 2
  • Proteomics
  • Systems Biology
  • Enzymology/Systems Biology
  • Protein Structure (Xray)

5
Introduction to Bioinformatics
  • Microbiology 343
  • David Wishart Rm. 3-41 Ath
  • david.wishart_at_ualberta.ca

6
Objectives Outline
  • Definitions and roles of bioinformatics
  • DNA sequencing (foundation to bioinformatics)
  • Genomes and Genomics
  • Gene finding in prokaryotes
  • From genes to proteins

7
Bioinformatics
Definition - A field of information
technology which endeavours
to improve the storage,
management and analysis of
biological, medical and
pharmaceutical data. A blend information
technology and biotechnology
8
Bioinformatics - Converting Data to Knowledge
Data
Knowledge
9
Bioinformatics Software

10
Whats the Appeal?
  • No spills or smells
  • No need for a lab coat
  • No need to get hands or clothes messy
  • Provides a faster or alternative route to create
    hypotheses, perform difficult experiments, avoid
    unnecessary experiments, compare, visualize and
    analyze data, make predictions, see what is
    unseeable and handle a growing tidal wave of
    both data and knowledge

11
Bioinformatics
Genomics
Proteomics
Bioinformatics
12
High Throughput DNA Sequencing
13
Shotgun Sequencing
Isolate Chromosome
ShearDNA into Fragments
Clone into Seq. Vectors
Sequence
14
Principles of DNA Sequencing
Primer
DNA fragment
Amp
PBR322
Tet
Ori
Denature with heat to produce ssDNA
Klenow ddNTP dNTP primers
15
The Secret to Sanger Sequencing
16
Principles of DNA Sequencing
3 Template
G C A T G C
5
5 Primer
GddC
GCddA
GCAddT
ddG
GCATGddC
GCATddG
17
Principles of DNA Sequencing
G
T
_
_
short
C
A
G C A T G C


long
18
Capillary Electrophoresis
Separation by Electro-osmotic Flow
19
Multiplexed CE with Fluorescent detection
ABI 3700
96x700 bases
20
Shotgun Sequencing
Assembled Sequence
Sequence Chromatogram
Send to Computer
21
Shotgun Sequencing
  • Very efficient process for small-scale (10 kb)
    sequencing (preferred method)
  • First applied to whole genome sequencing in 1995
    (H. influenzae)
  • Now standard for all prokaryotic genome
    sequencing projects
  • Successfully applied to D. melanogaster
  • Moderately successful for H. sapiens

22
The Finished Product
GATTACAGATTACAGATTACAGATTACAGATTACAG ATTACAGATTACA
GATTACAGATTACAGATTACAGA TTACAGATTACAGATTACAGATTACA
GATTACAGAT TACAGATTAGAGATTACAGATTACAGATTACAGATT AC
AGATTACAGATTACAGATTACAGATTACAGATTA CAGATTACAGATTAC
AGATTACAGATTACAGATTAC AGATTACAGATTACAGATTACAGATTAC
AGATTACA GATTACAGATTACAGATTACAGATTACAGATTACAG ATTA
CAGATTACAGATTACAGATTACAGATTACAGA TTACAGATTACAGATTA
CAGATTACAGATTACAGAT
23
Sequenced Genomes
http//www.genomenewsnetwork.org/
24
Genomes to Date
  • 8 vertebrates (human, mouse, rat, fugu, dog,
    chimp)
  • 3 plants (arabadopsis, rice, poplar)
  • 2 insects (fruit fly, mosquito)
  • 2 nematodes (C. elegans, C. briggsae)
  • 1 sea squirt
  • 4 parasites (plasmodium, guillardia)
  • 4 fungi (S. cerevisae, S. pombe)
  • 200 bacteria and archebacteria
  • 2000 viruses

25
Prokaryotes
  • Simple gene structure
  • Small genomes (0.5 to 10 million bp)
  • No introns (uninterrupted)
  • Genes are called Open Reading Frames of ORFs
    (include start stop codon)
  • High coding density (gt90)
  • Some genes overlap (nested)
  • Some genes are quite short (lt60 bp)

26
Prokaryotic Gene Structure
ORF (open reading frame)
TATA box
Stop codon
Start codon
ATGACAGATTACAGATTACAGATTACAGGATAG
Frame 1
Frame 2
Frame 3
27
Simple Gene Finding
  • Scan forward strand until a start codon is found
  • Staying in same frame scan in groups of three
    until a stop codon is found
  • If of codons between start and end is greater
    than 50, identify as gene and go to last start
    codon and proceed with step 1
  • If codons between start and end is less than
    50, go back to last start codon and go to step 1
  • At end of chromosome, repeat process for reverse
    complement

28
Advanced Gene Finding
  • Identify all ORFs (open reading frames) gt 200
    bases on both strands using normal and alternate
    start/stop codons
  • Find high scoring -10,-35 and RBS sites at 5
    ends of putative ORFs
  • Find high scoring rho terminators at 3 ends of
    putative ORFs
  • Exclude ORFs without identified signals at 5 or
    3 ends

29
Key Prokaryotic Gene Signals
  • Alternate start codons
  • RNA polymerase promoter site (-10, -35 site or
    TATA box)
  • Shine-Dalgarno sequence (Ribosome binding
    site-RBS)
  • Stem-loop (rho-independent) terminators
  • High GC content (CpG islands)

30
Alternate Start Codons (E. coli)
ATG Met GTG Val TTG Leu
Class I Class IIa
CTG Met ATT Val ATA Leu ACG Thr
31
-10, -35 Site (RNA pol Promoter)
-36 -35 -34 -33 -32 . -13 -12 -11 -10 -9 -8 T
T G A C T A t A A T
32
RBS (Shine Dalgarno Seq)
-13 -12 -11 -10 -9 -8 .. -1 0 1 2 3 4 G G
G G G G n A T G n C
33
Terminator Stem-loops
34
More Sophisticated Methods
RBS site
promoter site
HMM
35
Really Sophisticated Methods
  • GLIMMER
  • http//www.tigr.org/software/glimmer/
  • Uses interpolated markov models (IMM)
  • Requires training of sample genes
  • Takes about 1 minute/genome
  • GeneMark.hmm
  • http//opal.biology.gatech.edu/GeneMark/gmhmm2_pro
    k.cgi
  • Available as a web server
  • Uses hidden markov models (HMM)

36
Glimmer Performance
37
What Next?
  • Raw DNA sequence ? Gene sequences
  • Gene seqs ? Protein sequences
  • Gene Protein seqs ? Databases
  • Gene Protein info ? Databases
  • Most protein and DNA sequence data is entered
    into GenBank through XXX
  • Next Lecture Databases

38
Sample Exam Question
  • Describe an algorithm or sketch a flowchart for
    gene finding in prokaryotes
  • What are the key features of a prokaryotic ORF?
  • Following is a gene sequence identify and label
    all major features
Write a Comment
User Comments (0)
About PowerShow.com