Lecture 6: Gene Prediction - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Lecture 6: Gene Prediction

Description:

Regulatory binding sites. Focus for finding coding regions. Splice Junctions. Codon bias ... in these basal trc factors make them unreliable by themselves ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 17
Provided by: MICHELLE6
Category:
Tags: bias | binding | gene | how | lecture | make | prediction | to

less

Transcript and Presenter's Notes

Title: Lecture 6: Gene Prediction


1
Lecture 6 Gene Prediction
  • Chapter 6
  • Eukaryotic gene and intron splicing prediction
  • Genome and gene organization

2
Eukaryote Gene Prediction
  • Its like finding a 2gram needle in a 6,000kg
    haystack.

3
Finding Gene-Rich Regions in whole genome
  • Isochores (high density versus low density)
  • CpG islands
  • Codon biased regions
  • Mask the junk (mask the repeats)

4
Isochores
  • Genome organization
  • Nucleotides?genes?isochores?chromosomes
  • Just another level of organization to help
    organize our information!
  • Isochore
  • 1 mb
  • Homogenous bp composition (GC content uniform in
    individual isochores)
  • Human example
  • 5 different classes of isochores
  • L1, L2, H1, H2, H3
  • Llow density (lower GC and lower gene content)
  • Hhigh density (higher GC and higher gene
    content)

Where would you Start to sequence A genome?
5
CpG Islands
  • Randomness of most dinucleotide pairs
  • CpG found at 20 of the normal random frequency
    overall
  • Some regions are more dense with CpG than others
  • CpG Islands
  • Higher density of CpG in a given area as compared
    to overall percentage of CpG
  • CpG islands occur at specific sites
  • 45,000 islands in human genome
  • Found typically in housekeeping genes and at many
    regulatory binding sites or promoter regions
  • 1-2kb at 5 ends of many human genes
  • About 4,500 islands in human genes
  • Never found in introns or in junk DNA or in gene
    free regions

6
Codon Bias
  • Every organisms seems to prefer to use some
    codons over others within genes
  • Yeast AGA for arginine 48 of time
  • Codon bias occurs in exons, but not in introns
  • This can be used as more evidence for the
    presence of an intron

7
Codon Usage in a Properly Spliced Gene
8
Locating Genes and Coding Regions in Eukaryote
Gene-Rich Regions
  • ORF locating versus Coding Region Location
  • Note Easy!
  • Large genomes
  • About 1 of human DNA encodes functional genes.
  • Alternatively, 85 of bacteria DNA encodes
    functional genes
  • Genes are interspersed among long stretches of
    non-coding DNA.
  • More than one chromosome and more than one copy
    of a chromosome
  • Repeats, pseudo-genes, and introns confound
    matters
  • Will take at least 15 years to find all genes in
    a given genome once genome is sequenced.

9
Eukaryote Prediction
  • A variety of features must be found and the
    features must be found in specific locations
  • Any one feature (ie finding a promoter region)
    cannot be used by itself to predict a functional
    gene
  • Prediction algorithms have an accuracy of about
    50 for Euks
  • Focus for finding genes
  • Promoter elements
  • Regulatory binding sites
  • Focus for finding coding regions
  • Splice Junctions
  • Codon bias
  • All DNA motiffs observed in 5 to 3 direction on
    coding/sense strand

10
Finding Genes Promoter Elements
  • Binding of transcriptional factors to promoter
  • Assist in RNA polymerase II binding
  • TATA box
  • TATAWAW, where WA or T
  • At 25 of trc start site
  • Inr sequence (initiator sequence)
  • YYCARR, where YC or T and RG or A
  • At 1 from TATA box
  • Subtle differences in these basal trc factors
    make them unreliable by themselves
  • Need more evidence

11
Finding Genes Regulatory Protein Binding Sites
  • Found upstream from trc start site
  • Regulatory proteins bind to DNA and promote or
    prevent trc
  • CAAT box found 80 from actual gene
  • Enhancers
  • EX) GGGCGG site ? binding site for Sp1 protein
  • -500 to 500 to trc start site
  • Bend DNA?change DNA orientation to initiate trc

12
Finding Coding Regions Within Genes Splice
Junctions
  • Junctions between intron and 2 flanking exons
  • GT-AG rule (usually)
  • First 2 nucleotides of intron in mRNA is GU
    (coding DNA GT)
  • Last 2 nucleotides of intron is AG
  • Introns must be at least 60 bp, but there are no
    limits on length
  • No limits on distribution
  • Other splice junctions exist and differ for
    different genes
  • Alternative splicing complicates things here.
  • 20 of all human genes!

13
Finding Coding Regions Within Genes Splice
Junction
Intron
G100 T100 A62 A68 G64T6312-C/T N C68 A100
G100
Exon
Exon
3 Splice Junction (5-AG-3)
5 Splice Junction (5-GT-3)
  • Subscript refers to percentage of times you see
    that particular base.
  • Dots represent length of intron (can be any
    number of nucleotides greater than about 60).
  • Consensus GTAAGT.YYYYYYYYYYYYNCAG

14
Finding Coding Regions Within Genes Alternative
Splicing
Mouse Troponin T Cardiac muscle or skeletal
muscle
Intron
Intron
Tnni3 Cardiac
Intron
Intron
Tnnt1 Skeletal
15
Finding Coding Regions Within Genes Coding Bias
  • Also used for predicting Coding regions

16
Gene Prediction SoftwareGreat for Eukaryotes
  • Genescan and HMM Gene
  • Used to predict exon locations and repeated
    elements
  • Splices exons and translates (when more than one
    present) so you can do a BLASTP
  • http//genes.mit.edu/GENSCAN.html
  • http//www.cbs.dtu.dk/services/HMMgene/
  • Used primarily for Human/vertebrate genomic
    sequences
  • Not good for invertebrate sequences
  • Practice Together
Write a Comment
User Comments (0)
About PowerShow.com