Gene Structure Prediction - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Gene Structure Prediction

Description:

Statement of the gene prediction problem. Overview of the GENSCAN program ... for hexamers starting at the 1st, 2nd, and 3rd codon position, respectively. ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 12
Provided by: isrecI
Category:

less

Transcript and Presenter's Notes

Title: Gene Structure Prediction


1
Gene Structure Prediction
  • Eukaryotic gene structures
  • Statement of the gene prediction problem
  • Overview of the GENSCAN program
  • Interpolated Markov chains
  • Performance evaluation of gene prediction programs

EPFL Bioinformatics I 12 Dec 2005
2
EPFL Bioinformatics I 12 Dec 2005
3
Variation in human gene structure
EPFL Bioinformatics I 12 Dec 2005
4
Determination of gene structure
Experimental Isolation and sequencing of RNA
sequences. Public data Assembled cDNA sequences
and single-read EST sequences. Limitations Rare
transcripts, tissue and cell type specificity,
conditional expression. It may not be possible to
extract RNA from all cell types, at all
developmental stages, and under all possible
environmental condition from a complex
organism Computational ab initio prediction
based on sequence features. Hybrid ab initio
prediction with constraints derived from partial
RNA sequences or matches with known homologous
genes from other organisms (protein or DNA
sequence similarity)
EPFL Bioinformatics I 12 Dec 2005
5
Gene prediction statement of the problem
  • Given a genome sequence
  • predict the structure of all transcripts
  • Predict the structure of the coding part of all
    transcripts.
  • Performance criteria
  • of correctly predicted/missed genes
  • of correctly predicted/missed exons
  • of correctly predicted/missed coding
    nucleotides
  • Further complications
  • It is not known in advance how many genes a
    sequence contains
  • The sequence may start or end in the middle of a
    gene

EPFL Bioinformatics I 12 Dec 2005
6
GENSCAN, and example of a gene finding algorithm
  • Principle GENSCAN finds the optimal parse of a
    sequence
  • A parse is a succession of
  • intergenic regions
  • 5UTR (untranslated regions)
  • Exons
  • Introns
  • 3.UTRs
  • Evaluation of alternative parses with the aid of
  • Weight matrices or similar models for sites
    promoters, translation starts, splice donors and
    acceptors, translation stops, polyadenylation
    sites.
  • Interpolated Markov chains (3-periodic HMMs), and
    length distributions for exons, introns, 5 and
    3 UTRs, and intergenic regions.

EPFL Bioinformatics I 12 Dec 2005
7
Interpolated Markov chains
There are three scoring tables for hexamers
starting at the 1st, 2nd, and 3rd codon position,
respectively. The scores are computed as log-odds
scores from the observed hexamer
frequencies. The score of an exonic region is
computed as the sum of the scores of its
over-lapping hexamers plus a score for its
length. Hypothetical example
frame 1 frame 2 frame 3 ACTTGCAGAAC... ACTTGC
-2 -1 0 CTTGCA
-7 -3 2 TTGCAG 1
-1 5 TGCAGA -3 -2
4 Red, green, and blue scores are from tables 1,
2 and 3, respectively. The total number of
parameters of the exon model is 3 4096.
EPFL Bioinformatics I 12 Dec 2005
8
GENESCAN model (variant HMM)
EPFL Bioinformatics I 12 Dec 2005
9
EPFL Bioinformatics I 12 Dec 2005
10
Evaluation of performance of GENSCAN
Performance measures Sensitivity
TP/AP Specificity TP/PP Correlation
coefficient Approximate correlation TP
true positives, TN true negatives, FP false
positives, FN false negatives, AP actual
positives, AN actual negatives, PP predicted
positives, PN predicted negatives
EPFL Bioinformatics I 12 Dec 2005
11
EPFL Bioinformatics I 12 Dec 2005
Write a Comment
User Comments (0)
About PowerShow.com