Genome annotation Assessment in Drosophila melanogaster - PowerPoint PPT Presentation

About This Presentation
Title:

Genome annotation Assessment in Drosophila melanogaster

Description:

Genome annotation Assessment in Drosophila melanogaster By: Lauren Gonzalez Jeff Hu Cathy Zhou Purpose The goal of this genome annotation assessment project (GASP ... – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 11
Provided by: CathyZ151
Category:

less

Transcript and Presenter's Notes

Title: Genome annotation Assessment in Drosophila melanogaster


1
Genome annotation Assessment in Drosophila
melanogaster
  • By Lauren Gonzalez
  • Jeff Hu
  • Cathy Zhou

2
Purpose
  • The goal of this genome annotation assessment
    project (GASP) was to refine the accuracy of gene
    annotation technologies so that we could use it
    to our benefit to facilitate genome sequencing
    and annotation.

3
Identifying STDs
  • The researchers created three standard sets of
    DNA by which the experimental results were
    compared to.
  • std1
  • std2
  • std3

4
Standard Sets
  • std1 Accurate position of base pairs, but might
    not include every gene
  • std2 Original 80 cDNA sequences
  • Std3 Has most genes, but exact placements might
    not be known

5
6 classes of gene annotation
  • 1. Gene finding
  • 2. Promoter prediction
  • 3. Repeat finders
  • 4. Protein homology annotation
  • 5. EST/cDNA alignment
  • 6. Gene function

6
A comparison of testing results
7
Understanding results
  • True positive
  • True negative
  • False positive
  • False negative

8
Sensitivity vs. Specificity
  • Sensitivity ratio of true positives to all real
    positives (ratio of true hits)
  • TP / (TP FN)
  • Specificity ratio of true positives to all found
    positives (ratio of real hits)
  • TP / (TP FP)

9
Algorithmic Problem Testing
  • We will investigate the ability of a java code to
    identify repeat sequences in different ORFs in a
    strand of DNA. Our threshold for a repeat
    sequence will be different for each repeat
    sequence length
  • Length Threshold
  • 1 base pair 4 in a row
  • 2 base pairs 8 in a row
  • 3 base pairs 12 in a row
  • 4 base pairs 16 in a row

10
APT cont.
  • We will then take the number of repeat sequences
    (and their respective lengths) that are on or
    above threshold, and analyze which repeat
    sequence has the highest occurrence. We will run
    this java program on many different strands of
    DNA, and look at which strands have higher repeat
    rates, and how reading in different ORFs change
    our results.
Write a Comment
User Comments (0)
About PowerShow.com