Title: Computational Analysis of Transcript Identification Using GenBank
1Computational Analysis of Transcript
Identification Using GenBank
2Differentiation of hematopoietic cells
3(No Transcript)
4Genome-wide gene expression
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16SAGE (Serial Analysis of Gene Expression)
17Figure 1 Schematic illustration of the SAGE
process
Jes Stollberg et al. Genome Res. 2000 10
1241-1248
18SAGE GLGI Overview
19What is the chance of duplicate tags?
- We can assume we are drawing randomly from the
set of all 4-letters sequences of the given tag
length - This is the same problem as having unique
overlaps in the contig matching problem for
shotgun sequencing
20Random Model
21Random model does not reflect biological process
- Genes evolve by duplication as well as point
mutation - Many motifs are repeated
- Function widgets at work?
- Result is a strong bias in observed biological
sequences, not a uniform distribution as the
simple model hopes. - Here are some numbers .
22SAGE tags match to many genes(Tags from
Hashimoto S, et al. Blood 94837, 1999)
23Tag Frequency Groups for 10-base Tag
SetContaining 878,938 Tags for UniGene Human
24Unique Tags among 878,938 EST Derived Tags
25Unique Tags among 32,851 Gene Derived Tags
26Converting tag into longer 3 sequence
27Generation of Longer 3'cDNA for Gene
Identification (GLGI)
28UniGene Human 3 Part Length Distribution
29Myeloid Tag Matches with UniGene Human SAGE Tag
Reference Database
30SAGE Tag Processing with GIST
31k-mer tree
32(No Transcript)
33GIST Performance with Improved IO
34Conspirators
Terry Clark Andrew Huntwork Josef Jurek L.
Ridgway Scott
Sanggyu Lee Janet D. Rowley San Ming Wang