Indexing Genome Sequences - PowerPoint PPT Presentation

About This Presentation

Title:

Indexing Genome Sequences

Description:

State of the art. Dynamic Programming. Slow but accurate. Never ... Not suited for average DNA/Protein query lengths. IITB - Bioinformatics Workshop 2001 ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 16

Provided by: srik6

Category:

Tags: genome | indexing | sequences

Transcript and Presenter's Notes

Title: Indexing Genome Sequences

1
Indexing Genome Sequences

Srikanta B. J.
Database Systems Lab (DSL)
Indian Institute of Science

2
Background

Sequences
DNA (Deoxyribose Nucleic Acid)
Proteins
Similarity of sequences
The extent to which nucleotide or protein
sequences are related
Percent sequence identity, and/or Conservation

3
Genome Sequence Analysis

Hypothesize
Function of Proteins
Phylogenetic trees
Causes of Diseases
First step in unraveling the mystery of Life!
Sequence Similarity ? Structural Similarity ?
Functional Similarity

4
Sequence Similarity

Alignment
between two sequences, S1 S2 (perhaps of
unequal length)
Insert spaces, into or at the ends of S1(S2)
Place them so that every character or space in
either string is opposite a unique
character/space in the other.E.g.,q a c - d b
dq a w x - b -
Global Local Alignments

5
Alignment

Global
Given two sequences, find best alignment over
full length
E.g., between (agtcacaaaact, actcgga) a g t c a
c a a a a c t a c t c g
g a - - - - -
Local
Look for islands of high similarity
E.g., between (agtcacaaaact, actcgga) a g t c a
c a a a a c t
a c t c g g a

O(mn) with Dynamic Programming
6
Scoring the Alignments

Scoring Schemes
Value for aligning character x against character
y
Provided as scoring matrix, for alphabet ?
E.g.,
BLOSUM
PAM - 120
DNA-BLAST (5 for match, -4 for mismatch)
Optimizing alignments
E.g., Edit Distance
Scoring Scheme Insert - 1, Delete - 1, 0
otherwise
gt edit_distance (surgery, surgeon) 4

7
Search Process

Given sequence to be studied
Want all similar (global/local) known sequences
Collections of sequences
NCBI-GenBank, SwissProt etc.
Contain millions of sequences

8
State of the art

Dynamic Programming
Slow but accurate
Never misses a significant alignment
FastA
Faster than Dynamic Programming
Uses statistical heuristics
Reduced sensitivity ? False dismissals
BLAST
Fastest and popular
Lower sensitivity than FastA
Requires whole database in memory!

9
BLAST - on 1,000 Budget!

BODHI experience DSL, 2001
51,000 DNA sequences in database
CAFÉ Experience Williams and Zobel, 2001
120,000 DNA sequences in memory
Time - 67.1 seconds/BLAST

? 10.6 seconds / BLAST
10
NCBI GenBank Growth

Doubles every 13 months
In 1998, estimated 40,000 sequence similarity
queries per day
That was 3 years ago!!

11
We Need Indexes for Sequence Similarity
Searching NOW!!
12
Indexed Searching

Inverted Indexes
RAMdb Fondrat and Dessen, 1995
CAFÉ Williams and Zobel, 2001
FLASH Califano and Rigoutsos, 1993
Multi-Dimensional Indexes
MRS-indexing Kahveci and Singh, 2001
Persistent Prefix Tree Hunt et al., 2001

13
RAMdb (Rapid Access Motif db)

Each sequence in repository is indexed by
constituent overlapping sequences
800-fold speedup over Dynamic Programming
Prohibitive index size
No ranking (goodness) of alignments
False dismissals

ACTC
Seq1, seq2,
Seq1, seq4,
CTCG
14
CAFÉ

Partitioned Search
Coarse searching with compressed inverted index
Fine searching in small fraction of database,
with ranking
14-fold speedup over BLAST
Compression reduces the index size
Distant sequence relationships are lost
Lower retrieval effectiveness

15
MRS - Indexing

Uses progressive wavelet coefficients to
represent sequence

16
MRS-Indexing (contd.)

Builds a hierarchy of Multi-Dim. Indexes
Only for edit distances - no general scoring
schemes
Not suited for average DNA/Protein query lengths

17
Summary

Rapid growth in sequence databases
Existing algorithms do not scale
Indexed approach to Sequence Similarity is
necessary
Improvements needed in Indexed Searching methods

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Bioinformatics The Human Genome Project PowerPoint PPT Presentation

Bioinformatics The Human Genome Project - The Human Genome is completed again! October 2004 - present: ... iPod nano. Image available from: http://www.apple.com/ipodnano/ Downtown Bus Route map ... | PowerPoint PPT presentation | free to view

Patterns in Biological Sequences PowerPoint PPT Presentation

Patterns in Biological Sequences - Can analyze sequence using a Unix mainframe, or with free tools on the Web ... http://www.doe-mbi.ucla.edu/people/fischer/TEST/getsequence.html ... | PowerPoint PPT presentation | free to view

Introduction to Entrez Genome Projects PowerPoint PPT Presentation

Introduction to Entrez Genome Projects - chimpanzee. Organisms. Environmental samples? Data scope of genome resources ... Motility Salinity Oxygen Habitat Temp. Host - Disease. Organism Info ... | PowerPoint PPT presentation | free to view

The Shocking Details of Genome.ucsc.edu PowerPoint PPT Presentation

The Shocking Details of Genome.ucsc.edu - Started in 1999 in C after Java proved hopelessly unportable across browsers. Early modules include a Worm genome browser (Intronerator), and GigAssembler ... | PowerPoint PPT presentation | free to view

Computer Storage of Sequences PowerPoint PPT Presentation

Computer Storage of Sequences - Nomenclature Committee of the International Union of Biochemistry. Protein Sequence ... by the Department of Medical Biochemistry of the University of Geneva (now ... | PowerPoint PPT presentation | free to view

Computational Analysis of Genome Sequences PowerPoint PPT Presentation

Computational Analysis of Genome Sequences - Computational Analysis of Genome Sequences. Motivation, teaching objectives: Presentation of a practical application of Perl programming ... | PowerPoint PPT presentation | free to view

The challenge of annotating a complete eukaryotic genome: A case study in Drosophila melanogaster PowerPoint PPT Presentation

The challenge of annotating a complete eukaryotic genome: A case study in Drosophila melanogaster - Reese et al., Tutorial #3, ISMB 99. The challenge of annotating a complete eukaryotic genome: ... TATA box. Initiator (Inr) Downstream promoter element (DPE) ... | PowerPoint PPT presentation | free to view

The biology of Leishmania beyond the Genome Project PowerPoint PPT Presentation

The biology of Leishmania beyond the Genome Project - Molecular karyotype: definition of syntenic groups ... a molecular karyotype for. L.major -Friedlin. LGN laboratories. Leishmania Genome Project: LGN ... | PowerPoint PPT presentation | free to view

Genome Annotation PowerPoint PPT Presentation

Genome Annotation - Annotation to identify and describe all the physico-chemical, ... Roche/Boeringer. http://www.expasy.org/cgi-bin/search-biochem-index. EcoCyc. www.ecocyc.org ... | PowerPoint PPT presentation | free to view

Reference Genome Project PowerPoint PPT Presentation

Reference Genome Project - What criteria should we use to collect and prioritize genes for the reference genomes? ... Special display in AmiGO. Provide annotations in a separate file ... | PowerPoint PPT presentation | free to view

Genome Rearrangements, Synteny, and Comparative Mapping PowerPoint PPT Presentation

Genome Rearrangements, Synteny, and Comparative Mapping - This study helped pave the way to analyzing genome rearrangements in molecular evolution ... Colon cancer. Colon Cancer. Comparative maps. Waardenburg's Syndrome: ... | PowerPoint PPT presentation | free to view

Bioinformatics Workshop 1 Sequences and Similarity Searches PowerPoint PPT Presentation

Bioinformatics Workshop 1 Sequences and Similarity Searches - It also contains links to the example sequence files used in the workshop, and ... There should be two sequences: surfeit1' for frog and fly. ... | PowerPoint PPT presentation | free to view

Scalable Visual Comparison of Biological Trees and Sequences PowerPoint PPT Presentation

Scalable Visual Comparison of Biological Trees and Sequences - 1. Scalable Visual Comparison of Biological Trees and Sequences ... stretch out part of surface, the rest squishes. borders nailed down. Focus Context technique ... | PowerPoint PPT presentation | free to view

Genome, Protein and Model Organism Databases PowerPoint PPT Presentation

Genome, Protein and Model Organism Databases - Genome, Protein and Model Organism Databases | PowerPoint PPT presentation | free to view

EST sequences and databases Exploring the transcriptome Why PowerPoint PPT Presentation

EST sequences and databases Exploring the transcriptome Why - EST sequences and databases Exploring the transcriptome Why EST sequencing? Systematic sampling of the transcribed portion of the genome ( transcriptome ... | PowerPoint PPT presentation | free to view

Towards utilization of genome sequence information for PowerPoint PPT Presentation

Towards utilization of genome sequence information for - Towards utilization of genome sequence information for pigeonpea improvement By ICAR institutes, SAUs and ICRISAT * * A major source of protein to about 20% of the ... | PowerPoint PPT presentation | free to view

Genome Databases and Analysis PowerPoint PPT Presentation

Genome Databases and Analysis - Genome Databases and Analysis With the advent of the genome sequencing technology, biological research has now easy and fast access to the complete DNA sequences of ... | PowerPoint PPT presentation | free to view

Human Genome Structure and Organization PowerPoint PPT Presentation

Human Genome Structure and Organization - Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G. Genetic Variation Phenotype Expression of the genotype (modified by the environment). | PowerPoint PPT presentation | free to view

Genome-wide Regulatory Complexity in Yeast Promoters PowerPoint PPT Presentation

Genome-wide Regulatory Complexity in Yeast Promoters - Genome-wide Regulatory Complexity in Yeast Promoters Zhu YANG 15th Mar, 2006 Reference C. S. Chin, J. H. Chuang, & H. Li. 2005. Genome-wide regulatory complexity in ... | PowerPoint PPT presentation | free to view

Information Management for Genome Level Bioinformatics PowerPoint PPT Presentation

Information Management for Genome Level Bioinformatics - Title: Incremental Maintenance of Materialized OQL Views Author: Norman Paton Last modified by: norm Created Date: 11/4/2000 11:38:08 AM Document presentation format | PowerPoint PPT presentation | free to view

Whole genome transcriptome variation in Arabidopsis thaliana PowerPoint PPT Presentation

Whole genome transcriptome variation in Arabidopsis thaliana - Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Generalized tiling array HMM 3-state HMM Discrete distribution for emission ... | PowerPoint PPT presentation | free to view

Part 12 Genome Analysis PowerPoint PPT Presentation

Part 12 Genome Analysis - Part 12 Genome Analysis Outline Overview Why do comparative genomic analysis? Assumptions/Limitations Genome Analysis and Annotation Standard Procedure General ... | PowerPoint PPT presentation | free to view

InterPro/prosite UCSC Genome Browser Exercise 3 PowerPoint PPT Presentation

InterPro/prosite UCSC Genome Browser Exercise 3 - InterPro/prosite UCSC Genome Browser Exercise 3 Turning information into knowledge The outcome of a sequencing project is masses of raw data The challenge is to turn ... | PowerPoint PPT presentation | free to view

Indexing genomic sequences PowerPoint PPT Presentation

Indexing genomic sequences - Indexing genomic sequences Outline Introduction Unique markers Multi-layer unique markers Locating SNP on genome Aligning EST ... | PowerPoint PPT presentation | free to view

Genome evolution: a sequence-centric approach PowerPoint PPT Presentation

Genome evolution: a sequence-centric approach - Genome evolution: a sequence-centric approach Lecture 8-9: Concepts in population genetics Genome evolution: a sequence-centric approach Lecture 8-9: Concepts in ... | PowerPoint PPT presentation | free to view

HUMAN GENOME PowerPoint PPT Presentation

HUMAN GENOME - HUMAN GENOME Dr. ANIL KUMAR Officer-Incharge, Bioinformatic Sub Centre & Prof. & Head, School of Biotechnology DEVI AHILYA UNIVERSITY KHANDWA RD. CAMPUS | PowerPoint PPT presentation | free to view

Global Digital Genome Market | Growth, Trends, Analysis PowerPoint PPT Presentation

Global Digital Genome Market | Growth, Trends, Analysis - Global digital genome market is estimated to grow at a CAGR of 9.57% during the forecasting years 2021-2028. Get Free Sample Report. | PowerPoint PPT presentation | free to view