Information Retrieval from Biological Database - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Information Retrieval from Biological Database

Description:

3.3 Gene-Centric Information Retrieval: LocusLink (superseded by Entrez Gene) ... Cubby (superseded by My NCBI). Limits and History. Structure. ... – PowerPoint PPT presentation

Number of Views:1510
Avg rating:3.0/5.0
Slides: 70
Provided by: asia1
Category:

less

Transcript and Presenter's Notes

Title: Information Retrieval from Biological Database


1
Information Retrieval from Biological Database
2
  • 3.1 Introduction
  • 3.2 Integrated Information Retrieval The Entrez
    System
  • 3.3 Gene-Centric Information Retrieval LocusLink
    (superseded by Entrez Gene)
  • 3.4 Sequence Database Beyond NCBI
  • 3.5 Medical Databases
  • 3.6 Summary

3
3.1 Introduction
  • The Human Genome Projects major goal was
    completed in April 2003.
  • High-quality sequence information are attainable
    from public database that advance the research of
    bioinformatics and accelerating biological
    discovery.
  • In this chapter, we discussion centers on
    querying database at NCBI.

4
3.2 Integrated Information Retrieval The Entrez
System
  • The most widely used interfaces for the retrieval
    of information from biological database is the
    NCBI Entrez system.
  • Entrez is not a database itself, but rather is
    the interface through which all of its component
    databases can be accessed and traversed.

5
Relationships Between Database Entries
Neighboring
  • The concept of neighboring allows for entries
    within a given database to be connected to one
    another.
  • The establishment of neighboring relationships
    within a database is based on statistical
    measures of similarity, as follows.

6
  • BLAST. (sequence similarity)
  • Sequence data are compared with one another
    using the Basic Local Alignment Search Tool.
  • 2. VAST. (structure similarity)
  • Sets of coordinate data are compared using
    a vector-based method known as VAST, for Vector
    Alignment Search Tool.

7
  • 3. Weighted Key Terms
  • relevance pairs model of retrieval
  • (1) define the weighted key terms or the
    common words.
  • (2) score the pairs of key terms
  • closer together gt apart
  • (3) score the common words
  • in title gt in abstract
  • infrequency gt frequency

8
Hard Links
  • Hard links are applied between entries in
    different databases and exist everywhere.
  • Searches can, in essence, begin anywhere within
    Entrez.

9
The Entrez Discovery Pathway
  • Navigating the Entrez Search Space.
  • 1. Coupling of search terms with Boolean
    operator AND, OR, NOT.
  • 2. Using tag. Table 3.1
  • Cubby (superseded by My NCBI).
  • Limits and History.
  • Structure.

10
Navigating the Entrez Search Space
11
(No Transcript)
12
MEDLINE MH term MH medical subject heading
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Cubby (My NCBI)
35
(No Transcript)
36
  • My NCBI is a central place to customize NCBI Web
    services.
  • My NCBI is free.
  • You can use My NCBI to
  • 1. Save searches
  • 2. Set up e-mail alerts for new content
  • 3. Display links to Web resources (LinkOut)
  • 4. Choose filters that group search results

37
Limits and History
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Structure
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
3.3 Gene-Centric Information Retrieval LocusLink
(superseded by Entrez Gene)
48
(No Transcript)
49
(No Transcript)
50
3.4 Sequence Database Beyond NCBI
  • Specialized database with additional information
    do not always fit the NCBI data model.
  • Nucleic Acid Research devotes its first issue
    every year to papers describing these databases.
  • MGD, FlyBase, SGD

51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
  • NAR Databases
  • http//www3.oup.co.uk/nar/database/c/

55
3.5 Medical Databases
  • OMIM a database based on genetic disorder.
  • Nucleic Acid Research devotes its first issue
    every year to papers describing these databases.
  • MGD, FlyBase, SGD

56
Gene symbol
57
(No Transcript)
58
  • OMIM numbering system
  • The mode representation of first digit
  • ex. 604896 or 236700
  • 1 autosomal dominant
  • 2 autosomal recessive
  • 3 X-linked locus or phenotype
  • 4 Y-linked locus or phenotype
  • 5 mitochondrial
  • 6 autosomal locus or phenotype.

59
  • Phenotype is caused by single locus mutation.
  • Phenotype is caused by multiple loci mutation.

60
The information in each entry
  • Gene symbol
  • Alternate name for the disease
  • Disease description including allele variants
  • Clinic synopsis
  • Reference

61
(No Transcript)
62
Map Viewer
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com