Bioinformatics - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Bioinformatics

Description:

Bioinformatics & LIS A brief talk for librarians, information scientists, and computer scientists about resources and collaborative opportunities with biology. – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 24
Provided by: Simm63
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics


1
Bioinformatics LIS
  • A brief talk for librarians, information
    scientists, and computer scientists about
    resources and collaborative opportunities with
    biology.
  • April 18, 2006
  • G. Benoit

2
Outline of the talk
  • Bioinformatics defined
  • Generation of data
  • Tools and databases
  • Activities for Librarianship,
  • Computer and
  • Information Science
  • Examples
  • Entrez, NCBI, Visualization
  • Collaborations

3
Bioinformatics defined
  • Over 70 defintions
  • Differences arise from the work
  • Natl Center for Biotechnical Information (NCBI)
  • The development of new algorithms and statistics
    with which to assess relationships among members
    of large data sets
  • The analysis and interpretation of various types
    of data including nucleotide and amino acid
    sequences, protein domains, and protein
    structures and
  • The development and implementation of tools that
    enable efficient access and management of
    different types of information.

4
Without getting into the science
  • How the data started
  • Four chemical bases (purines adenine (A), guanin
    (G) and pyrimidines cytosine (C) and thymine
    (T) )
  • Their precise order and linking (attached to a
    sugar molecule and to a phosphate molecule to
    create a nucleotide)

5
DNA
6
  • A pairs with T G with C to make unique and very
    long strings, called sequences
  • E.g., AATGACCAT codes for a different gene than
    GGGCCATAG would
  • Replication RNA consists of A, G, C, and Uracil
    and has ribose instead of deoxyribose
  • Point is one can predict missing data, sometimes

7
In short
the nucleotides are linked in a certain order or
sequence through the phosphate group their
precise order and linking within the DNA
determines what proteins the gene produces and
the phenotype of the organism
8
Generation of Data
  • Raw data from sequencing
  • Expression data
  • Data generated by linking other raw data in very
    large, multidimensional databases (e.g., OMIM)
  • Research literature (full-text journals)
  • Data models to describe the literature for
    retrieval, linking to other data, and linking to
    the raw data
  • New data models to support greater
  • flexibility in describing manipulating
  • data

9
Generation of Data
  • To support integrated search and retrieval
  • To focus on single organisms or find similarities
    across them
  • Feed other technology
  • Visualization of natural phenomena and of
    abstract phenomena

10
Tools Databases
  • A host of tools for database searching
  • BLAST (basic local alignment search tool)
  • FASTA (sequence strings)
  • ChopUp (protein analysis)
  • Integrated packages (Lasergene Sequence Analysis
    Software)
  • The many services offered through
  • NCBI and NLM

11
  • Take a look at handout, Table 1, publically
    accessible databases

12
Data Categories
  • Monographs, Journals, Announcements (text)
  • Datasets
  • Bibliographic (http//www.expasy.org/links.html)
  • Taxonomic
  • Nucleic acid
  • Genomic (e.g., GDB, OMIM)
  • Protein DB (SwissProt, TrEMBL)
  • Protein families, domains, and functional sites
  • Proteomics initiative
  • Enzyme/metabolic pathways
  • Sequence Retrieval System (SRS) and NCBI Data
    Model

13
  • Take a look at handout, Table 2,
    publically-accessible databases defined and then
  • Entrez sample, Table 3

14
Entrez example
  • Notice the familiar access points (author,
    journal, title) as well as domain-specific ones
    (exon, gene, organism)
  • Notice, too, the DNA

15
NCBI Homepage
  • http//www.ncbi.nih.gov/
  • Notice the variety of tools (left menu)
  • Site map http//www.ncbi.nih.gov/Sitemap/index.ht
    ml
  • Alpha list http//www.ncbi.nih.gov/Sitemap/AlphaLi
    st.html

16
Linking across resources
  • http//www.ncbi.nlm.nih.gov/entrez/query/static/li
    nking.html
  • NCBIs structure database is called Molecular
    Modeling Database (MMDB), and is a subset of
    non-theoretical models 3D structures obtained
    from the Protein Data Bank (PDB). Data are
    obtained from X-ray crystallography and
    NMR-spectroscopy. Goal is to make it easier to
    compare structures.
  • Searching variety of access points author,
    title, text terms, or a PDB 4-character code or a
    numerical MMDB-id
  • MMDB Data PDB records are parsed (to extract
    sequences and citations from PDB records, and
    structural info). Converted to ASN.1.
  • Taxonomy is used to help end users see term
    relationships and databases, along with
    literature references
  • Example http//www.ncbi.nlm.nih.gov/Taxonomy/ta
    x.html/
  • http//www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwta
    x.cgi?modeUndefnameEscherichiacolilvl0srchm
    ode1

17
Linking across resources
  • XML - there are hundreds of XML schema used in
    biology
  • Calls for mapping to ASN1 records see NCBI
    example
  • Calls for mapping across schema
  • Calls for exporting data for different devices

18
Visualization
  • Cn3D - uses MMDB-Entrezs structure database
  • http//www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.sh
    tml
  • RasMol http//www.umass.edu/microbio/rasmol/
  • Protein Explorer http//www.umass.edu/microbio/ras
    mol/rotating.htm
  • OpenRasMol http//www.openrasmol.org/
  • MolviZ.org http//www.umass.edu/microbio/chime
  • World Index of Molecular Visualization
    http//molvis.sdsc.edu/visres/index.html

19
Recap main points
  • Very large data sets - homogenized thru ASN.1
  • Goal to integrate (text-text, visualization-text,
    text-vis)
  • Raw data research literature visualization
  • Biologists provide domain knowledge
  • XML is a big player
  • CS and IS provide technology
  • Librarians provide maintenance and access to
    resources

20
Collaborative Opportunities
  • For LIS and CS
  • Domain analysis
  • information use, communication, theories of
    information
  • systems analysis and design,
  • data modeling,
  • classification,
  • storage and retrieval,
  • HCI mapped onto a generalized model of a
    molecular biology experimental cycle
  • Denn MacMullen, 2002, p. 556

21
Collaborative Opportunities
  • Insertion Points - development of new tools and
    methods for managing, integrating visualization
  • For local use download selected
  • data sets for local needs
  • (Stapley Benoit, 2000)
  • XML Transformations
  • XML - SVG - X3D
  • Automated retrieval
  • Clustering (data- and text-mining)

22
Collaborative Opportunities
  • Biologists needs
  • To go beyond mining of genomic data to
    investigate causal entailments in intra- and
    intracellular dynamics
  • LISs response
  • To aid understanding of the scientific processes
    thru visualization of literature, metadata and
    graphic representations in general and for
    disease-specific analysis

23
Back to you
  • Thanks

woo-hoo!
Write a Comment
User Comments (0)
About PowerShow.com