Biological Database Systems - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Biological Database Systems

Description:

Project work: details will be given next time ... SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, ... – PowerPoint PPT presentation

Number of Views:167
Avg rating:3.0/5.0
Slides: 25
Provided by: denissh
Category:

less

Transcript and Presenter's Notes

Title: Biological Database Systems


1
Biological Database Systems
  • Denis Shestakov,
  • University of Turku/Tampere

2
Course Information
  • Contact info
  • Email
  • Office B6019, ICT
  • Course URL http//users.utu.fi/denshe/biodb/index
    .html
  • Course Blog (will be updated occasionally)
    http//biodb.wordpress.com/

3
Course Information
  • Course structure
  • Lectures(topics) approx. 12 (plus todays intro
    and review lecture in the end of the course)
  • Project work details will be given next time
  • Exam easy to pass if all assignments are
    completed

4
Course Information
  • Project work
  • Build database that combines data from several
    sources
  • Please send to me your suggestions (if you have
    any) for project, otherwise project will be
    suggested by me
  • Design part
  • E-R schema, relational schema, XML schema
  • Deploying part
  • Data converters
  • Building relational db ( XML-based db) based on
    corresponding schemas
  • Performance comparison relational database vs.
    XML-based db vs. file storage system
  • Invoking web services

5
Course Information Literature
  • Slides
  • References in the end of slides
  • Books
  • Database Systems Concepts, 5th edition by
    Silbershatz, Korth Sudarshan, McGraw-Hill, 2005
    ISBN-10 0072958863
  • Bioinformatics Managing Scientific Data by
    Lacroix Critchlow, Morgan Kaufmann, 2003
    ISBN-10 155860829X
  • Articles
  • Biological database design and implementation by
    Birney Clamp (the Ensembl project), Briefings
    in Bioinformatics, 5(1)31-38, 2004

6
Biological Database Systems
  • 1.1. Course Content
  • 1.2. Course Objectives
  • 1.3. Database and DBMS
  • 1.4. Biological Databases

7
Course content main topics
  • Database concepts, overview of database design
    process
  • Entity-relationship (ER) data model
  • Relational data model
  • Introduction to SQL
  • XML and XML Schema
  • Design of biological database systems

8
Course content main topics
  • Entity-attribute-value (EAV) modeling
  • Model organism databases
  • Web services
  • Integration of biological data
  • Analysis workflows

9
Course focus
  • Database issues
  • Biology-specific
  • Representation of biological data
  • Design of biological databases
  • NOT about
  • Usage of existing databases
  • Accessing/retrieving data from bio-databases

10
Course goal
  • Give basic knowledge of biological database
    design

- for molecular biology
11
Do you need to know this?
  • Work in wet laboratory
  • One bioinformatician and many biologists
  • Likely to be IT guru for others
  • Expect to answer IT-related questions
    (database-related too)
  • Work in bioinformatics lab
  • Many bioinformaticians
  • Group may maintain several dbs
  • Basics are helpful
  • Interested in creating/maintaining biological
    databases
  • Start learning!
  • Ask for more information

12
Database?
From Merriam-Webster dictionary (http//www.merri
am-webster.com/dictionary/database)
13
Database?
  • A collection of data
  • structured
  • searchable (i.e., indexable)
  • updated
  • cross-referenced
  • Objective
  • Transform meaningless raw data into useful
    information which can be accessed and analyzed in
    the best way
  • Database Management System (DBMS)
  • software designed for the purpose of managing
    databases (access, insert, delete, update, etc.)

14
DBMS database management system
  • A set of tools that
  • Store
  • Extract
  • Modify

15
Biological Databases?
  • Explosive growth in biological data
  • E.g., tremendous increase in nucleotide sequences
    (first increase in data due to the polymerase
    chain reaction (PCR) technique development in
    1983)
  • 1980 80 genes fully sequenced

16
Biological Databases?
  • EMBL Database Growth

Total nucleotides (Nov 07 188,490,792,445)
Number of entries(Nov 07 106,144,026)
17
Biological Databases?
  • Database systems are crucial for managing large
    and very large collections of data
  • Data (genomic sequences, 3D structures, 2D gel
    analysis, microarrays.) directly submitted to
    databases
  • Essential tools for biological research, like
    reading relevant literature

18
Biological Databases History
  • 1965
  • Margaret Dayhoff et al. publish Atlas of Protein
    Sequences and Structures
  • 1982
  • EMBL initiates DNA sequence databases, followed
    within a year by GenBank and in 1984 by the DNA
    Database of Japan
  • 1988
  • EMBL/GenBank/DDBJ agree on common format for data
    elements

19
Biological Databases some statistics
  • More than 1000 different databases
  • 968 databases reported in The Molecular Biology
    Database Collection 2007 update by Galperin,
    Nucleic Acids Research, 2007, Vol. 35, Database
    issue D3-D4
  • Metabase database of biological databases,
    http//biodatabase.org/index.php/Main_Page
  • Database sizes lt100kB to gt100GB (EMBL gt500GB)
  • DNA gt100GB
  • Protein 1GB
  • 3D structure 5GB
  • Update (adding new data) frequency daily to
    annually
  • Freely accessible (as a rule)

20
Some databases in the field of molecular biology
  • AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb,
  • ARR, AsDb, BBDB, BCGD, Beanref,
    Biolmage,
  • BioMagResBank, BIOMDB, BLOCKS,
    BovGBASE,
  • BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,
  • CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,
  • ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,
  • CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP,
    DictyDb,
  • Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract,
    ECDC,
  • ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,
  • ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,
  • GCRDB, GDB, GENATLAS, Genbank, GeneCards,
  • Genline, GenLink, GENOTK, GenProtEC,
    GIFTS,
  • GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,
  • HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,
  • HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,
  • HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,
  • KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,
  • Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5
  • Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,

Find more at http//biodatabase.org
21
Categories of Biological Databases
  • Nucleotide sequences
  • Genomics (information on gene chromosomal
    location and nomenclature, provide links to
    sequence databases)
  • Mutation/polymorphism (sequence variations linked
    or not to genetic diseases)
  • Protein sequences
  • Protein domain/family
  • Proteomics (2D gel, MS)

22
Categories of Biological Databases
  • Microarray (high-dimensional data profiles of
    thousands of genes depending on
    hundreds/thousands of various conditions)
  • Organism-specific
  • 3D structure
  • Metabolism (e.g., metabolic pathways graph
    data)
  • Bibliography
  • Others

23
Biological Databases specific features
  • Sub-class of scientific databases
  • Autonomous many independent maintainers
  • Heterogeneous data formats e.g., various data
    formats for the same data entities various types
    of biological data genomic, microarray,
    proteomic, ...
  • Dynamic frequent and continuous changes in data
    content (and, more importantly, in data schema)
  • Broad domain knowledge
  • Workflow-oriented databases rich set of
    analysis tools
  • Information integration is essential data
    aggregation from several databases

24
Biological Databases integration
Figure is taken from Bioinformatics Managing
Scientific Data by Lacroix Critchlow, p.20
Write a Comment
User Comments (0)
About PowerShow.com