Introduction to Databases - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Introduction to Databases

Description:

Introduction to Databases Daniela Puiu Applications Specialist Center for the Study of Biological Complexity, VCU dpuiu_at_vcu.edu 804-827-0952 General Concepts Database ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 25
Provided by: Daniel1314
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Databases


1
Introduction to Databases
  • Daniela Puiu
  • Applications Specialist
  • Center for the Study of Biological Complexity,
    VCU
  • dpuiu_at_vcu.edu
  • 804-827-0952

2
General Concepts
  • Database definition
  • Organized collection of logically related data
  • Data
  • Known facts
  • Types text, graphics, images, sound, videos
  • Database management system (DBMS)
  • Software package for defining and managing a
    database

3
Database Examples
  • Class roster
  • Hospital patients
  • Literature (published articles in a certain
    field)
  • Genomic information
  • Protein structure
  • Taxonomy
  • Single nucleotide polymorphism

4
Example Microbial Database
Data about the protein coding regions in the
microbial genomes sequenced so far.
  • Gene (protein coding regions)
  • Name
  • Accession number
  • Organism
  • Location on the chromosome (start,end)
  • Strand
  • Size
  • Product
  • Sequence
  • Organism
  • Name
  • Accession number
  • Genome size
  • GC
  • Release date
  • Genome center
  • Sequence

5
Database Models
  • Flat files 60
  • Hierarchical 60
  • Network 70
  • Relational 80
  • Object oriented 90
  • Object relational 90
  • Web enabled 90

6
Database Types (cont.)
Type Typical number of users Typical architecture Typical size
Personal 1 Desktop/Laptop/ PDA MB
Workgroup 5-25 Client/server2 tier MB-GB
Department 25-100 Client/server3 tier GB
Enterprise gt100 Client/server distributed GB-TB
Internet gt1000 Web sever application servers MB-GB
7
Flat Files
  • Characteristics
  • Data is stored as records in regular files
  • Records usually have a simple structure and fixed
    number of fields
  • For fast access may support indexing of fields in
    the records
  • No mechanisms for relating data between files
  • One needs special programs in order to access and
    manipulate the data

8
Flat Files Example
  • Microbial database
  • Genbank format
  • Escherichia coli K12
  • Streptococcus pneumoniae R6
  • Fasta format multiple files
  • Escherichia coli K12 genome , genes , gene
    positions
  • Streptococcus pneumoniae R6 genome , genes ,
    gene positions
  • Data manipulation
  • Sequence extraction, search
  • Indexing
  • Format conversion

9
Relational Database
  • Characteristics
  • Data is organized into tables rows columns
  • Each row represents an instance of an entity
  • Each column represents an attribute of an entity
  • Metadata describes each table column
  • Relationships between entities are represented by
    values stored in the columns of the corresponding
    tables (keys)
  • Accessible through Standard Query Language (SQL)

10
Enterprise data model
  • Graphical representation of the high level
    entities
  • Example Microbial database
  • each organism has multiple corresponding genes
  • OneMany relation

1
m
Organism
Gene
11
Metadata
  • Data that describes the properties or
    characteristics of other data
  • Does not include sample data
  • Allows database designers and users to understand
    the meaning of the data

12
Metadata Data Table
Organism
Name Type Max Length Description
Name Alphanumeric 100 Organism name
Size Integer 10 Genome length (bases)
Gc Float 5 Percent GC
Accession Alphanumeric 10 Accession number
Release Date 8 Release date
Center Alphanumeric 100 Genome center name
Sequence Alphanumeric Variable Sequence
Name Size Gc Accession Release Center Sequence
Escherichia coli K12 4,640,000 50 NC_000913 09/05/1997 Univ. Wisconsin AGCTTTTCATT
Streptococcus pneumoniae R6 2,040,000 40 NC_003098 09/07/2001 Eli Lilly and Company TTGAAAGAAAA

13
Metadata Data Table (cont.)
Gene
Name Type Max Length Description
Name Alphanumeric 100 Gene name
Accession Alphanumeric 10 Gene accession number
OAccesion Alphanumeric 10 Organism accession number
Start Integer 10 Gene start
End Integer 10 Gene end
Strand Character 1 Gene strand
Product Alphanumeric 1000 Gene annotation
Sequence Alphanumeric Variable Gene sequence
Name Accession OAccession Start End Strand Product Sequence
thrL 16127995 NC_000913 190 255 the operon leader peptide MKRI
thrA 16127996 NC_000913 337 2799 homoserine dehydrogenase I MRVL
transposase_A 15902058 NC_003098 20207 20554 transposase MWYN
14
Relationships
  • Used to connect tables
  • Field(s) that have the same value in the related
    tables
  • Organism.AccessionGene.OAccession
  • Organism.Accession
  • Unique
  • Primary key
  • Gene.OAccession
  • Not unique
  • Secondary key

15
SQL
  • ANSI (American National Standards Institute)
    standard computer language for accessing and
    manipulating database systems.
  • SQL statements are used to retrieve and update
    data in a database.
  • Includes
  • Data Manipulation Language (DML)
  • Data Definition Language (DDL)

16
Data Manipulation Language
  • Syntax for executing queries, updating,
    inserting, and deleting records.
  • SELECT - extracts data from one or more table
  • INSERT INTO - inserts new data into a table
  • UPDATE - updates data in a table
  • DELETE FROM - deletes data from a table

17
DML Example
  • Select all Escherichia coli K12 genes which are
    in the 1MB-2MB region of the chromosome
  • SELECT
  • FROM Organism, Gene
  • WHERE
  • Organism.NameEscherichia coli K12 AND
    Organism.AccessionGene.OAccession AND
    Gene.Startgt1,000,000 AND
  • Gene.Endlt2,000,000

18
DML Example (cont.)
  • INSERT INTO Gene
  • (Name, Accession, OAccession, Start, End, Strand,
    Sequence)
  • VALUES
  • (thrL, 16127995,NC_000913,190,255,,thr
    operon leader peptide, MKRI)
  • UPDATE Gene SET Start160 WHERE Accession
    NC_000913
  • DELETE FROM Gene WHERE Accession NC_000913

19
Data Definition Language
  • Syntax for creating ,editing, deleting
  • Databases
  • Tables
  • Views
  • Indexes
  • Constraints
  • Users
  • Privileges

20
DDL Examples
  • CREATE DATABASE Microbial
  • CREATE TABLE Organism (
  • Name varchar(100)
  • Size int(10)
  • Gc decimal(5)
  • Accession varchar(10)
  • Release date(8)
  • Center varchar(100))
  • ALTER TABLE Organism ADD Sequence varchar
  • DROP TABLE Organism

21
DBMS
  • Software package for defining and managing a
    database.
  • Examples
  • Proprietary MS Access, MS SQL Server, DB2,
    Oracle, Sybase
  • Open source MySql, PostgreSQL

22
DBMS Advantages
  • Program-data independence
  • Minimal data redundancy
  • Improved data consistency quality
  • Access control
  • Transaction control
  • Improved accessibility data sharing
  • Increased productivity of application development
  • Enforced standards

23
Web Databases
  • Data is accessible through Internet
  • Have different underlying database models
  • Example biological databases
  • Molecular data NCBI , Swissprot , PDB , GO
  • Protein interaction DIP , BIND
  • Organism specific Mouse , Worm, Yeast
  • Literature Pubmed
  • Disease

24
CSBC Resources
  • Database and software list
  • Molecular databases Genbank, EMBL, NR, NT,
    RefSeq, Swissprot
  • DBMS
  • MS Excel, MS Access
  • MySQL, PostgreSQL
  • Computer resources
  • watson.vcu.edu 8 processor Sun server
  • medusa.vcu.edu 64 processor Beowulf cluster
Write a Comment
User Comments (0)
About PowerShow.com