Biological Database Systems - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Biological Database Systems

Description:

Relational schema for protein sequences: information is distributed ... if there is no non-null amount ... Mozilla Firefox, Skype, .... PDAs, Symbian ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 46
Provided by: denissh
Category:

less

Transcript and Presenter's Notes

Title: Biological Database Systems


1
Biological Database Systems
  • 5.1. Introduction to SQL Querying
  • 5.2. Introduction to SQL Modifying data
  • 5.3. SQLite Demo
  • Assignment
  • - Extracting and preparing data for bulk load
    into relational dbms

2
Relational data model relation
Fields (attributes, columns)
Tuples (records, rows)
  • Cardinality7, degree3

3
Relational data model example
Relational schema for protein sequences
information is distributed over three tables
4
Relational data model example
Flat-file representation (data in just one table)
If two proteins share the same function of taxon
the information in some attributes is duplicated
5
SQL Queries
  • Sailors(sidinteger, snamestring,
    ratinginteger, agereal)
  • Boats(bidinteger, bnamestring, colorstring)
  • Reserves(sidinteger, bidinteger, daydate)

6
SQL Queries Query Structure
  • A typical SQL query has the form
  • SELECT DISTINCTALL A1, A2,
    ..., An FROM r1, r2, ..., rm
    WHERE P
  • Ai represents an attribute
  • ri represents a relation
  • P is a predicate (conditions that results must
    satisfy)
  • DISTINCT or ALL are optional keywords DISTINCT
    means that the answer should not contain
    duplicates (duplicates are not eliminated by
    default!) ALL is used by default and omitted
  • The result of an SQL query is a relation (result
    can be queried again)

7
SQL Queries SELECT clause
  • Find the names and ages of all sailors
  • SELECT DISTINCT Sailors.sname, Sailors.age
  • FROM Sailors
  • (in this query Sailors.name sname)
  • Results Results
    (without

  • DISTINCT)

8
SQL Queries SELECT clause
  • An asterisk in the SELECT clause denotes
    all attributes SELECT FROM Sailors
  • The SELECT clause can contain arithmetic
    expressions involving the operation, , ,, and
    /, and operating on constants or attributes of
    tuples
  • The query SELECT rating2 FROM Sailors returns
    one-column relation where the values of the
    rating attribute from the Sailors relation
    incremented by 2

9
SQL Queries WHERE clause
  • The WHERE clause specifies conditions that the
    result must satisfy
  • Find all sailors who are older than 30 with a
    rating above 7
  • SELECT sid, sname, rating, age
  • FROM Sailors WHERE ratinggt7 AND
    age gt30
  • Comparisons can be combined using the logical
    connectives AND, OR, and NOT
  • Comparisons can be applied to results of
    arithmetic expressions

10
SQL Queries FROM clause
  • The FROM clause lists the relations involved in
    the query
  • SELECT FROM Sailors,Reserves the Cartesian
    product of Sailors X Reserves (or cross-product
    of tables Sailors and Reserves)
  • Note for simplicity consider the following
    instances of Sailors and Reserves
  • Cross-product

11
SQL Queries FROM clause
  • Find the names of sailors who have reserved boat
    number 103
  • SELECT Sailors.sname
  • FROM Sailors, Reserves
  • WHERE Sailors.sidReserves.sid and
    Reserves.bid103

12
SQL Queries Tuple Variables
  • SELECT Sailors.sname
  • FROM Sailors, Reserves
  • WHERE Sailors.sidReserves.sid and
    Reserves.bid103
  • SELECT S.sname
  • FROM Sailors AS S, Reserves AS R
  • WHERE S.sidR.sid and R.bid103
  • SELECT S.sname , age
  • FROM Sailors AS S, Reserves AS R
  • WHERE S.sidR.sid and bid103
  • Keyword AS is optional, may be omitted

13
SQL Queries String Operations
  • SQL includes a string-matching operator for
    comparisons on character strings. The operator
    LIKE uses patterns that are described using two
    special characters
  • percent () matches any substring
  • underscore (_) matches any character
  • Find the ages of sailors whose name begins and
    ends with B and has at least three characters
  • SELECT S.age
  • FROM Sailors S
  • WHERE S.sname LIKE B_B
  • Match the name Main LIKE Main\ ESCAPE \
  • SQL supports a variety of string operations
  • concatenation (using )
  • converting from upper to lower case (and vice
    versa)
  • finding string length, extracting substrings, etc.

14
SQL Queries Ordering the display of tuples
  • List in alphabetic order the names of sailors who
    reserved at least one boat
  • SELECT Sailors.sname FROM Sailors S,
    Reserves R
  • WHERE S.sidR.sid
  • ORDER BY S.sname
  • We may additionally specify DESC for descending
    order or ASC for ascending order, for each
    attribute ascending order is the default
  • Example ORDER BY sname DESC

15
SQL Queries Set operations
  • The set operations UNION, INTERSECT, and
    EXCEPT(or MINUS) operate on relations and
    correspond to the relational algebra operations
    ???????
  • Each of these operations automatically eliminates
    duplicates to retain all duplicates use the
    corresponding multiset versions UNION ALL,
    INTERSECT ALL and EXCEPT ALL
  • If a tuple occurs m times in r and n times in
    s, then, it occurs
  • m n times in r UNION ALL s
  • min(m,n) times in r INTERSECT ALL s
  • max(0, m n) times in r EXCEPT ALL s
  • Note set operations may not be supported by
    certain RDBMSs (e.g., no support of INTERSECT and
    EXCEPT in MySQL)

16
SQL Queries Set operations
  • Find all customers who have a loan, an account,
    or both

(SELECT customer_name FROM depositor)UNION(SELEC
T customer_name FROM borrower)
17
SQL Queries Set operations
Find all customers who have both a loan and an
account
(SELECT customer_name FROM depositor)INTERSECT(S
ELECT customer_name FROM borrower)
Find all customers who have an account but no
loan
(SELECT customer_name FROM depositor)EXCEPT(SELE
CT customer_name FROM borrower)
18
SQL Queries Set operations
19
SQL Queries Set operations
20
SQL Queries Aggregate Functions
  • These functions operate on the multiset of values
    of a column of a relation, and return a value
  • AVG average value
  • AVG(A), AVG(DISTINCT
    A) MIN minimum value MAX maximum
    value SUM sum of values
  • SUM(A), SUM(DISTINCT
    A) COUNT number of values
  • COUNT(), COUNT(A),
    COUNT(DISTINCT A)

21
SQL Queries Aggregate Functions
22
SQL Queries Aggregate Functions GROUP BY
How to apply aggregate operations to each of a
number of groups of rows in a relation, where the
number of groups depends on the relation instance
(i.e., is not known in advance)?
Suppose we know that ratings are integers in the
range 1 to 10. Then, the following (ten) queries
are required
But, what if rating values are unknown?
23
SQL Queries Aggregate Functions GROUP BY,
HAVING
  • Predicates in the HAVING clause are applied
    after the formation of groups, whereas predicates
    in the WHERE clause are applied before forming
    groups
  • One answer tuple is generated per qualifying
    group

24
SQL Queries Aggregate Functions GROUP BY,
HAVING
Find the age of the youngest sailor with agegt18
for each rating with at least two such sailors
Eliminating columns
25
SQL Queries Aggregate Functions GROUP BY,
HAVING
Sorting to identify the groups
26
SQL Queries NULL values
  • Tuples may have a null value, denoted by NULL,
    for some of their attributes
  • NULL signifies an unknown value or that a value
    does not exist
  • The predicate IS NULL can be used to check for
    null values
  • Example Find all loan number which appear in the
    loan relation with NULL values for amount
  • SELECT loan_number FROM loan
    WHERE amount IS NULL
  • The result of any arithmetic expression involving
    NULL is NULL
  • Example 5 NULL returns NULL
  • However, aggregate functions simply ignore nulls
    (see next slides)

27
SQL Queries NULL values
  • Any comparison with NULL returns unknown
  • Example 5 lt NULL or NULL ltgt NULL or NULL
    NULL
  • Three-valued logic using the truth value unknown
  • OR (unknown or true) true
  • (unknown or false) unknown
    (unknown or unknown) unknown
  • AND (true and unknown) unknown
    (false and unknown) false (unknown
    and unknown) unknown
  • NOT (not unknown) unknown
  • P is unknown evaluates to true if predicate P
    evaluates to unknown
  • Result of WHERE clause predicate is treated as
    false if it evaluates to unknown

28
SQL Queries NULL values
  • Total all loan amounts
  • SELECT SUM (amount ) FROM loan
  • This statement ignores NULL amounts
  • Result is NULL if there is no non-null amount
  • All aggregate operations except COUNT() ignore
    tuples with NULL values on the aggregated
    attributes

29
SQL Queries Nested queries
  • SQL provides a mechanism for nesting of
    subqueries
  • A subquery is a SELECT-FROM-WHERE expression,
    that is nested within another query i.e., a
    WHERE clause can itself contain an SQL query
  • (compare with the same query,
    slide 157)
  • To find sailors whove not reserved 103, use NOT
    IN

30
SQL Queries Nested queries
31
SQL Queries Views
  • In some cases, it is not desirable for all users
    to see the entire logical model (i.e., all the
    actual relations stored in the database)
  • Consider a person who needs to know a customers
    name, loan number and branch name, but has no
    need to see the loan amount this person should
    see a relation described, in SQL, by
  • ( SELECT customer_name, borrower.loan_number,
    branch_name
  • FROM borrower, loan WHERE
    borrower.loan_number loan.loan_number )
  • A view provides a mechanism to hide certain data
    from the view of certain users (virtual
    relation)
  • Any relation that is not of the conceptual model
    but is made visible to a user as a virtual
    relation is called a view

32
SQL Queries Views
  • A view is defined using the CREATE VIEW statement
    which has the form
  • CREATE VIEW v AS lt query expression gt,
  • where ltquery expressiongt is any legal SQL
    expression the view name is represented by v
  • Once a view is defined, the view name can be used
    to refer to the virtual relation that the view
    generates
  • When a view is created, the query expression is
    stored in the database the expression is
    substituted into queries using the view

33
SQL Queries Modification
  • branch (branch_name, branch_city, assets)
  • customer (customer_name, customer_street,
  • customer_city)
  • loan (loan_number, branch_name, amount)
  • borrower (customer_name, loan_number)
  • account (account_number, branch_name, balance)
  • depositor (customer_name, account_number)

34
SQL Queries Modification - deletion
  • Delete all account tuples at the Perryridge
    branch
  • DELETE FROM accountWHERE branch_name
    'Perryridge'
  • Delete all accounts at every branch located in
    the city Needham
  • DELETE FROM accountWHERE branch_name IN
  • (SELECT branch_name
    FROM branch WHERE branch_city
    'Needham')

35
SQL Queries Modification - insertion
  • Add a new tuple to account
  • INSERT INTO accountVALUES ('A-9732',
    'Perryridge', 1200)
  • Or equivalently (recommended) INSERT INTO
    account(branch_name, balance, account_number)
  • VALUES ('Perryridge', 1200, 'A-9732')
  • Add a new tuple to account with balance set to
    NULL value
  • INSERT INTO accountVALUES ('A-9732',
    'Perryridge', NULL)

36
SQL Queries Modification - insertion
  • INSERT INTO table1 SELECT FROM table2
  • Suppose bank provide a gift for all loan
    customers of the Perryridge branch a 200
    (units) savings account, let loan numbers be
    account numbers for new savings account
  • INSERT INTO accountSELECT loan_number,
    branch_name, 200FROM loanWHERE branch_name
    'Perryridge'
  • INSERT INTO depositorSELECT customer_name,
    loan_numberFROM loan, borrowerWHERE branch_name
    'Perryridge' AND loan.account_number
    borrower.account_number
  • The SELECT FROM WHERE statement is evaluated
    fully before any of its results are inserted

37
SQL Queries Modification - updates
  • Increase all accounts with balances over 10,000
    by 6, all other accounts receive 5
  • Two UPDATE statements
  • UPDATE account SET balance balance ? 1.06
    WHERE balance gt 10000
  • UPDATE account SET balance balance ?
    1.05 WHERE balance ? 10000
  • The order (of UPDATEs) is important

38
SQL Queries Some remarks
  • Typically, biological databases are read-only
    after creation/population num.of reads gtgt num.of
    writes
  • Each RDBMS uses its own SQL statements/features
    , which are valid in one systems and may not work
    in others or work but differently (see slide 161
    for a specific example) do read specific manual!
    )
  • In terms of SQL language
  • Database creation CREATE TABLE statements
  • Database (schema) modification not covered in
    this course
  • Database population (data loading) INSERT
    statements
  • Querying database (data reading) SELECT
    statements
  • Data modification (data writing)
    DELETE/INSERT/UPDATE statements

39
SQLite
  • SQLite is a SQL database engine, which is
  • Self-contained
  • Serverless
  • Zero-configuration
  • Transactional
  • Most widely deployed database
  • Mozilla Firefox, Skype, .
  • PDAs, Symbian smartphones, iPhones
  • Convenient for educational purposes
  • Open-source
  • Current version (Linux/Mac/Windows) 3.5.4
  • http//www.sqlite.org
  • Documentation http//www.sqlite.org/docs.html
  • Books
  • The Definitive Guide to SQLite by Owens, ISBN-10
    1590596730
  • SQLite by Newman, ISBN-10 067232685X

40
SQLite
  • Implements most of SQL92
  • SQL features that are not supported
  • FOREIGN KEY constraints are parsed but are not
    enforced (technically, the equivalent constraint
    enforcement can be achieved using triggers)
  • VIEWs (see CREATE VIEW) are read-only one can
    not execute a DELETE, INSERT, or UPDATE statement
    on a view
  • No user access permissions (the only access
    permissions that can be applied are the normal
    file access permissions of the underlying
    operating system)
  • Also see those specified here http//www.sqlite.or
    g/omitted.html

41
SQLite Interfaces
  • Command-line sqlite3.exe
  • GUI SQLiteSpy
  • Perl DBIDBDSqlite modules, http//dbi.perl.org
  • Other programming languages Java, Python,
  • SQLite interface for the R statistical package,
    http//cran.r-project.org/src/contrib/Descriptions
    /RSQLite.html

42
SQLite Demo
  • Use data (database dump) from COGENT (Complete
    Genome Tracking Database) (COmplete GENome
    Tracking (COGENT) a flexible data environment
    for computational genomics by Janssen et al.,
    Bioinformatics, 19(11), 2003) http//cgg.ebi.ac.u
    k/services/cogent/
  • Very simple schema

43
SQLite Demo
  • See URL
  • http//users.utu.fi/denshe/biodb/sqlite_demo.html

44
Project Description
  • Extracting and preparing data for bulk load
    into relational dbms (SQLite) (before Feb 4)
  • INSERT statements
  • Decide on sources, on what and how to extract,
    then extract and convert to a set of INSERTs
  • More information will be given on the course page

45
  • References
  • Any introduction to SQL, e.g.
  • http//www.w3schools.com/sql/sql_intro.asp
  • http//sqlzoo.net
  • http//www.1keydata.com/sql/sql.html
Write a Comment
User Comments (0)
About PowerShow.com