Title: Biological Database Systems
1Biological Database Systems
- 5.1. Introduction to SQL Querying
- 5.2. Introduction to SQL Modifying data
- 5.3. SQLite Demo
- Assignment
- - Extracting and preparing data for bulk load
into relational dbms
2Relational data model relation
Fields (attributes, columns)
Tuples (records, rows)
3Relational data model example
Relational schema for protein sequences
information is distributed over three tables
4Relational data model example
Flat-file representation (data in just one table)
If two proteins share the same function of taxon
the information in some attributes is duplicated
5SQL Queries
- Sailors(sidinteger, snamestring,
ratinginteger, agereal) - Boats(bidinteger, bnamestring, colorstring)
- Reserves(sidinteger, bidinteger, daydate)
6SQL Queries Query Structure
- A typical SQL query has the form
- SELECT DISTINCTALL A1, A2,
..., An FROM r1, r2, ..., rm
WHERE P - Ai represents an attribute
- ri represents a relation
- P is a predicate (conditions that results must
satisfy) - DISTINCT or ALL are optional keywords DISTINCT
means that the answer should not contain
duplicates (duplicates are not eliminated by
default!) ALL is used by default and omitted - The result of an SQL query is a relation (result
can be queried again)
7SQL Queries SELECT clause
- Find the names and ages of all sailors
- SELECT DISTINCT Sailors.sname, Sailors.age
- FROM Sailors
- (in this query Sailors.name sname)
- Results Results
(without -
DISTINCT)
8SQL Queries SELECT clause
- An asterisk in the SELECT clause denotes
all attributes SELECT FROM Sailors - The SELECT clause can contain arithmetic
expressions involving the operation, , ,, and
/, and operating on constants or attributes of
tuples - The query SELECT rating2 FROM Sailors returns
one-column relation where the values of the
rating attribute from the Sailors relation
incremented by 2
9SQL Queries WHERE clause
- The WHERE clause specifies conditions that the
result must satisfy - Find all sailors who are older than 30 with a
rating above 7 - SELECT sid, sname, rating, age
- FROM Sailors WHERE ratinggt7 AND
age gt30 - Comparisons can be combined using the logical
connectives AND, OR, and NOT - Comparisons can be applied to results of
arithmetic expressions
10SQL Queries FROM clause
- The FROM clause lists the relations involved in
the query - SELECT FROM Sailors,Reserves the Cartesian
product of Sailors X Reserves (or cross-product
of tables Sailors and Reserves) - Note for simplicity consider the following
instances of Sailors and Reserves - Cross-product
11SQL Queries FROM clause
- Find the names of sailors who have reserved boat
number 103 - SELECT Sailors.sname
- FROM Sailors, Reserves
- WHERE Sailors.sidReserves.sid and
Reserves.bid103
12SQL Queries Tuple Variables
- SELECT Sailors.sname
- FROM Sailors, Reserves
- WHERE Sailors.sidReserves.sid and
Reserves.bid103 - SELECT S.sname
- FROM Sailors AS S, Reserves AS R
- WHERE S.sidR.sid and R.bid103
- SELECT S.sname , age
- FROM Sailors AS S, Reserves AS R
- WHERE S.sidR.sid and bid103
- Keyword AS is optional, may be omitted
13SQL Queries String Operations
- SQL includes a string-matching operator for
comparisons on character strings. The operator
LIKE uses patterns that are described using two
special characters - percent () matches any substring
- underscore (_) matches any character
- Find the ages of sailors whose name begins and
ends with B and has at least three characters - SELECT S.age
- FROM Sailors S
- WHERE S.sname LIKE B_B
- Match the name Main LIKE Main\ ESCAPE \
- SQL supports a variety of string operations
- concatenation (using )
- converting from upper to lower case (and vice
versa) - finding string length, extracting substrings, etc.
14SQL Queries Ordering the display of tuples
- List in alphabetic order the names of sailors who
reserved at least one boat - SELECT Sailors.sname FROM Sailors S,
Reserves R - WHERE S.sidR.sid
- ORDER BY S.sname
- We may additionally specify DESC for descending
order or ASC for ascending order, for each
attribute ascending order is the default - Example ORDER BY sname DESC
15SQL Queries Set operations
- The set operations UNION, INTERSECT, and
EXCEPT(or MINUS) operate on relations and
correspond to the relational algebra operations
??????? - Each of these operations automatically eliminates
duplicates to retain all duplicates use the
corresponding multiset versions UNION ALL,
INTERSECT ALL and EXCEPT ALL - If a tuple occurs m times in r and n times in
s, then, it occurs - m n times in r UNION ALL s
- min(m,n) times in r INTERSECT ALL s
- max(0, m n) times in r EXCEPT ALL s
- Note set operations may not be supported by
certain RDBMSs (e.g., no support of INTERSECT and
EXCEPT in MySQL)
16SQL Queries Set operations
- Find all customers who have a loan, an account,
or both
(SELECT customer_name FROM depositor)UNION(SELEC
T customer_name FROM borrower)
17SQL Queries Set operations
Find all customers who have both a loan and an
account
(SELECT customer_name FROM depositor)INTERSECT(S
ELECT customer_name FROM borrower)
Find all customers who have an account but no
loan
(SELECT customer_name FROM depositor)EXCEPT(SELE
CT customer_name FROM borrower)
18SQL Queries Set operations
19SQL Queries Set operations
20SQL Queries Aggregate Functions
- These functions operate on the multiset of values
of a column of a relation, and return a value - AVG average value
- AVG(A), AVG(DISTINCT
A) MIN minimum value MAX maximum
value SUM sum of values - SUM(A), SUM(DISTINCT
A) COUNT number of values - COUNT(), COUNT(A),
COUNT(DISTINCT A)
21SQL Queries Aggregate Functions
22SQL Queries Aggregate Functions GROUP BY
How to apply aggregate operations to each of a
number of groups of rows in a relation, where the
number of groups depends on the relation instance
(i.e., is not known in advance)?
Suppose we know that ratings are integers in the
range 1 to 10. Then, the following (ten) queries
are required
But, what if rating values are unknown?
23SQL Queries Aggregate Functions GROUP BY,
HAVING
- Predicates in the HAVING clause are applied
after the formation of groups, whereas predicates
in the WHERE clause are applied before forming
groups - One answer tuple is generated per qualifying
group
24SQL Queries Aggregate Functions GROUP BY,
HAVING
Find the age of the youngest sailor with agegt18
for each rating with at least two such sailors
Eliminating columns
25SQL Queries Aggregate Functions GROUP BY,
HAVING
Sorting to identify the groups
26SQL Queries NULL values
- Tuples may have a null value, denoted by NULL,
for some of their attributes - NULL signifies an unknown value or that a value
does not exist - The predicate IS NULL can be used to check for
null values - Example Find all loan number which appear in the
loan relation with NULL values for amount - SELECT loan_number FROM loan
WHERE amount IS NULL - The result of any arithmetic expression involving
NULL is NULL - Example 5 NULL returns NULL
- However, aggregate functions simply ignore nulls
(see next slides)
27SQL Queries NULL values
- Any comparison with NULL returns unknown
- Example 5 lt NULL or NULL ltgt NULL or NULL
NULL - Three-valued logic using the truth value unknown
- OR (unknown or true) true
- (unknown or false) unknown
(unknown or unknown) unknown - AND (true and unknown) unknown
(false and unknown) false (unknown
and unknown) unknown - NOT (not unknown) unknown
- P is unknown evaluates to true if predicate P
evaluates to unknown - Result of WHERE clause predicate is treated as
false if it evaluates to unknown
28SQL Queries NULL values
- Total all loan amounts
- SELECT SUM (amount ) FROM loan
- This statement ignores NULL amounts
- Result is NULL if there is no non-null amount
- All aggregate operations except COUNT() ignore
tuples with NULL values on the aggregated
attributes
29SQL Queries Nested queries
- SQL provides a mechanism for nesting of
subqueries - A subquery is a SELECT-FROM-WHERE expression,
that is nested within another query i.e., a
WHERE clause can itself contain an SQL query - (compare with the same query,
slide 157) - To find sailors whove not reserved 103, use NOT
IN
30SQL Queries Nested queries
31SQL Queries Views
- In some cases, it is not desirable for all users
to see the entire logical model (i.e., all the
actual relations stored in the database) - Consider a person who needs to know a customers
name, loan number and branch name, but has no
need to see the loan amount this person should
see a relation described, in SQL, by - ( SELECT customer_name, borrower.loan_number,
branch_name - FROM borrower, loan WHERE
borrower.loan_number loan.loan_number ) - A view provides a mechanism to hide certain data
from the view of certain users (virtual
relation) - Any relation that is not of the conceptual model
but is made visible to a user as a virtual
relation is called a view
32SQL Queries Views
- A view is defined using the CREATE VIEW statement
which has the form - CREATE VIEW v AS lt query expression gt,
- where ltquery expressiongt is any legal SQL
expression the view name is represented by v - Once a view is defined, the view name can be used
to refer to the virtual relation that the view
generates - When a view is created, the query expression is
stored in the database the expression is
substituted into queries using the view
33SQL Queries Modification
- branch (branch_name, branch_city, assets)
- customer (customer_name, customer_street,
- customer_city)
- loan (loan_number, branch_name, amount)
- borrower (customer_name, loan_number)
- account (account_number, branch_name, balance)
- depositor (customer_name, account_number)
34SQL Queries Modification - deletion
- Delete all account tuples at the Perryridge
branch - DELETE FROM accountWHERE branch_name
'Perryridge' - Delete all accounts at every branch located in
the city Needham - DELETE FROM accountWHERE branch_name IN
- (SELECT branch_name
FROM branch WHERE branch_city
'Needham')
35SQL Queries Modification - insertion
- Add a new tuple to account
- INSERT INTO accountVALUES ('A-9732',
'Perryridge', 1200) - Or equivalently (recommended) INSERT INTO
account(branch_name, balance, account_number) - VALUES ('Perryridge', 1200, 'A-9732')
- Add a new tuple to account with balance set to
NULL value - INSERT INTO accountVALUES ('A-9732',
'Perryridge', NULL)
36SQL Queries Modification - insertion
- INSERT INTO table1 SELECT FROM table2
- Suppose bank provide a gift for all loan
customers of the Perryridge branch a 200
(units) savings account, let loan numbers be
account numbers for new savings account - INSERT INTO accountSELECT loan_number,
branch_name, 200FROM loanWHERE branch_name
'Perryridge' - INSERT INTO depositorSELECT customer_name,
loan_numberFROM loan, borrowerWHERE branch_name
'Perryridge' AND loan.account_number
borrower.account_number - The SELECT FROM WHERE statement is evaluated
fully before any of its results are inserted
37SQL Queries Modification - updates
- Increase all accounts with balances over 10,000
by 6, all other accounts receive 5 - Two UPDATE statements
- UPDATE account SET balance balance ? 1.06
WHERE balance gt 10000 -
- UPDATE account SET balance balance ?
1.05 WHERE balance ? 10000 - The order (of UPDATEs) is important
38SQL Queries Some remarks
- Typically, biological databases are read-only
after creation/population num.of reads gtgt num.of
writes - Each RDBMS uses its own SQL statements/features
, which are valid in one systems and may not work
in others or work but differently (see slide 161
for a specific example) do read specific manual!
) - In terms of SQL language
- Database creation CREATE TABLE statements
- Database (schema) modification not covered in
this course - Database population (data loading) INSERT
statements - Querying database (data reading) SELECT
statements - Data modification (data writing)
DELETE/INSERT/UPDATE statements
39SQLite
- SQLite is a SQL database engine, which is
- Self-contained
- Serverless
- Zero-configuration
- Transactional
- Most widely deployed database
- Mozilla Firefox, Skype, .
- PDAs, Symbian smartphones, iPhones
- Convenient for educational purposes
- Open-source
- Current version (Linux/Mac/Windows) 3.5.4
- http//www.sqlite.org
- Documentation http//www.sqlite.org/docs.html
- Books
- The Definitive Guide to SQLite by Owens, ISBN-10
1590596730 - SQLite by Newman, ISBN-10 067232685X
40SQLite
- Implements most of SQL92
- SQL features that are not supported
- FOREIGN KEY constraints are parsed but are not
enforced (technically, the equivalent constraint
enforcement can be achieved using triggers) - VIEWs (see CREATE VIEW) are read-only one can
not execute a DELETE, INSERT, or UPDATE statement
on a view - No user access permissions (the only access
permissions that can be applied are the normal
file access permissions of the underlying
operating system) - Also see those specified here http//www.sqlite.or
g/omitted.html
41SQLite Interfaces
- Command-line sqlite3.exe
- GUI SQLiteSpy
- Perl DBIDBDSqlite modules, http//dbi.perl.org
- Other programming languages Java, Python,
- SQLite interface for the R statistical package,
http//cran.r-project.org/src/contrib/Descriptions
/RSQLite.html -
42SQLite Demo
- Use data (database dump) from COGENT (Complete
Genome Tracking Database) (COmplete GENome
Tracking (COGENT) a flexible data environment
for computational genomics by Janssen et al.,
Bioinformatics, 19(11), 2003) http//cgg.ebi.ac.u
k/services/cogent/ - Very simple schema
43SQLite Demo
- See URL
- http//users.utu.fi/denshe/biodb/sqlite_demo.html
44Project Description
- Extracting and preparing data for bulk load
into relational dbms (SQLite) (before Feb 4) - INSERT statements
- Decide on sources, on what and how to extract,
then extract and convert to a set of INSERTs - More information will be given on the course page
45-
- References
- Any introduction to SQL, e.g.
- http//www.w3schools.com/sql/sql_intro.asp
- http//sqlzoo.net
- http//www.1keydata.com/sql/sql.html