Title: The Structured Query Language
1The Structured Query Language
- Zachary G. Ives
- University of Pennsylvania
- CIS 550 Database Information Systems
- November 15, 2009
Some slide content courtesy of Susan Davidson
Raghu Ramakrishnan
2Administrivia
- Homework 2 handed out today
- Homework 1 is NOW DUE
3Recall Basic SQL
- SELECT DISTINCT T1.attrib, , T2.attribFROM
relation T1, relation T2, WHERE
predicates - And also
- UNION/UNION ALL, EXCEPT/EXCEPT ALL
- IN / NOT IN
select-list
from-list
qualification
4Our Example Data Instance
STUDENT
COURSE
Takes
PROFESSOR
Teaches
5Correlated Subqueries
- Most common EXISTS/NOT EXISTS
- Find all students who have taken DB but not AI
6Universal and Existential Quantification
- Generally used with subqueries
- op ANY, op ALL
- Find the students with the best expected grades
7Table Expressions
- Can substitute a subquery for any relation in the
FROM clause - SELECT S.sidFROM (SELECT sid FROM STUDENT WHERE
sid 5) SWHERE S.sid 4 - Notice that we can actually simplify this query!
- What is this equivalent to?
8Aggregation
- GROUP BY
- SELECT group-attribs, aggregate-operator(attr
ib)FROM relation T1, relation T2, WHERE
predicatesGROUP BY group-list - Aggregate operators
- AVG, COUNT, SUM, MAX, MIN
- DISTINCT keyword for AVG, COUNT, SUM
9Some Examples
- Number of students in each course offering
- Number of different grades expected for each
course offering - Number of (distinct) students taking AI courses
10Our Example Data Instance Again
STUDENT
COURSE
Takes
PROFESSOR
Teaches
11What If You Want to Only ShowSome Groups?
- The HAVING clause lets you do a selection based
on an aggregate (there must be 1 value per
group) - SELECT C.subj, COUNT(S.sid)FROM STUDENT S,
Takes T, COURSE CWHERE S.sid T.sid AND T.cid
C.cidGROUP BY subjHAVING COUNT(S.sid) gt 5 - Exercise For each subject taught by at least
two professors, list the minimum expected grade
12Aggregation and Table Expressions(aka Derived
Relations)
- Sometimes need to compute results over the
results of a previous aggregationSELECT subj,
AVG(size)FROM ( SELECT C.cid AS id, C.subj AS
subj, COUNT(S.sid) AS size FROM STUDENT S,
Takes T, COURSE C WHERE S.sid T.sid AND T.cid
C.cid GROUP BY cid, subj)GROUP BY subj
13Thought Exercise
- Tables are great, but
- Not everyone is uniform I may have a cell phone
but not a fax - We may simply be missing certain information
- We may be unsure about values
- How do we handle these things?
14One Answer Null Values
- We designate a special null value to represent
unknown or N/A -
- But a question what doesdo?
SELECT FROM CONTACT WHERE Fax lt 789-1111
15Three-State Logic
- Need ways to evaluate boolean expressions and
have the result be unknown (or T/F) - Need ways of composing these three-state
expressions using AND, OR, NOT - Can also test for null-ness attr IS NULL, attr
IS NOT NULL - Finally need rules for arithmetic, aggregation
T OR U TF OR U UU OR U U
T AND U UF AND U FU AND U U
NOT U U
16Nulls and Joins
- Sometimes need special variations of joins
- I want to see all courses and their students
- But what if theres a course with no students?
- Outer join
- Most common is left outer join
- SELECT C.subj, C.cid, T.sid FROM COURSE C LEFT
OUTER JOIN Takes T ON C.cid T.cidWHERE
17Data Instance Again (!)
STUDENT
COURSE
Takes
PROFESSOR
Teaches
18Warning on Outer Join
- Oracle doesnt support standard SQL syntax
hereSELECT C.subj, C.cid, T.sid FROM COURSE C
, Takes T WHERE C.cid () T.cid
19Beyond Null
- Can have much more complex ideas of incomplete or
approximate information - Probabilistic models (tuple 80 likely to be an
answer) - Naïve tables (can have variables instead of
NULLs) - Conditional tables (tuple IF some condition
holds) - And what if you want 0 or more?
- In relational databases, create a new table and
foreign key - But can have semistructured data (like XML)
20Modifying the DatabaseInserting Data
- Inserting a new literal tuple is easy, if
wordyINSERT INTO PROFESSOR (fid, name)VALUES
(4, Simpson) - But we can also insert the results of a
query!INSERT INTO PROFESSOR (fid, name)
SELECT sid AS fid, name FROM STUDENT WHERE
sid lt 20
21Deleting Tuples
- Deletion is a fairly simple operationDELETEFRO
M STUDENT SWHERE S.sid lt 25
22Updating Tuples
- What kinds of updates might you want to
do?UPDATE STUDENT SSET S.sid 1 S.sid,
S.name JanetWHERE S.name Jane
23Now, How Do I Talk to the DB?
- Generally, apps are in a different (host)
language with embedded SQL statements - Static (query fixed) SQLJ, embedded SQL in C
- Dynamic (query generated by program at runtime)
ODBC, JDBC, ADO, OLE DB, - Predefined mappings between SQL types and host
language types - CHAR, VARCHAR ? String
- INTEGER ? int
- DOUBLE ? double
24Static SQL using SQLJ
int sid 5 String name5 "Jim", name5 //
Database connection setup omitted sql INSERT
INTO STUDENT VALUES(sid, name) sql
SELECT name INTO name6 FROM STUDENT WHERE
sid 6
25JDBC Dynamic SQL
- import java.sql.
- Connection conn DriverManager.getConnection()
- Statement s conn.createStatement()
- int sid 5
- String name "Jim"
- s.executeUpdate("INSERT INTO STUDENT VALUES("
sid ", '" name "')") - // or equivalently
- s.executeUpdate(" INSERT INTO STUDENT VALUES(5,
'Jim')")
26Static vs. Dynamic SQL
- Syntax
- Static is cleaner that Dynamic
- Dynamic doesnt extend language syntax, so you
can use any tool you like - Execution
- Static must be precompiled
- Can be faster at runtime
- Extra step is needed to deploy application
- Static checks SQL syntax at compilation time,
Dynamic at run time - Well focus on JDBC, since its easy to use
27The Impedance Mismatch and Cursors
- SQL is set-oriented it returns relations
- Theres no relation type in most languages!
- Solution cursor thats opened, read
- ResultSet rs stmt.executeQuery("SELECT FROM
STUDENT") - while (rs.next())
- int sid rs.getInt("sid")
- String name rs.getString("name")
- System.out.println(sid " " name)
28JDBC Prepared Statements (1)
- But query compilation takes a (relatively) long
time! - This example is therefore inefficient.
int students 1, 2, 4, 7, 9for (int i 0
i lt students.length i) ResultSet rs
stmt.executeQuery("SELECT " "FROM STUDENT
WHERE sid " studentsi) while (rs.next())
29JDBC Prepared Statements (2)
- To speed things up, prepare statements and bind
arguments to them - This also means you dont have to worry about
escaping strings, formatting dates, etc. - Problems with this lead to a lot of security
holes (SQL injection) - Or suppose a user inputs the name OReilly
PreparedStatement stmt conn.prepareStatement("SE
LECT " "FROM STUDENT WHERE sid ? ")int
students 1, 2, 4, 7, 9for (int i 0 i lt
students.length i) stmt.setInt(1,
studentsi) ResultSet rs stmt.executeQuery()
while (rs.next())
30Language Integrated Querying (LINQ) MS .Net
Languages, e.g., C
- Represent a table as a collection (e.g., a list)
- Integrate SQL-style select-from-where and allow
for iterators - List products GetProductList()
- var expensiveInStockProducts
- from p in products
- where p.UnitsInStock gt 0 p.UnitPrice gt
3.00M - select p
- Console.WriteLine("In-stock products costing gt
3.00") - foreach (var product in expensiveInStockProduc
ts) - Console.WriteLine("0 in stock and costs
gt 3.00.", product.ProductName) -
31Database-Backed Web Sites
- We all know traditional static HTML web sites
Web-Browser
Web-Server
HTTP-Request GET ...
32Common Gateway Interface (CGI)
- Can have the web server invoke code (with
parameters) to generate HTML
Web Server
Web Server
File-System
HTTP-Request
Load File
HTML
HTML?
HTML-File
File
33CGI Discussion
- Advantages
- Standardized works for every web-server, browser
- Flexible Any language (C, Perl, Java, ) can
be used - Disadvantages
- Statelessness query-by-query approach
- Inefficient new process forked for every request
- Security CGI programmer is responsible for
security - Updates To update layout, one has to be a
programmer
34DB Access in Java
BrowserJVM
Java Applet
TCP/UDP IP
Java-Server-Process
JDBC Driver manager
JDBC-Driver
JDBC-Driver
JDBC-Driver
Sybase
Oracle
...
35Java Applets Discussion
- Advantages
- Can take advantage of client processing
- Platform independent assuming standard Java
- Disadvantages
- Requires JVM on client self-contained
- Inefficient loading can take a long time ...
- Resource intensive Client needs to be state of
the art - Restrictive can only connect to server where
applet was loaded from (for security can be
configured)
36SP Server Pages and Servlets(IIS, Tomcat, )
Web Server
Web Server
HTTP Request
File-System
Load File
HTML
HTML?
File
HTML File
May have a built- in VM (JVM, CLR)
37One Step Beyond Content Management Systems
(Strudel, Cocoon, )
Web Server
DB-Driven Web Server
HTTP Request
Styles
Cache
HTML File
HTML
LocalDatabase
Script?
DynamicHTMLGeneration
Data
Other datasources
38Wrapping Up
- Weve seen how to query in SQL
- Basic foundation is TRC-based
- Subqueries and aggregation add extra power beyond
RC - Nulls and outer joins add flexibility of
representation - We can update tables
- Weve also seen that SQL doesnt precisely match
standard host language semantics - Embedded SQL
- Dynamic SQL
- Weve seen a hint of data-driven web site
architectures