Title: Functional Dependency Graphs and SQL
1Functional Dependency Graphs and SQL
CS157A
Lecture 12
- Prof. Sin-Min Lee
- Department of Computer Science
- San Jose State University
2Data Normalization
- Primarily a tool to validate and improve a
logical design so that it satisfies certain
constraints that avoid unnecessary duplication of
data. - The process of decomposing relations with
anomalies to produce smaller, well-structured
relations. - Primary Objective Reduce Redundancy,Reduce
nulls, - Improve modify activities
- insert,
- update,
- delete,
- but not read
- Price degraded query, display, reporting
3Functional Dependency and Keys
- Functional Dependency The value of one attribute
(the determinant) determines the value of another
attribute. - Candidate Key Each non-key field is functionally
dependent on every candidate key.
4Functional dependency
- a constraint between two attributes (columns) or
two sets of columns - A ? B if for every valid instance of A, that
value of A uniquely determines the value of B - or A ?B if there exists at most one value of B
for every value of A
5(No Transcript)
6Functional Dependencies
R
X Y Z
- FDs defined over two sets of attributes X, Y
Ì R - Notation X à Y reads as X determines Y
- If X à Y, then all tuples that agree on X must
also agree on Y
1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9
7Functional Dependencies Graph(example)
X Y Z
X Y Z
1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9
8(No Transcript)
9 functional dependency
- some examples
- SSN ? Name, Address, Birthdate
- VIN ? Make, Model, Color
- note the LHS is the determinant
- so functional dependency is the technical term
for determines
10Candidate Keys
- an attribute (or set of attributes) that uniquely
identifies a row - primary key is a special candidate key
- values cannot be null
- e.g.
- ENROLL (Student_ID, Name, Address, )
- PK Student_ID
- candidate key Name, Address
11 candidate key
- a candidate key must satisfy
- unique identification.
- implies that each nonkey attribute is
functionally dependent on the key (for not(A ? B)
to be true, A must occur more than once (with a
different B), or A must map to more than one B in
a given row) - nonredundancy
- no attribute in the key can be deleted and still
be unique - minimal set of columns (Simsion)
12keys and dependencies
EMPLOYEE1 (Emp_ID, Name, Dept_Name, Salary)
determinant
Emp_ID Name Dept_Name Salary
functional dependency
13EMPLOYEE2 (Emp_ID, Course_Title, Name,
Dept_Name, Salary, Date_Completed)
Emp_ID Course_ Title Name Dept_ Name Salary Date_Comp.
not fully functionally dependant on the primary
key
14determinants candidate keys
- candidate key is always a determinant (one way to
find a determinant) - determinant may or may not be a candidate key
- ? candidate key is a determinant that uniquely
identifies the remaining (nonkey) attributes - determinant may be
- a candidate key
- part of a composite candidate key
- nonkey attribute
15Introduction
- Data integrity maintained by various constraints
on data - Functional dependencies are application
constraints that help DB model real-world entity - Join dependencies are a further constraint that
help resolve some FD constraint limitations
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21What is SQL?
- It is a language used to communicate with a
database. - SQL statements are used to perform tasks such as
update or retrieve data from a database. - Standard SQL commands Select, Create,
Insert, Update, Delete, and Drop can be
used to accomplish almost everything that a
database needs to do.
22History
- Originally developed by IBM at San Jose Research
Laboratory (a.k.a. Almaden Research Center) - Oracle, Sybase, Microsoft SQL server are some
common relational database management systems
that use SQL
23Table Basics
- Data or information for the database are stored
in tables. - Tables are uniquely identified by their names and
are comprised of columns and rows. - Rows contain data for the columns.
City State High Low
Phoenix Arizona 105 90
Tucson Arizona 101 92
Flagstaff Arizona 88 69
San Diego California 77 60
San Jose California 80 65
24(No Transcript)
25Basic Structure
- The basic structure of an SQL expression consists
of three clauses select, from, and where. - The select clause corresponds to the projection
operation of the relational algebra. It is used
to list attributes desired in the result. - The from clause corresponds to the
Cartesian-product operation of the relational
algebra. It lists the relations to be scanned in
the evaluation of the expression. - The where clause corresponds to the selection
predicate of the relational algebra. It consists
of a predicate involving attributes of the
relations that appear in the from clause.
26Selecting Data
- The Select statement is used to query the
database and retrieve selected data that match
the criteria that you specify - Example
- select column1
- ,column2, etc
- from tablename
- where condition
- optional continued
27Selecting Data cont
- The column names that follow the select keyword
determine which columns will be returned in the
results. - The table name that follows the keyword from
specifies the table that will be queried to
retrieve the desired results. - cont
- Example
- select column1
- ,column2, etc
- from tablename
- where condition
- optional
28Selecting Data cont
- The where clause is optional and specifies which
data values or rows will be returned or
displayed, based on the criteria described on the
condition. - Conditional selections used in the where clause
, gt, lt, gt, lt, and ltgt (not equal to).
- Example
- select column1
- ,column2, etc
- from tablename
- where condition
- optional
29The where Clause
- Problem Find all loan numbers for loans made at
the Perryridge branch with loan amounts greater
than 1200.
- Answer
- select loan-number
- from loan
- where branch-name Perryridge and amount gt
1200 - SQL uses the logical connectives and, or, and
not rather than the mathematical symbols in the
where clause.
30The from Clause
- The from clause, by itself, defines a Cartesian
product of the relations in the clause. - Problem For all customers who have a loan from
the bank, find their names, loan numbers, and
loan amount.
- Answer
- select customer-name, borrower.loan- number,
amount - from borrower, loan
- where borrower.loan- number loan.loan- number
31Creating Tables
- Example of simple create table statement
- create table tablename
- (column1 data type,
- column2 data type,
- column3 data type)
- The create table statement is used to create a
new table. - The data types specify what the type of data can
be for that particular column.
32Most Common Data Types
- char (size) Fixed length character string. Size
is specified in parenthesis. Max 255 bytes. - varchar (size) Variable-length character
string. Max size is specified in parenthesis. - number (size) Number value with a max number of
column digits specified in parenthesis. - date Date value
- number (size, d) Number value with a max number
of digits of size total, with a max number of
d digits to the right of the decimal.
33Creating Tables cont
- Example
- Problem Create a table for a new company that
contains the following information about your new
employees first name, last name, title, age, and
salary.
- Answer
- Create table myemployees
- (firstname varchar(30),
- lastname varchar(30),
- title varchar (30),
- age number(2),
- salary number(8, 2))
34Inserting into a Table
- Example 1
- Insert into tablename
- (first_column, last_column)
- values (first_value,last_value)
- Example 2
- Insert into employee
- (first, last, age, address, city, state)
- values (Luke, Duke, 45, 2130 Boars Nest,
Hazard Co., Georgia)
- The insert statement is used to insert or add a
row of data into the table. - Strings should be enclosed in single quotes, and
numbers should not.
35Updating Records
- The update statement is used to update or change
records that match a specified criteria. - This is accomplished by carefully constructing a
where clause. - Example 1
- update phone_book
- set area_code 623
- where prefix 979
- Example 2
- update tablename
- set columnname newvalue
- ,nextcolumn newvalue2
- where columnname
- OPERATOR value
- and or column
- OPERATOR value
- optional
36Deleting Records
- The delete statement is used to delete records or
rows from the table. - To delete an entire row/record, enter delete
from followed by the table name, the where
clause. - If you leave off the where clause, it will
delete all records.
- Example
- delete from tablename
- where columnname
- OPERATOR value
- and or column
- OPERATOR value
- optional
37Drop a Table
- The drop table command is used to delete a table
and all rows in the table. - drop table is different from deleting all of the
records in the table. Dropping the table removes
the table definition as well as all of its rows.
- Example
- Drop table tablename
38Ordering the Display of Tuples
- Example To list in alphabetic order all
customers who have a loan at the Perryridge
branch - select distinct customer-name
- from borrower, loan
- where borrower.loan-number loan.loan-number
and branch-name Perryridge - order by customer-name
- The order by clause causes the tuples in the
result of a query to appear in sorted order. - By default, the order by clause lists items in
ascending order.
39Aggregates
- Functions that operate on sets
- COUNT, SUM, AVG, MAX, MIN
- Produce numbers (not tables)
- Not part of relational algebra
SELECT COUNT() FROM Professor P
SELECT MAX (Salary) FROM Employee E
40Aggregates
Count the number of courses taught in S2000
SELECT COUNT (T.CrsCode) FROM Teaching T WHERE
T.Semester S2000
But if multiple sections of same course are
taught, use
SELECT COUNT (DISTINCT T.CrsCode) FROM Teaching
T WHERE T.Semester S2000
41Aggregates Proper and Improper Usage
SELECT COUNT (T.CrsCode), T. ProfId
makes no sense (in the absence of
GROUP BY clause)
SELECT COUNT (), AVG (T.Grade)
but this is OK
WHERE T.Grade gt COUNT (SELECT .)
aggregate cannot be applied to result
of SELECT statement
42Grouping
- But how do we compute the number of courses
taught in S2000 per professor? - Strategy 1 Fire off a separate query for each
professor - SELECT COUNT(T.CrsCode)
- FROM Teaching T
- WHERE T.Semester S2000 AND T.ProfId
123456789 - Cumbersome
- What if the number of professors changes? Add
another query? - Strategy 2 define a special grouping operator
- SELECT T.ProfId, COUNT(T.CrsCode)
- FROM Teaching T
- WHERE T.Semester S2000
- GROUP BY T.ProfId
43GROUP BY
44GROUP BY - Example
Transcript
Attributes -students Id -avg grade
-number of courses
1234 1234 1234 1234
1234 3.3 4
SELECT T.StudId, AVG(T.Grade), COUNT () FROM
Transcript T GROUP BY T.StudId
45HAVING Clause
- Eliminates unwanted groups (analogous to WHERE
clause) - HAVING condition constructed from attributes of
GROUP BY list and aggregates of attributes not in
list
SELECT T.StudId, AVG(T.Grade) AS CumGpa,
COUNT () AS NumCrs FROM
Transcript T WHERE T.CrsCode LIKE CS GROUP
BY T.StudId HAVING AVG (T.Grade) gt 3.5
46Evaluation of GroupBy with Having
47Example
- Output the name and address of all seniors on the
Deans List
SELECT S.Id, S.Name FROM Student S,
Transcript T WHERE S.Id T.StudId AND
S.Status senior GROUP BY HAVING AVG
(T.Grade) gt 3.5 AND SUM (T.Credit) gt 90
S.Id -- wrong S.Id, S.Name --
right
Every attribute that occurs in SELECT clause must
also occur in GROUP BY or it must be an
aggregate. S.Name does not.
48ORDER BY Clause
- Causes rows to be output in a specified order
SELECT T.StudId, COUNT () AS NumCrs,
AVG(T.Grade) AS CumGpa FROM Transcript
T WHERE T.CrsCode LIKE CS GROUP BY
T.StudId HAVING AVG (T.Grade) gt 3.5 ORDER BY
DESC CumGpa, ASC StudId
49Query Evaluation Strategy
- Evaluate FROM produces Cartesian product, A, of
tables in FROM list - Evaluate WHERE produces table, B, consisting of
rows of A that satisfy WHERE condition - Evaluate GROUP BY partitions B into groups that
agree on attribute values in GROUP BY list - Evaluate HAVING eliminates groups in B that do
not satisfy HAVING condition - Evaluate SELECT produces table C containing a
row for each group. Attributes in SELECT list
limited to those in GROUP BY list and aggregates
over group - Evaluate ORDER BY orders rows of C
50Nested Queries
List all courses that were not taught in S2000
SELECT C.CrsName FROM Course C WHERE C.CrsCode
NOT IN (SELECT T.CrsCode --subquery
FROM Teaching T WHERE T.Sem
S2000)
Evaluation strategy subquery evaluated once
to produces set of courses taught in S2000.
Each row (as C) tested against this set.
51Correlated Nested Queries
Output a row ltprof, deptgt if prof has taught a
course in dept.
SELECT P.Name, D.Name --outer
query FROM Professor P, Department D WHERE
P.Id IN (set of Ids of all profs who
have taught a course in
D.DeptId)
SELECT T.ProfId
--subquery FROM Teaching T, Course C WHERE
T.CrsCodeC.CrsCode AND
C.DeptIdD.DeptId --correlation
52Correlated Nested Queries (cont)
- Tuple variables T and C are local to subquery
- Tuple variables P and D are global to subquery
- Correlation subquery uses a global variable, D
- The value of D.DeptId parameterizes an
evaluation of the subquery - Subquery must (at least) be re-evaluated for
each distinct value of D.DeptId - Correlated queries can be expensive to evaluate
53Division SQL Solution
SELECT P.Id FROM Professor P WHERE NOT EXISTS
(SELECT D.DeptId -- set B of all dept
Ids FROM Department D EXCEPT
SELECT C.DeptId -- set A of dept
Ids of depts in
-- which P has taught a course
FROM Teaching T, Course C WHERE
T.ProfIdP.Id -- global variable
AND T.CrsCodeC.CrsCode)
54Division
- Query type Find the subset of items in one set
that are related to all items in another set - Example Find professors who have taught courses
in all departments - Why does this involve division?
ProfId DeptId
DeptId
Contains row ltp,dgt if professor p has taught
a course in department d
All department Ids
55Division
- Strategy for implementing division in SQL
- Find set of all departments in which a particular
professor, p, has taught a course - A - Find set of all departments - B
- Output p if A ? B, or equivalently if B-A is
empty
56Division SQL Solution
SELECT P.Id FROM Professor P WHERE NOT EXISTS
(SELECT D.DeptId -- B set of all dept
Ids FROM Department D EXCEPT
SELECT C.DeptId -- A set of dept
Ids of depts in
-- which P has taught a course
FROM Teaching T, Course C WHERE
T.ProfIdP.Id --global variable
AND T.CrsCodeC.CrsCode)
57GROUP BY
Table output by WHERE clause - Divide
rows into groups based on subset of
attributes - All members of a group
agree on those attributes
a a b b c c c c c d d d
group
Each group can be described by a single row in a
table with attributes limited to -Attributes
all group members share (listed in GROUP
BY clause) -Aggregates over group
GROUP BY attributes
58GROUP BY - Example
Transcript
Attributes -students Id -avg grade
-number of courses
1234 1234 1234 1234
1234 3.3 4
SELECT T.StudId, AVG(T.Grade), COUNT () FROM
Transcript T GROUP BY T.StudId
59HAVING Clause
- Eliminates unwanted groups (analogous to WHERE
clause) - HAVING condition constructed from attributes of
GROUP BY list and aggregates of attributes not in
list
SELECT T.StudId, AVG(T.Grade) AS CumGpa,
COUNT () AS NumCrs FROM Transcript
T WHERE T.CrsCode LIKE CS GROUP BY
T.StudId HAVING AVG (T.Grade) gt 3.5
60Example
- Output the name and address of all seniors on the
Deans List
SELECT S.Name, S.Address FROM Student S,
Transcript T WHERE S.StudId T.StudId AND
S.Status senior GROUP BY HAVING AVG
(T.Grade) gt 3.5 AND SUM (T.Credit) gt 90
S.StudId -- wrong S.Name,
S.Address -- right
61SQL and Relational Algebra
- RELATIONAL ALGEBRA SQL
- PSEUDO-CODE
-
- 1) R customers where city 'Dallas' SELECT
from customers where city 'Dallas' -
- For c1 first row of customers until c1 last
row of customers - If c1.city Dallas
- Display c1.
- End-If
- End-For
-
-
- 2) R customers cid, cname SELECT
cid, cname from customers -
- For c1 first row of customers until c1 last
row of customers - Display c1.cid, c1.cname
- End-For
62Constructing SQL
63(No Transcript)
64(No Transcript)
65Aggregate Functions
- Aggregate Functions are functions that take a
collection (a set or multiset) of values as input
and return a single value. - SQL offers five built-in aggregate functions
- Average avg
- Minimum min
- Maximum max
- Total sum
- Count count
66Embedded SQL
- The SQL standard defines embeddings of SQL in a
variety of programming languages such as C,
Cobol, Java, Pascal, etc. - A language in which SQL queries are embedded is
referred to as a host language, and the SQL
structures permitted in the host language
constitute embedded SQL.
67Embedded SQL cont
- Two important reasons why a programmer must have
access to a database from a general purpose
programming language - 1. Not all queries can be expressed in SQL,
since SQL does not provide the full expressive
power of a general-purpose language. - 2. Nondeclarative actions such as printing a
report, interacting with a user, or sending the
results of a query to a graphical user interface
cannot be done from within SQL.
68Dynamic SQL
- The dynamic SQL component of SQL allows programs
to construct and submit SQL queries at run-time. - Using dynamic SQL, programs can create SQL
queries as strings at run time (perhaps based on
input from the user) and can either have them
executed immediately or have them prepared for
subsequent use.
69Relational Algebra and SQL
Read the following
http//www.cs.ru.nl/gerp/IS0/sheets/IS0_Relatione
le_Algebra_SQL2.pdf