Title: Subqueries Example
1Subqueries Example
Find the name of the producer of Star Wars.
Movie(title, year, length, inColor, studioName,
producerC) MovieExec(name, address, cert,
netWorth) We can do SELECT name FROM
Movie, MovieExec WHERE title 'Star Wars'
AND producerC cert Or we can have a
subquery SELECT name FROM MovieExec
WHERE cert (SELECT producerC
FROM Movie WHERE title 'Star Wars')
If we can deduce that there will be only a single
value produced by the subquery, then we can use
this expression, surrounded by parentheses, as if
it were a constant.
2Conditions Involving Relations
There are a number of SQL operators that we can
apply to a relation R and produce a Boolean
result. Typically used in the WHERE clause. 1.
EXISTS R is a condition that is true if R is not
empty. 2. s IN R is true if s is equal to one of
the tuples in R. Likewise, s NOT IN R is true if
and only if s is equal to no tuple in R.
3Example
Give all the producers of movies in which Julia
Roberts stars. SELECT name
FROM MovieExec WHERE cert IN (SELECT
producerC FROM Movie WHERE (title,
year) IN (SELECT movieTitle, movieYear
FROM StarsIn WHERE starName 'Julia
Roberts'))
4Remark
The previous nested query can, like many nested
queries, be written as a single SELECT-FROM-WHERE
expression. SELECT name FROM MovieExec,
Movie, StarsIn WHERE cert producerC AND
title movieTitle AND year movie Year
AND starName 'Julia Roberts'
5(Continued)
3. s gt ALL R is true if s is greater than every
value in the unary (one column) relation R.
Similarly, the gt operator could be replaced by
any other comparison operators with the analogous
meaning. For instance, s ltgt ALL R is the same as
s NOT IN R. 4. s gt ANY R is true if s is greater
than at least one value in unary relation R.
Similarly we can use any other comparison
operators in place of gt. For instance, s ANY R
is the same as s IN R. EXISTS, ALL, and ANY
operators can be negated by putting NOT in front
of the entire expression.
6Bag Semantics and Union, Intersection and
Difference
- Although the SELECT-FROM-WHERE statement uses bag
semantics, the default for union, intersection,
and difference is set semantics. - Motivation?
- When doing projection in relational algebra, it
is easier to avoid eliminating duplicates. Just
work tuple-at-a-time. - When doing intersection or difference, it is most
efficient to sort the relations first. At that
point you may as well eliminate the duplicates
anyway.
7Controlling Duplicate Elimination
- Force the result to be a set by
- SELECT DISTINCT . . .
- Force the result to be a bag (i.e., dont
eliminate duplicates) by ALL, as in - UNION ALL . . .
- Only UNION ALL supported in ORACLE.
- Example
- Find all the different studios producing movies
- Movie(title, year, length, inColor, studioName,
producerC), -
- SELECT DISTINCT studioname
- FROM Movie
- Notice that without DISTINCT, a studioname would
be listed as many times as there were movies from
that studio.
8Aggregations
- SUM, AVG, COUNT, MIN, and MAX can be applied to a
column in a SELECT clause to produce that
aggregation on the column. - Example
- Find the average length of movies from Disney.
- SELECT AVG(length)
- FROM Movie
- WHERE studioName 'Disney'
- Remark
- We can also use COUNT() which counts the number
of tuples in the relation constructed from the
FROM and WHERE clauses of the query.
9Eliminating Duplicates in an Aggregation
- DISTINCT inside an aggregation causes duplicates
to be eliminated before the aggregation. - Example
- Find the number of different producers for Disney
movies. - SELECT COUNT(DISTINCT producerc)
- FROM Movie
- WHERE studioname 'Disney'
- This is not the same as
- SELECT DISTINCT COUNT(producerc)
- FROM Movie
- WHERE studioname 'Disney'
DISTINCT here is useless! Why?
10Not only in COUNT
- SELECT AVG(DISTINCT length)
- FROM Movie
- WHERE studioname 'Disney'
- This will produce the average of only the
distinct values for length.
11NULLs Ignored in Aggregation
- NULL never contributes to a sum, average, or
count, and can never be the minimum or maximum of
a column. - SELECT SUM(networth)
- FROM moviestar NATURAL FULL OUTER JOIN movieexec
- But if there are no non-NULL values in a column,
then the result of the aggregation is NULL.
12Example Effect of NULLs
- SELECT count()
- FROM Movie
- WHERE studioName 'Disney'
- SELECT count(length)
- FROM Movie
- WHERE studioName 'Disney'
The number of movies from Disney.
13Grouping
- We may follow a SELECT-FROM-WHERE expression by
GROUP BY and a list of attributes. - The relation that results from the
SELECT-FROM-WHERE - is grouped according to the values of all the
listed attributes in GROUP BY, and - any aggregation is applied only within each
group. - Example
- From the Movie relation, find the average length
for each studio. - SELECT studioName, AVG(length)
- FROM Movie
- GROUP BY studioName
14Another Example
- From Movie and MovieExec, find the producers
total length of film produced -
- SELECT name, SUM(length)
- FROM Movie, MovieExec
- WHERE producerc cert
- GROUP BY name
15Restriction on SELECT Lists With Aggregation
- If any aggregation is used, then each element of
the SELECT list must be either - Aggregated, or
- An attribute on the GROUP BY list.
16Illegal Query Example
- We might think we could find the shortest movie
of Disney as - SELECT title, MIN(length)
- FROM Movie
- WHERE studioName 'Disney'
- But this query is illegal in SQL.
- Because title is neither aggregated nor on the
GROUP BY list. - We should do instead
- SELECT title, length
- FROM Movie
- WHERE studioName 'Disney' AND length
- (SELECT MIN(length)
- FROM Movie
- WHERE studioName 'Disney')
17Or
- SELECT title, length
- FROM Movie NATURAL JOIN
- (SELECT MIN(length) AS length
- FROM Movie
- WHERE studioName 'Disney')
- WHERE studioName 'Disney'
resembling Relational Algebra.
18HAVING Clauses
- HAVING ltconditiongt may follow a GROUP BY clause.
- If so, the condition applies to each group, and
groups not satisfying the condition are
eliminated. - Example
- Consider again the query
- SELECT name, SUM(length)
- FROM Movie, MovieExec
- WHERE producerc cert
- GROUP BY name
- Suppose we didnt wish to include all the
producers in our table of aggregated lengths. We
want those producers - with networth less than 1,000,000, and
- that have at least one movie before 1973.
- Solution
- SELECT name, SUM(length)
- FROM MovieExec, Movie
- WHERE producerc cert AND networthlt1000000
- GROUP BY name
- HAVING MIN(year) lt 1973
19Requirements on HAVING Conditions
- These conditions may refer to any relation in the
FROM clause. - They may refer to attributes of those relations,
as long as the attribute makes sense within a
group i.e., it is either - A grouping attribute, or
- Aggregated attribute.
20Having is a special kind of ?
- The previous query can also be written as
- SELECT name, sumLength
- FROM (
- SELECT name, MIN(year) AS minYear, SUM(length) AS
sumLength - FROM MovieExec, Movie
- WHERE producerc cert AND networth lt 1000000
- GROUP BY name)
- WHERE minYear lt 1973
21Correlated Subqueries
- Suppose StarsIn relation has an additional
attribute salary - StarsIn(movieTitle, movieYear, starName,
salary) - Now, find the stars who were paid for some movie
more than the average salary for that movie. - SELECT starName, movieTitle, movieYear
- FROM StarsIn X
- WHERE salary gt
- (SELECT AVG(salary)
- FROM StarsIn
- WHERE movieTitle X.movieTitle
- AND movieYearX.movieYear)
- Remarks
- Outer query cannot reference any columns in the
subquery. - Subquery references the tuple in the outer query.
- Value of the tuple changes by row of the outer
query, so the database must rerun the subquery
for each row comparison.
22Another Solution (Nesting in FROM)
- SELECT X.starName, X.movieTitle, X.movieYear
- FROM StarsIn X, (SELECT movieTitle, movieYear,
AVG(salary) AS avgSalary - FROM StarsIn
- GROUP BY movieTitle, movieYear) Y
- WHERE X.salarygtY.avgSalary AND
- X.movieTitleY.movieTitle AND
X.movieYearY.movieYear
23Exercise
- Product(maker, model, type)
- PC(model, speed, ram, hd, rd, price)
- Laptop(model, speed, ram, hd, screen, price)
- Printer(model, color, type, price)
- Find those manufacturers that sell Laptops, but
not PC's. - Find those hard-disk sizes that occur in two or
more PC's. - Find those manufacturers of at least two
different computers (PC or Laptops) with speed of
at least 700. - Find the manufacturers who sell exactly three
different models of PC.