Title: CMSC424: Review
1CMSC424 Review
2Database Management Systems
- Manage data
- Store data
- Update data
- Answer questions about the data
3What kind of data ?
- Enterprise data
- Banking
- Supermarkets, Sales
- Airlines
- Universities
- Manufacturing
- Human resources
- More recent
- Semi-structured Data (XML)
- Scientific data
- Biological data
- Sensor network data etc etc
4Naïve solutions
- Dont offer
- Consistency
- Atomicity, durability etc
- Concurrency
- Declarative data retrieval
- Control of redundancy
- Dynamic data evolution
5DBMS
- Database Management Systems provide
- Data abstraction
- Key in evolving systems
- Probably the most important purpose of a DBMS
- Goal Hiding low-level details from the users of
the system - Guarantees about data integrity
- In presence of concurrent access, failures
- Speed !!
6Data Abstraction
What data users and application programs see ?
Logical Level
What data is stored ? describe data
properties such as data semantics, data
relationships
Physical Level
How data is actually stored ? e.g. are we
using disks ? Which file system ?
7DBMS at a Glance
- Data Modeling
- Data Retrieval
- Data Storage
- Data Integrity
8Data Modeling
- A data model is a collection of concepts for
describing data properties and domain knowledge - Data relationships
- Data semantics
- Data constraints
- We discussed two models
- Entity-relationship Model
- Diagrammatic representation
- Easier to work with
- Syntax not important, but remember the meaning
- Remember what you can model
- Relational Model
- Only one abstract concept
- Closer to the physical representation on disk
- Normalization
9Data Retrieval
- Query Declarative data retrieval program
- describes what data to acquire, not how to
acquire it - Non-declarative
- scan the accounts file
- look for number 55 in the 2nd field
- subtract 50 from the 3rd field
- Declarative (posed against the tables
abstraction) - update accounts
- set balance balance - 50
- where acct_no 55
- Why ?
- Easier to write
- More efficient to execute
- Database system can decide how to execute it
10Data Storage
- Where and how to store data ?
- Main memory ?
- What if the database larger than memory size ?
- Disks
- We discussed properties of disks
- RAID
- How to move data between memory and disk ?
- Buffer Management
- LRU, MRU, Clock
- Indexes
- Closely tied to data retrieval
- B-trees, Hashing
11Data Integrity
- Manage concurrency and crashes
- Transaction A sequence of database actions
enclosed within special tags - Properties
- Atomicity Entire transaction or nothing
- Consistency Transaction, executed completely,
take database from one consistent state to
another - Isolation Concurrent transactions appear to run
in isolation - Durability Effects of committed transactions are
not lost - Consistency Transaction programmer needs to
guarantee that - DBMS can do a few things, e.g., enforce
constraints on the data - Rest DBMS guarantees
- Havent covered in class yet
12Data Integrity
- Semantic constraints
- Typically specified at the logical level
- E.g. balance gt 0
- Assert statements
- Functional dependencies
- kinda
13DBMS at a glance
- Data Models
- Conceptual representation of the data
- Data Retrieval
- How to ask questions of the database
- How to answer those questions
- Data Storage
- How/where to store data, how to access it
- Data Integrity
- Manage crashes, concurrency
- Manage semantic inconsistencies
- Not fully disjoint categorization !!
14SQL Assignment
Report the home run champs in the last three
years (2002 to 2004).
select h.year, firstname, lastname, h.hrs from
playerinfo p, hitting h where p.playerid
h.playerid and h.hrs (
select max(hrs) from hitting h2
where h.year h2.year)
15SQL Assignment
Report the last name of the batter who would be
reported first in alphabetical order.
select firstname, lastname from playerinfo
where firstname lt all (select firstname from
playerinfo) and lastname
lt all ( select lastname
from playerinfo p2 where
p2.firstname playerinfo.firstname)
16SQL Assignment
20 Create the dream NL batting team (that will
have the most total RBI) from 2004
statistics. Remember, a NL team consists of
1 LF, 1CF, 1RF, 1 SS, 1 2B, 1 3B, 1 1B, 1
Catcher, and 1 Pitcher. Only consider the
position at which the hitter played the
maximum number of games so a player will
only qualify at one fielding position. Break
Ties arbitrarily.
17create table rbistable as select p.firstname,
p.lastname, p.playerid, h.rbis, f.pos from
playerinfo p, hitting h, fielding f where
p.playerid h.playerid and f.playerid
h.playerid and f.year 2004 and
h.year 2004 and f.numgames (select
max(f2.numgames) from fielding
f2 where f2.playerid
f.playerid and f2.year f.year) create table
rbistable2 as select firstname, lastname, pos,
rbis, playerid from rbistable r1 where r1.rbis
(select max(r2.rbis) from rbistable r2
where r2.pos r1.pos) select
firstname, lastname, pos, rbis from rbistable2
r where playerid lt all (select playerid from
rbistable2 r2 where r2.pos r.pos)
1822 Rank the 2004 teams by their number of
wins. The output should contains a table with
two columns Team Name, and Rank (between 1 and
30), and it should be sorted by Rank. Two
teams with same number of wins will be ranked the
same, and the next rank will be skipped in that
case.
select t1.teamname, t1.wins, 31 -
count(t2.teamname) as rank from teams t1, teams
t2 where t1.year 2004 and t1.year t2.year
and t1.losses lt t2.losses group by
t1.teamname, t1.wins order by rank