Title: Introduction to Database Design
1Introduction to Database Design
July 2006 Ken Nunes knunes _at_ sdsc.edu
2Database Design Agenda
- Introductions
- General Design Considerations
- Entity-Relationship Model
- Normalization
- Overview of SQL
- Star Schemas
- Additional Information
- QA
3General Design Considerations
- Users
- Application Requirements
- Legacy Systems/Data
4Users
- Who are they?
- Administrative
- Scientific
- Technical
- Impact
- Access Controls
- Interfaces
- Service levels
5Application Requirements
- What kind of database?
- OnLine Analytical Processing (OLAP)
- OnLine Transactional Processing (OLTP)
- Budget
- Platform / Vendor
- Workflow?
- order of operations
- error handling
- reporting
6Legacy Systems/Data
- What systems are currently in place?
- Where does the data come from?
- How is it generated?
- What format is it in?
- What is the data used for?
- Which parts of the system must remain static?
7Entity - Relationship Model
- A logical design method which emphasizes
simplicity and readability. - Basic objects of the model are
- Entities
- Relationships
- Attributes
8Entities
- Data objects detailed by the information in the
database. - Denoted by rectangles in the model.
Employee
Department
9Attributes
- Characteristics of entities or relationships.
- Denoted by ellipses in the model.
Employee
Department
Name
SSN
Name
Budget
10Relationships
- Represent associations between entities.
- Denoted by diamonds in the model.
Employee
Department
works in
Name
SSN
Name
Budget
Start date
11Relationship Connectivity
- Constraints on the mapping of the associated
entities in the relationship. -
- Denoted by variables between the related
entities. - Generally, values for connectivity are expressed
as one or many
Employee
Department
1
N
work
Name
SSN
Name
Budget
Start date
12Connectivity
one-to-one
Department
Manager
1
1
has
one-to-many
Department
Project
N
1
has
many-to-many
Employee
Project
N
M
works on
13ER example
- Retailer wants to create an online webstore.
- The retailer requires information on
- Customers
- Items
- Orders
14Webstore Entities Attributes
- Customers - name, credit card, address
- Items - name, price, inventory
- Orders - item, quantity, cost, date, status
cost
date
status
price
Name
credit card
Orders
Items
Customers
name
address
item
quantity
inventory
15Webstore Relationships
- Identify the relationships.
- The orders are recorded each time a customer
purchases items, so the customer and order
entities are related. - Each customer may make several purchases so the
relationship is one-to-many
1
N
Customer
Order
purchase
16Webstore Relationships
- Identify the relationships.
- The order consists of the items a customer
purchases but each item can be found in multiple
orders. - Since a customer can purchase multiple items and
make multiple orders the relationship is many to
many.
N
M
Order
Item
consists
17Webstore ER Diagram
credit card
name
address
Customers
1
purchase
date
status
N
Orders
Items
consists
M
N
item
quantity
cost
name
price
inventory
18Logical Design to Physical Design
- Creating relational SQL schemas from
entity-relationship models. - Transform each entity into a table with the key
and its attributes. - Transform each relationship as either a
relationship table (many-to-many) or a foreign
key (one-to-many and many-to-many).
19Entity tables
Transform each entity into a table with a key and
its attributes.
Employee
create table employee (emp_no number, name
varchar2(256), ssn number, primary key
(emp_no))
Name
SSN
20Foreign Keys
- Transform each one-to-one or one-to-many
relationship as a foreign key. - Foreign key is a reference in the child (many)
table to the primary key of the parent (one)
table.
create table department (dept_no number, name
varchar2(50), primary key (dept_no))
Department
1
has
create table employee (emp_no number, dept_no
number, name varchar2(256), ssn
number, primary key (emp_no), foreign key
(dept_no) references department)
N
Employee
21Foreign Key
Department
Accounting has 1 employee Brian Burnett Human
Resources has 2 employees Nora Edwards Ben
Smith IT has 3 employees Ajay Patel John
OLeary Julia Lenin
Employee
22Many-to-Many tables
- Transform each many-to-many relationship as a
table. - The relationship table will contain the foreign
keys to the related entities as well as any
relationship attributes.
Project
create table project_employee_details (proj_no
number, emp_no number, start_date
date, primary key (proj_no, emp_no), foreign
key (proj_no) references project foreign key
(emp_no) references employee)
N
Start date
has
M
Employee
23Many-to-Many tables
Project
Project_employee_details
Employee
Employee Audit has 1 employee Brian
Burnett Budget has 2 employees Julia
Lenin Nora Edwards Intranet has 3
employees Julia Lenin John OLeary Ajay Patel
24Normalization
- A logical design method which minimizes data
redundancy and reduces design flaws. - Consists of applying various normal forms to
the database design. - The normal forms break down large tables into
smaller subsets.
25First Normal Form (1NF)
- Each attribute must be atomic
- No repeating columns within a row.
- No multi-valued columns.
- 1NF simplifies attributes
- Queries become easier.
261NF
Employee (unnormalized)
Employee (1NF)
27Second Normal Form (2NF)
- Each attribute must be functionally dependent on
the primary key. - Functional dependence - the property of one or
more attributes that uniquely determines the
value of other attributes. - Any non-dependent attributes are moved into a
smaller (subset) table. - 2NF improves data integrity.
- Prevents update, insert, and delete anomalies.
28Functional Dependence
Employee (1NF)
Name, dept_no, and dept_name are functionally
dependent on emp_no. (emp_no -gt name, dept_no,
dept_name) Skills is not functionally dependent
on emp_no since it is not unique to each emp_no.
292NF
Employee (1NF)
Employee (2NF)
Skills (2NF)
30Data Integrity
Employee (1NF)
- Insert Anomaly - adding null values. eg,
inserting a new department does not require the
primary key of emp_no to be added. - Update Anomaly - multiple updates for a single
name change, causes performance degradation. eg,
changing IT dept_name to IS - Delete Anomaly - deleting wanted information.
eg, deleting the IT department removes employee
Barbara Jones from the database
31Third Normal Form (3NF)
- Remove transitive dependencies.
- Transitive dependence - two separate entities
exist within one table. - Any transitive dependencies are moved into a
smaller (subset) table. - 3NF further improves data integrity.
- Prevents update, insert, and delete anomalies.
32Transitive Dependence
Employee (2NF)
Dept_no and dept_name are functionally dependent
on emp_no however, department can be considered a
separate entity.
333NF
Employee (2NF)
Employee (3NF)
Department (3NF)
34Other Normal Forms
- Boyce-Codd Normal Form (BCNF)
- Strengthens 3NF by requiring the keys in the
functional dependencies to be superkeys (a column
or columns that uniquely identify a row) - Fourth Normal Form (4NF)
- Eliminate trivial multivalued dependencies.
- Fifth Normal Form (5NF)
- Eliminate dependencies not determined by keys.
35Normalizing our webstore (1NF)
orders
items
order_id cust_id item_id quantity cost date status
405 45 34 2 100 2/306 shipped
405 45 35 1 50 2/306 shipped
405 45 56 3 75 2/306 shipped
408 78 56 2 50 3/5/06 refunded
410 102 72 2 150 3/10/06 shipped
410 102 81 1 175 3/10/06 shipped
item_id name price inventory
34 sweater red 50 21
35 sweater blue 50 10
56 t-shirt 25 76
72 jeans 75 5
81 jacket 175 9
customers
cust_id name address credit_card_num credit_card_type
45 Mike Speedy 123 A St. 45154 visa
45 Mike Speedy 123 A St. 32499 mastercard
45 Mike Speedy 123 A St. 12834 discover
78 Frank Newmon 2 Main St. 45698 visa
102 Joe Powers 343 Blue Blvd. 94065 mastercard
102 Joe Powers 343 Blue Blvd. 10532 discover
36Normalizing our webstore (2NF 3NF)
customers
credit_cards
cust_id name address
45 Mike Speedy 123 A St.
78 Frank Newmon 2 Main St.
102 Joe Powers 343 Blue Blvd.
cust_id num type
45 45154 visa
45 32499 mastercard
45 12834 discover
78 45698 visa
102 94065 mastercard
102 10532 discover
37Normalizing our webstore (2NF 3NF)
items
item_id name price inventory
34 sweater red 50 21
35 sweater blue 50 10
56 t-shirt 25 76
72 jeans 75 5
81 jacket 175 9
order details
orders
order_id item_id quantity cost
405 34 2 100
405 35 1 50
405 56 3 75
408 56 2 50
410 72 2 150
410 81 1 175
order_id cust_id date status
405 45 2/306 shipped
408 78 3/5/06 refunded
410 102 3/10/06 shipped
38Revisit webstore ER diagram
address
Customers
Credit card
have
1
name
N
1
card number
card type
purchase
N
status
Orders
date
name
price
inventory
1
consists
quantity
N
M
N
Items
Order details
consists
cost
39Structured Query Language
- SQL is the standard language for data definition
and data manipulation for relational database
systems. - Nonprocedural
- Universal
40Data Definition Language
- The aspect of SQL that defines and manipulates
objects in a database. - create tables
- alter tables
- drop tables
- create views
41Create Table
name
address
create table customer (cust_id number, name
varchar(50) not null, address varchar(256) not
null, primary key (cust_id)) create table
credit_card (cust_id number not null,
credit_card_type char(5) not null,
credit_card_num number not null, foreign key
(cust_id) references customer)
Customer
1
have
N
Credit card
card number
card type
42Modifying Tables
alter table customer modify name
varchar(256) alter table customer add
credit_limit number drop table customer
43Data Manipulation Language
- The aspect of SQL used to manipulate the data in
a database. - queries
- updates
- inserts
- deletes
44Data Manipulation Language
- The aspect of SQL used to manipulate the data in
a database. - queries
- updates
- inserts
- deletes
45Select command
- Used to query data from database tables.
- Format
- Select ltcolumnsgt From lttablegt
- Where ltconditiongt
46Query example
customers
cust_id name address
45 Mike Speedy 123 A St.
78 Frank Newmon 2 Main St.
102 Joe Powers 343 Blue Blvd.
Select name from customers result Mike
Speedy Frank Newmon Joe Powers
47Query example
customers
cust_id name address
45 Mike Speedy 123 A St.
78 Frank Newmon 2 Main St.
102 Joe Powers 343 Blue Blvd.
select name from customers where address 123 A
St. result Mike Speedy
48Query example
customers
credit_cards
cust_id name address
45 Mike Speedy 123 A St.
78 Frank Newmon 2 Main St.
102 Joe Powers 343 Blue Blvd.
cust_id num type
45 45154 visa
45 32499 mastercard
45 12834 discover
78 45698 visa
102 94065 mastercard
102 10532 discover
select from customers where customers.cust_id
credit_cards.cust_id and type
visa returns
Cust_id Name Address Cust_id Num type
45 Mike Speedy 123 A St. 45 45154 visa
78 Frank Newmon 2 Main St. 78 45698 visa
49Changing Data
There are 3 commands that change data in a
table. Insert insert into lttablegt (ltcolumnsgt)
values (ltvaluesgt) insert into customer
(cust_id, name) values (3, Fred
Flintstone) Update update lttablegt set
ltcolumngt ltvaluegt where ltconditiongt update
customer set name Mark Speedy where cust_id
45 Delete delete from lttablegt where
ltconditiongt delete from customer where cust_id
45
50Star Schemas
- Designed for data retrieval
- Best for use in decision support tasks such as
Data Warehouses and Data Marts. - Denormalized - allows for faster querying due to
less joins. - Slow performance for insert, delete, and update
transactions. - Comprised of two types tables facts and
dimensions.
51Fact Table
- The main table in a star schema is the Fact
table. - Contains groupings of measures of an event to be
analyzed. - Measure - numeric data
Invoice Facts
units sold unit amount total sale price
52Dimension Table
- Dimension tables are groupings of descriptors and
measures of the fact. - descriptor - non-numeric data
Customer Dimension
Time Dimension
cust_dim_key name address phone
time_dim_key invoice date due date delivered date
Location Dimension
Product Dimension
loc_dim_key store number store address store
phone
prod_dim_key product price cost
53Star Schema
The fact table forms a one to many relationship
with each dimension table.
Customer Dimension
Time Dimension
1
1
cust_dim_key name address phone
time_dim_key invoice date due date delivered date
Invoice Facts
N
N
cust_dim_key loc_dim_key time_dim_key prod_dim_ke
y units sold unit amount total sale price
Product Dimension
Location Dimension
N
prod_dim_key product price cost
N
loc_dim_key store number store address store
phone
1
1
54Analyzing the webstore
- The manager needs to analyze the orders obtained
from the webstore. - From this we will use the order table to create
our fact table.
Order Facts
date items customers
55Webstore Dimension
We have 2 dimensions for the schema customers
and items.
Item Dimension
Customer Dimension
item_dim_key name price inventory
cust_dim_key name address credit_card_type
56Webstore Star Schema
Order Facts
date items customers
N
N
1
1
Item Dimension
Customer Dimension
item_dim_key name price inventory
cust_dim_key name address credit_card_type
57Books and Reference
- Database Design for Mere Mortals,
- Michael J. Hernandez
- Information Modeling and Relational Databases,
- Terry Halpin
- Database Modeling and Design,
- Toby J. Teorey
58Continuing Education
UCSD Extension Data Management Courses DBA
Certificate Program Database Application
Developer Certificate Program
59Data Central
- The Data Services Group provides Data Allocations
for the research community. - http//datacentral.sdsc.edu/
- Tools and expertise for making data collections
available to the broader scientific community. - Provide disk, tape, and database storage
resources.