Title: 376a. Database Design
1376a. Database Design
- Dept. of Computer Science
- Vassar College
- http//www.cs.vassar.edu/cs376
2Introduction
- What are databases?
- What are the different types of databases.
- Data modeling
- Access
- Security
3Who am I
- Bill Yoshimi
- Office 106 Old Laundry Building
- Phone 437-5986
- Email yoshimi_at_cs.vassar.edu
- Office Hours 400pm-500pm MW
4Required Texts
- Fundamentals of Database Systems,
Elmasri/Navathe, Benjamin Cummings.
5How course will be graded
- 40 - Homework/programming assignments
- 20 - Midterm
- 30 - Final Exam
6Programming assignments
- One homework/programming assignment per week
(consisting of 3 or 4 exercises). - Absolutely essential for learning the material.
- Try to hand assignments in on time, future
assignments depend on knowledge learned from
previous assignments.
7Late policy
- All assignments are due at the start of class on
the date specified. - Assignments arriving after the start of class but
before the start of the next class will be
accepted with a 10 penalty. Late assignments
will not be accepted after this point.
8Tests
- Closed book and closed notes.
- Each test is cumulative.
- Tests will be announced later in semester.
9Original authoring of materials
- Talking to classmates about general ideas is OK.
- Each student must do his or her programming
assignment entirely by himself or herself. - Do not discuss or share programs with other
students. - Vassar regulations require the professor to
report suspected violations of academic integrity
to the Dean of Studies. - Read the Originality and Attribution pamphlet.
10- Now, were finished with the administrative
tasks, on with the course
11Define database
- A collection of related data, logically connected
and organized. - A database management System (DBMS) enables users
to create and use a database
12Simplified Database System
Application Programs/Queries
Software to process queries/programs
Software to access stored data
13Example
- STUDENT file stores data on each student.
- COURSE file stores data on eahc course
- SECTION file stores data on each section of a
course - GRADE_REPORT file stores grades that students
receive in the various sections they complete - PREREQUISITE file stores prerequisites for each
course.
14STUDENT file
- Each student has
- Name
- StudentID
- Class
- Major
- Define the types of each of these
15Look at relationships between these files
PREREQUISITE
COURSE
STUDENT
SECTION
GRADE_REPORT
16Basic operations
- Queries (list the names of all students enrolled
in Databases) - Updates (change Smiths section from 1 to 2)
- These are informal queries (must be fully
specified before real query can be made against
database)
17Old way to access data
- Flat file system.
- Easiest to implement quickly.
- Each system has its own copy of the data.
- Every programmer must keep track of their own
interfaces, must maintain consistency themselves.
18Fundamental characteristic of a database
- Should be self describing.
- System catalog contains info about all data,
fields, constraints and relationships between
data. (Also called meta-data) - Database program should be application agnostic.
(as long as relationships can be described by
meta-data.) - Information is looked up according to meta-data
definitions. In ordinary files, this information
is encoded in accessor routines.
19Databases provide
- Program-data independence. Change in physical
file structure should not change access routines. - Data abstraction all details about how data is
stored is hidden (access is via data model.) - Multiple views subsets of data and virutal
data.
20Who accesses the database?
- Database administrators (DBA) authorizing
access, coordinating and monitoring use. - Database designers identify data and how it
will be stored in DB, understand requirements,
devise views. - End users
- Casual end users use query language to access
database. - Naïve or parametric end users use canned
queries (little variation.) - Sophisticated end users thoroughly familiar
with intricacy of database, make sophisticated
queries. - System Analyst determine needs of Naïve and
parameteric users - Application programs implement SA reqs.
21Behind the scenes
- DBMS designers and implementers maintain
modules inside the database (query processors,
data access, security, etc.) - Tool developers Facilitate design and improve
performance. (performance monitoring tools,
natural language interfaces, etc.) - Operators and maintenance personnel
22Why use a DBMS?
- Control redundancy
- Shared data (concurrent access)
- Restrict unauthorized access
- Provide multiple interfaces to data
- Represent complex relationships between data.
- Enforce integrity constraints
- Backup and recovery
23Lesser advantages
- Ability to enforce standards across organization.
- Flexibility
- Reduced development time for new applications.
- Work with up to date data
- Economies of scale
24When not to use a DBMS
- Database and application are simple, well
defined, not expected to change often. - System has stringent real-time constraints that
DBMS my not meet. - Multiple access to data is not required.
- DBMS are high overhead systems (hardware,
planning, training and maintenance).
25DBMS Concepts and Architecture
26Data model
- Describes structure of the database (types,
relationships and constraints.) - High level (conceptual) vs. low level (physical)
vs. implementation data models - High level (object-based models)
- Entity an object
- Attribute some quality or quantity associated
with an object - Relationship how objects are related
- Implementation level
- Relational, network, hierarchical.
27Database Schema
- Description of the database. Specified during
design time and not expected to change. - Schema Diagram data model specific convention
for displaying schemas. - Schema Construct single object description in a
schema.
28Schema Diagram example
STUDENT
Name StudentNumber Class Major
COURSE
CourseName Number CreditHour Dept
SECTION
SectionID CourseNumber Semester Year
29Understand
- Data changes frequently, data scheme, if well
designed, should not change. - The data in the database at any time is called
the database instance, occurance or state. Any
insert, delete or change converts DB from one
instance to another. - DBMS is responsible for verifying that each
instance of the DB adheres to schema. - Scheme also called intension, instance
extension of the schema.
30Three Schema Architecture
- Supports program data independence, multi-user
views, and catalog to store DB schema. - Internal level has internal scheme physical
structure of DB and access paths. - Conceptual level has conceptual schema
describes entities, datatype, relationships, and
constraints. - External or view level has multiple external
schemas or user views. views for a particular
audience.
31Schemas are only descriptions of data
- Data only exists at the physical level.
- Mappings are used to convert from schema to
schema (walk through an example).
32How schema preserve data independence.
- Logical data independence changes to conceptual
scheme do not affect external schemas. - Physical data independence changes to internal
schema (physical layer) do not affect conceptual
schema. - In reality, difficult to implement (also multiple
level mappings add overhead to queries.)
33Interfaces to Databases
- Languages
- Data Definition Language (DDL) used to define
internal and conceptual schemas - Storage Definition Language (SDL) mapping
between conceptual and internal schema. - View Definition Language (VDL) defines user
views and mapping to conceptual - Data Manipulation Language (DML) methods for
retrieving, inserting, deleting and modifying
data. - High-level (non procedural) set-at-a-time
- Low-level (procedural) must be embedded.
Record-at-a-time.
34How languages are used
- Data Manipulation calls are made either directly
(query language) or embedded in a host language
(data sublanguage) - Naïve and parametric users have user-friendly
interfaces
35User Friendly DBMS Interfaces
- Menus, graphical interfaces, forms, natural
language, command languages for parametric users,
DBA interfaces - Are these really user friendly?
36Classification of DBMS
- Relational, network, hierarchical, other.
- Single vs. multi-user
- Number of sites (centralized or distributed).
- Homo vs. heterogeneous (federated)
- Cost
- Types of access path
- General vs. special purpose
37Main criteria the data model
- Relational data organized in tables, high level
query language, limited form of user views.
Conceptual and internal views are not
distinguishable. - Network set of records and implements limited
1N relationships. Must be embedded in host
programming language. - Hierarchical include parent-child relationships.
38Database design process
- Collect requirements and analyze results.
- Create conceptual schema (conceptual database
design) (datatypes, relationships, and
constraints.) - Implement database (data model mapping)
- Design physical database (specify internal
storage structures and file organizations)
39Example a company database
- Company is organized in departments. Each
department has a name, a number and an employee
who manages the department. Track when the
employee started managing the department.
Department may have several locations. - Departments have projects, each has a name, a
number and a location. - Employees have SSN, address, salary, sex and
birthdate. Employee is assigned to one
department but may work on multiple projects.
Track of hours worked per week and on which
projects. Track employees direct supervisor.
40Entity
- A thing in the real world. Can be real or
conceptual. - Entity has particular properties attributes
- E.g.
E1
Name John Addr 1 J St, Apt 2, NY, NY
10002 Age 55 Phone 222 222 2222
41Attributes
- Atomic vs. composite.
- Atomic or simple are not decomposable.
- Composite attributes can form a hierarchy (street
address number, street, apartment) - Composite attributes useful when parts are
referenced as well as whole.
42Attributes cont.
- Single valued vs. multivalued.
- Derivable attributes (age from BD, of employees
by sum) - NULL- not applicable or unknown
43Entity Types
- Similar entities (having similar attributes but
with different values) - Described using Entity Type Schema (intension)
common structure shared by entities. - E.g. EMPLOYEE has Name, Age, Salary
- Sets of instances valid at a given point in time
is called an extension of the entity type. - E.g. (John Smith, 45, 80k) (Fred Brown, 32, 20K)
- Entity type schema should not change extensions
change often.
44Key Attribute
- Uniqueness constraint on attributes.
- An entity type usually an attribute whose values
are distinct for each individual entry. - Key attribute is used to uniquely identify an
entity. - E.g. PERSON has SocialSecurityNumber, COMPANY
has Name. - No two instances can have the same Key Attribute.
- Some entities can have more than one Key
Attribute.
45Value Sets of Attributes
- A Attribute
- E Entity Type
- V Value Set
- AE -gt P(V)
- P(V) is the Power Set of V the set of all subsets
of V.
46Value Sets of Attributes cont.
- A(e) value of attribute A for entity e
- A(e) is a singleton for SV attributes (has only
one element) - A(e) may be empty set, single element or multiple
element for multivalued attribute. - A(e) for composite attribute is cartesian product
(all pairs mapping) of P(V1)xP(V2)xP(V3) - Use () and comma separted list to denote
composite attribute. - Use and comma separated list to denote
multi-valued attribute
47Value Set of Entity cont.
- E.g. if a person can have more than one residence
and each residence can have more than one phone
then the attribute AddressPhone - AddressPhone(Phone(AreaCode,PhoneNumber),
Address(StreetAddress(Number, Street,
ApartmentNumber),City,State,Zip))
48Relationships
- Relationship type R among N entities E1EN is a
set of associations among these types. - R is a set of ri where each ri is an n tuple of
(e1, e2, en) and each ej in ri is a member of
entity type Ej 1ltjltN
49Example
EMPLOYEE
DEPARTMENT
WORKS_FOR
e1 e2 e3 e4 e5 e6 e7
r1 r2 r3 r4 r5 r6 r7
d1 d2 d3 d4 d5
50Degree of a Relationship Model
- Degree is number of participating Entity Types.
- In previous example, degree is 2 or binary. (most
used form) - Ternary relationship type has three Entity Types.
(holds more information than 3 binary
relationships). - Connection trap can occur when 3 binary relations
used instead of ternary relation.
51Relations as Attributes
52Example
53 54Entity-Relationship Model
55(No Transcript)
56Example
Student Name StudentID Class Major
Smith 17 1 COSC
Brown 8 2 COSC
57Quoting
- Quoting by using (quote ) or protects the
symbol that follows like (quote helloworld) or
helloworld
58Summary
- State what has been learned
- Define ways to apply training
- Request feedback of training session
59Where to Get More Information
- Other training sessions
- List books, articles, electronic sources
- Consulting services, other sources