LIS 557 Database Design and Management - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

LIS 557 Database Design and Management

Description:

1970 Invented by Edgar (Ted) Codd (IBM) 'A Relational Model of ... 2002 Forbes names relational databses as one of the most important modern innovations ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 32
Provided by: comminfo
Category:

less

Transcript and Presenter's Notes

Title: LIS 557 Database Design and Management


1
LIS 557Database Design and Management
  • William Voon
  • Michael Cole
  • Spring '04

2
Relational Databases
  • 5 February 2004

3
  • "Simplicity does not precede complexity, but
    follows it." - Alan J. Perlis

4
The Relational Database
  • 1970 Invented by Edgar (Ted) Codd (IBM)
  • "A Relational Model of Data for Large Shared
    Data Banks"
  • http//www.acm.org/classics/nov95/toc.html
  • 1981 Turing Award
  • 2002 Forbes names relational databses as one of
    the most important modern innovations
  • http//www.forbes.com/forbes/2002/1223/172_print.h
    tml

5
The Problem
  • Databases were already important, but computer
    time cost hundreds of dollars a minute.
  • So, programs needed to be as efficient as
    possible before execution.
  • Early databases had
  • a rigid hierarchical structure, or
  • a complex system of navigational pointers to the
    physical locations of the data on magnetic tapes.
  • It took groups of programmers to write database
    queries (CODASYL). Every query required a new
    program. New data types required a redesign of
    the database. The price of efficiency was utter
    inflexibility.

6
Codd's Insight
  • Relationships between data items should be based
    on the item's values, and not on separately
    specified linking or nesting.
  • A big simplification for queries since we only
    talk about the data
  • Allows flexibility to use existing data in new
    ways
  • Users should be able to work at a more
    natural-language level and not have to know
    anything about the details of the data storage

7
Codd's RDB
  • This separation of the data from the details of
    its storage, etc. requires much more computing
    power to keep track of those details
  • In this respect relational databases are far less
    efficient than thier predecessors, but they are
    spectacularly more cost effective
  • Relational databases really took off as computing
    power continued to improve (Moore's Law)
  • Today, relational databases are dominant.

8
RDB development
  • 1973 - IBM begins development in the System R
    project
  • Structured Query Language (SQL), to express
    queries (Don Chamberlin and Ray Boyce)
  • A cost-based optimizer, which automatically
    translates a high-level query into an efficient
    plan for executing the query (Pat Selinger)
  • A query compiler, to save query plans for future
    use (Raymond Lorie)
  • Boyce and Codd also developed Boyce-Codd Normal
    Form to design efficient RDB tables without
    needless duplication of information in different
    tables

9
Early RDB Commercialization
  • 1970s INGRES UC-Berkeley (Mike Stonebraker)
  • 1977 Relational Software Inc. (Larry Ellison)
    produces first commercially available relational
    database.
  • 1981 IBM announces SQL/DS.
  • 1983 IBM introduces DB2 for large mainframes.
    It now stores data on handhelds to supercomputers
    and handles billions of transactions per day
  • 1983 Relational Software renamed Oracle

10
The RDBMS
  • Handles the complex physical details of the
    location and actual relationships between the
    data sets
  • Presents the data model so that the user sees the
    database as a collection of tables
  • In designing and using the database we need only
    think about relationships between tables

11
Tables
Columns store attributes (same data type for each
column!)
Rows (tuples) store entities
Table entity set relation
12
A flat version of a database
Why is this inefficient?
13
Relating Tables
Table name CUSTOMER
Table name SALESREP
The tables are independent, but we can still
learn that Nam June Paik is Sonny Blount's sales
rep or that Paik handles both Blount and John
Gilmore (and that makes sense).
14
What's the big deal?
  • Complex things (the point of the database) can be
    broken into logical units (perhaps along the
    lines of the data domains)
  • The units (aka tables) can be selected so that
    they minimize the redundant data in all of the
    tables

15
Nature of Inefficiency
  • Duplicated data
  • Natural groupings are associated with types of
    queries

16
Anatomy of a Table
  • A table (relation) is a 2-D structure
  • Each row (tuple) defines a single entity
  • Each column defines the attributes of the
    entities in the set
  • Each row/column element has a single data value
    of a type appropriate to the attribute (and
    allowed by the software)
  • Order of rows and columns does not matter to the
    user
  • Every table has a primary key

17
Column Attribute
  • Each column represents an attribute
  • Each column has a unique name
  • The values in the column are of a single type
  • Range of possible values is the column's domain

18
Row Entity
  • A row corresponds to an entity, so reading across
    the columns we see all of the specific attributes
    for that entity
  • Taken together the rows are the entity set.
    Notice that this is logically equivalent to the
    table itself.

19
Null Values
  • Can be useful
  • Placeholder for an unknown attribute value
  • Known, but missing attribute value
  • A 'not applicable' condition

20
Data Types
  • Options are defined by the software
  • Should be appropriate to the attribute
  • Standard data types
  • Numeric
  • Character
  • Date
  • Logical
  • Other data types may be supported

21
Numeric Data
  • Numbers that are arithmetically meaningful
  • It makes sense to add, subtract, etc.
  • Other uses of numbers are handled as character
    data
  • e.g. Telephone numbers, credit card identifiers,
    zip codes

22
Character Data
  • Also called string data
  • Any set of characters that is not intended for
    mathematical manipulation

23
Date Data
  • Calendar dates are stored in a Julian date format
    invented by Joseph Scaliger in 1583
  • the Julian date is the number of elapsed days
    since the beginning of a cycle of 7,980 years
  • The system allows easy calculation of the integer
    difference between two calendar dates

24
Julian Dates
  • The 7,980 year cycle was derived by combining
    several traditional time cycles (solar, lunar,
    and a particular Roman tax cycle) for which 7,980
    was a common multiple.
  • The first Julian cycle began on January 1, 4713
    B.C. and will end on January 22, 3268.
  • February 5, 2004 _at_ 190000 EST
  • 2453041.5000
  • A calendar calculator http//www.fourmilab.com/do
    cuments/calendar/

25
Logical Data
  • True / False
  • Can perform Boolean logical calculations (AND,
    OR, NOT) on this data

26
Keys
  • The RDB works because of controlled redundancy
  • Tables share common attributes, so they can be
    linked
  • The idea is that a specific attribute in one
    table can be used to point to a row (entity) in
    another table

27
Key Types
  • Superkey
  • Can be an attribute or a combination of
    attributes that uniquely identifies each entity
    (row) in a table
  • Candidate key
  • A minimal superkey (A superkey that contains no
    superkeys)
  • Primary key
  • A candidate key that identifies all the values in
    a row (entity)

28
The Primary Key
  • Each table must have a primary key
  • There must be a unique key for each entity (row),
    null values are not allowed
  • The primary key is the identifier for each entity
  • A table with this property has entity integrity

29
Key Types
  • Secondary key
  • An attribute (or combination) used only for data
    retrieval
  • Foreign key
  • An attribute (or combination) in one table whose
    values must match the primary key in another
    table or be null.

30
Integrity
  • Entity integrity depends on the primary key
  • Ensures each entity has a unique identifier
  • Referential integrity depends on the foreign
    key(s)
  • Makes it impossible for an attribute to have an
    invalid entry (might be null)

31
Assignment
  • Design a collection of tables for a database
  • Briefly, what is the database about? Who are the
    users? What purposes will the database serve?
  • At least three related tables
  • Explain why this is the most efficient design
  • Is this also the most logical design?
Write a Comment
User Comments (0)
About PowerShow.com