Title: What is a Database
1What is a Database?
- Definition A database is a structure that can
store information about multiple types of
entities, the attributes of those entities, and
the relationships among the entities. (Pratt
Adamski, 2003)
2Traditional Data Processing
- Languages like COBOL and FORTRAN
- Data in files
- Not interfaced
- Some are STILL in use!
3Problems with this approach?
- Changes difficult
- Errors and inconsistencies data integrity
- Same data in multiple places
- Duplication and redundancy
4Advantages of a DBMS
- Reduction of redundancies and errors within the
data - Easier to extract information
- The system is more flexible
- Security is improved
- Integrity can be improved
- Data model needs to be produced
5Disadvantages of a DBMS
- Takes longer to access the data
- Need special knowledge
- Applications may be dependant to specific DBMS
versions - Initial cost may be large.
6Characteristics
- A good DBMS has the following
- Data integrity
- Security
- Concurrency control
- Transaction processing
- Recovery functions
7Database Lifecycle
8ER Terminology - Relationship
- A set of associations between one or more entity
type. (Connolly Begg, 2002) - Examples
- A Staff member works at a Branch
- A Branch has Staff members
works at
has
9ER Terminology - Cardinality
- Represented by the presence or absence of a
crows foot - One-to-One
- One-to-Many
- Many-to-Many
10ER Terminology - Optionality
- Whether the information is mandatory or optional
- A customer may not have an invoice
- An invoice will have a customer
- Represented by the circles and bars
11Example Relationship
12Developing Models
- First stage initial study
- gather information
- Second stage database design
- inc. development of the ER model based on the
information gathered - Iterative process (sometime you have to go back
to the first stage)
13Attribute
- State of an entity
- Correspond to columns in a table
- Book
- ISBN
- Author
- Name
- Publisher
14Primary Key
- Attribute(s) that are distinct for each entry
- Primary key is UNIQUE
- Examples Student Number, Book ISBN, PIN,
Drivers Licence Number, Course Code...
15Intelligent keys
- When business-related columns are used as primary
keys they are often called intelligent or natural
keys. - For example, if the user gives each customer a
unique customer number that value might naturally
serve as the primary key for the customer table
(e.g. StudentID, DriverLicenceID). - An alternative is to use system-generated
artificial primary key values. These are often
called surrogate keys because they are
replacements for the intelligent keys, or blind
keys because the user doesn't see them.
16Surrogate keys
- According to the Webster's Unabridged Dictionary,
a surrogate is an "artificial or synthetic
product that is used as a substitute for a
natural product." - That's a great definition for the surrogate keys
we use in data warehouses. A surrogate key is an
artificial or synthetic key that is used as a
substitute for a natural key.
17Foreign Keys
- A foreign key attribute set is a primary key
attribute in one table that is repeated as a
foreign key in another table - A non-key attribute in one relation that also
appears as a primary key in another relation
18What is Normalization?
- Reorganisation and evaluation of table structures
- Produces a set of stable, well structured tables.
- Remove repeating/redundant data
19Why Normalize?
- Remove redundant data
- Reduce update anomalies
- Reduce delete anomalies
- Reduce insert anomalies
20Normal Forms
- UNF a table that contains one or more repeating
groups - 1NF A relation is in 1NF is all value are
atomic and no rows are repeated. - 2NF A relation is said to be in 2NF if it is in
1NF and every non-key attribute is fully
functionally dependent on the primary key - 3NF A relation is said to be in 3NF if it is in
2NF and every non-key is non-transitively
dependent on the primary key - All attributes in every table must be determined
by the key, the whole key, and nothing but the
key
21Un-normalized Data
- Grade-Report
- sNumber sName sAddress uNumber uTitle iNam
e iRoom sMark - 38214 Bright 123 Smith St IS350 Database CODD B104
A - 38214 Bright 123 Smith St IS465 Sys
Anal KEMP B213 C - 69173 Smith 35 Main Rd IS465 Sys Anal KEMP B213
A - 69173 Smith 35 Main Rd PM300 Proj Mgt LEWIS D317
B - 69173 Smith 35 Main Rd QM440 OpSys KEMP B213 A
- What entities are present?
- What fields do they have?
22Normalization Conclusion
- All attributes in every table must be determined
by the key, the whole key, and nothing but the
key
23SQL (Structured Query Language)
- Developed by IBM in late 70s for DB2
- SEQUEL
- Used to
- Build
- Alter
- Query
24Data Manipulation Language
- Used to retrieve, add, modify and delete
- Commands
- SELECT
- INSERT
- UPDATE
- DELETE
- ANSI vs Various DBMS implementations
25SELECT STATEMENT
- SELECT column(s)
- FROM table(s)
- WHERE IS, NOT, NULL, , ltgt, gt , lt, IN, NOT IN,
BEETWEEN, LIKE - GROUP BY column(s)
- HAVING COUNT
- ORDER BY column(s) ASC, DESC
26SQL Functions
- Five functions
- AVG
- MAX
- MIN
- SUM
- COUNT
- MS-Access functions
- DATE()
- LCASE/UCASE
- MID
- LTRIM/RTRIM
- Etc
27Database Development Activities
- Six main phases
- Enterprise modelling
- Conceptual data modelling
- Logical database design
- Physical database design and creation
- Database implementation
- Database maintenance
28Data and Database Administration
- Two factors are driving the changes in the data
administration and DB administration - the availability of more technologies and
platforms that must be managed concurrently - The increased pace of business changes
29Growth Change
- What happens when the database grows?
- Analyse space utilization and performance
- Additional space allocated
- Change in structure
- Check if already available (where?)
- Y use current data
- N add new data-types or relationships
- Change in usage
- Move to faster devices
- Change placement (clustering)
- Change contents
30DA vs DBA
- Strategic planning
- Sets long term goals
- Sets Policies and standards
- Broad scope
- Long term
- Managerial orientation
- DBMS-independent
- Control and supervision
- Executes plans
- Enforces policies / procedures prog stand
- Narrow scope
- Short term (daily)
- Technical
- DBMS-specific
316 Types of Information Systems
- Transaction Processing Systems (TPS)
- Management Information Systems (MIS)
- Decision Support Systems (DSS)
- Executive Information Systems (EIS)
- Expert Systems (ES)
- Geographical Information Systems (GIS)
32Organisations
- Organisation classification (Mintzberg, 1979)
- Machine Bureaucracy
- Divisionalised Bureaucracy
- Entrepreneurial structure
- Professional
- Ad hoc racy
33Distributed DBMS
- Formal Definition
- A distributed database management system (DDBMS)
governs the storage and processing of logically
related data over interconnected computer systems
in which both data and processing functions are
distributed among several sites.
34Distributed DBMS Characteristics
- A collection of logically related shared data
- Data is split into a number of fragments
- Fragments may be replicated
- Fragments/replicates are allocated to sites
- The sites are linked by a communication network
- The data at each site is under the control of a
DBMS - The DBMS at each site can handle local
applications - Each DBMS participates in at least one global
application
35DDBMS Commandments
- Local site independence
- Central Site Independence
- Failure Independence
- Location Transparency
- Partitioning Transparency
- Replication Transparency
- Distributed Query Processing
- Distributed Transaction Processing
- Hardware Independence
- Operating System independence
- Network Independence
- Database Independence
36Distributed Database Design
- Data partitioning
- Horizontal partitioning
- Vertical partitioning
- Mixed
37Transaction processing terms
- A transaction is an atomic unit of work. i.e.
changes to the database that must be done
together. - For example, in discharging a patient from a
hospital the system needs to update tables to
indicate his bed is free and needs to update
tables to store his bill. Doing one without the
other would be incomplete.
38Transaction processing terms
- A program that changes data in tables is called
transaction. - Single user vs Multi user. DBMS may allow more
than one user to update data stored in the DB.
However users may update the same data at the
same time i.e. Concurrently - Concurrency control is a term that describes
procedures used to ensure correct interleaving of
transactions
39Transaction processing
- Begin transaction
- Read, Write
- End transaction
- Commit transaction
- Rollback
- Redo
- Undo
40Concurrency Control
- A program that changes data in tables (as an
atomic unit of work) is called transaction - Concurrency control (CC) controls flow in a
multi-user database - CC is managing and maintaining data integrity
- CC is related to transactions
41Two Types of Locks
- Read / Shared others allowed to read locked
data but cannot update. Placing a shared lock
prevents another user for placing exclusive lock
on that record set - Write / Exclusive prohibits other transaction
from accessing data (reading). A transaction
should put a write lock on record set when is
about to update that record set. Placing write
lock prevents users from placing any type of lock
on that record set - LOCK TABLE Statement
- LOCK TABLE tableName IN SHARE/EXCLUSIVE MODE
42Deadlocks
- Also referred to as Deadly Embrace
- Locking introduces this problem
- Example
- Application A puts read lock on table X
- Application B puts read lock on table X
- A requests write lock to X (wait)
- B requests write lock to X (wait)
43Deadlock
- Results when two or more transactions have locked
a common resource, and each waits for the other
to unlock that resource. - Unless the DBMS intervenes (by maintaining a
matrix of usage), the transactions will wait
indefinitely.
44Handling Deadlocks
- Mainly DBMS, partly programmer
- So far we covered locking called Pessimistic
concurrency control - Exclusive locks on all records until complete
- A newer approach to concurrency control called
versioning takes Optimistic approach that most
of the time other users do not want the same
record, or if they do, they only want to read
(but not update) the record.
45Versioning
- Each transaction timestamped
- Read requested latest timestamped version is
used - Read requested on item being updated
- update transaction rolled back and given later
timestamp
46Decision Support Systems (DSS)
- Methodology designed to extract information from
data accumulated over time - Arrangement of computerized tools to assist
decision making - Used at all levels tailored to focus on
specific areas or problems - Provides ability to conduct ad hoc queries
47Data Warehouse
- Data Warehouse (DW) is a broad based, shared
database for management decision making that
contains data that has been accumulated over time - Data must be high quality, aggregated, often
denormalised and is not necessarily absolutely
current
48Contrast
- Operational
- Current transactions
- Specific transactions
- Focus on atomic transactions
- Volume megabytes
- Scope narrow
- DSS Data
- Longer time frame
- Different levels
- DSS analysed from multi-dimensions
- Volume gigabytes
- Broad scope
49Data Warehouse Concepts
- The Data Warehouse is an integrated,
subject-oriented, time-variant, non-volatile
database that provides support for
decision-making
50Data Mart
- Data Marts are based on a limited number of
subjects (possibly one) - Support to small group of people
- Designed for local or departmental problems
- A large company will often have several Data
Marts.
51On-Line Analytical Processing (OLAP)
- Decision support methodology based on viewing
data in multiple dimensions - OLAP is well suited for querying and multi time
period trend analysis - OLAP concepts
- Drill down (yearly-gtmonthly-gtweekly)
- Slice (subset of the data e.g. products-gtshoes)
- Rotate (interchange the dimensions)
52Data Mining
- DSS tools reactive
- Data-mining proactive
- Automatically search for anomalies and possible
relationships