Title: Database Technology
1 Database Technology
Prof. Hyoung-Joo Kim Internet Database Lab
School of Computer Sci Eng Seoul National
University
2Contents
- A general survey of DBMS
- History of DBMS
- Database market share
- The current DBMS trend
Research in IDB Lab.
3What is a Database?(1/10)
- DBMS
- A software system which provides the environment
enables to store and retrieve massive data
effectively
4What is a Database?(2/10)
- A large collection of data
- Data Programs
STORE
Database
5What is a Database?(3/10)
- Information about register and course of 40,000
students of the Seoul Natl Univ.
course term register grade prof
45 courses, 10K records per student
course term register grade prof
10K Byte 40,000 400M Byte
Others library, health center, S-card,
6What is a Database?(4/10)
- Information of SAT management
profile answer rate ranking
8K records per student
Profile Answer Rate ranking
Year 2006 550,000 Year 2005 570,000
8K Byte 550,000 4.4G Byte (109)
7What is a Database?(5/10)
- Information of mobile phone
phone number station time
60KB record per one
phone number station time
39M 60 Byte 5calls/day 365 days 4T Byte
Korea 2006.7
China 370M in 2005
8What is a Database?(6/10)
- Information of resident registration
SSN name addr domicile
10KB record per one
SSN name addr domicile
10K Byte 470 M 5T Byte (47millions)
9What is a Database?(7/10)
8billions Websites, 2billions indexing
terminology management
Usenet archive 700 Million messages
20KB/message 14 TB
10What is a Database?(8/10)
- Hubble space telescope data from Mars
Data constructed by 2005 over 12 TB
Constructing and sending 35GBs data abroad daily
11What is a Database?(9/10)
- NCBI (National Center for Biotechnology
Information)
- GenBank
- management of information of 165,000 species
- add 3millions new DNA sequence monthly
12What is a Database?(10/10)
Venture MacroGen SNU Medical School Early
version 900G Byte Final product 15T Byte
13What do we do with Database?(1/2)
- Record search
- Retrieve math grade of the student whose SSN is
840101-12121
740,000 5 records 3.7 M records
12ms to fetch a record and check content
3.7M 12ms 44.4Kseconds over 12 hours
If we use DBMS, it will be less than 0.1sec!
Statistical processing for population census
DBMS
Search for the purchase pattern on customer
groups
Search for the correlation between gene and
disease
14What do we do with Database?(2/2)
- Most (all?) computing applications use some type
of a database
CRM
ERP
MIS, ERP
Data Warehouse
OLTP
EDPS
Database
Database
Database
Database
15Database Management System (DBMS) (1/3)
Warehouse
16Database Management System (DBMS) (2/3)
Warehouse
Warehouse keeper
17Database Management System (DBMS) (3/3)
Database
Management of orders on-line
profile
product
customer
DBMS
user
Management of wages
sale
stock
Management of manager info.
Application
18DBMS Architecture
naive users
application programmers
casual users
database administrator
application programs
system calls
query
database scheme
data manipulation language pre-compiler
query processor
data definition language compiler
application programs object
database manager
DBMS
file manager
Disk storage
19A Sample Relational Database
20SQL
- SQL widely used commercial query language
- E.g. find the name of the customer with
customer-id 192-83-7465 select customer.customer-
name from customer where customer.customer-id
192-83-7465 - E.g. find the balances of all accounts held by
the customer with customer-id 192-83-7465 select
account.balance from depositor,
account where depositor.customer-id
192-83-7465 and depositor.account-number
account.account-number
21Major Commercial DBMS in 2006(1/3)
10g
22Major Commercial DBMS in 2006(2/3)
23Major Commercial DBMS in 2006(3/3)
24Database Companies in the World
25Contents
- A general survey of DBMS
- History of DBMS
- Database market share
- The current DBMS trend
Research in IDB Lab.
26Hierarchical, Network DBMS
The early 70
IMS (IBM), System/2000(MRA)
DMS 1100 (Sperry), Total (Cincom)
Advantage quick data access using link
Drawback impossible to make out independent
application
27Network Database example
Root Record
Customer records
Lowery
Maple
Queens
Hodges
SideHill
Brooklyn
Shiver
North
Bronx
Amount records
900
556
647
647
801
Query
Whats the total balance of Mr. Shiver in Bronx?
28Network DB query example
sum0 get first customer where
customer.nameShiver and customer.city
Bronx while DB_status 0 do begin
sumsumcustomer.amount get next customer
where
customer.name Shiver and
customer.city Bronx end print(sum)
29Relational DBMS
- The late 70 and early 80
- E.F.Codd, 1970 CACM paper, The Relational Data
Model - Relational Algebra Calculus
- The Spartan Simplicity!
- SQL Structured Query Language
- System/R - 1976, first commercial RDBMS
- Ingres - 1976, first academic RDBMS
30Relational DBMS example
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
Select sum(amount) from customer
where customer.name Shiver
and customer.cityBronx
31The advent of new DB application in 80 (1/4)
CAD/CASE/CAM massive design data
Artificial Intelligence Expert systems
Telecommunication
Multimedia IMAGE, TEXT, AUDIO, VIDEO, etc.
Rich data model DBMS function
32The advent of new DB application in 80 (2/4)
- Massive design data in CAD/CASE/CAM
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
Previous DATA
CAD DATA
33The advent of new DB application in 80(3/4)
- Artificial Intelligence Expert systems
Vehicle disorder
Symptoms
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
Control
Drive
Break
Handle
Gearbox
Engine
conclusion engine ECU disorder
Previous DATA
Expertise DATA
34The advent of new DB application in 80(4/4)
- Multimedia image, audio, video
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
Previous DATA
MULTIMEDIA DATA
35Advent of Object Oriented DBMS
17
36Feature of Object Oriented DBMS
Object-Oriented Paradigm support object, object
identity, go back to traversal Network DB? Class
hierarchy, inheritance
37Object Oriented Database example
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
ISA relationship
Is-part-of relationship
38OQL query of Object Oriented DBMS
select sum(customer.deposit.balance) from
Customer customer where customer.name
Shiver and customer.deposit.branch.city
Bronx
39Object Relational DBMS
1980 1985 ORDBMS Research Prototype PostGres
by UC Berkeley System/R Engineering Extension
Relational DBMS with Object Oriented function
Extension within SQL Tables! The early 90
OODBMS (Illustra, UniSQL, Mattise) downfall 1997,
Big3 ORDBMS advent
40Object Relational Database example
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
41Principal functions of Object Relational DBMS
LOB (large object) support
User defined type Stored procedure support
Abstract Data Type support
SQL procedure extension
Application domain specific extension support
Rule/trigger System support
Type Inheritance support
42Product of Object Relational DBMS
43Contents
- A general survey of DBMS
- History of DBMS
- Database market share
- The current DBMS trend
Research in IDB Lab.
44DBMS market share(1/2)
- Worldwide market share for biggest sellers of
corporate databases, 2005
15
48.6
22
Source Gartner Dataquest
45DBMS market share(2/2)
- Worldwide sales for biggest sellers of corporate
databases, 2005
6.7
3.0
2.1
billions of dollars
Source Gartner Dataquest
46Domestic DBMS market share
source Report for database industry and
perspective in Korea, 2004
47Domestic DBMS market sales
- Domestic market share for biggest sellers of
corporate databases, 2004
?57.2
?45.3
?25.1
billions of won
Source Gartner Dataquest, South Korea(2005)
48Preference in domestic market
Others 3
source Report for database industry and
perspective in Korea, 2004
49Contents
- A general survey of DBMS
- History of DBMS
- Database market share
- The current DBMS trend
Research in IDB Lab.
50XML Technology(1/2)
- The late 90 and now
- What is XML1)?
- Developed by the W3C
- Semi-structured text for dissemination and
publication - Self-describing
HTML
XML
lttrgt lttdgt ltfont
colorredgt?? lt/fontgt
lt/tdgt lttdgt???lt/tdgt lt/trgt lttrgt
lttdgt ltbgt??lt/bgt lt/tdgt
ltpersongt ltnamegt???lt/namegt
ltcitygt??lt/citygt ltagegt20lt/agegt
lt/persongt
Tagging for Display
Tagging for structure and semantics
1) eXtensible Markup Language
51XML Technology(2/2)
- Why XML
- Standard data format for storing and exchange
XML
ltpersongt ltnamegt???lt/namegt
ltcitygt??lt/citygt lt/persongt
52Semantic Web(1/2)
- ??? web
- 1) ??? ?? ???? ??? ??
- 2) ??? ??? ??? ??? ????? ??
- 3) ??? ?? ???? ???? ??? ??? ?? ?? ??
- ??? ???? ??? ?? ?? ??
search engine
Patient
53Semantic Web(2/2)
- Semantic web
- Semantic web?? ??? ??? ??? ??
- ??? ?? ???, ? ??? ??, ?? ??, ??
- 1) ??? software agent?? ?? ??
- 2) ? ??? ????? ???? ??? ????? software agent? ???
??? ???? ???? ??, ??? ??? ???? ?? ??? ??? ??? ?
clinics web pages (with Semantic web)
appointment schedule
Software Agents
Patient
54Knowledge discovery
Database
Data
Warehouse
useful,
interesting
hidden
Knowledge Discovery
information
Processing Data mining
apply
decision
55Data warehouse(1/2)
- Storing data of time
- Analyze the pattern in times
- Summarized data
- Observation data in various view point
- Non-volatile
Need for new data model
Dimensional model
56Data warehouse(2/2)
Sales Volumes
Jan
time
Product
Feb
C
Mar
B
A
Wong
Dewitt
Stonebreaker
Sales person
57Data mining(1/2)
- ?? ??
- ??? ?? ???? ???? ?????? ??? ??? ??, ??? ? ??? ???
? ?? ?????, ??, ???? ???? ????? ?? - ?? ??
- ??? ????? ?? ?? ??? ??? ? ?? ??? ???? ???? ?? ??
????data mining algorithm?? ?????? ??
58Data mining(2/2)
????
?? ??? ?? ??? 80? ??? ?? ?? ??? ???? ?? ??? 74?
??? ?? ??
????
?? ??? ??? ??? ??? ??? ?? ?? ?? ?? ??? ?? ??? ???
??
????
?? ???? (?, ??, ??), (??, ???, ??)? ?? ?? ?? ???
???? ?? ?,?? ??? ??
59The emerging challenges
Rapid development of H/W
Rapid spread of Web and Internet
Disks and RAM size Access time Bandwidth
Millions of users Connected on Web
New areas emerging
Sensor Streams, Scientific dataUncertain data,
Information privacy
60The Emerging Challenges
- Sophisticated Data type support
New DBMS
Structured data
temporal
Unstructured data
61The Emerging Challenges
- Sensor streams
- Battery constraint, communication cost
- Rapidly changing configuration(Sensors die or
disconnect) - Complex forms of information integrationLocate
a person from the heat, sound and vibration
sensors
62The Emerging Challenges
- Reasoning about uncertain data
- Scientific measurement errors
- Location data for moving objects
- Sequence, image and text similarity
Location data
Sequence data
Scientific measurement
63The Emerging Challenges
- Personalization
- Different person, different answer
- WEB CRM example
Web Site Entry
Page Views
Event Select product Insert item to Shopping Cart
Recommendation Engine
Personalized View of Recommendation
64The Emerging Challenges
- Privacy
- How to support the protection of personal or
sensitive information - Access by user and usage
- Include purpose description in query
Name income
We just want the statistics of the income not the
personal information !
Alice 25K
John 40K