Title: Based on slides developed by
1ICS 214B Transaction Processing and Distributed
Data Management
Lecture 17 Providing Database as a
Service Professor Chen Li
- Based on slides developed by
- Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra
- ICDE 2002, San Jose, CA, USA
2Talk Outline
- Software as a Service
- Database as a Service
- NetDB2 System
- Challenges for Database as a Service
- User Interface Issues
- Performance Issues
- Data Privacy Issues
- Data Encryption in DBMSs for Data Privacy
- Conclusion
3Software as a Service
- Get
- what you need
- when you need
- Pay
- what you use
- Dont worry
- how to deploy, implement, maintain, upgrade
4Software as a Service
- Driving forces to paradigm shift
- Faster, cheaper, more accessible networks
- Rise of distributed architectures
- Virtualization in server and storage technologies
- Established e-business infrastructures
- Hardware/Software is not the largest in total
cost of ownership - User Operations 46
- Technical Support 24
- Capital Cost (HW/SW) 21 (Source Gartner
Group) - Hardware, software, network costs have been
decreasing more sharply than personnel cost
5Software as a Service
- Already in the market as
- storage services, disaster recovery services,
e-mail services, rent-a-spreadsheet services etc. - Sun ONE, Oracle Online Services, Microsoft .NET
My Services etc. - Why not Database as a Service ?
6Database as a Service - Why?
- Organizations need data management
- DBMSs are complex systems to deploy, setup,
maintain - requires highly skilled people (DBAs etc.) with
high cost
7Database as a Service - Offerings
- Inherits all advantages of software as a service,
plus - Service provider allows mechanisms to
- create, store, access databases
- DB management transferred to service provider for
- backup, administration, restoration, space
management, upgrades - Clients use the services providers HW, SW,
personnel instead of their own
8NetDB2 - Database Service Provision
- Developed in collaboration with University of
California, Irvine and IBM - Deployed on the Internet over a year ago
- Been used by 15 universities and more than 2500
students to help teaching database classes - Currently offered through IBM Scholars Program
9NetDB2 System Architecture
- Three tier architecture
- Client - as thin as possible - just a browser
- Java based implementation
- Backed by fail-over solutions
- Allows expansions and user driven integration for
application development
10Database as a Service - Issues
- Issues to address
- User Interface
- Performance
- Data Privacy
11User Interface
- Simple yet powerful
- supports SQL queries, scripts, UDFs, stored
procedures, metadata, data upload - Consistent
- Region-based composition
- Expansion/Integration
- User defined interfaces
12Performance
- Interaction in a different medium - network
- Performance should -at least- match what we have
already - Experimented with TPC-H database and queries
13Data Privacy
- Users give control of their data to service
provider - Attacks on stored data is a well known problem
- So, they need data security in place
- Security of data over the network is well studied
- SSL,TSL
- Establish security for stored data
- even it is stolen should not make sense ?
Encryption !
14Encryption Alternatives
- Implementation Level
- Software v.s. Hardware encryption
- Granularity of Data
- Field (Attribute) level
- Row (Record) level
- (Disk) Page level
?
15Encryption Alternatives (2)
- Field level encryption
- Pros
- Easier to implement and integrate
- Flexible
- Allows selective encryption, reduces number of
bytes to encrypt/decrypt - Cons
- Increases encryption overhead significantly due
to invocation cost - Data size expansion (for block cipher algorithms)
- Current optimization technologies do not handle
foreign functions well
16Encryption Alternatives (3)
- Row level encryption
- Pros
- Reduces the data size expansion problem
- Reduces invocation cost
- Better security because of total encryption
- Cons
- Does not allow selective encryption, increases
the number of bytes to encrypt/decrypt - Implementation and integration can be hard when
row functions are not supported
17Encryption Alternatives (4)
- Page level encryption
- Pros
- Significantly reduces encryption/decryption
overhead due to reduced invocation cost - Eliminates data size expansion problem (for block
ciphers) - Better security because of total encryption
- Cons
- Implementation and integration is not
straightforward - Increases the number of bytes to encrypt/decrypt
each time - Higher update/delete cost, requires re-encryption
of all affected pages
18Encryption Alternatives Experiments
- Experimented with TPC-H database and queries
- Encryption scheme alternatives (V evaluated,
not evaluated)
Data Granularity Implementation Field
Level Row Level Page Level Software Encryption
V Hardware Encryption
V V
19Software - Field Level Encryption
- Block Cipher Algorithm - Blowfish
- Implemented as foreign function (UDF)
- Sample insert
- insert into lineitem (discount) values
(encrypt(10,key)) - Sample select
- select decrypt(discount,key) from lineitem where
custid 300
20Software - Field Level Encryption (2)
- Creator supplies the key
- Unauthorized person can not get hold of the key
- protection even from the service provider at some
level - User can easily implement different encryption
algorithm and check that into the system - different encryption algorithm/key can be used
for different fields
21Software - Field Level Encryption (3)
- TPC-H queries, except Q1
- Only one field (l_discount of lineitem table)
encrypted - Introduced very large overhead
22TPC-H Query 1
- Problem Multiple decryption on same field
- select
- l_returnflag, l_linestatus,
- sum(l_quantity) as sum_qty,
- sum(l_extendedprice) as sum_base_price,
- sum(l_extendedprice (1 - l_discount)) as
sum_disc_price, - sum(l_extendedprice (1 - l_discount) (1
l_tax)) as sum_charge, - avg(l_quantity) as avg_qty,
- avg(l_extendedprice) as avg_price,
- avg(l_discount) as avg_disc,
- count() as count_order
- from tpcd.lineitem
- where l_shipdate lt date ('1998-12-01') - 90 day
- group by l_returnflag, l_linestatus
- order by l_returnflag, l_linestatus
23Query Rewrite to Improve Performance
- Problem Multiple decryption on same field (e.g.,
TPC-H Q1) - CSE based algorithm to eliminate redundant
decryptions - Use temporary view
24Hardware - Row Level Encryption
- Specialized hardware IBM S/390 Cryptographic
Coprocessor under IBM OS/390 - editproc facility
- invoked for whole row
- upon read/write request, encrypt/decrypt is
invoked from hardware for the row
25SW Field Level v.s. HW Row Level
- Experimented on TPC-H Q1
- Software Field Level Only one field is encrypted
- Hardware Row Level All fields are encrypted
26Hardware - Page Level Encryption
- Page level encryption is simulated
- It gives significant improvement due to reduction
in start-up cost
27Conclusion
- Database as a Service is a new model to
alleviates the need to - hire professionals
- purchase expensive hardware/software
- deal with administrative and maintenance tasks
- It is viable model and can emerge as a successful
offering - Encryption is a solution for privacy -the most
important issue- - Hardware encryption has a clear superiority over
software - Hardware makes encryption practical for databases
- There are trade-offs for granularity of data