Rajkumar Sen - PowerPoint PPT Presentation

About This Presentation

Title:

Rajkumar Sen

Description:

One of the most powerful handhelds. Low cost and shareable. Developed at IISc ... MySQL and BerkeleyDB primarily designed for disk-based systems. Data in Flash Memory ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 29

Provided by: Gue142

Category:

more less

Transcript and Presenter's Notes

Title: Rajkumar Sen

1
An open source DBMS for handheld devices

by
Rajkumar Sen

2
Outline

Introduction and Motivation
System Architecture
Storage Model and Index
Query Processing
Data Synchronization
Transaction Management
Summary and Proposed Future Work

3
Introduction and Motivation

What is a handheld device
A small computer with limited resources
e.g. Simputer, Palm devices, iPAQ etc.
The Simputer
One of the most powerful handhelds
Low cost and shareable
Developed at IISc
Intel StrongArm processor
Flash Memory for stable storage (24MB)
Limited main memory (32MB)
Recognizes Smartcards
Linux operating system

4
Introduction and Motivation

Why DBMS for a Simputer
Increasing number of applications
e.g. Microbanking, E-governance, Agricultural
market pricing etc.
They deal with a fair amount of data
Complex queries involving joins and aggregates
Atomicity and Durability for data consistency
Ease of application development
Need for Synchronization
Data from remote server downloaded on the
Simputer
Updates at both places
Common data needs to be synchronized
Open source since Simputer is designed to be
low-cost

5
Introduction and Motivation

No open source DBMS designed for handheld devices
Oracle Lite and DB2 Everyplace are not open
source
MySQL and BerkeleyDB primarily designed for
disk-based systems
Data in Flash Memory
Access time less than disk
Random access as good as sequential access
System Constraints
Limited storage capacity
Less main memory
Slow writes in Flash Memory

6
Introduction and Motivation

Goal is to design a system that consists of
A lightweight relational DBMS that resides on the
Simputer
An application that downloads data from a remote
server
A synchronization tool
Database Module Toolkit
Developing the building blocks as modules
Modules can be plugged depending on the type of
application

7
System Architecture

Key components of the system
Data and Metadata Manager
Representation of relational data and data
structures for metadata
Main Memory Manager
Manages data to be kept in memory
Transaction Manager
Ensures data consistency
Access Right Manager
Security rules needed as Simputer is shareable
Query Engine
SQL compiler and query processing
FetchRelation Module
Fetches data from a remote server
Synchronization Module
Synchronizes common data in Simputer and
remote server

8
System Architecture
Figure System Architecture
9
Storage Model and Indexing

Aim at compactness in representation of data as
well as index
Existing models
Flat Storage
Tuples are stored sequentially. Ensures access
locality but consumes
space. Access locality not an issue since data in
flash memory
Pointer-based Domain Storage
Eliminates duplicates
Values partitioned into domains which are sets of
unique values
Tuples reference the attribute value by means of
pointers
One domain shared among multiple attributes
No index
Pointer-based Ring Storage (Bobineau et al)
Modification to Domain Storage
Uses domain structure as index
Tuple-to-tuple pointers connect two tuples that
have same attribute value
Index structure in the form of a ring

10
Storage Model and Indexing

Ring Storage Model contd.
Advantages
Addresses data and index compactness
Join becomes easy
Drawbacks
For projecting value of an attribute, need to
traverse half of the ring
Some other Models
DBGraph (Pucheral et al)
Maintains value-to-tuple as well as
tuple-to-value pointers.
Too many pointers
Domain Tree (Missikov et al)
Maintains Domain trees, for simple
projection need to scan
all the domain trees.
Another model (Krithi et al ) suggested for data
warehouses
uses projection index and join index

11
Storage Model and Indexing

Our Model
Based on Domain Storage Model
Each domain element is a (length, value) tuple
Domains used for duplicates and variable width
attributes
When primary key-foreign key relation, instead of
storing value in child table we store pointer to
the tuple in the parent table which contains the
value

Figure Our storage model
12
Storage Model and Indexing

Index structure
Sorted array of Tuple Identifier List (pointers
to tuples)
Each List corresponds to a value, consists of
pointers to those tuples that share it
Ordering preserved in the index structure, hence
efficient execution of range queries

Figure Our index model
13
Storage Model and Indexing

Our model eliminates the problems by storing
some additional pointers
Data storage separate from index structure, hence
projection is easy
Index created on attribute rather than domains
For primary key-foreign key relation, join
requires scanning of only child table
For domain based join attributes, join indices
can be built

Figure Join index
14
Query Processing

Entire data is memory resident
Aim to reduce number of tuples read from memory
Join without indices
Assuming R and S are the outer and inner
relations with n and m tuples
Nested loop join
Most reasonable choice when very less memory
available.
No additional structure
Cost of join is mn
Hash Join
Requires the creation of hash partitions in main
memory.
Memory permitted, hash join is faster
Instead of storing values in the
partitions, we store pointers to tuples
Cost (m n ) //for building hash
partitions
(m n) //for reading the hash
partitions

15
Query Processing

Sort-Merge Join
Need memory for sorting
While sorting, do not store the
tuples but only pointers to the tuples
Cost ( m log m n log n) //for
sorting
( m n)
// for reading to merge
(m n) writes to memory and (m
n) pointer chases
Join with Indices
Indexed Nested loop join
Relation on which index is available is
treated as the inner relation
Cost n //for reading the
outer relation R
n log m //for each tuple of
R, index into S

16
Query Processing

Sort-Merge Join
If indices available on both join
attributes, sorting not needed
Cost m n //for merging
Some cost involved in
traversing the index and chasing pointers
Join Indices
A Single traversal of the index and
chasing pointers gives the tuples
which are candidates for joining
Other operations
Selection, Projection and Sorting are
straightforward.
Duplicate elimination, aggregation
,union, intersection, and difference
can be computed by sorting and building
hash indices on relations

17
Query Processing

Query Plan Generation and Memory Allocation
An optimal query execution plan is needed
Reduce materialization, left-deep tree and bushy
trees are ruled out
Choose from right-deep and extreme right-deep
tree depending on
memory available for storing intermediate
results
Query optimizer has to be memory cognizant
Memory must be optimally allocated among all
operators since we
cannot assume entire memory is available for each
operator
Arvind et. al. suggests memory allocation to
operators based on cost
functions
Aggregation, sorting and duplicate removal
generally performed on
materialized results.
By enforcing a particular tuple arrival order at
the leaf of the plan tree,
pipelining of aggregation etc. is possible

18
Synchronization

Main data resides on a remote server
Information in the form of relations in a remote
DBMS
Relevant information is downloaded on the
Simputer
Copies of data present on both Simputer and
remote server
Need for synchronization
Two scenarios in data synchronization
1. Relation downloaded on the Simputer
Download the entire relation R only once
Subsequently, transfer only the differentials
Rsimpnew Rsimpold U ?Rmain
Rmainnew Rmainold U ?Rsimp
Problem arises when the same tuple is
updated at both places.
Tuple Conflict
Detection harder than resolution

19
Synchronization

2.Resultset downloaded on the Simputer
Execute the query at the remote server and
transfer only once
Subsequently transfer only the differentials
q (R S)
qnew qold U (ir S)
qnew qold - (dr S)
For each query, find out what differential to
compute.
Resembles the Incremental View Maintenance
Problem
In-place updates on the resultset in the Simputer
means changes
have to be propagated to the remote database
Resembles the View Update Problem
Updates to downloaded relations and resultsets
recorded in
Intention List and Intention View List
respectively.
Determine whether the changes can be reflected

20
Synchronization

View Update Considerations
Only legal view updates
Only legal database updates
Only underlying tuples affected
Insertion of extra tuples
Modification of other view tuples
Keller suggests translations for inserts,
deletes, and modifications
- Restricts the relations to BCNF
- Joins should have an inclusion dependency
- Our remote server relations need not
satisfy
Langerak suggests translations based on extension
joins
- No restriction on normal forms and joins
- Can be the basis of our synchronization
engine
Most of the rules need data and metadata
about the base relations
- Remote database connection needed to
validate updates

21
Transaction Management

ACID properties need to be maintained
Global atomicity
- data updates in both Simputer and
smartcard.
- Simputer participates in
distributed transactions
Simputer a single user system, why Concurrency
Control
- Transactions on behalf of data
synchronization
- Long transaction doing some
aggregation
Concurrency Control
Most of the transactions expected to complete
fast
Lock contention reduces, hence large locking
granules
Korth et al proposes a serial protocol . Main aim
is to improve
response time rather than throughput
Instead of hash tables, bits can be used for lock
information

22
Transaction Management

Commit processing
Shadow updates
Data locality not an issue since data in memory
Not suitable for pointer-based storage models
If one domain value changes, its location also
changes
and all tuple pointers need to be updated
Size of the shadow is also a concern
Log-based updates
Better suited for pointer-based models
Cost of maintaining the logs
Pointer-based logging instead of value-based
logging
Limited memory, hence Steal buffer replacement
policy.
Dirty blocks output to smartcard stable storage
when connected
If all dirty blocks of a transaction output to
smartcard , no Undo
More detailed survey left as future work

23
Summary and Proposed Future Work

Discussion and solutions proposed can be
generalized for any
handheld device.
However, device specific optimizations possible
Future Work
More detailed survey of concurrency control and
recovery
Protocols for validation of updates on downloaded
data
Query engine needs to be designed
Access Rights Management issues need to be
addressed
Database module toolkit
Implement the system on the Simputer
Evaluate the system

24
Thank You

25
References

A. Ammann, M. Hanrahan, and R. Krishnamurthy.
Design of a Memory Resident DBMS. In IEEE
COMPCON, 1985.
2. C. Bobineau, L. Bouganim, P. Pucheral, and P.
Valduriez. PicoDBMS Scaling down Database
Techniques for the Smartcard. In VLDB, 2000.
3. Stephen Blott and Henry F. Korth. An Almost
Serial Protocol for Transaction Execution in Main
Memory Database Systems. In VLDB, 2002.
4. DB2 Everyplace. http//www.ibm.com/software/
data/db2/everyplace.
5. Anindya Datta, Debra VanderMeer, Krithi
Ramamritham, and Bongki Moon. Applying Parallel
Processing Techniques in Data Warehousing and
OLAP. In VLDB, 1999.
6. A. Hulgeri, S. Sudarshan, and S. Seshadri.
Memory Cognizant Query Optimization. In Advances
In Data Management, 2000.