Title: Rajkumar Sen
1An open source DBMS for handheld devices
2Outline
- Introduction and Motivation
- System Architecture
- Storage Model and Index
- Query Processing
- Data Synchronization
- Transaction Management
- Summary and Proposed Future Work
3Introduction and Motivation
- What is a handheld device
- A small computer with limited resources
- e.g. Simputer, Palm devices, iPAQ etc.
- The Simputer
- One of the most powerful handhelds
- Low cost and shareable
- Developed at IISc
- Intel StrongArm processor
- Flash Memory for stable storage (24MB)
- Limited main memory (32MB)
- Recognizes Smartcards
- Linux operating system
4Introduction and Motivation
- Why DBMS for a Simputer
- Increasing number of applications
- e.g. Microbanking, E-governance, Agricultural
market pricing etc. - They deal with a fair amount of data
- Complex queries involving joins and aggregates
- Atomicity and Durability for data consistency
- Ease of application development
- Need for Synchronization
- Data from remote server downloaded on the
Simputer - Updates at both places
- Common data needs to be synchronized
- Open source since Simputer is designed to be
low-cost
5Introduction and Motivation
- No open source DBMS designed for handheld devices
- Oracle Lite and DB2 Everyplace are not open
source - MySQL and BerkeleyDB primarily designed for
disk-based systems - Data in Flash Memory
- Access time less than disk
- Random access as good as sequential access
- System Constraints
- Limited storage capacity
- Less main memory
- Slow writes in Flash Memory
6Introduction and Motivation
- Goal is to design a system that consists of
-
- A lightweight relational DBMS that resides on the
Simputer - An application that downloads data from a remote
server - A synchronization tool
- Database Module Toolkit
- Developing the building blocks as modules
- Modules can be plugged depending on the type of
application
7System Architecture
- Key components of the system
- Data and Metadata Manager
- Representation of relational data and data
structures for metadata - Main Memory Manager
- Manages data to be kept in memory
- Transaction Manager
- Ensures data consistency
- Access Right Manager
- Security rules needed as Simputer is shareable
- Query Engine
- SQL compiler and query processing
- FetchRelation Module
- Fetches data from a remote server
- Synchronization Module
- Synchronizes common data in Simputer and
remote server
8System Architecture
Figure System Architecture
9Storage Model and Indexing
- Aim at compactness in representation of data as
well as index - Existing models
- Flat Storage
- Tuples are stored sequentially. Ensures access
locality but consumes - space. Access locality not an issue since data in
flash memory - Pointer-based Domain Storage
- Eliminates duplicates
- Values partitioned into domains which are sets of
unique values - Tuples reference the attribute value by means of
pointers - One domain shared among multiple attributes
- No index
- Pointer-based Ring Storage (Bobineau et al)
- Modification to Domain Storage
- Uses domain structure as index
- Tuple-to-tuple pointers connect two tuples that
have same attribute value - Index structure in the form of a ring
10Storage Model and Indexing
- Ring Storage Model contd.
- Advantages
- Addresses data and index compactness
- Join becomes easy
- Drawbacks
- For projecting value of an attribute, need to
traverse half of the ring - Some other Models
- DBGraph (Pucheral et al)
- Maintains value-to-tuple as well as
tuple-to-value pointers. - Too many pointers
- Domain Tree (Missikov et al)
- Maintains Domain trees, for simple
projection need to scan - all the domain trees.
- Another model (Krithi et al ) suggested for data
warehouses - uses projection index and join index
11Storage Model and Indexing
- Our Model
- Based on Domain Storage Model
- Each domain element is a (length, value) tuple
- Domains used for duplicates and variable width
attributes - When primary key-foreign key relation, instead of
storing value in child table we store pointer to
the tuple in the parent table which contains the
value -
Figure Our storage model
12Storage Model and Indexing
-
- Index structure
- Sorted array of Tuple Identifier List (pointers
to tuples) - Each List corresponds to a value, consists of
pointers to those tuples that share it - Ordering preserved in the index structure, hence
efficient execution of range queries -
Figure Our index model
13Storage Model and Indexing
- Our model eliminates the problems by storing
some additional pointers - Data storage separate from index structure, hence
projection is easy - Index created on attribute rather than domains
- For primary key-foreign key relation, join
requires scanning of only child table - For domain based join attributes, join indices
can be built -
Figure Join index
14Query Processing
- Entire data is memory resident
- Aim to reduce number of tuples read from memory
- Join without indices
- Assuming R and S are the outer and inner
relations with n and m tuples - Nested loop join
- Most reasonable choice when very less memory
available. - No additional structure
- Cost of join is mn
- Hash Join
- Requires the creation of hash partitions in main
memory. - Memory permitted, hash join is faster
- Instead of storing values in the
partitions, we store pointers to tuples -
- Cost (m n ) //for building hash
partitions - (m n) //for reading the hash
partitions
15Query Processing
- Sort-Merge Join
- Need memory for sorting
- While sorting, do not store the
tuples but only pointers to the tuples - Cost ( m log m n log n) //for
sorting - ( m n)
// for reading to merge - (m n) writes to memory and (m
n) pointer chases - Join with Indices
- Indexed Nested loop join
- Relation on which index is available is
treated as the inner relation -
- Cost n //for reading the
outer relation R - n log m //for each tuple of
R, index into S
16Query Processing
- Sort-Merge Join
- If indices available on both join
attributes, sorting not needed - Cost m n //for merging
-
- Some cost involved in
traversing the index and chasing pointers - Join Indices
- A Single traversal of the index and
chasing pointers gives the tuples - which are candidates for joining
- Other operations
- Selection, Projection and Sorting are
straightforward. - Duplicate elimination, aggregation
,union, intersection, and difference - can be computed by sorting and building
hash indices on relations -
17Query Processing
- Query Plan Generation and Memory Allocation
- An optimal query execution plan is needed
- Reduce materialization, left-deep tree and bushy
trees are ruled out - Choose from right-deep and extreme right-deep
tree depending on - memory available for storing intermediate
results - Query optimizer has to be memory cognizant
- Memory must be optimally allocated among all
operators since we - cannot assume entire memory is available for each
operator - Arvind et. al. suggests memory allocation to
operators based on cost - functions
- Aggregation, sorting and duplicate removal
generally performed on - materialized results.
- By enforcing a particular tuple arrival order at
the leaf of the plan tree, - pipelining of aggregation etc. is possible
18Synchronization
- Main data resides on a remote server
- Information in the form of relations in a remote
DBMS - Relevant information is downloaded on the
Simputer - Copies of data present on both Simputer and
remote server - Need for synchronization
- Two scenarios in data synchronization
- 1. Relation downloaded on the Simputer
- Download the entire relation R only once
- Subsequently, transfer only the differentials
- Rsimpnew Rsimpold U ?Rmain
- Rmainnew Rmainold U ?Rsimp
- Problem arises when the same tuple is
updated at both places. - Tuple Conflict
- Detection harder than resolution
19Synchronization
- 2.Resultset downloaded on the Simputer
- Execute the query at the remote server and
transfer only once - Subsequently transfer only the differentials
- q (R S)
-
- qnew qold U (ir S)
- qnew qold - (dr S)
- For each query, find out what differential to
compute. - Resembles the Incremental View Maintenance
Problem - In-place updates on the resultset in the Simputer
means changes - have to be propagated to the remote database
- Resembles the View Update Problem
- Updates to downloaded relations and resultsets
recorded in - Intention List and Intention View List
respectively. - Determine whether the changes can be reflected
20Synchronization
- View Update Considerations
- Only legal view updates
- Only legal database updates
- Only underlying tuples affected
- Insertion of extra tuples
- Modification of other view tuples
- Keller suggests translations for inserts,
deletes, and modifications - - Restricts the relations to BCNF
- - Joins should have an inclusion dependency
- - Our remote server relations need not
satisfy - Langerak suggests translations based on extension
joins - - No restriction on normal forms and joins
- - Can be the basis of our synchronization
engine - Most of the rules need data and metadata
about the base relations - - Remote database connection needed to
validate updates
21Transaction Management
- ACID properties need to be maintained
- Global atomicity
- - data updates in both Simputer and
smartcard. - - Simputer participates in
distributed transactions - Simputer a single user system, why Concurrency
Control - - Transactions on behalf of data
synchronization - - Long transaction doing some
aggregation - Concurrency Control
- Most of the transactions expected to complete
fast - Lock contention reduces, hence large locking
granules - Korth et al proposes a serial protocol . Main aim
is to improve - response time rather than throughput
- Instead of hash tables, bits can be used for lock
information
22Transaction Management
- Commit processing
- Shadow updates
- Data locality not an issue since data in memory
- Not suitable for pointer-based storage models
- If one domain value changes, its location also
changes - and all tuple pointers need to be updated
- Size of the shadow is also a concern
- Log-based updates
- Better suited for pointer-based models
- Cost of maintaining the logs
- Pointer-based logging instead of value-based
logging - Limited memory, hence Steal buffer replacement
policy. - Dirty blocks output to smartcard stable storage
when connected - If all dirty blocks of a transaction output to
smartcard , no Undo - More detailed survey left as future work
23Summary and Proposed Future Work
- Discussion and solutions proposed can be
generalized for any - handheld device.
- However, device specific optimizations possible
- Future Work
- More detailed survey of concurrency control and
recovery - Protocols for validation of updates on downloaded
data - Query engine needs to be designed
- Access Rights Management issues need to be
addressed - Database module toolkit
- Implement the system on the Simputer
- Evaluate the system
24Thank You
25References
- A. Ammann, M. Hanrahan, and R. Krishnamurthy.
Design of a Memory Resident DBMS. In IEEE
COMPCON, 1985. - 2. C. Bobineau, L. Bouganim, P. Pucheral, and P.
Valduriez. PicoDBMS Scaling down Database
Techniques for the Smartcard. In VLDB, 2000. - 3. Stephen Blott and Henry F. Korth. An Almost
Serial Protocol for Transaction Execution in Main
Memory Database Systems. In VLDB, 2002. - 4. DB2 Everyplace. http//www.ibm.com/software/
data/db2/everyplace. - 5. Anindya Datta, Debra VanderMeer, Krithi
Ramamritham, and Bongki Moon. Applying Parallel
Processing Techniques in Data Warehousing and
OLAP. In VLDB, 1999. - 6. A. Hulgeri, S. Sudarshan, and S. Seshadri.
Memory Cognizant Query Optimization. In Advances
In Data Management, 2000.
26References
- 7. Arthur M. Keller. Algorithms for Translating
View Updates to - Database Updates for Views Involving
Selections, Projections and - Joins. In ACM PODS, 1985.
- 8. Rom Langerak. View Updates in Relational
Databases with an - Independent Scheme. In ACM PODS, 1990.
- T. Lehmann and M. Carey. A Study of Index
Structures for Main - Memory DBMS. In VLDB, 1986.
- 10. M. Missikov and M. Scholl. Relational Queries
in a Domain Based - DBMS. In ACM SIGMOD, 1983.
- Mysql. http//www.mysql.com.
- 12. P. Pucheral, P. Valduriez, and J.M.Thevenin.
EÆcient Main - Memory Data Management using the DBGraph
Storage Model. In - VLDB, 1990.
- 13. The Simputer. http//www.simputer.org.
27Storage Model and Indexing
- Aim at compactness in representation of data as
well as index - Existing models
- Flat Storage
- Tuples are stored sequentially. Ensures access
locality but consumes - space. Access locality not an issue since data in
flash memory - Pointer-based Domain Storage
- Eliminates duplicates
- Values partitioned into domains which are sets of
unique value - Tuples reference the attribute value by means of
pointers - One domain shared among multiple attributes
- No index
Figure Domain Storage
28Storage Model and Indexing
- Pointer-based Ring Storage (Bobineau et al)
- Modification to Domain Storage
- Uses domain structure as index
- Tuple-to-tuple pointers connect two tuples that
have same attribute value - Index structure in the form of a ring
-
-
Figure Ring Storage