Rajkumar Sen - PowerPoint PPT Presentation

About This Presentation
Title:

Rajkumar Sen

Description:

One of the most powerful handhelds. Low cost and shareable. Developed at IISc ... MySQL and BerkeleyDB primarily designed for disk-based systems. Data in Flash Memory ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 29
Provided by: Gue142
Category:

less

Transcript and Presenter's Notes

Title: Rajkumar Sen


1
An open source DBMS for handheld devices
  • by
  • Rajkumar Sen

2
Outline
  • Introduction and Motivation
  • System Architecture
  • Storage Model and Index
  • Query Processing
  • Data Synchronization
  • Transaction Management
  • Summary and Proposed Future Work

3
Introduction and Motivation
  • What is a handheld device
  • A small computer with limited resources
  • e.g. Simputer, Palm devices, iPAQ etc.
  • The Simputer
  • One of the most powerful handhelds
  • Low cost and shareable
  • Developed at IISc
  • Intel StrongArm processor
  • Flash Memory for stable storage (24MB)
  • Limited main memory (32MB)
  • Recognizes Smartcards
  • Linux operating system

4
Introduction and Motivation
  • Why DBMS for a Simputer
  • Increasing number of applications
  • e.g. Microbanking, E-governance, Agricultural
    market pricing etc.
  • They deal with a fair amount of data
  • Complex queries involving joins and aggregates
  • Atomicity and Durability for data consistency
  • Ease of application development
  • Need for Synchronization
  • Data from remote server downloaded on the
    Simputer
  • Updates at both places
  • Common data needs to be synchronized
  • Open source since Simputer is designed to be
    low-cost

5
Introduction and Motivation
  • No open source DBMS designed for handheld devices
  • Oracle Lite and DB2 Everyplace are not open
    source
  • MySQL and BerkeleyDB primarily designed for
    disk-based systems
  • Data in Flash Memory
  • Access time less than disk
  • Random access as good as sequential access
  • System Constraints
  • Limited storage capacity
  • Less main memory
  • Slow writes in Flash Memory

6
Introduction and Motivation
  • Goal is to design a system that consists of
  • A lightweight relational DBMS that resides on the
    Simputer
  • An application that downloads data from a remote
    server
  • A synchronization tool
  • Database Module Toolkit
  • Developing the building blocks as modules
  • Modules can be plugged depending on the type of
    application

7
System Architecture
  • Key components of the system
  • Data and Metadata Manager
  • Representation of relational data and data
    structures for metadata
  • Main Memory Manager
  • Manages data to be kept in memory
  • Transaction Manager
  • Ensures data consistency
  • Access Right Manager
  • Security rules needed as Simputer is shareable
  • Query Engine
  • SQL compiler and query processing
  • FetchRelation Module
  • Fetches data from a remote server
  • Synchronization Module
  • Synchronizes common data in Simputer and
    remote server

8
System Architecture
Figure System Architecture
9
Storage Model and Indexing
  • Aim at compactness in representation of data as
    well as index
  • Existing models
  • Flat Storage
  • Tuples are stored sequentially. Ensures access
    locality but consumes
  • space. Access locality not an issue since data in
    flash memory
  • Pointer-based Domain Storage
  • Eliminates duplicates
  • Values partitioned into domains which are sets of
    unique values
  • Tuples reference the attribute value by means of
    pointers
  • One domain shared among multiple attributes
  • No index
  • Pointer-based Ring Storage (Bobineau et al)
  • Modification to Domain Storage
  • Uses domain structure as index
  • Tuple-to-tuple pointers connect two tuples that
    have same attribute value
  • Index structure in the form of a ring

10
Storage Model and Indexing
  • Ring Storage Model contd.
  • Advantages
  • Addresses data and index compactness
  • Join becomes easy
  • Drawbacks
  • For projecting value of an attribute, need to
    traverse half of the ring
  • Some other Models
  • DBGraph (Pucheral et al)
  • Maintains value-to-tuple as well as
    tuple-to-value pointers.
  • Too many pointers
  • Domain Tree (Missikov et al)
  • Maintains Domain trees, for simple
    projection need to scan
  • all the domain trees.
  • Another model (Krithi et al ) suggested for data
    warehouses
  • uses projection index and join index

11
Storage Model and Indexing
  • Our Model
  • Based on Domain Storage Model
  • Each domain element is a (length, value) tuple
  • Domains used for duplicates and variable width
    attributes
  • When primary key-foreign key relation, instead of
    storing value in child table we store pointer to
    the tuple in the parent table which contains the
    value

Figure Our storage model
12
Storage Model and Indexing
  • Index structure
  • Sorted array of Tuple Identifier List (pointers
    to tuples)
  • Each List corresponds to a value, consists of
    pointers to those tuples that share it
  • Ordering preserved in the index structure, hence
    efficient execution of range queries

Figure Our index model
13
Storage Model and Indexing
  • Our model eliminates the problems by storing
    some additional pointers
  • Data storage separate from index structure, hence
    projection is easy
  • Index created on attribute rather than domains
  • For primary key-foreign key relation, join
    requires scanning of only child table
  • For domain based join attributes, join indices
    can be built

Figure Join index
14
Query Processing
  • Entire data is memory resident
  • Aim to reduce number of tuples read from memory
  • Join without indices
  • Assuming R and S are the outer and inner
    relations with n and m tuples
  • Nested loop join
  • Most reasonable choice when very less memory
    available.
  • No additional structure
  • Cost of join is mn
  • Hash Join
  • Requires the creation of hash partitions in main
    memory.
  • Memory permitted, hash join is faster
  • Instead of storing values in the
    partitions, we store pointers to tuples
  • Cost (m n ) //for building hash
    partitions
  • (m n) //for reading the hash
    partitions

15
Query Processing
  • Sort-Merge Join
  • Need memory for sorting
  • While sorting, do not store the
    tuples but only pointers to the tuples
  • Cost ( m log m n log n) //for
    sorting
  • ( m n)
    // for reading to merge
  • (m n) writes to memory and (m
    n) pointer chases
  • Join with Indices
  • Indexed Nested loop join
  • Relation on which index is available is
    treated as the inner relation
  • Cost n //for reading the
    outer relation R
  • n log m //for each tuple of
    R, index into S

16
Query Processing
  • Sort-Merge Join
  • If indices available on both join
    attributes, sorting not needed
  • Cost m n //for merging
  • Some cost involved in
    traversing the index and chasing pointers
  • Join Indices
  • A Single traversal of the index and
    chasing pointers gives the tuples
  • which are candidates for joining
  • Other operations
  • Selection, Projection and Sorting are
    straightforward.
  • Duplicate elimination, aggregation
    ,union, intersection, and difference
  • can be computed by sorting and building
    hash indices on relations

17
Query Processing
  • Query Plan Generation and Memory Allocation
  • An optimal query execution plan is needed
  • Reduce materialization, left-deep tree and bushy
    trees are ruled out
  • Choose from right-deep and extreme right-deep
    tree depending on
  • memory available for storing intermediate
    results
  • Query optimizer has to be memory cognizant
  • Memory must be optimally allocated among all
    operators since we
  • cannot assume entire memory is available for each
    operator
  • Arvind et. al. suggests memory allocation to
    operators based on cost
  • functions
  • Aggregation, sorting and duplicate removal
    generally performed on
  • materialized results.
  • By enforcing a particular tuple arrival order at
    the leaf of the plan tree,
  • pipelining of aggregation etc. is possible

18
Synchronization
  • Main data resides on a remote server
  • Information in the form of relations in a remote
    DBMS
  • Relevant information is downloaded on the
    Simputer
  • Copies of data present on both Simputer and
    remote server
  • Need for synchronization
  • Two scenarios in data synchronization
  • 1. Relation downloaded on the Simputer
  • Download the entire relation R only once
  • Subsequently, transfer only the differentials
  • Rsimpnew Rsimpold U ?Rmain
  • Rmainnew Rmainold U ?Rsimp
  • Problem arises when the same tuple is
    updated at both places.
  • Tuple Conflict
  • Detection harder than resolution

19
Synchronization
  • 2.Resultset downloaded on the Simputer
  • Execute the query at the remote server and
    transfer only once
  • Subsequently transfer only the differentials
  • q (R S)
  • qnew qold U (ir S)
  • qnew qold - (dr S)
  • For each query, find out what differential to
    compute.
  • Resembles the Incremental View Maintenance
    Problem
  • In-place updates on the resultset in the Simputer
    means changes
  • have to be propagated to the remote database
  • Resembles the View Update Problem
  • Updates to downloaded relations and resultsets
    recorded in
  • Intention List and Intention View List
    respectively.
  • Determine whether the changes can be reflected

20
Synchronization
  • View Update Considerations
  • Only legal view updates
  • Only legal database updates
  • Only underlying tuples affected
  • Insertion of extra tuples
  • Modification of other view tuples
  • Keller suggests translations for inserts,
    deletes, and modifications
  • - Restricts the relations to BCNF
  • - Joins should have an inclusion dependency
  • - Our remote server relations need not
    satisfy
  • Langerak suggests translations based on extension
    joins
  • - No restriction on normal forms and joins
  • - Can be the basis of our synchronization
    engine
  • Most of the rules need data and metadata
    about the base relations
  • - Remote database connection needed to
    validate updates

21
Transaction Management
  • ACID properties need to be maintained
  • Global atomicity
  • - data updates in both Simputer and
    smartcard.
  • - Simputer participates in
    distributed transactions
  • Simputer a single user system, why Concurrency
    Control
  • - Transactions on behalf of data
    synchronization
  • - Long transaction doing some
    aggregation
  • Concurrency Control
  • Most of the transactions expected to complete
    fast
  • Lock contention reduces, hence large locking
    granules
  • Korth et al proposes a serial protocol . Main aim
    is to improve
  • response time rather than throughput
  • Instead of hash tables, bits can be used for lock
    information

22
Transaction Management
  • Commit processing
  • Shadow updates
  • Data locality not an issue since data in memory
  • Not suitable for pointer-based storage models
  • If one domain value changes, its location also
    changes
  • and all tuple pointers need to be updated
  • Size of the shadow is also a concern
  • Log-based updates
  • Better suited for pointer-based models
  • Cost of maintaining the logs
  • Pointer-based logging instead of value-based
    logging
  • Limited memory, hence Steal buffer replacement
    policy.
  • Dirty blocks output to smartcard stable storage
    when connected
  • If all dirty blocks of a transaction output to
    smartcard , no Undo
  • More detailed survey left as future work

23
Summary and Proposed Future Work
  • Discussion and solutions proposed can be
    generalized for any
  • handheld device.
  • However, device specific optimizations possible
  • Future Work
  • More detailed survey of concurrency control and
    recovery
  • Protocols for validation of updates on downloaded
    data
  • Query engine needs to be designed
  • Access Rights Management issues need to be
    addressed
  • Database module toolkit
  • Implement the system on the Simputer
  • Evaluate the system

24
Thank You

25
References
  • A. Ammann, M. Hanrahan, and R. Krishnamurthy.
    Design of a Memory Resident DBMS. In IEEE
    COMPCON, 1985.
  • 2. C. Bobineau, L. Bouganim, P. Pucheral, and P.
    Valduriez. PicoDBMS Scaling down Database
    Techniques for the Smartcard. In VLDB, 2000.
  • 3. Stephen Blott and Henry F. Korth. An Almost
    Serial Protocol for Transaction Execution in Main
    Memory Database Systems. In VLDB, 2002.
  • 4. DB2 Everyplace. http//www.ibm.com/software/
    data/db2/everyplace.
  • 5. Anindya Datta, Debra VanderMeer, Krithi
    Ramamritham, and Bongki Moon. Applying Parallel
    Processing Techniques in Data Warehousing and
    OLAP. In VLDB, 1999.
  • 6. A. Hulgeri, S. Sudarshan, and S. Seshadri.
    Memory Cognizant Query Optimization. In Advances
    In Data Management, 2000.

26
References
  • 7. Arthur M. Keller. Algorithms for Translating
    View Updates to
  • Database Updates for Views Involving
    Selections, Projections and
  • Joins. In ACM PODS, 1985.
  • 8. Rom Langerak. View Updates in Relational
    Databases with an
  • Independent Scheme. In ACM PODS, 1990.
  • T. Lehmann and M. Carey. A Study of Index
    Structures for Main
  • Memory DBMS. In VLDB, 1986.
  • 10. M. Missikov and M. Scholl. Relational Queries
    in a Domain Based
  • DBMS. In ACM SIGMOD, 1983.
  • Mysql. http//www.mysql.com.
  • 12. P. Pucheral, P. Valduriez, and J.M.Thevenin.
    EÆcient Main
  • Memory Data Management using the DBGraph
    Storage Model. In
  • VLDB, 1990.
  • 13. The Simputer. http//www.simputer.org.

27
Storage Model and Indexing
  • Aim at compactness in representation of data as
    well as index
  • Existing models
  • Flat Storage
  • Tuples are stored sequentially. Ensures access
    locality but consumes
  • space. Access locality not an issue since data in
    flash memory
  • Pointer-based Domain Storage
  • Eliminates duplicates
  • Values partitioned into domains which are sets of
    unique value
  • Tuples reference the attribute value by means of
    pointers
  • One domain shared among multiple attributes
  • No index

Figure Domain Storage
28
Storage Model and Indexing
  • Pointer-based Ring Storage (Bobineau et al)
  • Modification to Domain Storage
  • Uses domain structure as index
  • Tuple-to-tuple pointers connect two tuples that
    have same attribute value
  • Index structure in the form of a ring

Figure Ring Storage
Write a Comment
User Comments (0)
About PowerShow.com