Research in Databases - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Research in Databases

Description:

Directory Services. LDAP (lightweight directory access protocol) ... Cell phone directory a cache for global phone directory. Data Characteristics. Read-mostly ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 30
Provided by: Vrb
Category:

less

Transcript and Presenter's Notes

Title: Research in Databases


1
Research in Databases
2
Research in Databases
  • Not just using Oracle
  • Not just relational databases anymore
  • There are new frontiers for databases
  • Wireless networks
  • Data Grids
  • The web (XML)

3
Research in Databases
  • So what is it?
  • Taking traditional techniques
  • ACID properties, indexing, concurrency control,
    security
  • and extending them for new environments
  • data Grids, mobile databases, web databases
  • Strategy oriented
  • Design new algorithms (Not like CS601)
  • Sometimes it is just a tweak to an existing
    strategy
  • Show yours is better by comparing to existing
    ones

4
Research in Databases
  • Designing new strategies (algorithms) so can
    access data more
  • Efficiently (access time)
  • Varied (semi-structured data)
  • Green (power aware)

5
Examples
  • Data intensive computing Data Grids
  • Data replication, query processing
  • Distributed databases mobile databases
  • How to process a query, replicate/update data
  • Management of heterogeneous and semi-structured
    data (XML)
  • Storage, query processing, indexing

6
DB Research in General
  • Pick a topic
  • Study what has been done
  • Propose new strategy, improve existing one
  • Run simulation to compare to existing strategies
  • Or, implement on an existing system (our cluster)
  • May have some theoretical results as well

7
Beyond Relational Databases by M. Seltzer
  • Changes in computing devices
  • Small, highly mobile
  • PDAs, laptops, palmtops, mobile telephony
    handsets
  • Powerful platforms to deliver new applications
    and services
  • New computing and network elements needed to
    support the infrastructure

8
Current environment
  • Services now available
  • Text and multimedia messaging
  • Location based searching
  • Multiplayer games
  • Need data storage and retrieval functions
  • Move messages reliably
  • Map physical to logical, location based
    information
  • Record/share current state, deliver in real-time
  • Storage and retrieval needed
  • Data management needed using relational?

9
Relational History
  • IBM and Berkeley in the 1970s
  • Reaction to complex DB systems
  • Existing DBs, no program independence
  • Reorganizing or additional new data problem
  • Relational hid physical, used logical
  • Relational used declarative language
  • Programmers no longer had to optimize
  • No longer had to rewrite code when layout changed

10
Relational History
  • In 1980s RDBMS vendors arose
  • Relational tremendously successful
  • Functionality of RDBs increased over years for
    niche markets
  • Applications use a decreasing fraction of feature
    set
  • Increased complexity
  • Price associated with complexity

11
New DB Frontier
  • Vendors try to convince all that RDBMS answer to
    all
  • Rethinking DBMS architecture
  • DBMS need to be more modular simple
    component-based blocks
  • One size no longer fits all
  • Different data management strategies needed

12
Data Warehousing
  • Every customer transaction recorded
  • Data Characteristics
  • Read-mostly
  • Updated by appending
  • Operations
  • Huge tables
  • Query only a few columns
  • Scan tables sorted in different ways

13
Directory Services
  • LDAP (lightweight directory access protocol)
  • Fast lookup of hierarchically arranged data
  • Data Characteristics
  • Read-mostly
  • Queries
  • single-row
  • lookups based on attribute values
  • Relational inefficient because of multivalued
    attributes

14
Web Search
  • Data semistructured (HTML)
  • Data Characteristics
  • Read-mostly
  • bulk updates
  • Queries
  • Keyword lookups resulting in sorted list of
    possibles
  • Typical solutions
  • Inverted indices
  • Parallelized index and lookup implementations

15
Mobile Device Caching
  • Small mobile devices requires caching needed
    portions
  • Cell phone directory a cache for global phone
    directory
  • Data Characteristics
  • Read-mostly
  • transitory

16
XML Management
  • More online transactions exchanging XML-encoded
    documents
  • Current solution
  • Convert/store into RDBMS
  • Convert again when use them
  • Inefficient?
  • Native XML stores with Xquery and Xpath
  • Data Characteristics
  • Read-only

17
Stream Processing
  • Data filtering instead of data management
  • Filter stream for hotly traded stock
  • Filters look like SQL
  • Data Characteristics
  • Real-time stream
  • Not persistently stored
  • Still use SQL but different management system
  • RDBMs for dynamic queries, static data but stream
    processing the opposite

18
Solutions
  • RDBS
  • Ad hoc queries, write traffic, strong
    transactional and integrity guarantees
  • Do we need transactional guarantees?
  • Read-mostly
  • Single solution?
  • Every application build own storage service
  • Provide DM options under SQL
  • Configurable storage engine
  • Not a good solution!

19
Solutions
  • Another solution provide many management
    options, each for a particular application class
  • Approach being used in relational market
  • SQL used to hid different capabilities for DW,
    etc.
  • Better solution produce storage engine that is
    configurable

20
Configurability and Modularity
  • Developers must
  • Understand configuration options
  • Integrate into component into product designed
  • Handful of systems useful for many application
    classes
  • Also need modularity
  • Exclude major subsystems so no increase in
    complexity or cost
  • Must run on different platforms
  • Configure to specific hardware and OS

21
Modularity
  • Architectural mechanism
  • Build DM capability out of small, simple reusable
    components
  • Query capabilities available at different levels
    of sophistication
  • Indexing/updating/selection
  • Select-project-join
  • Aggregates

22
Modularity
  • Modularity also for
  • Concurrency control
  • Transactions
  • Logging
  • Concurrency control
  • Applications single-threaded, no locking
  • Table level locks
  • High grained locking

23
Modularity
  • Transactions
  • Checkpoints (savepoints)
  • 2-phase commit
  • Nested transactions
  • Logging
  • No logging
  • Logging
  • Expand so used for auditing
  • Availability
  • High availability configuration
  • Use with heartbeat protocols, etc.

24
Modularity
  • Well-defined, clean exposed interfaces allow for
    extensibility
  • Given transactions manager, lock manager, log
    manager
  • Incorporate electrical control over chips into
    transactions, e.g. power up interface card part
    of transaction

25
Configurability
  • Runtime mechanism
  • How well a system can match its environment and
    application needs
  • Hardware, OS, software architecture, natural data
    format

26
Configurability
  • Variability in storage technologies challenge for
    DB engine
  • Work on magnetic storage, flash drives with
    constraints, no persistent storage (in memory)
  • Same transactional component should allow
    programmer to control data persistence

27
Configurability
  • DBS portable to special-purpose hardware devices
    and OS
  • DB must accommodate architectural choices
  • 1 thread
  • multiple threads in single process, etc.
  • Accommodate different network protocols
  • Indexing
  • Control over primary key selection
  • Ignore clustering issues if not persistent
  • Allow application specific indexing mechanism

28
Configurability
  • Permit flexible internal structure of data items
  • SQL, Xpath, Xquery, LDAP, etc.
  • Programmer select format most natural for
    greatest benefit

29
Conclusions
  • Need new-style databases to solve new-style
    problems
  • Need to recognize there are options in data
    management
  • Use right tool for simplicity, robustness,
    efficiency
Write a Comment
User Comments (0)
About PowerShow.com