Peter Boncz - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Peter Boncz

Description:

Title: Slide 1 Author: CWI Last modified by: Peter en Cecilia Created Date: 6/26/2006 3:27:58 PM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 28
Provided by: CWI87
Category:

less

Transcript and Presenter's Notes

Title: Peter Boncz


1
everything you always wanted to know
about Updates in MonetDB/XQuery but were afraid
to ask
Peter Boncz (CWI) Sjoerd Mullender update
actions Jens Teubner XQUF parsing Niels
Nes logging Stefan Manegold the rest
2
Overview
  • XQuery Update Facility (XQUF)
  • semantics the update tape
  • Updatable XML storage in BATs
  • maintaining order in an array without O(N) cost
  • Snapshot Isolation
  • why we want it, how we got it
  • Concurrency Control
  • optimistic, with abort convoys
  • Durability
  • physical logging
  • Conclusion Future Challenges

3
XQuery Update Facility (XUF)
  • January 2006, first proposal
  • Internal primitives
  • updinsertBeforeupdinsertAfterupdinsertIntoup
    dinsertIntoAsLastupdinsertAttributesupddelete
    updreplaceValueupdrename
  • Pending update list concept
  • updapplyUpdates

4
Example
  • insert
  • ltitem id"id"gt
  • ltlocationgtBrazillt/locationgt
  • ltquantitygt200lt/quantitygt
  • ltnamegtXML in a nutshelllt/namegt
  • ltpaymentgtCredit Card, Personal checklt/paymentgt
  • ltshippinggtWill ship internationallylt/shippinggt
  • ltincategory category"category1"/gt
  • lt/itemgt
  • as last into
  • fndoc("xmark.xml")/site/regions/samerica

5
Semantics
  • let root doc(foo.xml)
  • for i in (1,2,3)
  • return
  • do insert ltxgtilt/xgt as first into root),
  • do insert ltygtilt/ygt as first into
    root))

6
Semantics
  • let root doc(foo.xml)
  • for i in (1,2,3)
  • return
  • (do insert ltxgtilt/xgt as first into root),
  • do insert ltygtilt/ygt as first into
    root))
  • ?
  • We need to
  • define an execution order, and
  • enforce it

7
The Update Tape
  • update sequence ( int, node, node/str,
    node/str)
  • fndelete() ? (DELETE, node, nil, nil)
  • fninsert_() ? (INSERT, tgt-node, tgt-level,
    expr-node)
  • fnset-attr() ? (ATTR, node, qn, val)
  • fnunset-attr() ? (ATTR, node, qn, nil)
  • fnset-text() ? (TEXT, node, val, nil)
  • fnset-pi() ? (PI, node, ins-val, arg-val)
  • fnset-comment() ? (COMMENT, node, val, nil)

( element construction ), that combines updates,
will enforce the correct order of the update
tape. Pathfinder compiler automatically inserts
call to fnupdate(item) on the result of all
update queries
8
XPath Accellerator SIGMOD02
ltagt ltbgt ltcgt
ltd/gt lte/gt lt/cgt
lt/bgt ltfgt ltg/gt lthgt
lti/gt ltj/gt
lt/hgt lt/fgt lt/agt
pre post
a 0 9
b 1 3
c 2 2
d 3 0
e 4 1
f 5 8
g 6 4
h 7 7
i 8 5
j 9 6
Node-based relational encoding of XQuery's data
model
9
XML Storage Revisited
pre size level
0 9 0
1 3 1
2 2 2
3 0 3
4 0 3
5 4 1
6 0 2
7 2 2
8 0 3
9 0 3
pre post
a 0 9
b 1 3
c 2 2
d 3 0
e 4 1
f 5 8
g 6 4
h 7 7
i 8 5
j 9 6
post pre size - level
10
Updates Mission Impossible?
SIZE I
ltagt ltbgt ltcgt
ltd/gt lte/gt lt/cgt
lt/bgt ltfgt ltg/gt lthgt
lti/gt ltj/gt
lt/hgt lt/fgt lt/agt
pre post
a 0 9
b 1 3
c 2 2
d 3 0
e 4 1
f 5 8
g 6 4
h 7 7
i 8 5
j 9 6
PRE I
INSERT SUBTREE
size(following) O(N) ? killer (?)
11
XML Storage Revisited
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 2 2 N2
5 0 3 N3
6 0 3 N4
7 4 1 N5
8 0 2 N6
9 2 2 N7
10 0 3 N8
11 0 3 N9
pre size level
0 9 0
1 3 1
2 2 2
3 0 3
4 0 3
5 4 1
6 0 2
7 2 2
8 0 3
9 0 3
pre size level
0 11 0
1 5 1
2 -1 null
3 null null
4 2 2
5 0 3
6 0 3
7 4 1
8 0 2
9 2 2
10 0 3
11 0 3
pre post
a 0 9
b 1 3
c 2 2
d 3 0
e 4 1
f 5 8
g 6 4
h 7 7
i 8 5
j 9 6
post pre size - level
Allow holes
Define logical pages
12
XML Storage Revisited
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
8 2 2 N2
9 0 3 N3
10 0 3 N4
11 4 1 N5
pre size level
0 9 0
1 3 1
2 2 2
3 0 3
4 0 3
5 4 1
6 0 2
7 2 2
8 0 3
9 0 3
pre size level
0 11 0
1 5 1
2 -1 null
3 null null
4 2 2
5 0 3
6 0 3
7 4 1
8 0 2
9 2 2
10 0 3
11 0 3
pre post
a 0 9
b 1 3
c 2 2
d 3 0
e 4 1
f 5 8
g 6 4
h 7 7
i 8 5
j 9 6
post pre size - level
Allow holes
Define logical pages
page map
0 0
1 2
2 1
13
XML Storage Revisited
  • Update-friendly
  • rid-table is append-only
  • rid-tuples may be unused
  • rid autoincrement column

rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
8 2 2 N2
9 0 3 N3
10 0 3 N4
11 4 1 N5
  • MonetDB
  • rid not stored but computed
    (virtual oid)
  • allows positional lookup/join
  • Not stored ? no need to update it either

14
XML Storage Revisited
pre size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 2 2 N2
5 0 3 N3
6 0 3 N4
7 4 1 N5
8 0 2 N6
9 2 2 N7
10 0 3 N8
11 0 2 N9
  • Update-friendly
  • rid-table is append-only
  • rid-tuples may be unused
  • rid autoincrement column

rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
8 2 2 N2
9 0 3 N3
10 0 3 N4
11 4 1 N5
  • Updatable document collection
  • pfadd-doc(URI, docname, percgt0)
  • pfadd-doc(URI, docname, collname, percgt0)
  • pre nid.leftfetchjoin(nid_rid).swizzle(map_pid)
  • Read-only document collection
  • pfadd-doc(URI, docname, 0)
  • pfadd-doc(URI, docname, collname, 0)
  • NID RID PRE
  • pre nid.leftfetchjoin(nid_rid).swizzle(map_pid)
    FREE!!

15
Snapshot Isolation
  • Versus 2-phase locking (2PL) full
    serializability
  • Why not 2PL XML
  • lock semantics much more complex than in
    relational case (order matters!!)
  • node-level locking in staircase join?? (now 10
    cycles/node)

16
Snapshot Isolation
17
Snapshot Isolation
  • Versus 2-phase locking (2PL) full
    serializability
  • Why not 2PL XML
  • lock semantics much more complex than in
    relational case (order matters!!)
  • node-level locking in staircase join?? (now 10
    cycles/node)
  • Why Snapshot Isolation
  • great for read-queries, great for ll_scj (runs
    unmodified)
  • quite strong. Better than repeatable read.
    Oracle/Postgres do it.
  • Problem with Snapshot Isolation
  • in XQuery, it is unknown at compile-time what to
    snapshot (fndoc(..))

18
Snapshot Isolation
  • Read Query1 Read Query 2
    Update Query

rid size level Nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
  • Isolation By Shadow Paging (copy-on-write mmap)
  • rid/pre delete/insert attr-replace
  • Touch one byte per physical page addr addr
  • MMU traps, OS replaces page by a copy
  • we would like to replace the master copy once,
    not all client copies

19
Snapshot Isolation
  • Read Query1 Read Query 2
    Update Query

rid size level Nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
Isolate-page
  • Isolation By Shadow Paging (copy-on-write mmap)
  • rid/pre delete/insert attr-replace
  • Touch one byte per physical page addr addr
  • MMU traps, OS replaces page by a copy
  • we would like to replace the master copy once,
    not all client copies

20
Snapshot Isolation
  • Read Query1 Read Query 2
    Update Query

rid size level Nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
Isolate-page
  • Isolation By Shadow Paging (copy-on-write mmap)
  • rid/pre delete/insert attr-replace
  • Touch one byte per physical page addr addr
  • MMU traps, OS replaces page by a copy

21
Snapshot Isolation
  • Read Query1 Read Query 2
    Update Query

rid size level Nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
Master-update
  • Isolation By Shadow Paging (copy-on-write mmap)
  • rid/pre delete/insert attr-replace
  • Touch one byte per physical page addr addr
  • MMU traps, OS replaces page by a copy
  • we would like to replace the master copy once,
    not all client copies

22
Durability
  • Masters become dirty
  • no time to flush them during query
  • log all changes to a WAL
  • log all tuples that changed entire pages
  • Recovery
  • after a crash, we do not know whether dirty
    pages got saved
  • solution overwrite tables with values from the
    WAL
  • Checkpointing Thread
  • every 5 minutes, if many changes occurred,
    checkpoint
  • memory mapped bats are sync()-ed ? ony dirty
    pages get written
  • checkpoint locks collection, halts query
    processing

rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
23
Durability
  • Masters become dirty
  • no time to flush them during query
  • log all changes to a WAL
  • log all tuples that changed entire pages
  • Recovery
  • after a crash, we do not know whether dirty
    pages got saved
  • solution overwrite tables with values from the
    WAL
  • Checkpointing Thread
  • every 5 minutes, if many changes occurred,
    checkpoint
  • memory mapped bats are sync()-ed ? ony dirty
    pages get written
  • checkpoint locks collection, halts query
    processing

rid size level nid
0 11 0 N0
1 5 1 N1
2 -1 null null
3 0 null null
4 0 2 N6
5 2 2 N7
6 0 3 N8
7 0 3 N9
24
The Update Sequence
  • Execute Query
  • build update tape
  • queries get isolated copies of a document (VM
    copy-on-write mmap)
  • Prepare Intensional Updates
  • execute update tape.
  • does not modify masters (except append-only
    tables)
  • Commit Phase (locked phase per doc-collection)
  • precommit
  • detect conflicts (not the size-ancestors)
  • write WAL (globally locked)
  • read master-size-ancestors, use delta, log
    result
  • update master tables
  • isolate first! Only then update masters.
  • update index structures

25
Many more Issues Solved
  • Indexing and Updates
  • Runtime QN ? NID mapping, with hash table
  • read-only not a hash, but keep sorted
    persistent
  • keep INS DEL deltas to commit without changing
    the hash table
  • Runtime NID ? ATTR hash table
  • isolation loses you MonetDB dynamic hash table
    reuse
  • share an old copy, exploit append-mostly

Concurrency Updates ? Checkpoint Shredding ?
Query Shredding ? Updates
  • Conflicting Updates
  • detect conflicting queries
  • look at RID page numbers and attr-IDs
  • reacting to conflicts
  • abort query automatic restart
  • run CONVOY of 5 next update queries serially
  • ACID properties on the Meta Level
  • Shredding a new doc into a collection ? Query
  • Shredding a new doc into a collection ? Update
  • Using a collection ? Deleting/adding documents
  • Meta Querying ? Deleting/adding documents
  • Allocating New Pages and NIDS
  • Offload shredding interference with freelist
  • Unlocked access to private pages

26
Snapshot Isolation
  • Versus 2-phase locking (2PL) full
    serializability
  • Why not 2PL XML
  • lock semantics much more complex than in
    relational case (order matters!!)
  • node-level locking in staircase join?? (now 10
    cycles/node)
  • Why Snapshot Isolation
  • great for read-queries, great for ll_scj (runs
    unmodified)
  • quite strong. Better than repeatable read.
    Oracle/Postgres do it.
  • Problem with Snapshot Isolation
  • in XQuery, it is unknown at compile-time what to
    snapshot (fndoc(..))

2PL () 375 transactions/5 minutes 1.2
transaction/sec
27
Conclusions
  • It works! Reasonable/good performance!
  • transaction mgmt as a module extension outside a
    kernel works
  • identified VM primitives that databases really
    need
  • Future work
  • Test on XML update benchmark TPOX (DB2 700
    trans/second)
  • Packed Memory Arrays alternative for page
    remapping?
  • page remapping is technically O(N)
  • Engineering
  • support for value-indexing (does PF support it
    already)
  • asynchronous WAL writing to boost throughput
  • port MIL to C primitives port C primitives to
    Monet5
Write a Comment
User Comments (0)
About PowerShow.com