V Storage Manager

1 / 53
About This Presentation
Title:

V Storage Manager

Description:

Template design: Polly M., Silver Fox Productions, Inc. Formatter: Event Date: Event Location: Speech Length: Audience: Key Topics: – PowerPoint PPT presentation

Number of Views:6
Avg rating:3.0/5.0
Slides: 54
Provided by: VishalT8

less

Transcript and Presenter's Notes

Title: V Storage Manager


1
V Storage Manager
  • Shahram Ghandeharizadeh
  • Computer Science Department
  • University of Southern California

2
Simple Tests
  • Arbitrary
  • Designed to stress your software and ensure its
    robustness
  • Repeated creation of the same datazone name.
  • Deletion of records in iteration j when they were
    deleted in iteration (j-1)
  • Etc.
  • Designed to be scalable.
  • Your test can be implemented with a variable A
    that captures the number of records in alpha
    (10,000).
  • Value of all other important parameters can be a
    function of A, e.g., number of records for beta,
    number of iterations in Test 3.
  • Start with a small value for A, say 100. Once
    your tests are working, increase the value of A
    to 1,000. Once this works, increase A to 10,000.
  • Do not be surprised to find your code break when
    you increase the value of A from 100 to 1,000.
    This is the reality of developing robust
    software Size matters!

3
Suggestion
  • Focus on a single-threaded version of your
    implementation.
  • Once all tests are running, extend to analyze the
    impact of multi-threading.
  • This may require a re-visit of your designs.

4
Heap versus Stack
  • Execution of your program consists of two kinds
    of dynamic memory Heap and Stack.
  • Use of malloc and new allocates memory from Heap.
  • The programmer is responsible to free this memory
    and return it to heap.
  • Invocation of a method uses a stack. All
    variables declared in a method are placed on the
    stack. When the method returns, its stack is
    freed.

Code
static data
stack
heap
5
Heap versus Stack (Example)
  • In method Test 1, the character array named
    payload is declared on the stack when Test 1 is
    invoked.
  • Its memory is freed when Test 1 completes
    execution.
  • The programmer is NOT responsible fore managing
    the memory assigned to payload because it is a
    local variable managed using the stack.
  • Test 1 ()
  • char10000 payload
  • vdt vptr
  • vptr.set_data(payload)
  • .

Code
static data
stack
heap
6
Heap versus Stack (Example)
  • In method Test 1, the character array named
    payload is assigned memory from the heap (using
    new).
  • The variable payload is on the stack!
  • The memory pointed to by payload is allocated
    from the heap.
  • The programmer is responsible for freeing this
    memory using delete.
  • Test 1 ()
  • char payload
  • vdt vptr
  • payload new char1000
  • vptr.set_data(payload)
  • .

Code
static data
stack
heap
7
Urban Legends about Heap
  • The following is FALSE Memory allocated in
    method X can be freed only in method X. See the
    example as proof.
  • Cause Debugging C/C programs is difficult.
    It is easy to corrupt memory if you are not
    careful! These errors are difficult to find.
    They are also stressful, resulting in beliefs
    that are not true ? Urban legend is born.
  • How to avoid these kinds of conceptual traps?
    Write small programs to verify a belief that
    sounds too good to be true. It is simple and
    avoids digressions that waste your time and cause
    a lot of heart ache.
  • GenMemGenMem(Vdt v)
  • char cptr
  • cptr new char10
  • v-gtset_data(cptr)
  • memcpy(cptr, "Shahram", 7)
  • int _tmain(int argc, _TCHAR argv)
  • Vdt vptr
  • char cptr
  • GenMem GM new GenMem(vptr)
  • cptr (char ) vptr.get_data()
  • delete cptr

8
Variant Indexes by P. ONeil and D. Quass
  • Shahram Ghandeharizadeh
  • Computer Science Department
  • University of Southern California

9
Key Assumptions
  • A read-mostly database that is updated
    infrequently.
  • Complex indexes to speedup queries.
  • Focuses on physical designs to enhance
    performance.

10
Example Data Warehouse
  • McDonalds keeping track of different sandwich
    purchases.

SALES
TIME
Cid Pid Day Amt dollar_cost Unit_sales
PROD
Day Week Month Year Holliday Weekday
Pid Name Size Weight Package_type
11
Example Data Warehouse
  • Key Observations
  • A handful of products, a PROD table with tens of
    rows.
  • Many millions of rows for SALES tables.

SALES
TIME
Cid Pid Day Amt dollar_cost Unit_sales
PROD
Day Week Month Year Holliday Weekday
Pid Name Size Weight Package_type
12
A B-Tree On the Pid of Sales
  • Assuming McDonalds sales 12 different products

B-tree Leaf page
(Big Mac, (1,1), (1, 3), (1, 4), (2,4), .
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
13
A B-Tree On the Pid of Sales
  • Assuming McDonalds sales 12 different products

B-tree Leaf page
(Big Mac, (1,1), (1, 3), (1, 4), (2,4), .
What happens with a SALES table consisting of a
million rows?
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
14
A B-Tree on Major Holidays
  • A B-tree index on different holidays of the
    SALES table.

B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
15
A B-Tree on Major Holidays
  • A B-tree index on different holidays of the
    SALES table.

B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Value List
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
16
A B-Tree on Major Holidays
  • A B-tree index on different holidays of the
    SALES table.

B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Value List
RID List
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
17
Conjunctive Queries
  • Count number of Big Mac Sales on Presidents
    Day assuming a B-tree on product (pid) and day
    of SALES
  • With RID-Lists
  • Get the Value-List for Big Mac using the
    B-tree, obtain RID-List1.
  • Get the Value-List for Presidents Day using
    the B-tree, obtain RID-List2.
  • Compute set-intersect of RID-List1 and RID-List2
  • Count the number of RIDs in the intersection set.
  • Is there a better way?
  • Yes, use bit-maps and logical bit-wise operands.

18
Bitmap Indexes
  • Use a bitmap to represent the existence of a
    record with a certain attribute value.
  • Example If a record has the indexed attribute
    value Big Mac then its corresponding entry in
    the bitmap is set to one. Otherwise, it is a
    zero.

19
A Bitmap
  • A Bitmap B is defined on T as a sequence of M
    bits.
  • For each row r with row number j that has the
    property P, we set bit j in B to one all other
    bits are set to zero.
  • Assuming fix sized disk pages that hold p
    records, RID of record j is (j/p, jp). Page is
    j/p, slot number is jp.

Pres Day, 0100001100111111110000011001..
Record 0
20
A Bitmap
  • A Bitmap B is defined on T as a sequence of M
    bits.
  • For each row r with row number j that has the
    property P, we set bit j in B to one all other
    bits are set to zero.
  • Assuming fix sized disk pages that hold p
    records, RID of record j is (j/p, jp). Page is
    j/p, slot number is jp.

Pres Day, 0100001100111111110000011001..
Record 1
21
A Bitmap
  • A Bitmap B is defined on T as a sequence of M
    bits.
  • For each row r with row number j that has the
    property P, we set bit j in B to one all other
    bits are set to zero.
  • Assuming fix sized disk pages that hold p
    records, RID of record j is (j/p, jp). Page is
    j/p, slot number is jp.

Pres Day, 0100001100111111110000011001..
Record 2
22
A B-Tree on Major Holidays
  • A B-tree index on different holidays of the
    SALES table.

B-tree Leaf page
(Pres day, 01111111.
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
23
Logical Bit-Wise Operations
  • Three key operands AND, OR, NOT
  • Assume a bit map consisting of 4 bits
  • 0011 AND 0101 0001
  • 0011 OR 0101 0111
  • NOT 0011 1100
  • This paper assumes bit maps consisting of
    millions, if not billions, of bits. In Example
    3.1, they assume a bitmap consisting of
    100,000,000 bits, 12.5 Mega bytes.
  • A large bit map is stored in a sequence of disk
    pages. Each disk page full of bits is termed a
    fragment.
  • Some bit positions may correspond to non-existent
    rows. An Existence Bitmap (EBM) has exactly
    those 1 bits corresponding to existing rows.

24
Summary
  • ANY QUESTIONS?

25
Range Predicate
  • SELECT target-list
  • FROM T
  • WHERE C-range
  • C-range C gt c1, C gt c1, C c1, C lt c1, C lt
    C1, C between c1 and c2
  • How to process with a bit-map index?

26
Range Predicate
  • SELECT target-list
  • FROM T
  • WHERE C-range
  • C-range C gt c1, C gt c1, C c1, C lt c1, C lt
    C1, C between c1 and c2
  • How to process with a bit-map index?

27
Range Predicate
  • SELECT target-list
  • FROM T
  • WHERE C-range
  • C-range C gt c1, C gt c1, C c1, C lt c1, C lt
    C1, C between c1 and c2
  • How to process with a bit-map index?

28
Conjunctive Queries
  • Count number of Big Mac Sales on Presidents
    Day assuming a B-tree on product (pid) and day
    of SALES
  • With RID
  • Get the Value-List for Big Mac using the
    B-tree, obtain RID-List1.
  • Get the Value-List for Presidents Day using
    the B-tree, obtain RID-List2.
  • Compute set-intersect of RID-List1 and RID-List2
  • Count the number of RIDs in the intersection set.
  • With bit maps
  • Get the Value-List for Big Mac using the
    B-tree, obtain bit-map1.
  • Get the Value-List for Presidents Day using
    the B-tree, obtain bit-map2.
  • Recall Existence Bitmap (EBM) identify rows that
    exist.
  • Let RES logical AND of bit-map1, bit-map2, and
    EBM.
  • Count the number of bits set to one to identify
    how many Big Macs were sold on Presidents Day.

29
Example 2.1
30
Projection Index
  • Reminiscent of vertical partitioning.
  • Once the qualifying records are found, the
    projection index enables the system to find the
    amt attribute value of the record with a few disk
    I/Os.

amt
cid
pid
holliday
amt
450 699 598 799 520
450 699 598 799 520
Labor day Labor day Presidents day Labor
day Labor day ..
31
Projection Index (Definition)
  • Page 41, first paragraph of Section 2.2

32
Projection Index (Example Usage)
  • Page 41, middle of left hand column

33
Bit-Sliced Indexes Motivation
  • Assume the Amt values are in dollars and as
    follows

1 3 5 7 3 3 1
34
Bit-Sliced Indexes Motivation
  • Assume the Amt values are in dollars and as
    follows. Their binary representation is

1 3 5 7 3 3 1
001 011 101 111 011 011 011
35
Bit-Sliced Indexes Motivation
  • Now, number the order of records as before

1 3 5 7 3 3 1
001 011 101 111 011 011 011
0 1 2 3 4 5 6
36
Bit-Sliced Indexes Motivation
  • Construct a Bit-Sliced index

1 3 5 7 3 3 1
001 011 101 111 011 011 011
0 1 2 3 4 5 6
Bit 0, 1111111 Bit 1, 0101111 Bit 2, 0011000
37
Bit-Sliced Indexes Motivation
  • To compute the sum of all records using the
    existence bit-map bnn (1111111)

1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
?
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
38
Bit-Sliced Indexes Motivation
  • To compute the sum of all records using the
    existence bit-map bnn (1111111)

1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
1 (7 records with bit 0 set to 1) 2 (4
records with bit 1 set to 1) 4 (2 records
with bit 2 set to 1)
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
39
Bit-Sliced Indexes Motivation
  • To compute the sum of all records using the
    existence bit-map bnn (1111111)

1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
1 (7 records with bit 0 set to 1) 2 (4
records with bit 1 set to 1) 4 (2 records
with bit 2 set to 1) (1 7) (2 4) (4
2) 23
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
40
Bit-Sliced Indexes Definition
  • Interpret the value of the Amt column as an
    integer number of pennies, represented as a
    binary number with N1 bits. Define

41
Bit-Sliced Indexes Definition
  • Interpret the value of the Amt column as an
    integer number of pennies, represented as a
    binary number with N1 bits. Define

Why maintain Bn?
42
Bit-Sliced Indexes Definition
  • Interpret the value of the Amt column as an
    integer number of pennies, represented as a
    binary number with N1 bits. Define

The result of a scalar such as SUM involving a
null will itself be a null. Example,
see http//www.oracle.com/technology/oramag/oracl
e/05-jul/o45sql.html
43
Bit-Sliced Index
  • 20 Bitmaps for the Amt column represents
    quantities up to 220 1 pennies, 10,485.75.
  • If we assume normal sales range up to 100.00,
    and all values are as likely to occur, a
    Value-List index would have nearly 10,000
    different values. A Bitmap representation would
    lose its effectiveness. However, Bit-sliced
    indexes continue to perform well.

44
Example with Value-List Index
  • Assume SALES table has 100 million rows. Each
    row is 200 bytes in length. Disk page is 4
    Kbytes, holding 20 rows.
  • Query
  • SELECT SUM(AMT)
  • FROM SALES
  • WHERE condition
  • Bitmap Bf the Foundset
  • Bitmap Bv for each value
  • Bnn Existance bitmap

45
Example with Bit-Sliced Indexes
  • Query
  • SELECT SUM(AMT)
  • FROM SALES
  • WHERE condition
  • Bitmap Bf the Foundset
  • Bitmap Bv for each value
  • Bnn Existance bitmap
  • 20 bits
  • Bit 0, 01010101010
  • Bit 1, 10101011111
  • Bit 19, 0000000001

46
Other Aggregate Functions
  • Ignore MEDIAN Column-Product.
  • SELECT AGG(C) FROM T WHERE condition
  • AGG(C) is COUNT, SUM, AVG, MIN, MAX

47
Range Queries
  • SELECT target-list
  • FROM T
  • WHERE C-range
  • C-range C gt c1, C gt c1, C c1, C lt c1, C lt
    C1, C between c1 and c2

48
Bit-Sliced Indexes
  • Assume c1 3, 011

1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
BGT BLT 0000000 BEQ 1111111 If bit 2 is on
in constant c1 Else BGT 0000000 (1111111
0011000) 0011000 BEQ 1111111
(0011000) 1111111 1100111
1100111 Iteration 1 on Bit 2 BLT
0000000 BGT 0011000 BEQ 1100111
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
49
Bit-Sliced Indexes
  • Assume c1 3, 011

1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
BLT 0000000 BGT 0011000 BEQ 1100111 If bit
1 is on in constant c1 BLT BLT (BEQ
NOT(B1)) 0000000 (1100111 (0101110))
0000000 (1100111 1010001) 1000001 BEQ
BEQ B1 1100111 0101110
0100110 Iteration 2 on Bit 1 BLT 1000001 BGT
0011000 BEQ 0100110
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
50
Bit-Sliced Indexes
  • Assume c1 3, 011

1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
BLT 1000001 BGT 0011000 BEQ 0100110 If bit
0 is on in constant c1 BLT BLT (BEQ
NOT(B1)) 1000001 (0100110 (1111111))
1000001 (0100110 0000000) 1000001 BEQ
BEQ B1 0100110 1111111
0100110 Iteration 3 on Bit 0 BLT 1000001 BGT
0011000 BEQ 0100110
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
51
Bit-Sliced Indexes Range Queries
  • Note that lt and gt are computed using BEQ, BLT
    and BGT

52
Range Queries
53
Variant Indexes
  • You are not responsible for Section 5, OLAP style
    queries.
Write a Comment
User Comments (0)