Title: V Storage Manager
1V Storage Manager
- Shahram Ghandeharizadeh
- Computer Science Department
- University of Southern California
2Simple Tests
- Arbitrary
- Designed to stress your software and ensure its
robustness - Repeated creation of the same datazone name.
- Deletion of records in iteration j when they were
deleted in iteration (j-1) - Etc.
- Designed to be scalable.
- Your test can be implemented with a variable A
that captures the number of records in alpha
(10,000). - Value of all other important parameters can be a
function of A, e.g., number of records for beta,
number of iterations in Test 3. - Start with a small value for A, say 100. Once
your tests are working, increase the value of A
to 1,000. Once this works, increase A to 10,000.
- Do not be surprised to find your code break when
you increase the value of A from 100 to 1,000.
This is the reality of developing robust
software Size matters!
3Suggestion
- Focus on a single-threaded version of your
implementation. - Once all tests are running, extend to analyze the
impact of multi-threading. - This may require a re-visit of your designs.
4Heap versus Stack
- Execution of your program consists of two kinds
of dynamic memory Heap and Stack. - Use of malloc and new allocates memory from Heap.
- The programmer is responsible to free this memory
and return it to heap. - Invocation of a method uses a stack. All
variables declared in a method are placed on the
stack. When the method returns, its stack is
freed.
Code
static data
stack
heap
5Heap versus Stack (Example)
- In method Test 1, the character array named
payload is declared on the stack when Test 1 is
invoked. - Its memory is freed when Test 1 completes
execution. - The programmer is NOT responsible fore managing
the memory assigned to payload because it is a
local variable managed using the stack.
- Test 1 ()
- char10000 payload
- vdt vptr
- vptr.set_data(payload)
- .
Code
static data
stack
heap
6Heap versus Stack (Example)
- In method Test 1, the character array named
payload is assigned memory from the heap (using
new). - The variable payload is on the stack!
- The memory pointed to by payload is allocated
from the heap. - The programmer is responsible for freeing this
memory using delete.
- Test 1 ()
- char payload
- vdt vptr
- payload new char1000
- vptr.set_data(payload)
- .
Code
static data
stack
heap
7Urban Legends about Heap
- The following is FALSE Memory allocated in
method X can be freed only in method X. See the
example as proof. - Cause Debugging C/C programs is difficult.
It is easy to corrupt memory if you are not
careful! These errors are difficult to find.
They are also stressful, resulting in beliefs
that are not true ? Urban legend is born. - How to avoid these kinds of conceptual traps?
Write small programs to verify a belief that
sounds too good to be true. It is simple and
avoids digressions that waste your time and cause
a lot of heart ache.
- GenMemGenMem(Vdt v)
-
- char cptr
- cptr new char10
- v-gtset_data(cptr)
- memcpy(cptr, "Shahram", 7)
-
- int _tmain(int argc, _TCHAR argv)
-
- Vdt vptr
- char cptr
- GenMem GM new GenMem(vptr)
- cptr (char ) vptr.get_data()
- delete cptr
8Variant Indexes by P. ONeil and D. Quass
- Shahram Ghandeharizadeh
- Computer Science Department
- University of Southern California
9Key Assumptions
- A read-mostly database that is updated
infrequently. - Complex indexes to speedup queries.
- Focuses on physical designs to enhance
performance.
10Example Data Warehouse
- McDonalds keeping track of different sandwich
purchases.
SALES
TIME
Cid Pid Day Amt dollar_cost Unit_sales
PROD
Day Week Month Year Holliday Weekday
Pid Name Size Weight Package_type
11Example Data Warehouse
- Key Observations
- A handful of products, a PROD table with tens of
rows. - Many millions of rows for SALES tables.
SALES
TIME
Cid Pid Day Amt dollar_cost Unit_sales
PROD
Day Week Month Year Holliday Weekday
Pid Name Size Weight Package_type
12A B-Tree On the Pid of Sales
- Assuming McDonalds sales 12 different products
B-tree Leaf page
(Big Mac, (1,1), (1, 3), (1, 4), (2,4), .
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
13A B-Tree On the Pid of Sales
- Assuming McDonalds sales 12 different products
B-tree Leaf page
(Big Mac, (1,1), (1, 3), (1, 4), (2,4), .
What happens with a SALES table consisting of a
million rows?
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
14A B-Tree on Major Holidays
- A B-tree index on different holidays of the
SALES table.
B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
15A B-Tree on Major Holidays
- A B-tree index on different holidays of the
SALES table.
B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Value List
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
16A B-Tree on Major Holidays
- A B-tree index on different holidays of the
SALES table.
B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Value List
RID List
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
17Conjunctive Queries
- Count number of Big Mac Sales on Presidents
Day assuming a B-tree on product (pid) and day
of SALES - With RID-Lists
- Get the Value-List for Big Mac using the
B-tree, obtain RID-List1. - Get the Value-List for Presidents Day using
the B-tree, obtain RID-List2. - Compute set-intersect of RID-List1 and RID-List2
- Count the number of RIDs in the intersection set.
- Is there a better way?
- Yes, use bit-maps and logical bit-wise operands.
18Bitmap Indexes
- Use a bitmap to represent the existence of a
record with a certain attribute value. - Example If a record has the indexed attribute
value Big Mac then its corresponding entry in
the bitmap is set to one. Otherwise, it is a
zero.
19A Bitmap
- A Bitmap B is defined on T as a sequence of M
bits. - For each row r with row number j that has the
property P, we set bit j in B to one all other
bits are set to zero. - Assuming fix sized disk pages that hold p
records, RID of record j is (j/p, jp). Page is
j/p, slot number is jp.
Pres Day, 0100001100111111110000011001..
Record 0
20A Bitmap
- A Bitmap B is defined on T as a sequence of M
bits. - For each row r with row number j that has the
property P, we set bit j in B to one all other
bits are set to zero. - Assuming fix sized disk pages that hold p
records, RID of record j is (j/p, jp). Page is
j/p, slot number is jp.
Pres Day, 0100001100111111110000011001..
Record 1
21A Bitmap
- A Bitmap B is defined on T as a sequence of M
bits. - For each row r with row number j that has the
property P, we set bit j in B to one all other
bits are set to zero. - Assuming fix sized disk pages that hold p
records, RID of record j is (j/p, jp). Page is
j/p, slot number is jp.
Pres Day, 0100001100111111110000011001..
Record 2
22A B-Tree on Major Holidays
- A B-tree index on different holidays of the
SALES table.
B-tree Leaf page
(Pres day, 01111111.
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
23Logical Bit-Wise Operations
- Three key operands AND, OR, NOT
- Assume a bit map consisting of 4 bits
- 0011 AND 0101 0001
- 0011 OR 0101 0111
- NOT 0011 1100
- This paper assumes bit maps consisting of
millions, if not billions, of bits. In Example
3.1, they assume a bitmap consisting of
100,000,000 bits, 12.5 Mega bytes. - A large bit map is stored in a sequence of disk
pages. Each disk page full of bits is termed a
fragment. - Some bit positions may correspond to non-existent
rows. An Existence Bitmap (EBM) has exactly
those 1 bits corresponding to existing rows.
24Summary
25Range Predicate
- SELECT target-list
- FROM T
- WHERE C-range
- C-range C gt c1, C gt c1, C c1, C lt c1, C lt
C1, C between c1 and c2 - How to process with a bit-map index?
26Range Predicate
- SELECT target-list
- FROM T
- WHERE C-range
- C-range C gt c1, C gt c1, C c1, C lt c1, C lt
C1, C between c1 and c2 - How to process with a bit-map index?
27Range Predicate
- SELECT target-list
- FROM T
- WHERE C-range
- C-range C gt c1, C gt c1, C c1, C lt c1, C lt
C1, C between c1 and c2 - How to process with a bit-map index?
28Conjunctive Queries
- Count number of Big Mac Sales on Presidents
Day assuming a B-tree on product (pid) and day
of SALES - With RID
- Get the Value-List for Big Mac using the
B-tree, obtain RID-List1. - Get the Value-List for Presidents Day using
the B-tree, obtain RID-List2. - Compute set-intersect of RID-List1 and RID-List2
- Count the number of RIDs in the intersection set.
- With bit maps
- Get the Value-List for Big Mac using the
B-tree, obtain bit-map1. - Get the Value-List for Presidents Day using
the B-tree, obtain bit-map2. - Recall Existence Bitmap (EBM) identify rows that
exist. - Let RES logical AND of bit-map1, bit-map2, and
EBM. - Count the number of bits set to one to identify
how many Big Macs were sold on Presidents Day.
29Example 2.1
30Projection Index
- Reminiscent of vertical partitioning.
- Once the qualifying records are found, the
projection index enables the system to find the
amt attribute value of the record with a few disk
I/Os.
amt
cid
pid
holliday
amt
450 699 598 799 520
450 699 598 799 520
Labor day Labor day Presidents day Labor
day Labor day ..
31Projection Index (Definition)
- Page 41, first paragraph of Section 2.2
32Projection Index (Example Usage)
- Page 41, middle of left hand column
33Bit-Sliced Indexes Motivation
- Assume the Amt values are in dollars and as
follows
1 3 5 7 3 3 1
34Bit-Sliced Indexes Motivation
- Assume the Amt values are in dollars and as
follows. Their binary representation is
1 3 5 7 3 3 1
001 011 101 111 011 011 011
35Bit-Sliced Indexes Motivation
- Now, number the order of records as before
1 3 5 7 3 3 1
001 011 101 111 011 011 011
0 1 2 3 4 5 6
36Bit-Sliced Indexes Motivation
- Construct a Bit-Sliced index
1 3 5 7 3 3 1
001 011 101 111 011 011 011
0 1 2 3 4 5 6
Bit 0, 1111111 Bit 1, 0101111 Bit 2, 0011000
37Bit-Sliced Indexes Motivation
- To compute the sum of all records using the
existence bit-map bnn (1111111)
1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
?
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
38Bit-Sliced Indexes Motivation
- To compute the sum of all records using the
existence bit-map bnn (1111111)
1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
1 (7 records with bit 0 set to 1) 2 (4
records with bit 1 set to 1) 4 (2 records
with bit 2 set to 1)
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
39Bit-Sliced Indexes Motivation
- To compute the sum of all records using the
existence bit-map bnn (1111111)
1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
1 (7 records with bit 0 set to 1) 2 (4
records with bit 1 set to 1) 4 (2 records
with bit 2 set to 1) (1 7) (2 4) (4
2) 23
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
40Bit-Sliced Indexes Definition
- Interpret the value of the Amt column as an
integer number of pennies, represented as a
binary number with N1 bits. Define
41Bit-Sliced Indexes Definition
- Interpret the value of the Amt column as an
integer number of pennies, represented as a
binary number with N1 bits. Define
Why maintain Bn?
42Bit-Sliced Indexes Definition
- Interpret the value of the Amt column as an
integer number of pennies, represented as a
binary number with N1 bits. Define
The result of a scalar such as SUM involving a
null will itself be a null. Example,
see http//www.oracle.com/technology/oramag/oracl
e/05-jul/o45sql.html
43Bit-Sliced Index
- 20 Bitmaps for the Amt column represents
quantities up to 220 1 pennies, 10,485.75. - If we assume normal sales range up to 100.00,
and all values are as likely to occur, a
Value-List index would have nearly 10,000
different values. A Bitmap representation would
lose its effectiveness. However, Bit-sliced
indexes continue to perform well.
44Example with Value-List Index
- Assume SALES table has 100 million rows. Each
row is 200 bytes in length. Disk page is 4
Kbytes, holding 20 rows. - Query
- SELECT SUM(AMT)
- FROM SALES
- WHERE condition
- Bitmap Bf the Foundset
- Bitmap Bv for each value
- Bnn Existance bitmap
45Example with Bit-Sliced Indexes
- Query
- SELECT SUM(AMT)
- FROM SALES
- WHERE condition
- Bitmap Bf the Foundset
- Bitmap Bv for each value
- Bnn Existance bitmap
- 20 bits
- Bit 0, 01010101010
- Bit 1, 10101011111
-
- Bit 19, 0000000001
46Other Aggregate Functions
- Ignore MEDIAN Column-Product.
- SELECT AGG(C) FROM T WHERE condition
- AGG(C) is COUNT, SUM, AVG, MIN, MAX
47Range Queries
- SELECT target-list
- FROM T
- WHERE C-range
- C-range C gt c1, C gt c1, C c1, C lt c1, C lt
C1, C between c1 and c2
48Bit-Sliced Indexes
1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
BGT BLT 0000000 BEQ 1111111 If bit 2 is on
in constant c1 Else BGT 0000000 (1111111
0011000) 0011000 BEQ 1111111
(0011000) 1111111 1100111
1100111 Iteration 1 on Bit 2 BLT
0000000 BGT 0011000 BEQ 1100111
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
49Bit-Sliced Indexes
1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
BLT 0000000 BGT 0011000 BEQ 1100111 If bit
1 is on in constant c1 BLT BLT (BEQ
NOT(B1)) 0000000 (1100111 (0101110))
0000000 (1100111 1010001) 1000001 BEQ
BEQ B1 1100111 0101110
0100110 Iteration 2 on Bit 1 BLT 1000001 BGT
0011000 BEQ 0100110
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
50Bit-Sliced Indexes
1 3 5 7 3 3 1
001 011 101 111 011 011 001
0 1 2 3 4 5 6
BLT 1000001 BGT 0011000 BEQ 0100110 If bit
0 is on in constant c1 BLT BLT (BEQ
NOT(B1)) 1000001 (0100110 (1111111))
1000001 (0100110 0000000) 1000001 BEQ
BEQ B1 0100110 1111111
0100110 Iteration 3 on Bit 0 BLT 1000001 BGT
0011000 BEQ 0100110
Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
51Bit-Sliced Indexes Range Queries
- Note that lt and gt are computed using BEQ, BLT
and BGT
52Range Queries
53Variant Indexes
- You are not responsible for Section 5, OLAP style
queries.