Title: MISTRAL Performance of TPC-D Benchmark and Datawarehouses
1MISTRALPerformance of TPC-D Benchmark and
Datawarehouses
- Prof. R. Bayer, Ph.D.
- Dr. Volker Markl
- Dept. of Computer Science, Technical University
Munich - and Bavarian Research Center for Knowledgebased
Systems - (FORWISS)
2Test Bed for Performance Measurements
- Hardware
- Compaq Proliant 5000
- 4 Pentium II 200 MHz
- 512 MB RAM
- hard disk 7 4 GB 28 GB
- Operating System
- Windows NT 4.0
- RDBMS
- Oracle 8
- 8kB pages
- Access Methods
- Tetris Algorithm for UB-Trees
- Oracle IOT (clustering B-Tree)
- Oracle FTS (full table scan)
3TPC-D Schema
4Shipping Priority Query (Q3)
SELECT L_ORDERKEY, SUM(L_EXTENDEDPRICE(1-L
_DISCOUNT)) AS REVENUE, O_ORDERDATE,
O_SHIPPRIORITYFROM CUSTOMER, ORDER,
LINEITEMWHERE C_MKTSEGMENT 'FOOD'
AND C_CUSTKEY O_CUSTKEY AND L_ORDERKEY
O_ORDERKEY AND O_ORDERDATE lt DATE 1.5.98 AND
L_SHIPDATE gt DATE 1.6.98GROUP BY L_ORDERKEY,
O_ORDERDATE, O_SHIPPRIORITYORDER BY REVENUE
DESC, O_ORDERDATE
5Tetris algorithm Q3
6Response times 50 LINEITEM (Q3)
7Temporary Storage 50 LINEITEM (Q3)
8Sorting 50 of LINEITEM
9Forecasting Revenue Change Query (Q6)
SUM(L_EXTENDEDPRICEL_DISCOUNT) AS REVENUEFROM
LINEITEMWHERE L_SHIPDATE gt date
AND L_SHIPDATE lt date INVERVAL 1 YEAR
AND L_DISCOUNT BETWEEN discount -0.01 AND
discount 0.01 AND L_QUANTITY lt quantity
10Forecasting Revenue Change Query (Q6)
11Performance of Q6
12Retrieving 3,3 of LINEITEM
13GFKSnowflake Schema
14TETRIS MHC
15Performance Measurements GFK
- DBMS
- TransBase (covering, clustering compound
B-Trees) - UB/API on top of TransBase (UB-Tree, two ESQL
Statements are optimized and processed per
UB-Tree page access) - TransBase Hypercube (UB-Tree inside the DBMS
Kernel) - Database
- real world data warehouse from GFK
- 3D Snowflake Schema
- Time (3 years 18 MP)
- Segment (10500 outlets)
- Product ( 500000 items in 604 product groups)
- 42 Mio fact tuples ( 4 GB fact table size)
- Computer
- Sun ULTRA 1 Workstation (64 MB Main Memory)
16Indexes
- MHC to encode hierarchies
- TIME_CS (5 bits)
- SEGMENT_CS (24 bits)
- PRODUCT_CS (29 bits)
- Compound on (PRODUCT_CS, TIME_CS, SEGMENT_CS)
or(TIME_CS, SEGMENT_CS, PRODUCT_CS) - UB-Tree (UB/API) onTIME_CS, PRODUCT_CS,
SEGMENT_CS
17GFK Datawarehouse Reports
? selectivity ltlt 1
18Compound fixed 2MP, varying PG
19UB-Tree fixed 2MP, varying PG
20Response Time Result Set Size
21Clustering of UB-Trees
Ø 0.85
s / dc
clustering factor
22Clustering depending on Result Set Size
23Summary UB-Tree
- Excellent performance on large real DBs, gt factor
10 - Very low storage requirement
- 1st answer extremely fast, interactive use!!
- Response time proportional to size of answer
- Wide applicability all DBs are
multidimensional!! - Easy integration into DBMS, simple DDL extension
- Very useful as middleware
- Patent applications