www'SageLogix'Com

About This Presentation

Title:

www'SageLogix'Com

Description:

Pseudo-randomizing such data with REVERSE key indexes relieves this performance problem ... All buckets assumed to have the same number of rows ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 43

Provided by: TimGo6

Category:

more less

Transcript and Presenter's Notes

Title: www'SageLogix'Com

1
International Oracle Users Group Live 2004
Technical Session 549Understanding
IndexesTim GormanPrincipal - SageLogix,
Inc.Email tim_at_sagelogix.com

www.SageLogix.Com

2
Agenda

BTree index architecture
Issues with BTree indexes in Oracle
Real issues
Sparsely-populated indexes
Contention on INSERT
Uneven data distribution
Low data cardinality (a.k.a. selectivity)
Imagined or mythical issues
Indexes needing rebalancing
???
Some interesting options for indexing
Function-based, descending, compression

3
BTree index architecture

Oracle implements balanced BTree indexes as its
primary indexing method
Tree structure consisting of root, branch, and
leaf nodes
each node is one database block in Oracle
root node is the top-level branch node
starting point for searches into the index
branch nodes are connectors
point to other branch nodes or to leaf nodes
leaf nodes
contain data values and pointers to rows

4
BTree index architecture

Branch entries contain
Max data value piece
DBA of next level branch or leaf
Leaf entries contain
Data value
ROWID of table row

Index PK_X
root

branch
branch
branch
branch

leaf
leaf
leaf
leaf
leaf
leaf
leaf

Table X
5
BTree index architecture

Index read I/O is sequential in nature
Oracle wait-event for single-block I/O requests
used during indexed access
db file sequential read
Used with RANGE, UNIQUE, and FULL scans
FAST FULL index-scans are an exception
Uses sequential multiblock I/O similar to FULL
table-scans
Conventional wisdom that tables and indexes must
be separated to different tablespaces to enable
some form of parallel I/O is a myth
Think of how a treasure hunt is performed.

6
BTree index architecture

When indexes are built on populated tables using
CREATE INDEX or ALTER INDEX xxx REBUILD
Table is scanned and column values are sorted
Leaf blocks of index are populated first
Then supporting branch levels are built to
support those leaves
All blocks are filled up to PCTFREE threshold

7
BTree index architecture

When indexes are built transactionally using
conventional-path INSERT or UPDATE statements
How the indexes grow is dependent upon two
possible design decisions
Optimize either for
random data values
Or
sequentially ascending data values

8
BTree index architecture

If optimizing for random data values
As blocks fill, split them in half, move upper
half to new block, leave lower half behind
Leaves behind half-full blocks
Allows back-filling as lower data values occur
If optimizing for sequential data values
As blocks fill, simply overflow into a new block
Leaves behind full blocks
No need to worry about back-filling as lower data
values could never occur

9
BTree index architecture
Grows like this
Overflow

Overflowing uses space more efficiently in a
heap (non-sorted) data structure
Also uses space more efficiently when optimizing
sequential data values

Split
Grows like this

Splitting anticipates back-filling in a sorted
data structure
Therefore uses space more efficiently when
optimizing random data values

10
BTree index architecture

If the last index entry inserted is the highest
data value in the block
Then data will overflow into the next block on
the same level
If the last index entry inserted is not the
highest data value in the block
Then the current block will split, with the lower
set of values remaining in place and the higher
set of values moving to the next block on the
same level

11
BTree index architecture

If the branch block above cannot accommodate
another child
Then this algorithm is applied recursively first
If this algorithm iterates recursively all the
way back to the root block, and the root becomes
full
Then the root splits/overflows to generate a new
branch level (BLEVEL)
Root block stays in place, generates two new
blocks at a new 1st BLEVEL below

12
BTree index architecture

Balanced BTree indexes grow balanced by design,
automatically
New levels are added to the tree structure above,
dynamically, as needed
Indexes do not shrink automatically when
deletions occur
Instead, empty index entries are left in place
For possible reuse by newly inserted data
ROWID (pointer to data) portion is set to NULL
Deletions may cause an index to become sparse and
thus less efficient over time

13
Issues with BTree indexes

BTree indexes optimize
Both the random and sequential data values!
Even distribution of distinct data values
High data cardinality or selectivity of values
As a result, issues with BTree indexes are
Sparseness
Resulting from data deletions
Block contention when inserting sequential data
Uneven distribution of data values
Popular and unpopular data values
Low data cardinality

14
Sparsely-populated indexes

Indexes never become unbalanced over time
If so, then why does performance sometimes
deteriorate over time?
Instead, indexes can become sparsely populated
due to
Deletions of row data
Unfortunate patterns of inserted random data
Sparseness simply means that more I/O is required
to perform the same work

15
Sparsely-populated indexes

How is this condition detected?
ANALYZE INDEX VALIDATE STRUCTURE command
Populate session-private view named INDEX_STATS
with only one row
Column PCT_USED is the average percentage of
space utilized in the blocks belonging to the
index
Derived from the ratio of the value of the
columns USED_SPACE and BTREE_SPACE
Expect PCT_USED to be 90 by default
Lesser values may indicate sparseness developing
Best to watch values over time

16
Sparsely-populated indexes

How is this condition detected?
If PCT_USED is much less than 90
Non-zero values in the column DEL_LF_ROWS
Cause of sparseness is probably row deletions, if
the value in the column DEL_LF_ROWS is a large
percentage of value in column LF_ROWS

17
Sparsely-populated indexes

Rebuild or coalesce are two possible solutions
for index sparseness
Dont guess!!!
Make certain using ANALYZE INDEX VALIDATE
STRUCTURE
Be aware that the ANALYZE command locks the
index, however
To rebuild an index
Use ALTER INDEX REBUILD
To coalesce space within an index
Use ALTER INDEX COALESCE

18
Sparsely-populated indexes

ALTER INDEX REBUILD is the most commonly-used
solution for sparsely-populated indexes
Unlike CREATE INDEX, uses the existing index as
the source, which is faster because
Index is usually smaller than the table (less
I/O)
Index is already sorted (some sorting still
needed)
Other features of ALTER INDEX REBUILD include
Parallel execution (throw more resources at task)
Direct-path operations (no rollback/undo
generated)
NOLOGGING (no redo generated)
COMPUTE STATISTICS (better CBO stats cheaply)
ONLINE (allows full usage of index during task)

19
Sparsely-populated indexes

ALTER INDEX COALESCE is another alternative to
rebuilding
Merges unused space within indexes to free blocks
for reuse
Slower than ALTER INDEX REBUILD because it is a
transactional operation, not a bulk operation
No direct-path, parallel, nologging, etc
But COALESCE is implicitly an ONLINE operation

20
Contention on INSERT

Multiple concurrently-executing INSERTs into an
index on column(s) with sequential data values
can also bottle-neck on buffer busy waits
(a.k.a. block-level contention)
Pseudo-randomizing such data with REVERSE key
indexes relieves this performance problem
Instead of many processes attempting to insert
index entries into the right-most leaf block
Insertions are evenly scattered over all of the
available leaf blocks

21
Contention on INSERT

CREATE ALTER INDEX REVERSE
Pseudo-randomizes non-random data
By flipping or reversing the physical order
of the data value during storage
Causes sequential data values to become more
random by simply reversing data
123456 becomes 654321
123457 becomes 754321, etc...
Index data values are transparently converted
(flipped) and unconverted (unflipped) upon insert
and retrieval, respectively

22
Contention on INSERT

Adverse impacts of using REVERSE indexes
only equivalence operations will use the index
, !, ltgt, IN, and NOT IN
range-scans will not use the index
gt, gt, lt, lt, LIKE, BETWEEN
But, is this really a problem?
Depends on how the sequential data values are
used
Sequence numbers or surrogate system-generated
keys are usually sought using equivalence
operations
No problem here
But timestamps are often range-scanned
There is likely a problem with these situations

23
Uneven distribution of data values

Basic rule of thumb for using indexes
Use indexes when selecting unpopular data
Otherwise, use FULL table-scan
Or a different index? Or partitioning? Or
By default, the Oracle cost-based optimizer (CBO)
assumes even data distribution
Only LOVAL, HIVAL, and DISTINCT_KEYS gathered
during ANALYZE or DBMS_STATS
Real-life often invalidates this assumption
Selectively gathering column-level statistics
provides the CBO with information on data values
which are popular and which are unpopular

24
Uneven distribution of data values

There are two commands for gathering CBO
statistics
ANALYZE TABLE FOR ALL INDEXED COLUMNS
DBMS_STATS.GATHER_COLUMN_STATS
Populates data dictionary views
DBA_TAB_HISTOGRAMS and DBA_PART_HISTOGRAMS with
rows representing buckets of data values
All buckets assumed to have the same number of
rows
Each bucket is defined by its highest value
By default, only one row populated

25
Uneven distribution of data values

Please dont gather column-level statistics
unless the problem is proven
Gathering column-level statistics is not a good
choice as a default operation
Perform SQL tuning with SQL Trace/TKPROF or
STATSPACK to provide proof of a problem
If the CBO fails to choose an index
it could be because it has detected low
cardinality on average
If the CBO chooses an index
it could perform poorly if the data value is
overly popular

26
Illustrating the importance of data

SQLgt create table t1
2 ( c1 varchar2(30),
3 c2 number,
4 c3 number
5 ) tablespace tools
Table created.
SQLgt begin
2 for i in 1..100000 loop
3 insert into t1
4 values(to_char(mod(i,187)), i,
mod(i,187))
5 end loop
6 end
7 /
PL/SQL procedure successfully completed.

27
Illustrating the importance of data

SQLgt create index i1 on t1(c1) tablespace tools
Index created.
SQLgt analyze table t1 compute statistics
Table analyzed.
SQLgt set autotrace on
SQLgt select c1 from t1 where c1 '10000'
no rows selected
Execution Plan
--------------------------------------------------
---
0 SELECT STATEMENT OptimizerCHOOSE
(Cost37 Card535 Bytes1605)
1 0 TABLE ACCESS (FULL) OF 'T1' (Cost37
Card535 Bytes1605)

28
Illustrating the importance of data

SQLgt select num_rows, blocks
2 from user_tables
3 where table_name 'T1'
NUM_ROWS BLOCKS
------------- ---------
100,000 232
SQLgt select num_rows, distinct_keys,
2 avg_leaf_blocks_per_key, avg_data_blocks_per_
key
3 from user_indexes where index_name 'I1'
Avg Leaf Avg Data
Distinct Blocks Blocks
Nbr Rows Key Per Key Per Key
-------- -------- ---------- ---------
100,000 187 1 231

29
Illustrating the importance of data

SQLgt truncate table t1
Table truncated.
SQLgt begin
2 for i in 1..100000 loop
3 insert into t1
4 values(to_char(round(i/187,0)), i,
5 round(i/187,0))
6 end loop
7 end
8 /
PL/SQL procedure successfully completed.

30
Illustrating the importance of data

SQLgt create index i1 on t1(c1) tablespace tools
Index created.
SQLgt analyze table t1 compute statistics
Table analyzed.
SQLgt set autotrace on
SQLgt select c1 from t1 where c1 '10000'
no rows selected
Execution Plan
--------------------------------------------------
---
0 SELECT STATEMENT OptimizerCHOOSE (Cost2
Card187 Bytes561)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1'
(Cost2 Card187 Bytes561)
2 1 INDEX (RANGE SCAN) OF 'I1' (NON-UNIQUE)
(Cost1 Card187)

31
Illustrating the importance of data

SQLgt select num_rows, blocks
2 from user_tables
3 where table_name 'T1'
NUM_ROWS BLOCKS
------------- ---------
100000 242
SQLgt select num_rows, distinct_keys,
2 avg_leaf_blocks_per_key, avg_data_blocks_per_
key
3 from user_indexes where index_name 'I1'
Avg Leaf Avg Data
Distinct Blocks Blocks
Nbr Rows Key Per Key Per Key
-------- -------- ---------- ---------
100000 536 1 1

32
Low data cardinality

BTree indexes work best with high cardinality
Think of the example of a telephone book
Would you use the standard indexing method to
find all occurances of a name comprising 25 of
the book?
Or, would it be better to just scan through the
book?

33
Low data cardinality

Bitmap indexes are extremely compact
That is their primary advantage
No real magic involved
Less I/O is faster, plain and simple
Still, no matter how compact they are
a FULL table scan is still likely faster than a
single bitmap index on low cardinality data
However, merging two or more bitmap indexes
brings out the real power of the mechanism
Scanning of merged bitmaps using fast low-level
bitmasking operations
Sifts through huge numbers of rows quickly

34
Low data cardinality

Bitmap indexes are designed for quickly scanning
low cardinality data
each database block is comprised of two bitmap
segments
each bitmap segment is comprised of a bitmap, a
ROWID list, and a list of distinct data values

ROWID list is a forward-compressed list of
ROWIDs, positioned according to bits in bitmap

ROWID list
Distinct values list
Bitmap

Bitmap is organized into columns for distinct
data values and rows of bits
representing rows

ROWID list
Distinct values list
Bitmap

More distinct data values means less room for
rows within each data value

35
Low data cardinality

Bitmap indexes extremely slow during DML
INSERT, UPDATE, and DELETE
For each change, entries within the bitmap
segments must be
Expanded/decoded
Manipulated
Re-encoded
For multiple concurrently-executing DML changes
Slower individual changes cause queueing
Only two bitmap segments per block increases
contention
Because of this, bitmap indexes are most feasible
on partitioned tables in non-transactional
applications
Data changes should be performed in bulk using
some variation on the technique of EXCHANGE
PARTITION

36
Function-based indexes

Indexes can be based on functions and expressions
create index xxx_ix1 on xxx(upper(c1))
create index xxx_ix2 on xxx(upper(c1)yadda_yadda
(c2))

37
Function-based indexes

User-defined functions usable along with standard
built-in functions
Restrictions for using user-defined functions
But functions and expressions must be
deterministic
repeatable given the same inputs, the function
or expression must always provide the same
results
user-defined functions cannot access tables or
package variables
PRAGMA RESTRICT_REFERENCES(, RNDS, WNDS, RNPS,
WNPS)
CREATE FUNCTION DETERMINISTIC

38
Function-based indexes

Function-based indexes must have statistics
gathered before they can be used
System permissions
GLOBAL QUERY REWRITE needed to CREATE or
ALTER REBUILD
Parameter QUERY_REWRITE_ENABLED TRUE to use

39
Descending indexes

Implemented as a form of function-based indexes
CREATE INDEX DESC
data values in index leaf blocks are sorted in
descending order
must have statistics before it will be utilized
not usable with rule-based optimizer
system permission QUERY REWRITE or GLOBAL QUERY
REWRITE needed to CREATE or ALTER REBUILD
parameter QUERY_REWRITE_ENABLED not necessary to
use

40
Index compression

Since Oracle8i, compression has been available on
BTree indexes
Reduces repeated storage of prefix column data
values
Index compression
COMPRESS -prefix-cols NOCOMPRESS
UNIQUE default -prefix-cols is (-cols)-1
NONUNIQUE default -prefix-cols is -cols
Same syntax for IOT in USING clause as for index
compression
DML is supported on compressed indexes, but it
becomes much slower
Index compression is best used for read-mostly or
read-only situations
Mixing of compressed and uncompressed index
partitions is possible

41
Solutions for BTree indexes

Dont assume that there is anything wrong with a
BTree index
Prove it first!!!
To address sparse index structures
ALTER INDEX REBUILD COALESCE
To address block contention on INSERTs
ALTER INDEX REVERSE
To address uneven data distribution
Gather column-level statistics or histograms
To address low cardinality
Use bitmap indexes or no indexes at all
Additional groovy stuff
Function-based, descending, and compressed indexes

42
Technical Session 549 Slides, paper, and
scripts downloadable from http//www.SageLogix.com
and http//www.EvDBT.com/papers.htm Email
tim_at_sagelogix.com

Write a Comment

User Comments (0)