Title: Tree Indexing on Flash Disks
1Tree Indexing on Flash Disks
- Yinan Li
- Cooperate with
- Bingsheng He, Qiong Luo, and Ke YiHong Kong
University of Science and Technology
2Introduction
Tape is Dead, Disk is Tape, Flash is Disk
Jim Gray
- Flash based device the main-stream storage in
mobile devices and embedded systems. - Recently, the flash disk, or flash Solid State
Disk (SSD), has emerged as a viable alternative
to the magnetic hard disk for non-volatile
storage.
3Flash SSD
- Intel X-25M 80GB SATA SSD
- Mtron 64GB SATA SSD
- Other manufactories Samsung,SanDisk, Seagate,
Fusion-IO,
4Internal Structure of Flash Disk
5Flash Memory
- Three basic operations of flash memory
- Read Page (512B-2KB), 80us
- Write Page (512B-2KB), 200us
- writes are only able to change bits from 1 to 0.
- Erase Block (128-512KB), 1.5ms
- clear all bits to 1.
- Each block can be erased for a finite number of
times before wear out.
6Flash Translation Layer (FTL)
- Flash SSDs employ a firmware layer, called FTL,
to implement out-place update scheme. - Maintaining a mapping table between the logical
and physical pages - Address Translation
- Garbage Collection
- Wear Leveling
- Page-Level Mapping, Block-Level Mapping,
Fragmentation
7Superiority of Flash Disk
- Pure electrical device (No mechanical moving
part) - Extremely fast random read speed
- Low power consumption
MagneticHardDisk
FlashDisk
8Challenge of Flash Disk
- Due to the physical feature of flash memory,
flash disk exhibits relative Poor Random Write
performance.
9Bandwidth of Basic Access Patterns
- Random writes are 5.6 - 55X slower than random
reads on flash SSDs Intel, Mtron, Samsung SSDs. - Random accesses are significantly slower than
sequential ones with multi-page optimization.
Access Unit Size 2KB
Access Unit Size 512KB
10 Tree Indexing on Flash Disk
- Tree indexes are a primary access method in
databases - Tree indexes on flash disk
- exploit the fast random read speed.
- suffer from the poor random write performance.
- we study how to adapt them to the flash disk
exploiting the hardware features for - efficiency.
11B-Tree
- Search I/O Cost O(logBN) Random Reads
- Update I/O Cost O(logBN) Rndom Reads O(1)
Rndom Writes
Search Key 48
Insert Key 40
39
O(logBN)Levels
43
54
58
9
15
27
36
43
48
53
39
41
54
56
58
48
41
40
12LSM-Tree (Log Structure Merge Tree)
- Search I/O Cost O(logkNlogBN) Random Reads
- Update I/O Cost O(logkN) Sequential Write
Search Key X
Insert Key Y
Size Ratio k
O(logBN)Levels
Size Ratio k
Size Ratio k
B-Tree
B-Tree
B-Tree
B-Tree
Merge
Merge
Merge
O(logkN) BTrees
1 P. E. ONeil, E. Cheng, D. Gawlick, and E. J.
ONeil. The Log-Structure Merge-Tree(LSM-Tree).
Acta Informatica. 1996
13BFTL
- Search I/O cost O(clogBN) Random Reads
- Update I/O cost O(1/c) Random Writes
Max Length of link lists c
Pid
0
Pid 0
1
2
Pid 1
Pid2
100
Pid 100
Pid 200
Pid3
2 Chin-Hsien Wu, Tei-Wei Kuo, and Li Ping
Chang. An efficient B-tree layer implementation
for flash memory storage systems, In RTCSA, 2003
14Designing Index for Flash Disk
- Our Goal
- reducing update cost
- preserving search efficiency
- Two ways to reduce random write cost
- Transform into sequential ones.
- Limit them within a small area (lt 512-8MB).
15Outline
- Introduction
- Structure of FD-Tree
- Cost Analysis
- Experimental Results
- Conclusion
16FD-Tree
- Transforming Random Writes into Sequential ones
by logarithmic method. - Insert perform on a small tree first
- Gradually merge to larger ones
- Improving search efficiency by fractional
cascading. - In each level, using a special entry to find the
page in the next level that search will go next.
17Data Structure of FD-Tree
- L Levels
- one head tree (B-tree) on the top
- L-1 sorted runs at the bottom
- Logarithmically increasing sizes(capacities) of
levels
18Data Structure of FD-Tree
- Entry a pair of key and pointer
- Fence a special entry, used to improve search
efficiency - Key is equal to the FIRST key in its pointed
page. - Pointer is ID of a page in the immediate next
level that search will go next.
19Data Structure of FD-Tree
- Each page is pointed by one or more fences in the
immediate topper level. - The first entry of each page is a fence. (If not,
we insert one)
20Insertion on FD-Tree
- Insert new entry into the head tree
- If the head tree is full, merge it into next
level and then empty it. - The merge process may invoke recursive merge
process (merge to lower levels).
21Merge on FD-Tree
- Scan two sorted runs and generate new sorted
runs.
x
Fence
x
Entry in Li
Entry in Li1
x
Li
2
3
1
19
11
29
Li1
1
5
6
7
9
10
11
12
15
22
24
26
New Li
1
9
New Li1
11
1
2
3
5
6
7
9
10
12
15
19
22
24
26
9
22
22Insertion Merge on FD-Tree
- When top L levels are full, merge top L levels
and replace them with new ones.
Insert
Merge
23Search on FD-Tree
Search Key 81
63
L0(Head Tree)
72
63
95
84
L1
63
84
79
78
75
71
58
60
93
L2
71
81
76
83
86
91
81
24Deletion on FD-Tree
- A deletion is handled in a way similar to an
insertion. - Insert a special entry, called filter entry, to
mark the original entry, called phantom entry,
has been deleted. - Filter entry will encounter its corresponding
phantom entry in a particular level as the merges
occurring. Thus, we discard both of them.
25Deletion on FD-Tree
Delete three entries
L0
L0
37
L1
L1
45
L2
L2
16
Merge L0,L1
L0
L0
L1
L1
L2
L2
Merge L0, L1, L2
26Outline
- Introduction
- Structure of FD-Tree
- Cost Analysis
- Experimental Results
- Conclusion
27Cost Analysis of FD-Tree
- I/O cost of FD-Tree
- Search
- Insertion
- Deletion Search Insertion
- Update Deletion Insertion
k size ratio between adjacent levels f
entries in a page N entries in index
entries in the head tree
28I/O Cost Comparison
You may assume for simplicity of
comparison, thus
29Cost Model
- Tradeoff of k value
- Large k value high insertion cost
- Small k value high search cost
-
- We develop a cost model to calculate the optimal
value of k, given the characteristics of both
flash SSD and workload.
30Cost Model
- Estimated cost varying k values
31Outline
- Introduction
- Structure of FD-Tree
- Cost Analysis
- Experimental Results
- Conclusion
32Implementation Details
FD-tree
LSM-tree
BFTL
B-tree
- Storage Layout
- Fixed-length record page format
- Disable OS disk buffering
- Buffer Manager
- LRU replacement policy
Buffer Manager
Storage Layout
Flash SSDs
33Experimental Setup
- Platform
- Intel Quad Core CPU
- 2GB memory
- Windows XP
- Three Flash SSDs
- Intel X-25M 80GB, Mtron 64GB, Samsung 32GB.
- SATA interface
34Experimental Settings
- Index Size 128MB-8GB (8GB by default)
- Entry Size 8 Bytes (4 Bytes Key 4 Bytes Ptr)
- Buffer Size 16MB
- Warm up period 10000 queries
- Workload 50 search 50 insertion (by default)
35Validation of the Cost Model
- The estimated costs are very close to the
measured ones. - We can estimated relative accurate k value to
minimize the overall cost by our cost model.
Mtron SSD
Intel SSD
36Overall Performance Comparison
- On Mtron SSD, FD-tree is 24.2X, 5.8X, and 1.8X
faster than B-tree, BFTL and LSM-tree,
respectively. - On Intel SSD, FD-tree is 3X, 3X, and1.5X faster
than B-tree, BFTL, and LSM-tree, respectively
Intel SSD
Mtron SSD
37Search Performance Comparison
- FD-tree has similar search performance as B-tree
- FD-tree and B-tree outperform others on both
SSDs
Intel SSD
Mtron SSD
38Insertion Performance Comparison
- FD-tree has similar insertion performance as
LSM-tree - FD-tree and LSM-tree outperform others on both
SSDs.
Intel SSD
Mtron SSD
39Performance Comparison
- W_Search 80 search 10 insertion 5
deletion 5 update - W_Update 20 search 40 insertion 20
deletion 20 update
40Outline
- Introduction
- Structure of FD-Tree
- Cost Analysis
- Experimental Results
- Conclusion
41Conclusion
- We design a new index structure that can
transform almost all random writes into
sequential ones, and preserve the search
efficiency. - We empirically and analytically show that FD-tree
outperform all other indexes on various flash
SSDs.
42Related Publication
- Yinan Li, Bingsheng He, Qiong Luo, Ke Yi. Tree
Indexing on Flash Disks. ICDE 2009. Short Paper. - Yinan Li, Bingsheng He, Qiong Luo, Ke Yi. Tree
Indexing on Flash Based Solid State Drives.
Preparing to submit to a journal.
43QA
44(No Transcript)
45Additional Slides
46Block-Level FTL
- Mapping Granularity Block
- Cost 1 erase N writes N reads
47Page-Level FTL
- Mapping Granularity Page
- Larger mapping table
- Cost 1/N erase 1 write 1 read
48Fragmentation
- Cost of Recycling ONE block N2 reads, N(N-1)
writes, N erases.
Flash Disk is full now. We have to recycle space
49Deamortized FD-Tree
- Normal FD-Tree
- High average insertion performance
- Poor worst case insertion performance
- Deamoritzed FD-Tree
- Reducing the worst case insertion cost
- Preserving the average insertion cost.
50Deamortized FD-Tree
- Maintain Two Head Trees T0 , T0
- Insert into T0
- Search on both T0 and T0
- Concurrent Merge
Search
Insert
T0
T0
Insert into T0
51Deamortized FD-Tree
- The high merge cost is amortized to all entries
inserted into the head tree. - The overall cost (almost) does not increased.
52FD-Tree vs. Deamortized FD-Tree
- Relative high worst case performance
- Low overhead