STACS - PowerPoint PPT Presentation

About This Presentation

Title:

STACS

Description:

Post-processing of data. analyze data (find particles produced, tracks) ... exact index over all properties. Share files brought into cache by multiple queries ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 49

Provided by: arie84

Learn more at: https://dsf.berkeley.edu

Category:

Tags: stacs

more less

Transcript and Presenter's Notes

Title: STACS

1
STACS Storage Access Coordination of Tertiary
Storage for High Energy Physics
Applications Arie Shoshani, Alex Sim, John
Wu, Luis Bernardo, Henrik Nordberg, Doron
Rotem Scientific Data Management
Group Computing Science Directorate Lawrence
Berkeley National Laboratory
no longer at LBNL
2
Outline

Short High Energy Physics overview (of data
handling problem)
Description of the Storage Coordination System
File tracking
The Query Estimator (QE)
Details of the bit-sliced index
The Query Monitor
coordination of file bundles
The Cache Manager
tertiary storage queuing and tape coordination
transfer time for query estimation

3
Optimizing Storage Management for High Energy
Physics Applications
Data Volumes for planned HENP experiments
STAR Solenoidal Tracker At RHIC RHIC
Relativistic Heavy Ion Collider
4
Particle Detection Systems
Phenix at RHIC
STAR detector at RHIC
5
Result of Particle Collision (event)
6
Typical Scientific Exploration Process

Generate large amounts of raw data
large simulations
collect from experiments
Post-processing of data
analyze data (find particles produced, tracks)
generate summary data
e.g. momentum, no. of pions, transverse energy
Number of properties is large (50-100)
Analyze data
use summary data as guide
extract subsets from the large dataset
Need to access events based on partialproperties
specification (range queries)
e.g. ((0.1 lt AVpT lt 0.2) (10 lt Np lt 20)) v (N gt
6000)
apply analysis code

7
Size of Data and Access Patterns

STAR experiment
108 events over 3 years
1-10 MB per event reconstructed data
events organized into 0.1 - 1 GB files
1015 total size
106 files, 30,000 tapes (30 GB tapes)
Access patterns
Subsets of events are selected by region in
high-dimensional property space for analysis
10,000 - 50,000 out of total of 108
Data is randomly scattered all over the tapes
Goal Optimize access from tape systems

8
EXAMPLE OF EVENT PROPERTY VALUES
I event 1 I N(1) 9965 I N(2) 1192 I N(3)
1704 I Npip(1) 2443 I Npip(2) 551 I Npip(3)
426 I Npim(1) 2480 I Npim(2) 541 I Npim(3)
382 I Nkp(1) 229 I Nkp(2) 30 I Nkp(3) 50 I
Nkm(1) 209 I Nkm(2) 23 I Nkm(3) 32 I Np(1)
255 I Np(2) 34
I Np(3) 24 I Npbar(1) 94 I Npbar(2) 12 I
Npbar(3) 24 I NSEC(1) 15607 I NSEC(2) 1342 I
NSECpip(1) 638 I NSECpip(2) 191 I NSECpim(1)
728 I NSECpim(2) 206 I NSECkp(1) 3 I NSECkp(2)
0 I NSECkm(1) 0 I NSECkm(2) 0 I NSECp(1) 524 I
NSECp(2) 244 I NSECpbar(1) 41 I NSECpbar(2) 8
R AVpT(1) 0.325951 R AVpT(2) 0.402098 R
AVpTpip(1) 0.300771 R AVpTpip(2) 0.379093 R
AVpTpim(1) 0.298997 R AVpTpim(2) 0.375859 R
AVpTkp(1) 0.421875 R AVpTkp(2) 0.564385 R
AVpTkm(1) 0.435554 R AVpTkm(2) 0.663398 R
AVpTp(1) 0.651253 R AVpTp(2) 0.777526 R
AVpTpbar(1) 0.399824 R AVpTpbar(2) 0.690237 I
NHIGHpT(1) 205 I NHIGHpT(2) 7 I NHIGHpT(3) 1 I
NHIGHpT(4) 0 I NHIGHpT(5) 0
54 Properties, as many as 108 events
9
Opportunities for optimization

Prevent / eliminate unwanted queriesgt query
estimation (fast estimation index)
Read only events qualified for a query from a
file (avoid reading irrelevant events)gt exact
index over all properties
Share files brought into cache by multiple
queriesgt look ahead for files needed and cache
management
Read files from same tape when possiblegt
coordinating file access from tape

10
The Storage Access Coordination System (STACS)
Query estimation / execution requests
Query Estimator (QE)
Bit- Sliced index
Users Application
open, read, close
Caching Policy Module
Query Monitor (QM)
Disk Cache
file purging
file caching
Cache Manager (CM)
file caching request
File Catalog (FC)
11
A typical SQL-like Query
SELECT FROM star_dataset WHERE
500lttotal_trackslt1000 energylt3
-- The index will generate a set of files F6
E4,E17,E44, F13 E6,E8,E32, , F1036
E503,E3112 that the query needs -- The
files can be returned to the application in
any order
12
File Tracking (1)
13
File Tracking (2)
14
File Tracking
query1 start
query2 start
query3 start
All 3 queries
15
Typical Processing Flow
STACS
1 new Query Quick Estimate
2 Execute Full Estimate
3 execute
18 done
whichFileToCache
12 retrieve
5
4 request
6
14 release
FileID ToCache
13
7 stage
11 staged
15 purge
17 purged
16 purge
Local Disk
8 file info
9 File Caching Request
10 File Caching
16
The Storage Access Coordination System (STACS)
Query estimation / execution requests
Query Estimator (QE)
Bit- Sliced index
Users Application
open, read, close
Caching Policy Module
Query Monitor (QM)
Disk Cache
file purging
file caching
Cache Manager (CM)
file caching request
File Catalog (FC)
17
Bit-Sliced Indexused by Query Estimator

Index size
property space
108 events x 100 properties x 4 bytes 40 GB
index requirements
range queries (10 lt Np lt 20) (0.1 lt AVpT lt .2)
number of properties involved is small 3-5
Problem
how to organize property space index

18
indexing over all properties

Multi-dimensional index methods
partitioning MD space (KD-trees, n-QUAD-trees,
...)
for high dimensionality - either fanout or tree
depthtoo large
e.g. symmetric n-QUAD-trees require 2100 fanout
non-symmetric solutions are order dependent

19
Partitioning property space

One possible solution
partition property space into subsets
e.g. 7 dimensions at a time
Performance
good for non-partial range queries (full
hypercube)
bad if only few of the dimensions in
eachpartition are involved in query
S. Berchtold, C. Bohm, H. Kriegel, The
Pyramid-Technique Towards Breaking the Curse of
Dimensionality, SIGMOD 1998
best for non-skewed (random) data
best for full hypercube queries
for partial range (e.g. 3 out 100) close to
sequential scan

20
Bit-Sliced Index

Solution Take advantage that index need to be is
append only
partition each property into bins
(e.g. for 0ltNplt300, have 20 equal size bins)
for each bin generate a bit vector
compress each bit vector (run length encoding)

21
Run Length Compression
Uncompressed 0000000000001111000000000
......0000001000000001111111100000000 ....
000000 Compressed 12, 4, 1000,1,8,1000 Store
very short sequences as-is
Advantage Can perform AND, OR, COUNT
operations on compressed data
22
Bit-Sliced Index
Advantages

Advantages
space for index very small - can fit in memory
Need only touch properties involved in queries
(vertical partitioning)
Need only touch bins involved

min-max
Query Estimation in memory only !!
23
Inner Bins vs. Edge Bins
Edge bin
Edge bin
Range(x)
Range(y)
24
Vertical
Bit-Sliced
Event
partitions
index
list
(on disk)
(in memory)
edge
edge
bin
bin
events
in bin1
events
in bin2
file list,
and events
properties
that qualify
range
in each
conditions
Events
in edge
bin
20-40 GB
50-100 MB
25
Experimental Results on Index

Simulated dataset (hijing)
10 million events
70 properties
property space
BSI 2.5 GB
Oracle 3.5 GB
index size
BSI 280 MB (4 MB/property)
Oracle 7 GB (100 MB/property)
index creation time
BSI 3 hours (2.5 min / property)
Oracle 47 hours (40 min / property)

26
Experimental Results on Index

Run a count query (preliminary)
BSI
1 property 14 - 70 sec (depending on size of
range)
2 properties 90 sec (both about half the range)
gt linear with of bins touched
Oracle
1 property comparable (counts only)
2 properties gt 2 hours !
use one index, loop on table
gt Need to tune Oracle
run analyze on indexes, choose policy
bitmap index - did not help
gt After tuning 12 Min

27
The Storage Access Coordination System (STACS)
Query estimation / execution requests
Query Estimator (QE)
Bit- Sliced index
Users Application
open, read, close
Caching Policy Module
Query Monitor (QM)
Disk Cache
file purging
file caching
Cache Manager (CM)
file caching request
File Catalog (FC)
28
File BundlesMultiple Event Components
29
A typical SQL-like Queryfor Multiple Components
SELECT Vertices, Raw FROM star_dataset WHERE
500lttotal_trackslt1000 energylt3
-- The index will generate a set of bundles
F7, F16 E4,E17,E44, F13, F16 E6,E8,E32,
that the query needs -- The bundles can
be returned to the application in any
order -- Bundle the set of files that need to
be in cache at the same time
30
File Weight Policy for ManagingFile Bundles

File weight (bundle) 1 if it appears in a
bundle, 0
otherwise
Initial file weight SUM (all bundles for each
query) over all queries
Example
query 1 file FK appears is 5 bundles
query 2 file FK appears is 3 bundles Then, IFW
(FK) 8

31
File Weight Policy for ManagingFile Bundles
(contd)

Dynamic file weight the file weight for a file
in a bundle that was processed is decremented by
1
Dynamic Bundle Weight

32
How file weights are usedfor caching and purging

Bundle caching policy
For each query, in turn, cache the bundlewith
the most files in cache
In case of a tie, select the bundle with the
highest weight
Ensures that a bundle that include files needed
by other bundles/queries have priority
File purging policy
No file purging occurs till space is needed
Purge file not in use with smallest weight
Ensures that files needed in other bundles stay
in cache

33
Other policies

Pre-fetching policy
queries can request pre-fetching of
bundlessubject to a limit
Currently, limit set to two bundles
multiple pre-fetching useful for parallel
processing
Query service policy
queries serviced in Round Robin fashion
queries that have all their bundles cachedand
are still processing are skipped

34
Managing the queues
Files
Being
Processed
Query
Bundle
File
Queue
Set
Set
Files
in Cache
35
File Tracking of Bundles
Bundle (3 files) formed, then passed to query
Bundle shared by two queries
Bundle was found in cache
Query 1 starts here
Query 2 starts here
36
Summary

The key to managing bundle caching and purging
policies is weight assignment
caching - based on bundle weight
purging - based on file weight
Other file weight policies are possible
e.g. based on bundle size
e.g. based on tape sharing
Proving which policy is best - a hard problem
can test in real system - expensive, need stand
alone
simulation - too many parameters in query profile
can vary processing time, inter-arrival time,
number of drive, size of cache, etc.
model with a system of queues - hard to model
policies
we are working on last two methods

37
The Storage Access Coordination System (STACS)
Query estimation / execution requests
Query Estimator (QE)
Bit- Sliced index
Users Application
open, read, close
Caching Policy Module
Query Monitor (QM)
Disk Cache
file purging
file caching
Cache Manager (CM)
file caching request
File Catalog (FC)
38
Queuing File Transfers

Number of PFTPs to HPSS are limited
limit set by a parameter - NoPFTP
parameter can be changed dynamically
CM is multi-threaded
issues and monitors multiple PFTPs in parallel
All requests beyond PFTP limit are queued
File Catalog used to provide for each file
HPSS path/file_name
Disk cache path/file_name
File size
tape ID

39
File Queue Management

Goal
minimize tape mounts
still respect the order of requests
do not postpone unpopular tapes forever
File clustering parameter - FCP
If the file at top of queue is in Tapei and FCP
gt 1 (e.g. 5) then up to 4 files from Tapei will
be selected to be transferred next
then, go back to file at top of queue
Parameter can be set dynamically

4
F(Ti)
3
F(Ti)
2
F(Ti)
5
1
F(Ti)
40
File Queue Management

Goal
minimize tape mounts
still respect the order of requests
do not postpone unpopular tapes forever
File clustering parameter - FCP
If the file at top of queue is in Tapei and FCP
gt 1 (e.g. 4) then up to 4 files from Tapei will
be selected to be transferred next
then, go back to file at top of queue
Parameter can be set dynamically

4
F4(Ti)
3
F3(Ti)
2
F2(Ti)
5
1
F1(Ti)
Order of file service
41
File Caching Order fordifferent File Clustering
Parameters
File Clustering Parameter 1
File Clustering Parameter 10
42
Transfer Rate (Tr) Estimates

Need Tr to estimate total time of a query
Tr is average over recent file transfers from
thetime PFTP request is made to the time
transfer completes. This includes
mount time, seek time, read to HPSS
Raid,transfer to local cache over network
For dynamic network speed estimate
check total bytes for all file being
transferredover small intervals (e.g. 15 sec)
calculate moving average over n intervals(e.g.
10 intervals)
Using this, actual time in HPSS can be estimated

43
Dynamic Display of Various Measurements
44
Query Estimate

Given transfer rate Tr.
Given a query for which
X files are in cache
Y files are in the queue
Z files are not scheduled yet
Let s(file_set) be the total byte size of all
files in file_set
If Z 0, then
QuEst s(Y)/Tr
If Z 0, then
QuEst (s(T) q.s(Z))/Trwhere q is the number
of active queries

F4(Y)
T
F3(Y)
F2(Y)
F1(Y)
45
Reason for q.s(Z)
20 Queries of length 20 minutes launched 20
minutes apart
20 Queries of length 20 minutes launched 5
minutes apart
Estimate bad - request accumulate in queue
Estimate pretty close
46
Error Handling