STACS - PowerPoint PPT Presentation

About This Presentation
Title:

STACS

Description:

Post-processing of data. analyze data (find particles produced, tracks) ... exact index over all properties. Share files brought into cache by multiple queries ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 49
Provided by: arie84
Learn more at: https://dsf.berkeley.edu
Category:
Tags: stacs

less

Transcript and Presenter's Notes

Title: STACS


1
STACS Storage Access Coordination of Tertiary
Storage for High Energy Physics
Applications Arie Shoshani, Alex Sim, John
Wu, Luis Bernardo, Henrik Nordberg, Doron
Rotem Scientific Data Management
Group Computing Science Directorate Lawrence
Berkeley National Laboratory
no longer at LBNL
2
Outline
  • Short High Energy Physics overview (of data
    handling problem)
  • Description of the Storage Coordination System
  • File tracking
  • The Query Estimator (QE)
  • Details of the bit-sliced index
  • The Query Monitor
  • coordination of file bundles
  • The Cache Manager
  • tertiary storage queuing and tape coordination
  • transfer time for query estimation

3
Optimizing Storage Management for High Energy
Physics Applications
Data Volumes for planned HENP experiments
STAR Solenoidal Tracker At RHIC RHIC
Relativistic Heavy Ion Collider
4
Particle Detection Systems
Phenix at RHIC
STAR detector at RHIC
5
Result of Particle Collision (event)
6
Typical Scientific Exploration Process
  • Generate large amounts of raw data
  • large simulations
  • collect from experiments
  • Post-processing of data
  • analyze data (find particles produced, tracks)
  • generate summary data
  • e.g. momentum, no. of pions, transverse energy
  • Number of properties is large (50-100)
  • Analyze data
  • use summary data as guide
  • extract subsets from the large dataset
  • Need to access events based on partialproperties
    specification (range queries)
  • e.g. ((0.1 lt AVpT lt 0.2) (10 lt Np lt 20)) v (N gt
    6000)
  • apply analysis code

7
Size of Data and Access Patterns
  • STAR experiment
  • 108 events over 3 years
  • 1-10 MB per event reconstructed data
  • events organized into 0.1 - 1 GB files
  • 1015 total size
  • 106 files, 30,000 tapes (30 GB tapes)
  • Access patterns
  • Subsets of events are selected by region in
    high-dimensional property space for analysis
  • 10,000 - 50,000 out of total of 108
  • Data is randomly scattered all over the tapes
  • Goal Optimize access from tape systems

8
EXAMPLE OF EVENT PROPERTY VALUES
I event 1 I N(1) 9965 I N(2) 1192 I N(3)
1704 I Npip(1) 2443 I Npip(2) 551 I Npip(3)
426 I Npim(1) 2480 I Npim(2) 541 I Npim(3)
382 I Nkp(1) 229 I Nkp(2) 30 I Nkp(3) 50 I
Nkm(1) 209 I Nkm(2) 23 I Nkm(3) 32 I Np(1)
255 I Np(2) 34
I Np(3) 24 I Npbar(1) 94 I Npbar(2) 12 I
Npbar(3) 24 I NSEC(1) 15607 I NSEC(2) 1342 I
NSECpip(1) 638 I NSECpip(2) 191 I NSECpim(1)
728 I NSECpim(2) 206 I NSECkp(1) 3 I NSECkp(2)
0 I NSECkm(1) 0 I NSECkm(2) 0 I NSECp(1) 524 I
NSECp(2) 244 I NSECpbar(1) 41 I NSECpbar(2) 8
R AVpT(1) 0.325951 R AVpT(2) 0.402098 R
AVpTpip(1) 0.300771 R AVpTpip(2) 0.379093 R
AVpTpim(1) 0.298997 R AVpTpim(2) 0.375859 R
AVpTkp(1) 0.421875 R AVpTkp(2) 0.564385 R
AVpTkm(1) 0.435554 R AVpTkm(2) 0.663398 R
AVpTp(1) 0.651253 R AVpTp(2) 0.777526 R
AVpTpbar(1) 0.399824 R AVpTpbar(2) 0.690237 I
NHIGHpT(1) 205 I NHIGHpT(2) 7 I NHIGHpT(3) 1 I
NHIGHpT(4) 0 I NHIGHpT(5) 0
54 Properties, as many as 108 events
9
Opportunities for optimization
  • Prevent / eliminate unwanted queriesgt query
    estimation (fast estimation index)
  • Read only events qualified for a query from a
    file (avoid reading irrelevant events)gt exact
    index over all properties
  • Share files brought into cache by multiple
    queriesgt look ahead for files needed and cache
    management
  • Read files from same tape when possiblegt
    coordinating file access from tape

10
The Storage Access Coordination System (STACS)
Query estimation / execution requests
Query Estimator (QE)
Bit- Sliced index
Users Application
open, read, close
Caching Policy Module
Query Monitor (QM)
Disk Cache
file purging
file caching
Cache Manager (CM)
file caching request
File Catalog (FC)
11
A typical SQL-like Query
SELECT FROM star_dataset WHERE
500lttotal_trackslt1000 energylt3
-- The index will generate a set of files F6
E4,E17,E44, F13 E6,E8,E32, , F1036
E503,E3112 that the query needs -- The
files can be returned to the application in
any order
12
File Tracking (1)
13
File Tracking (2)
14
File Tracking
query1 start
query2 start
query3 start
All 3 queries
15
Typical Processing Flow
STACS
1 new Query Quick Estimate
2 Execute Full Estimate
3 execute
18 done
whichFileToCache
12 retrieve
5
4 request
6
14 release
FileID ToCache
13
7 stage
11 staged
15 purge
17 purged
16 purge
Local Disk
8 file info
9 File Caching Request
10 File Caching
16
The Storage Access Coordination System (STACS)
Query estimation / execution requests
Query Estimator (QE)
Bit- Sliced index
Users Application
open, read, close
Caching Policy Module
Query Monitor (QM)
Disk Cache
file purging
file caching
Cache Manager (CM)
file caching request
File Catalog (FC)
17
Bit-Sliced Indexused by Query Estimator
  • Index size
  • property space
  • 108 events x 100 properties x 4 bytes 40 GB
  • index requirements
  • range queries (10 lt Np lt 20) (0.1 lt AVpT lt .2)
  • number of properties involved is small 3-5
  • Problem
  • how to organize property space index

18
indexing over all properties
  • Multi-dimensional index methods
  • partitioning MD space (KD-trees, n-QUAD-trees,
    ...)
  • for high dimensionality - either fanout or tree
    depthtoo large
  • e.g. symmetric n-QUAD-trees require 2100 fanout
  • non-symmetric solutions are order dependent

19
Partitioning property space
  • One possible solution
  • partition property space into subsets
  • e.g. 7 dimensions at a time
  • Performance
  • good for non-partial range queries (full
    hypercube)
  • bad if only few of the dimensions in
    eachpartition are involved in query
  • S. Berchtold, C. Bohm, H. Kriegel, The
    Pyramid-Technique Towards Breaking the Curse of
    Dimensionality, SIGMOD 1998
  • best for non-skewed (random) data
  • best for full hypercube queries
  • for partial range (e.g. 3 out 100) close to
    sequential scan

20
Bit-Sliced Index
  • Solution Take advantage that index need to be is
    append only
  • partition each property into bins
  • (e.g. for 0ltNplt300, have 20 equal size bins)
  • for each bin generate a bit vector
  • compress each bit vector (run length encoding)

21
Run Length Compression
Uncompressed 0000000000001111000000000
......0000001000000001111111100000000 ....
000000 Compressed 12, 4, 1000,1,8,1000 Store
very short sequences as-is
Advantage Can perform AND, OR, COUNT
operations on compressed data
22
Bit-Sliced Index
Advantages
  • Advantages
  • space for index very small - can fit in memory
  • Need only touch properties involved in queries
    (vertical partitioning)
  • Need only touch bins involved

min-max
Query Estimation in memory only !!
23
Inner Bins vs. Edge Bins
Edge bin
Edge bin
Range(x)
Range(y)
24
Vertical
Bit-Sliced
Event
partitions
index
list
(on disk)
(in memory)
edge
edge
bin
bin
events
in bin1
events
in bin2
file list,
and events
properties
that qualify
range
in each
conditions
Events
in edge
bin
20-40 GB
50-100 MB
25
Experimental Results on Index
  • Simulated dataset (hijing)
  • 10 million events
  • 70 properties
  • property space
  • BSI 2.5 GB
  • Oracle 3.5 GB
  • index size
  • BSI 280 MB (4 MB/property)
  • Oracle 7 GB (100 MB/property)
  • index creation time
  • BSI 3 hours (2.5 min / property)
  • Oracle 47 hours (40 min / property)

26
Experimental Results on Index
  • Run a count query (preliminary)
  • BSI
  • 1 property 14 - 70 sec (depending on size of
    range)
  • 2 properties 90 sec (both about half the range)
  • gt linear with of bins touched
  • Oracle
  • 1 property comparable (counts only)
  • 2 properties gt 2 hours !
  • use one index, loop on table
  • gt Need to tune Oracle
  • run analyze on indexes, choose policy
  • bitmap index - did not help
  • gt After tuning 12 Min

27
The Storage Access Coordination System (STACS)
Query estimation / execution requests
Query Estimator (QE)
Bit- Sliced index
Users Application
open, read, close
Caching Policy Module
Query Monitor (QM)
Disk Cache
file purging
file caching
Cache Manager (CM)
file caching request
File Catalog (FC)
28
File BundlesMultiple Event Components
29
A typical SQL-like Queryfor Multiple Components
SELECT Vertices, Raw FROM star_dataset WHERE
500lttotal_trackslt1000 energylt3
-- The index will generate a set of bundles
F7, F16 E4,E17,E44, F13, F16 E6,E8,E32,
that the query needs -- The bundles can
be returned to the application in any
order -- Bundle the set of files that need to
be in cache at the same time
30
File Weight Policy for ManagingFile Bundles
  • File weight (bundle) 1 if it appears in a
    bundle, 0
    otherwise
  • Initial file weight SUM (all bundles for each
    query) over all queries
  • Example
  • query 1 file FK appears is 5 bundles
  • query 2 file FK appears is 3 bundles Then, IFW
    (FK) 8

31
File Weight Policy for ManagingFile Bundles
(contd)
  • Dynamic file weight the file weight for a file
    in a bundle that was processed is decremented by
    1
  • Dynamic Bundle Weight

32
How file weights are usedfor caching and purging
  • Bundle caching policy
  • For each query, in turn, cache the bundlewith
    the most files in cache
  • In case of a tie, select the bundle with the
    highest weight
  • Ensures that a bundle that include files needed
    by other bundles/queries have priority
  • File purging policy
  • No file purging occurs till space is needed
  • Purge file not in use with smallest weight
  • Ensures that files needed in other bundles stay
    in cache

33
Other policies
  • Pre-fetching policy
  • queries can request pre-fetching of
    bundlessubject to a limit
  • Currently, limit set to two bundles
  • multiple pre-fetching useful for parallel
    processing
  • Query service policy
  • queries serviced in Round Robin fashion
  • queries that have all their bundles cachedand
    are still processing are skipped

34
Managing the queues
Files
Being
Processed
Query
Bundle
File
Queue
Set
Set
Files
in Cache
35
File Tracking of Bundles
Bundle (3 files) formed, then passed to query
Bundle shared by two queries
Bundle was found in cache
Query 1 starts here
Query 2 starts here
36
Summary
  • The key to managing bundle caching and purging
    policies is weight assignment
  • caching - based on bundle weight
  • purging - based on file weight
  • Other file weight policies are possible
  • e.g. based on bundle size
  • e.g. based on tape sharing
  • Proving which policy is best - a hard problem
  • can test in real system - expensive, need stand
    alone
  • simulation - too many parameters in query profile
    can vary processing time, inter-arrival time,
    number of drive, size of cache, etc.
  • model with a system of queues - hard to model
    policies
  • we are working on last two methods

37
The Storage Access Coordination System (STACS)
Query estimation / execution requests
Query Estimator (QE)
Bit- Sliced index
Users Application
open, read, close
Caching Policy Module
Query Monitor (QM)
Disk Cache
file purging
file caching
Cache Manager (CM)
file caching request
File Catalog (FC)
38
Queuing File Transfers
  • Number of PFTPs to HPSS are limited
  • limit set by a parameter - NoPFTP
  • parameter can be changed dynamically
  • CM is multi-threaded
  • issues and monitors multiple PFTPs in parallel
  • All requests beyond PFTP limit are queued
  • File Catalog used to provide for each file
  • HPSS path/file_name
  • Disk cache path/file_name
  • File size
  • tape ID

39
File Queue Management
  • Goal
  • minimize tape mounts
  • still respect the order of requests
  • do not postpone unpopular tapes forever
  • File clustering parameter - FCP
  • If the file at top of queue is in Tapei and FCP
    gt 1 (e.g. 5) then up to 4 files from Tapei will
    be selected to be transferred next
  • then, go back to file at top of queue
  • Parameter can be set dynamically

4
F(Ti)
3
F(Ti)
2
F(Ti)
5
1
F(Ti)
40
File Queue Management
  • Goal
  • minimize tape mounts
  • still respect the order of requests
  • do not postpone unpopular tapes forever
  • File clustering parameter - FCP
  • If the file at top of queue is in Tapei and FCP
    gt 1 (e.g. 4) then up to 4 files from Tapei will
    be selected to be transferred next
  • then, go back to file at top of queue
  • Parameter can be set dynamically

4
F4(Ti)
3
F3(Ti)
2
F2(Ti)
5
1
F1(Ti)
Order of file service
41
File Caching Order fordifferent File Clustering
Parameters
File Clustering Parameter 1
File Clustering Parameter 10
42
Transfer Rate (Tr) Estimates
  • Need Tr to estimate total time of a query
  • Tr is average over recent file transfers from
    thetime PFTP request is made to the time
    transfer completes. This includes
  • mount time, seek time, read to HPSS
    Raid,transfer to local cache over network
  • For dynamic network speed estimate
  • check total bytes for all file being
    transferredover small intervals (e.g. 15 sec)
  • calculate moving average over n intervals(e.g.
    10 intervals)
  • Using this, actual time in HPSS can be estimated

43
Dynamic Display of Various Measurements
44
Query Estimate
  • Given transfer rate Tr.
  • Given a query for which
  • X files are in cache
  • Y files are in the queue
  • Z files are not scheduled yet
  • Let s(file_set) be the total byte size of all
    files in file_set
  • If Z 0, then
  • QuEst s(Y)/Tr
  • If Z 0, then
  • QuEst (s(T) q.s(Z))/Trwhere q is the number
    of active queries

F4(Y)
T
F3(Y)
F2(Y)
F1(Y)
45
Reason for q.s(Z)
20 Queries of length 20 minutes launched 20
minutes apart
20 Queries of length 20 minutes launched 5
minutes apart
Estimate bad - request accumulate in queue
Estimate pretty close
46
Error Handling
  • 5 generic errors
  • file not found
  • return error to caller
  • limit PFTP reached
  • cant login
  • re-queue request, try later (1-2 min)
  • HPSS error (I/O, device busy)
  • remove part of file from cache, re-queue
  • try n times (e.g. 3), then return
    errortransfer_failed
  • HPSS down
  • re-queue request, try repeatedly till successful
  • respond to File_status request with HPSS_down

47
Summary
  • HPSS Hierarchical Resource Manager (HRM)
  • insulates applications from transient HPSS and
    network errors
  • limits concurrent PFTPs to HPSS
  • manages queue to minimize tape mounts
  • provides file/query time estimates
  • handles errors in a generic way
  • Same API can be used for any MSS, suchas
    Unitree, Enstore, etc.

48
Web pointers
  • http//gizmo.lbl.gov/stacs
  • http//gizmo.lbl.gov/arie/download.papers.html
    -- to download papers
  • http//gizmo.lbl.gov/stacs/stacs.slides/index.htm
    -- a STACS presentation
Write a Comment
User Comments (0)
About PowerShow.com