Title: Optimizing EndUser Data Delivery Using Storage Virtualization
1Optimizing End-User Data Delivery Using Storage
Virtualization
- Sudharshan Vazhkudai
- Oak Ridge National Laboratory
- Ohio State University
- Systems Group Seminar
- October 20th, 2006
- Columbus, Ohio
2Outline
- Problem space Client-side caching
- Storage Virtualization
- FreeLoader Desktop Storage Cache
- A Virtual cache Prefix caching
- End on a funny note!!
3Problem Domain
- Data Deluge
- Experimental facilities SNS, LHC (PBs/yr)
- Observatories sky surveys, world-wide telescopes
- Simulations from NLCF end-stations
- Internet archives NIH GenBank (serves 100
gigabases of sequence data) - Typical user access traits on large scientific
data - Download remote datasets using favorite tools
- FTP, GridFTP, hsi, wget
- Shared interest among groups of researchers
- A Bioinformatics group collectively analyze and
visualize a sequence database for a few days
Locality of interest! - Often times, discard original datasets after
interest dissipates
4So, whats the problem with this story?
- Wide-area data movement is full of pitfalls
- Sever bottlenecks, BW/latency fluctuations
- GridFTP-like tuned tools not widely available
- Popular Internet repositories still served
through modest transfer tools! - User applications are often latency intolerant
- e.g., real-time viz rendering of a TerraServer
map from Microsoft on ORNLs tiled display! - Why cant we address this with the current
storage landscape? - Shared storage Limited quotas
- Dedicated storage SAN storage is a non-trivial
expense! (4TB disk array 40K) - Local storage Usually not enough for such large
datasets - Archive in mass storage for future accesses High
latency - Upshot
- Retrieval rates significantly lower than local
I/O or LAN throughput
5Is there a silver lining at all? (Desktop Traits)
- Desktop Capabilities better than ever before
- Space usage to Available storage ratio is
significantly low in academic and industry
settings - Increasing numbers of workstations online most of
the time - At ORNL-CSMD, 600 machines are estimated to be
online at any given time - At NCSU, gt 90 availability of 500 machines
- Well-connected, secure LAN settings
- A high-speed LAN connection can stream data
faster than local disk I/O
6Storage Virtualization?
- Can we use novel storage abstractions to provide
- More storage than locally available
- Better performance than local or remote I/O
- A seamless architecture for accessing and storing
transient data
7Desktop Storage Scavenging as a means to
virtualize I/O access
- FreeLoader
- Imagine Condor for storage
- Harness the collective storage potential of
desktop workstations Harnessing idle CPU cycles - Increased throughput due to striping
- Split large datasets into pieces, Morsels, and
stripe them across desktops - Scientific data trends
- Usually write-once-read-many
- Remote copy held elsewhere
- Primarily sequential accesses
- Data trends LAN-Desktop Traits user access
patterns make collaborative caches using storage
scavenging a viable alternative!
8Old wine in a new bottle?
- Key strategies derived from best practices
across a broad range of storage paradigms - Desktop Storage Scavenging from P2P systems
- Striping, parallel I/O from parallel file systems
- Caching from cooperative Web caching
- And, applied to scientific data management for
- Access locality, aggregating I/O, network
bandwidth and data sharing - Posing new challenges and opportunities
heterogeneity, striping, volatility, donor
impact, cache management and availability
9FreeLoader Environment
10FreeLoader Architecture
- Lightweight UDP
- Scavenger device metadata bitmaps, morsel
organization - Morsel service layer
- Monitoring and Impact control
- Global free space management
- Metadata management
- Soft-state registrations
- Data placement
- Cache management
- Profiling
11Testbed and Experiment setup
- FreeLoader installed in a users HPC setting
- GridFTP access to NFS
- GridFTP access to PVFS
- hsi access to HPSS
- Cold data from tapes
- Hot data from disk caches
- wget access to Internet archive
12Comparing FreeLoader with other storage systems
13Optimizing access to the cache Client
Access-pattern Aware Striping
- Uploading client likely to access more frequently
- So, lets try to optimize data placement for him!
- Overlap network I/O with local I/O
- What is the optimal localremote data ratio?
- Model
14Philosophizing
- What the scavenged storage is not
- Not a file system, not a replacement to high-end
storage - Not intended for wide-area resource integration
- What it is
- Low-cost, best-effort storage cache for
scientific data sources - Intended to facilitate
- Transient access to large, read-only datasets
- Data sharing within administrative domain
- To be used in conjunction with higher-end storage
systems
15Towards a virtual cache
- Scientific data caches typically host complete
datasets - Not always feasible in our environment since
- Desktop workstations can fail or space
contributions can be withdrawn leaving partial
datasets - Not enough space in the cache to host the new
dataset in entirety - Cache evictions can leave partial copies of
datasets - Can we host partial copies of datasets and yet
serve client accesses to the entire dataset? - FileSystem-BufferCacheDisk
FreeLoaderRemoteDataSource
16The Prefix Caching Problem Impedance Matching on
Steroids!!
- HTTP Prefix Caching
- Multimedia, streaming data delivery
- BitTorrent P2P System leechers can download and
yet serve - Benefits
- Bootstrapping the download process
- Store more datasets
- Allows for efficient cache management
- Oh, that scientific data trends again (how
convenient?) - Immutable data, Remote source copy, Primarily
sequential accesses - Challenges
- Clients should be oblivious to dataset being
partially available - Performance hit?
- How much of the prefix of a dataset to cache?
- So, client accesses can progress seamlessly
- Online patching issues
- Client access to remote patching I/O mismatch
- Wide-area download vagaries
17Virtual Cache Architecture
- Capability-based resource aggregation
- Persistent storage BW-only donors
- Client serving parallel get
- Remote patching using URIs
- Better cache management
- Stripe entirely when space available
- When eviction is needed, only stripe a prefix of
the dataset - Victims based on LRU
- Evict chunks from the tail until a prefix
- Entire datasets evicted only after all such tails
are evicted
18Prefix Size Prediction
- Goal Eliminate client perceived delay in data
access - What is an optimal prefix size to hide the cost
of suffix patching? - Prefix size depends on
- Dataset size, S
- In-cache data access rate by the client, Rclient
- Suffix patching rate, Rpatch
- Initial latency in suffix patching, L
- Client access rate indicative of time to patch,
S/Rclient L (S Sprefix)/Rpatch - Thus, Sprefix S(1 Rpatch/Rclient) LRpatch
19Collective Download
- Why?
- Wide-area transfer reasons
- Storage systems and protocols for HEC are tuned
for bulk transfers (GridFTP, HSI) - Wide-area transfer pitfalls high latency,
connection establishment cost - Clients local-area cache access reasons
- Client accesses to the cache use a smaller stripe
size (e.g., 1MB chunks in FreeLoader) - Finer granularity for better client access rates
- Can we derive from collective I/O in parallel I/O
20Collective Download Implementation
- Patching nodes perform bulk, remote I/O 256MB
per request - Reducing multiple authentication costs per
dataset - Automated interactive session with Expect for
single sign on - FreeLoader patching framework instrumented with
Expect - Protocol needs to allow sessions (GridFTP, HSI)
- Need to reconcile the mismatch in client access
stripe size and the bulk, remote I/O request size - Shuffling
- Patching nodes, p, redistribute the downloaded
chunks among themselves according to the clients
striping policy - Redistribution will enable a round-robin client
access - Each patching node redistributes (p 1)/p of the
downloaded data - Shuffling accomplished in memory to motivate
BW-only donors - Thus, client serving, collective download and
shuffling are all overlapped
21Testbed and Experiment setup
- UberFTP stateful client to GridFTP servers at
TeraGrid-PSC and TeraGrid-ORNL - HSI access to HPSS
- Cold data from tapes
- FreeLoader patching framework deployed in this
setting
22Collective Download Performance
23Prefix Size Model Verification
24Impact of Prefix Caching on Cache Hit rate
- Tera-ORNL will see improvements around 0.2 and
0.4 curve (308 and 176 for 20 and 40 prefix
ratio) - Tera-PSC sees up to 76 improvement in hit rate
with 80 prefix ratio
25Let me philosophize again
- Novel storage abstractions as a means to
- Provide performance impedance matching
- Overlap remote I/O, cache I/O and local I/O into
a seamless data pathway - Provide rich resource aggregation models
- Provide a low-cost, best-effort architecture for
transient data - A combination of best practices from parallel
I/O, P2P scavenging, cooperative caching, HTTP
multimedia streaming brought to bear on
scientific data caching
26(No Transcript)
27Let me advertise
- http//www.csm.ornl.gov/vazhkuda/Storage.html
- Email vazhkudaiss_at_ornl.gov
- Collaborator Xiaosong Ma (NCSU)
- Funding DOE ORNL LDRD (Terascale Petascale
initiatives) - Interested in joining our team?
- Full time positions and summer internships
available
28More slides
- Some performance numbers
- Impact studies
29Striping Parameters
30Client-side Filters
31Computation Impact
32Network Activity Test
33Disk-intensive Task
34Impact Control