Title: BNL Batch System
1 - BNL Batch System
- David Yu
- Brookhaven National Laboratory
2Sourcing Data from HPSS
- Files are sunken into tape cartridges
sequentially. - However, file retrieving requests are accessing
tapes randomly 24 x 7. - Large number of retrieval requests, large
quantity of cartridges, but limited tape drives. - Extremely high tape mount/dismount rate.
- We need to guarantee that we have the necessary
tape drives available for migration. How can we
guarantee the drives are available for migration?
3Sourcing Data from HPSS
- Oak Ridge Batch is one of the HPSS applications
that attempts to provide resource control for - Staging multiple files by sorting requests by
tape. - Capability of guarantee the drive availability
for migration. -
- Oak Ridge Batch System has been modified to fit
BNL's requirements.
4Oak Ridge Batch
- Like HSI, Oak Ridge Batch aggregates files by
tape id and then stages files in batch, which
improves the reading performance by reducing tape
mounts and dismounts. -
- Not available in HSI Resource Control - logical
resource management that throttles tape drive use
for reading, thus guaranteeing that drives are
available for migration. -
5BNLs Environment
- 3 Experiments Atlas, Phenix and Star
- Peak stage rate in last 5 months
- Atlas 12759 files, 370 tapes (July 14)
- Phenix 12991 files, 580 tapes (Aug 9)
- Star 20530 files, 302 tapes (June 6)
- Average stage 7 TB / day
6BNLs Environment
- Since the file stage requests are randomly
accessing different tapes, real time access like
pftp is not appropriated when there are so many
requests but limited tape drives. HSI does not
provide the capability to control number of
drives. - Due to the nature of tape storage, we need to
stage file in batch mode. - Need the resource control for guaranteeing the
drive availability for migration. - Oak Ridge Batch System was introduced to BNL for
resource management a few years ago.
7BNLs Environment
- Data sourcing activities are 24x7, randomly out
of 38 million files. Hundreds of requests per
hour. - BNLs HPSS system receives data from 5 different
sources (7 projects). - BNL has 7 instances of Batch.
- 7 instances are distributed on 2 servers.
- For resource allocation purpose, some instances
may be using multiple PVRs and some are sharing
PVRs.
8BNLs Environment
- Multiple instances and multiple PVRs
Batch instance 1
Batch instance 2
Batch instance 3
PVR 1 9940B
PVR 2 LTO-3
PVR 3 LTO-4
9BNLs Requirements
- Stability
- Enhance error handling
- Enhance performance
- Support multiple PVRs
- Tape drive resource control, management, and
statistic for performance tuning. - Enhance monitoring tools
- Need Dynamic Configuration without shutting down
the process.
10BNL Batchs New Features
- Supports multiple PVRs. Each PVR is being
handled independently by a dedicated thread.
Each thread is throttling the number of drives
and number of requests for a PVR. - Added web-based monitoring tools.
- All instances are monitored and managed by
web-based GUI in real time.. - All transactions are stored in a database, for
- Tracking purpose (file lost? why? how?).
- Performance analyzing to learn from historical
data and fine tune the configuration. - Performance analyzing to compare to other Batch
instances and see why the other user can stage
more files.
11BNL Batchs New Features
- Configurations can be modified dynamically.
Debug level, error auto-retry, priority, number
of drives and number of files to stage at the
same time. PVR lock/unlock from user level or PVR
level. - Added some web-based general historical
performance lookup tools. - Stage priority is optional FIFO (default), or
high-demand (sorted by number of requests on
tape). - Optional timeout value can be specified in
request.
12BNL Batch Monitoring Tools
13BNL Batch Queue View
- Due to the nature of the usage, instance A has
requests from all around the world, and the
chance to have multiple users to access the same
tape at the same timeframe is very low. - In this example, 157 files are located on 124
tapes. That means it will require 124 tape mounts
and dismounts. - Average 14 mounts / hour / drive.
14BNL Batch Queue View
- In Instance Ss case, 1793 files are on 32
tapes. That means it will only take 32 tape
mounts / dismounts to complete the stage. - Average 4 mounts / hour / drive.
15BNL Batch Resource Management
- Tape drives are used for both Read and Write.
- Also a PVR may be shared by multiple Batch
instances. -
- In the following example, PVR Star Raw LTO-3
has a total of 14 drives, but only 7 are
allocated to instance Starrdat. - We can always adjust the drive allocation when
necessary - To allow more drives for migration
- To allow other users to use more drives
16BNL Batch Resource Management
BNL Batch Resource Management is not in control
of resource allocation. Full control is achieved
only when there is no direct user access to HPSS.
Tape drive allocation can be adjusted manually
from Web GUI, or automatically by scripts.
- When a disk free space is reaching the
threshold, our monitoring scripts will
automatically reduce the drive allocation for the
PVR using the disk. This will slow down the
speed of disk usage growth, and free up more
drives for HPSS to process migration.
17BNL Batch Resource Management
- When we need to lock one entire PVR for
maintenance, staging activities for other PVRs
should not be affected. BNL Batch allows you to
suspend a PVR (but continue to queue up requests)
without affecting other PVRs staging activities. - We can also suspend all staging activities by a
single button. -
18BNL Batch Historical Data Search
- BNL Batch allows you to view/search the status
of a request. - This feature will generate a report based on
your interest. -
- You may use a cron job to download the report
for each instance and email it to the subscribed
users. - For performance reason, this feature is
accessing the historical database (AKA
Secondary DB).
19Architecture Overview
- Requirement, Design and Implementation
- Use well established 3rd party tools to save
development time. - Use MySQL as the central repository.
- Each Batch is a MySQL client, which updates
status to MySQL from time to time. The Batch
also pulls message from MySQL periodically, and
processes the message accordingly.
20Architecture Overview
PVRA
PVRB
PVR A Tape 1012 Tape 1041 Tape 1338
PVRC
PVR B Tape 3847 Tape 3784 Tape 3348
HPSS
PVRD
PVRE
Batch Instance 1
PVRF
PVRG
hpss get file attribute
Tape 3348
File1.txt
DB2
Requests File1.txt
21Architecture Overview
PVRA
PVRB
PVR A Tape 1012 Tape 1041 Tape 1338
PVRC
Staging 2 tapes
HPSS
PVRD
PVRE
PVR B Tape 3847 Tape 3784 Tape 3348 Tape
3924 Tape 3923
Batch Instance 1
Staging 5 tapes
PVRF
PVRG
DB2
Staging 2 tapes
PVR D Tape 5013 Tape 5243 Tape 5345
22Architecture Overview
BNL Batch Server
HPSS
PVR A Tape 1012 Tape 1041 Tape 1338
PVRA
Staging 2 tapes
Batch Instance 1
File1.txt staged
Destination
Notification script
Delivery script PFTP
HPSS Disk File1.txt
File1.txt
23Architecture Overview
Web GUI browser
Batch instance 1
Web GUI browser
Monitoring
Heartbeat
MySQL Production
Web GUI browser
CFG update
Shutdown
Batch instance 2
Changing configuration
Hourly synchronization
Batch instance 3
Web GUI browser
Searching
MySQL Historical Database
Web GUI browser
Hourly performance
MySQL scripts
Daily Report
Batch instance 7
Daily Report
Daily Report
Cron job
24BNLs Experience
- The stage activity statistics has been used
heavily for fine tuning performance. - We proved to our users that small files and
random file access are performance killers. - When a disk is getting full, we no longer need to
kill Batch process. - When we need to bring down a PVR for maintenance,
we no longer have to kill all Batch processes
using that PVR.
25BNLs Experience
- When a file is lost, we always have the ability
to track the complete history of this request. - We also have scripts that keep monitoring the
errors in Batch and send alert back to system
admins when the situation is critical.
26BNL Experience?
- BNL has such a high demand of fully utilizing
HPSS system in both sinking and staging files, 24
x 7. - We use BNL Batch to throttle the tape drives by
limiting number of stages. Thus to guarantee a
number of the unused drives available for HPSS
migration. - How do other sites satisfy this requirement? Any
suggestions?
27Questions?
28Thank you!