Title: SAM: Past, Present, and Future
1SAM Past, Present, and Future
- Lee Lueking
- All Dzero Meeting
- November 2, 2001
2SAM Past, Present, and Future
- Part I Past and Present
- Stats users,groups,datasets,projects,files. How
is the system being utilized? - Cache and job management How do the caching and
fair share mechanisms work? - Central analysis groups and queues.
- Tape access What are encp stats for last month?
Tapes, good, bad and recoverable. - Remote sites data forwarding from remote MC
processing centers - Part II Future (post shutdown)
- New tape facilities
- SAM on Farm and ClueD0
- Storing user/group data into sam
- Delivering data to remote sites
- Problems and concerns
3Part I Past and Present
4SAM Usage Statistics
- 428 registered SAM users in production
- 283 of them have at some time run at least
one SAM project - 267 of them have run a SAM project at some
time in the past year - 181 of them have run a SAM project in the
past 2 months - 222 registered nodes
- 150,847 cached files on disk somewhere
- 146,908 of them on d0mino
- 1299 on d0lxac1
- 2301 on a clued0 node
- 337 on imperial college test machine in the UK
- 503 on linux build machine
- 281,066 data files known to SAM
- 43,534 raw files (all stored on tape)
- 78,463 reconstructed files (76,305 of them
actually stored) - 19,700 root-tuple files
5Active Stations
6Central-analysis Cache
- All groups currently use Least Recently Used
replacement algorithm - Files can migrate from one groups cache to
another if used frequently by other group. - Currently, caches are large and there is little
turn over.
7Central-analysis Cache Turn over
8Resource Management Approaches
- Fair Sharing (policies)
- Allocation of resources and scheduling of jobs
- The goal is to ensure that, in a busy
environment, each group gets a fixed share of
resources or gets a fixed share of work done - Co-allocation and reservation (optimization)
9Fair Share and Computational Economy
- Jobs, when executed, incur costs (through
resource utilization) and realize benefits
(through getting work done) - Maintain a tuple (vector) of cumulative
costs/benefits for each group and compare them to
its allocated fair share to set priority
higher/lower - Incorporate all known resource types and benefit
metrics, totally flexible. Examplestape mounts,
tape reads, network, cache, CPU, and memory.
10Job Control Station Integration with the
Abstract Batch System
1.user sam submit
Job Manager (Project Master)
2.submit to SM
Local RM (Station Master)
3.invoke
Client
jobEnd
4.submit To BS
7.Started
5.Sam condition satisfied
9.setJobCount/stop
Process Manager (SAM wrapper script)
Batch System
User Task
6.dispatch
8.invoke
10.resubmit
- Fair Share Job Scheduling
- Resource Co-allocation
11Forwarding Caching Global Replication
Fermilab
D0robot
Mass Storage System
Sara
Station
NIKHEF (Amsterdam) 155 Mbps
Site
Replica
WAN
Data flow
12Enstore Statistics Delivery
- Start Date "10/22/01 000000" End Date
"10/29/01 000000" - Delivered Files 938 Total
- Delivered Bytes 268.82 GB
- Average File Size 293.47 /- 107.66 MB
- Average Delivery Time 718.20 /- 1017.20 s
- Average Queue Wait Time 611.32 /- 947.45 s
- Average Mount Time 3.25 /- 13.45 s
- Average Seek Time 24.07 /- 42.30 s
- Average Transfer Time 42.78 /- 84.48 s
- Average Transfer Rate 9.10 /- 2.24 MB/s
- File Delivery Error Statistics Total Errors 856
- "USERERROR" Errors 72 (8.41 of Total Errors)
- "NOACCESS" Errors 675 (78.86 of Total Errors)
- "NOTALLOWED" Errors 109 (12.73 of Total Errors)
13Enstore Statistics Store
- Start Date "10/22/01 000000" End Date
"10/29/01 000000" - File Store Success Statistics Stored Files 1622
- Total Stored Bytes 514.27 GB
- Average File Size 324.67 /- 231.03 MB
- Average Delivery Time 208.71 /- 273.28 s
- Average Queue Wait Time 53.89 /- 174.69 s
- Average Mount Time 8.34 /- 18.85 s
- Average Seek Time 34.98 /- 52.26 s
- Average Transfer Time 82.50 /- 154.56 s
- Average Transfer Rate 4.37 /- 2.11 MB/s
- File Store Error Statistics Total Errors 4
- "USERERROR" Errors 3 (75.00 of Total Errors)
- "EEXIST" Errors 1 (25.00 of Total Errors)
14Current Tape Storage Summary
- 45 TB on tape
- Total of 1362 volumes altogether
- Currently there are 18 noaccess volumes
- 80 notallowed
15Part II The Future (post shutdown)
16New Tape Facilities
- STK 9940 Drives
- Very reliable (no problems in 30 TB)
- 60 GB cartridge
- Share STK PowderHorn silo with other lab
customers - have 6-7 x 9940 drives for us.
- 1000 tape slots
- In March, Move to our own PowderHorn
- Space in FCC now being prepared
- Robot already here
- Deploy and test starting Jan-Feb.
- Dzero STK PowderHorn silo
- have 9 x 9940 drives now, up to 20 drives.
- 5500 tape slots total.
17Use Existing AML/2 for MC
- Replacing M2 drives with LTO.
- 100 GB cartridge
- Have 6 drives, expand to 10 later.
- Very Reliable in tests so far (1 problem in 30
TB) - Plan to use for all MC and some Group data
18SAM Distributed Cache
19Case StudyDistributed Reconstruction Farm
- 90 dual processor Linux nodes (growing)
- 30 GB disk each
- 100 Mbit ethernet NICs on workers
- D0bbin is 4 processor SGI O2000, Gigabit NIC
Enstore Mass Storage
D0bbin Farm Server
LAN
Worker N
Worker 1
Worker 2
Worker 3
No disks are cross mounted. Worker nodes get
files directly from MSS via encp. Data is moved
by SAM using rcp from where it is cached to where
it is needed.
20Case StudyDistributed Analysis Cluster ClueD0
Mass Storage
- ClueD0-ripon node has 640 GB SAM cache disk
- 100 linux desktop nodes have 4-5TB distributed
SAM cache - 5 nodes in SAM mode now
Clued0-ripon (file server node)
Desktop 100
Desktop 1
Desktop 2
Desktop 3
All (tape) data enters the ClueD0 station through
the main file server node ClueD0-ripon. The
station migrates data as needed and manages the
cache distributed among the many desktop
constituents.
21Storing Group Data in SAM
- Each group will have tapes allocated for specific
tiers of data gen, d0gstar, d0sim,
reconstructed, root-tuples, others. - Each group will have a tape allocation limit
- Group data will be added with special tier
designation -bygroup to distinguish it form
farm and other production data. - Document describing details available under sam
documentation Storing Group Data into SAM. - Groups set up so far include top, higgs, and
tauid.
22Routing Caching Global Replication
Mass Storage System
Station
Site
Replica
WAN
Data flow
23Issues
- Tape problems should be under control
- CORBA naming server has caused problems in past.
We are testing a new naming service with
persistency that should resolve this. Plan to
deploy this month. - Some queries have caused the system to jam. We
have split user db server away from the dbserver
for the stations. Looking into how to deal with
long (usually event picking) queries. - User support is sometimes slower than people
like - We are training many Dzero volunteers to help
- Lauri is available at Dzero every Wednesday on
DAB5 (my office). She has not been overwhelmed by
walk-ins.
24Conclusion
- Sam is heavily used by D0
- The Cache management and Fair share resource
allocations are designed to help control the use
of resources in the system. - SAM provides easy storage of data for on-site and
off-site production customers. - In spite of many tape problems, the Ensore system
has been storing and serving lots of data.
25Conclusion (2)
- The new Tape and Robot technologies will make the
tape-based data storage and access extremely
reliable. - SAM provides a framework within which to operate
distributed processing and analysis clusters.
These will be very important in the future. - We are ready to store group data into the system
on a regular basis. - Delivery of data to remote stations from robot
stores is coming. - We have addressed, and continue to address many
issues to make the system serve Dzero better than
ever.