Title: First Look at the New NFSv4.1 Based dCache
1First Look at the New NFSv4.1 Based dCache
- Art Kreymer, Stephan Lammel, Margaret Votava, and
Michael Wang for the CD-REX Department - CD Scientific Computing Facilities Leaders
Meeting - 13 December 2011
2Introduction
- Investigate alternatives to the BlueArc-based IF
central disk servers - BA has performed fairly well for the, currently,
relatively modest requirements of the IF
experiments (0.5 PB) - Will it continue to satisfy the growing
requirements of the IF experiments in the years
to come in a reasonable, cost-effective way?
- Started surveying storage options available
- NFSv4.1 all the rage among the major storage
vendors (Panasas, IBM, EMC, NetApp, even BlueArc) - Despite all the hype, no stable server
implementation readily available for evaluation
- Stumbled upon presentation on the web by DESY
dCache team - Described a stable NFSv4.1 implementation in a
new Chimera-based version of dCache - All the nice features of the old dCache PLUS all
files in exported filesystem tree now directly
accessible (POSIX compliant) without special
protocols (like DCAP) ! - i.e. dCache filesystem can now appear behave
like a regular nfs accessible area on a worker
node
3Introduction
- Approached our local dCache experts
- REX and DMS meeting where DMS gave overview of
the new dCache . - DMS department set up a test dCache system
(version 1.9.12) for us to evaluate (many thanks
to Dmitry Litvintsev, Yujun Wu, Terry Jones, Stan
Naymola and Gene Oleynik from DMS for their
support).
- Brief overview of talk
- Description of test setup
- Present some initial test results
- Focus is on technical I/O performance
- no discussion on other nice features of NFSv4.1
(e.g. ACLs) - no cost comparisons and studies (relative to BA)
4Test setup
- Client side
- SLF6 Virtual Machines on Fermicloud (many thanks
to Steve Timm and Farooq Lowe of Fermigrid Dept.) - Linux 2.6.40 kernel (a renamed 3.0 kernel)
- Server side
- dCache 1.9.12 with one head node, two pool nodes
- Each pool node has 2 RAID6 partitions with
4x250GB SATA drives each
5Throughput test results
IOzone in cluster mode with sequential write and
read test.
Increased number of clients beyond 10
(multiple clients per VM) but aggregate data
transferred Is fixed to 40GB.
One 4GB file transferred per client. One client
per VM.
6Monitoring pool node disk activity during IOzone
test
Pool node 2 Partition B
Pool node 2 Partition A
Pool node 1 Partition B
Pool node 1 Partition A
Strip chart recording (x-axis, y-axis) (time,
MB/sec)
Disk write rate
Disk read rate
7Metadata test results
Mdtest with multiple MPI tasks, each
creating/stat-ing/removing 100 directories
100 zero-length files.
8Conclusion
- Presented some preliminary test results on the
new NFSv4.1 Chimera-based version of dCache - Results look promising, throughput scales well
with number of pool nodes - Metadata performance may be adequate for now but
may be a cause for concern in the future (need to
consult and discuss with the developers) - Will do more real-world tests, e.g. with Art
Kreymers BlueArc performance monitoring scripts - More details can be found in a write-up in CD
DocDB (CS-doc-4583) - http//cd-docdb.fnal.gov/cgi-bin/ShowDocument?doc
id4583 - Details on setting up VM clients with
pNFS-enabled Linux kernels available on Fermi
Redmine IF-storage project Wiki - https//cdcvs.fnal.gov/redmine/projects/if-storag
e/wiki - Many thanks to DMS and Fermigrid Depts. for their
unwavering support!
9End
10Monitoring pool node disk activity
Pool node 2 Partition B
Pool node 2 Partition A
Pool node 1 Partition B
Pool node 1 Partition A