Computer Hardware and Procurement at CERN - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Computer Hardware and Procurement at CERN

Description:

Tenders include 3 years on-site warranty for hardware. Typical requirements: ... Hardware being procured (1) Large volumes several times 750 kCHF per year ' ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 32
Provided by: slacSt
Category:

less

Transcript and Presenter's Notes

Title: Computer Hardware and Procurement at CERN


1
Computer Hardware and Procurement at CERN
  • Helge Meinhard (at) cern ch
  • HEPiX fall 2005 _at_ SLAC

2
Outline
  • Procedures
  • Hardware (being) procured
  • Power measurements
  • Observations

3
Procedures
4
Constraints (1)
  • CERN is an international organisation with strict
    administrative rules
  • Competitive tendering required covering (at
    least) member states
  • No way to avoid for commodity equipment
  • Lowest compliant bid wins
  • No negotiations about added value of higher offers

5
Constraints (2)
  • Different procedures depending on expected volume
  • service. Four weeks response time
  • market survey. Six weeks response time
  • 750000 CHF As CERNs Finance Committee (5 sessions/year, papers
    ready two months in advance)
  • (1 CHF 0.78 USD 0.65 EUR)

6
Our problems
  • Procedures badly adapted to quickly evolving
    computing market
  • Difficult to give preference to good, reliable
    equipment

7
Our choices (1)
  • For significant purchases ( 100 kCHF) we require
    (a) sample system(s)
  • with the tender for big tenders
  • on CERNs request for small tenders
  • Tenders include 3 years on-site warranty for
    hardware
  • Typical requirements
  • 4 working hours response / 12 working hours
    repair for critical machines
  • 3 working days response / 5 working days repair
    for farm nodes
  • Supplier can subcontract on-site warranty

8
Our choices (2)
  • Payment within 30 days after provisional
    acceptance on receipt of bank guarantee of 5 of
    purchase sum valid until end of warranty period
  • Delivery within 6 weeks, penalty for late
    delivery 2 of purchase sum per complete week,
    max. 10

9
Our choices (3)
  • If more than 10 systems fail during acceptance
    or during first month after right to return the
    whole batch
  • If a system fails 3 or more times during any 6
    months period, right to request complete
    replacement of system
  • If more than 20 of any component fail during any
    6 months period, right to request complete
    replacement of this component across batch
  • If CERN adds third-party devices, no impact on
    warranty obligations for system as delivered

10
Our choices (4)
  • If justified by volume, procure from two
    suppliers (lowest and second-lowest compliant)
  • Better protection if one delivers crap or nothing
    at all
  • Better chance for companies to win an order
  • Increased workload on our part

11
Example of a procurement
  • Procurement of equipment worth
  • Approval by Finance Committee not needed
  • Market survey already done
  • Market survey can cover different types of
    equipment
  • Valid for 1 year
  • If not done yet, add 16 weeks

12
Steps (1)
  • Fix scope 2 w
  • Write technical, commercial docs 3 w
  • IT-internal review
  • Revise technical, commercial docs 2 w
  • Specification meeting
  • Revise technical, commercial docs 1 w
  • Tender out
  • Deadline for replies 6 w
  • Opening of replies 1 w
  • (Total so far 15 weeks, at best compressible to
    12 weeks)

Typical case
13
Steps (2)
  • (Total from previous slide 15 w, min. 12 w)
  • Technical analysis of replies 1 w
  • Visual inspection, mounting 1 w
  • Benchmarks, reports 3 w
  • Technical clarifications 1 w
  • Purchase request, order 2 w
  • Delivery 7 w
  • Preliminary acceptance 6 w
  • Total 36 weeks, compressible to 30 weeks

Typical case
14
Hardware (being) procured
15
Objectives
  • Cover existing needs with as few different models
    and as few procurement procedures as possible
  • Closely follow technology and market evolution
    and satisfy requirements with modern hardware at
    low cost

contradiction
16
Fabric Infrastructure and Operations (1)
From CERN site report 2005/10/11
  • RedHat 7.3 phased out on public services
  • Campaign on storage nodes far advanced
  • New in machine room since Karlsruhe
  • 200 farm PCs (dual Nocona) in production
  • 116 disk servers ( 5 TB usable each, total of
    900 TB gross capacity) part in production, part
    under acceptance test
  • 112 midrange servers under acceptance test
  • 32-node Infiniband-based cluster for Theory
  • Refurbishment of machine room proceeding
  • LHS being populated, but power remains limited

Talk
17
Hardware being procured (1)
  • Large volumes several times
  • Farm PCs non-redundant, cheap dual-processor
    work horses
  • Disk servers storage-in-a-box systems with
    many SATA disks for streaming applications

18
Hardware being procured (2)
  • Medium-size volumes once once or several times
  • Midrange servers redundant building blocks
    for specific applications
  • Tape servers midrange servers with an FC
    interface
  • Disk arrays autonomous RAID units with FC
    uplinks
  • SAN infrastructure (most notably FC switches)
  • Head nodes for serial console infrastructure
  • Small disk servers, somewhere between disk
    servers and midrange servers
  • Miscellaneous

19
Specifications Farm PCs (1)
  • 2 boxed Intel Noconas of 2.8 GHz
  • Mainboard
  • BMC (IPMI 1.5 or higher)
  • PXE, USB boot
  • BBS menu
  • Console redirection
  • Configurable to stay off on AC power loss
  • 2 GB ECC memory
  • From mainboard manuf. approved list
  • Upgradable to 4 GB without removing modules

20
Specifications Farm PCs (2)
  • 1 disk 140 GB, IDE not permitted
  • Certified for 24/7, 3 y warranty by disk manuf.
  • 1 GigE providing PXE and IPMI access
  • 19 chassis max. 4 U, with rails
  • Power, reset button
  • Power, disk activity LED
  • Power supply supporting machine 50 W
  • Active PFC
  • C13 to C14 LSZH power cord
  • Guaranteed to run under RHEL 3 (i386 and x86_64)
  • Delivery within 6 weeks from dispatch of order

21
Specifications Disk server (1)
  • 1 or 2 boxed Intel Xeon with EM64T
  • Mainboard as for Farm PCs
  • Now adding support for memory mirroring
  • Memory as for Farm PCs
  • General requirements for disks etc.
  • 7200 rpm, no EIDE, 3 y warranty, certified for
    24/7 by manufacturer
  • Metallic hot-swap trays certified by chassis
    manuf.
  • Indicators for power and activity for each tray
  • PCB backplanes for disks, multilane cabling
  • Intelligent RAID controllers

22
Specifications Disk server (2)
  • System disks 2 x 140 GB mirrored
  • Data disks all identical
  • Redundant RAIDs with hot spares (min. 1/15)
  • Total usable capacity per system above 5 TB
  • Battery buffer if controller with active cache
  • 1 GigE providing required performance, PXE, IPMI
    access
  • 19 chassis rack-mountable with rails
  • Min. 40 TB usable in 42 U high rack
  • Power supply N1 redundant, active PFC
  • Guaranteed to run under RHEL 3 (i386 and x86_64)
  • Delivery within 6 weeks from dispatch of order

23
Specifications Disk server (3)
  • Performance memory to disk iozone with 16 GB
    files and 256 kb record size
  • Single stream 40 MB/s write, 40 MB/s read
  • Multi-stream (at least 10) 115 MB/s write, 170
    MB/s read ()
  • Memory to network iperf
  • Single stream 100 MB/s write, 100 MB/s read
  • Two streams 110 MB/s write, 110 MB/s read
  • Two streams in, two streams out 145 MB/s

24
Specifications Disk server (4)
  • Global (disk to network) performance At least 10
    clients transferring 2 GB files via rfio
  • Reading from system 95 MB/s ()
  • Writing to system 90 MB/s ()
  • () Requirements scale linearly with usable
    capacity, numbers for 5000 GB usable

25
Power measurements
  • Done by
  • Andras Horvath, CERN

26
Power measurements
http//ahorvath.home.cern.ch/ahorvath/power
27
Observations
28
Observations (1)
  • Profile of winning companies
  • Tier-1 suppliers competing with large integrators
  • Small round the corner companies eliminated at
    Market Survey stage
  • Almost always the integrators win
  • Specially tailored solutions responding to our
    specifications
  • Prices of Tier-1s rather high in Europe

29
Observations (2)
  • Stress test as (important) part of the acceptance
    test
  • Introduced 2 years ago (triggered by
    presentations from SLAC and FNAL at HEPiX)
  • Very useful
  • Based on va-ctcs
  • No longer sufficiently actively maintained
  • Large number of false positives
  • Looking for a replacement

30
Observations (3)
  • Pushing these procedures through requires
    dedicated (and knowledgeable) person power
  • Not obvious to run multiple procedures in
    parallel
  • In particular, if things go wrong, e.g. stress
    test fails

31
Summary
  • Computer hardware procurement is an excellent
    experimental confirmation of two fundamental laws
    of human nature
  • Murphy Everything that can go wrong will go
    wrong.
  • Hoffstaedter Things always take longer than you
    think, even if you take into account
    Hoffstaedters law.
Write a Comment
User Comments (0)
About PowerShow.com