Hepmark - PowerPoint PPT Presentation

About This Presentation
Title:

Hepmark

Description:

... High-K dielectric, Active Power Managements, Clock throttling An HEP data center Need to make measurement of Power usage for HEP application Example: ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 37
Provided by: michel458
Category:

less

Transcript and Presenter's Notes

Title: Hepmark


1
Hepmark
  • Valutazione della potenza dei nodi di calcolo
    nella HEP
  • Michele Michelotto
  • Padova Ferrara Bologna

2
Modello di computing
Centri di vario livello
3
Esigenze di Computing
  • Tape and disk Storage
  • Very Easy events ? Terabyte
  • Disk Storage
  • Easy again events ? Terabyte
  • (1000x1000 or 1024x1024?)
  • RAID protected or raw size?
  • Computing Power
  • Tricky Event/sec? Sim or Reco?
  • MIPS, CernUnit, MHz, Spec, SI2K.

4
  • SI2K is the benchmark used up to now to measure
    the computing power of all the
  • HEP experiments
  • Computing power requested by experiment
  • Computing power provided by a Tier-0,1,2
  • SI2K is the nickname for
  • SPEC CPU Int 2000 benchmark
  • Came after Spec89, Spec Int 92 and Spec Int 95
  • Declared obsolete by SPEC in 2006
  • Replaced by SPEC with CPU Int 2006

5
T1 T2 cpu budget - LHC
SI2K
Misura in k
6
The SI2K inflaction
  • The main problems with SI2000 in our community
    it is not proportional to HEP codes performance
    (as it was)
  • You can buy processors with huge SI2K number but
    with a smaller increase in real performances
  • SI2K results for the last generation processor
    affected by inflation

7
Nominal SI vs real SI
  • So CERN (and FZK) started to use a new currency
    SI2K measured with gcc, the gnu C compiler and
    using two flavour of optimization
  • High tuning gcc O3 funroll-loopsmarchARCH
  • Low tuning gcc O2 fPIC pthread

8
  • CERN Proposal Use as site rating the Real SI
    obtained by SI measured with gcc-low and
    increased by 50
  • Actually this make sense only for a short period
    of time and for the last generation of processor
  • Run n copies in parallel
  • Where n is the number of cores in the worker node
  • To take in account the drop in performance of a
    multicore machine when fully loaded.

9
Too many SI2K
esempio
  • Take as an example a worker node with two Intel
    Woodcrest dual core 5160 at 3.06 GHz
  • SI2K nominal 2929 3089 (min max)
  • SI2K sum on 4 cores 11716 - 12536
  • SI2K gcc-low 5523
  • SI2K gcc-high 7034
  • SI2K gcc-low 50 8284
  • The goal is to find a commercial mantained
    benchmark to replace SI2K

10
  • Cache importance of the cache architecture
  • 1st level, 2nd level, 3rd level, cache latency
    (tempo), Cache bandwidth (vel trasfer), shared or
    exclusive?
  • Access time to memory
  • Power consumption
  • Example a big Tier2 with 500 boxes needs
    100kW
  • About 800 MWh in one year
  • Energy cost 0.12 Euro per kWh ? Energy bills of
    100 kEuro/year
  • A 10 improvement on Power efficiency
  • means 10 k/year savings
  • And savings on the infrastructure (power
    distribution, UPS, Cooling)

11
Many gaps
  • Difficult to measure
  • Not easy to have machine on loan from Server
    reseller or producer
  • Not easy to borrow machine from colleagues
  • Always for short periods of time
  • A SPEC run can last 15-20 hours
  • Need a set of dedicated worker node
  • to make SPEC and HEP
  • application measurement

12
  • Padova Michele Michelotto (1 Tecn.) 0.70,
    Matteo Menguzzato (Univ) 0.40
  • Ferrara Alberto Gianoli (1 Tecn.) 0.20
  • Bologna Franco Brasolin (CTER) 0.20
    TOT FTE 1.5
  • Milestone
  • 2009 Undestand SPEC 2006. Propose a new
    benchmark to replace SI2K
  • Measure the performance of the current
    architectures for Montecarlo SIM (evt/sec vs
    SPEC)
  • 2009/2010 Power performances, Cache profiling

int estero consumo inventario
TOTALI FE 1.00 1.00
PD 1.00 2.00 3.00 16.00 22.00
Totali 2.00 2.00 3.00 16.00 23.00
2010 1.00 3.00 2.00 16.00 22.00
13
Mem intel vs amd
  • Who is faster?
  • It depends on the block size
  • On the red zones Intel is better.
  • On the green zone AMD is better

14
Cache behaviour
  • 54xx has lower latency even with bigger cache
  • The 3 processors behave very differently in the
    4MB e 64MB range
  • If your (HEP) application works in this range you
    will see a big change of performance changing
    processor

15
CMS sw SIM and Pythia
  • CMS Montecarlo simulation (32bit) and Pythia
    (64bit) show the same performance once normalized
  • Both Specint 2006 pubblished and Specint 2006
    with gcc show the same behaviour
  • SI2K pubbished does not match HEP sw
  • SI2K cern better but not as good as SI2006

16
Babar TierA Results
  • If you normalize by core and clock all new
    processors have the same performance
  • Doubling the older generation cpu
  • SI2006 matches this pattern (pubblished and gcc
    ratio constant)
  • SI2000-cern better than SI2K nominal
  • SI2000 clearly doesnt work

17
4 core processor
18
Intel 54xx
19
AMD 4core
20
Load transactional (confronto tra processori)
  • Performance dont drop in the new 4core processor
  • Clovertown drop wrt Harpwertown
  • A dual core processor keeps only up to Load3

21
Perf/watt
  • AMD Barcelona at 65nm Performance per watt
    similar to INTEL xeon at 45nm

22
Cache behaviour
  • 54xx has lower latency even with bigger cache
  • The 3 processors behave very differently in the
    4MB e 64MB range
  • If your (HEP) application works in this range you
    will see a big change of performance changing
    processor

23
Memory intel vs amd
  • Access time very similar
  • At 1GB (tipical footprint of HEP application)
    the new AMD behave better
  • But the new are Xeon 54xx much better than the
    53xx

24
Mem intel vs amd
  • Who is faster?
  • It depends on the block size
  • On the red zones Intel is better.
  • On the green zone AMD is better

25
Cache behaviour
  • We need to study the behaviour of tipical HEP
    application
  • Simulation, event generation, Reconstruction,
    Analysis
  • To understand how to write more efficient
    application

26
Power issues
  • Power consumption change from one processor to
    another
  • Clock, High-K dielectric, Active Power
    Managements, Clock throttling

27
An HEP data center
  • Need to make measurement of Power usage for HEP
    application
  • Example a big Tier2 with 500 boxes needs 100kW
  • Like the whole CED of INFN Padova
  • About 800 MWh in one year
  • Energy cost 0.12 Euro per kWh ? Energy bills of
    100 kEuro/year
  • A 10 improvement on Power efficiency
  • means 10 k/year savings
  • And savings on the infrastructure (power
    distribution, UPS, Cooling)

28
Financial request
  • Need to buy a new worker node each time a new
    processor is released in the dual proc market
    segment
  • Only if significantly new features are presents
  • One or two each for INTEL and AMD per year
  • 4 kEuro each (dual proc, 2GB/core, 1disk)
  • 2 box to start with

29
Transition problem
  • Impossible to find SPEC Int 2000 pubblished
    results for the new processors (e.g. the not so
    new Clovertown 4-core)
  • Impossible to find pubblished SPEC Int 2006 for
    old processor (before 2006)
  • E.g. Old P4 Xeon, P4, AMD 2xx
  • You cant convert from SI2000 to SI2006 but the
    ratio for x86 architecture is in the 137 172
    range

30
Even more
  • Actually all the gcc results in the previous
    slide are on i386 (32bit)
  • if you would like to know how your code is
    running on 64 bit machine, you can measure
    Specint INT 2000 with gcc on x86_64.
  • So the worker node with two Intel Woodcrest dual
    core 5160 at 3.06 GHz
  • SI2K nominal 2929 3089 (min max)
  • SI2K on 4 cores 11716 - 12536
  • SI2K gcc-low 6021
  • SI2K gcc-high 6409
  • SI2K gcc-low 50 9031

31
Atlas
  • Here 100 is Xeon5160
  • Few results for SI2006gcc but no diff from CMS
    and babar
  • Few results also from SI2006 pubblished because
    of several old architectures
  • SI2Kgcc not bad
  • SI2K pubblished heavily overstimate new Xeon
  • Atlas simulation normalized performs the same on
    the new intel core or amd opteron (like CMS,
    Babar)

32
Power consumption
33
Power meter
  • Need a device to measure Voltage and Current
  • And logging capabilities
  • E.g. Fluke 1735

34
FZK Measurement
  • In 2001 SPEC with gcc was 80 of the average
    pubblished data
  • In 2006 the gap was much wider

35
Which is the better?
  • I started to measure performances of HEP codes on
    several machines
  • The goal was to find a commercial mantained
    benchmark to replace SI2K
  • I compared HEP code with
  • SI2K pubblished results
  • SI2K measured with gcc and CERN tuning
  • SI2006 and SI2006 rate pubblished results
  • SI2006 and SI2006 with gcc4 (32 and 64 bit)

36
Cache
  • In the 80s the latency (3-10 clock time)
  • Now latency is 1000s of clock time
  • Importance of the cache architecture
  • 1st level, 2nd level, 3rd level
  • Cache latency (tempo)
  • Cache bandwidth (vel trasfer)
  • Shared or exclusive?
Write a Comment
User Comments (0)
About PowerShow.com