Title: Disconnected Diagrams, Multigrid, Nvidia
1 Disconnected Diagrams, Multi-grid, Nvidia
all thaty
Richard Brower (Boston University)? James
Brannick (Penn) Ron Babich (BU)? Kipton Barros
(BU) Mike Clark (BU)? George Fleming
(Yale)? James Osborn (Argonne)? Claudio Rebbi
(BU)? QCDNA 2008 Regensburg Sept 5, 2008
y WARNING Much here is a FUTURE plan NOT proven
results but .....
2What is QCD?
Acronym Definition QCD Qualified Charitable
Distribution (IRS) QCD Quality, Cost,
Delivery QCD Quantum Chromodynamics QCD QuarkCopyD
esk (file extension) QCD Quasi-Cyclic
Dyadic QCD Quick Change Directory QCD Quick Claim
Deed (real estate) QCD Quintessential CD (PC
media player) QCD Quit Claim Deed (real
estate) QCD Quality Control Department
3What do these mean?
- QCD From Wikipedia, the free encyclopedia
- Quintessential Player, formerly known as
Quintessential CD - Quality, Cost, Delivery, a three-letter
acronym used in lean manufacturing - Quad City DJ's, Southern rap group
- Quick Control Dial, a control on many DSLR
cameras, like the Canon EOS 40D - Quote-Comma-Delimited known also as
Comma-separated values - Quantum chromodynamics, the theory
describing the Strong Interaction
4Outline
- Physics (How strange is the proton?)
- Algorithms (Multi-grid to the rescue?)
- Hardware (GPU propagator farm?)
5Physics Disconnected Diagrams
- Connected vs.
Disconnected - Want matrix element
6How strangey is the proton? Who cares?
- Violation of Standard Model
- Dark Energy (Neutralino scattering)
- NuTev anomaly
- Nucleon Physics (include u/d s quares)
- iso-scalar Form Factors, nucleon structure
function, Spin crisis for proton, matrix element
etc.
y see Lattice 2008 http//conferences.jlab.org/la
ttice2008/parallel-bytopic-struct.htmlS.Collins,
G. Bali, A.Schafer Hunting for the strangeness
... nucleonTakumi Doi et al
Strangeness and glue in the nucleon from
lattice QCDRon Babich et al
Strange quark content of the nucleon
7(No Transcript)
8Direct detection of dark matter
- In SUSY, the neutralino scatters from a nucleon
via Higgs exchange - The strange scalar matrix element is a major
uncertainty - Uncertainty in fTs gives up to a factor of 4
uncertainty in the cross-section! - Bottino et al., hep-ph/0111229
- Ellis et al., hep-ph/0502001
9Nuclear Experiment
PVES BNL E734 (?p scattering)?
Parity-violating electron scattering (SAMPLE,
HAPPEx, PVA4, G0)?
J. Liu et al., arXiv0706.0226 nucl-ex
(see also Young et al., nucl-ex/0605010)?
Pate et al., arXiv0805.2889 hep-ex
10Algorithm
- Monte Carlo update (Long auto correlations
times) - Global Heat bath aka Stochastic Estimator
(Zero auto correlations) - Find Á D-1 for Gaussian or Gauge
or Z2 - (Zero auto correlations!)
- With lt y x gt yx
Axy
11Improving Stochastic Estimate
- Variance reduction
- Dilution vs hopping parametery (Short
distance) - Multi-grid vs deflation/truncationy (Long
distance) - Curing volume divergence
- Trace versus Gauge fluctuations
- Better and more source (all to all?).
- Full multi-grid O(N long N) Trace?
x
y
y S.Collins, G. Bali, A.Schafer Hunting for
the strangeness ... nucleon
12Trace estimation
- Two sources of error gauge noise and error in
trace. In this calculation, we largely eliminate
the second source by calculating a nearly exact
trace on four time-slices. - 864 sources (x12 for color/spin). A given source
is nonzero on 4 sites on each of 4 time-slices. - Minimal spatial separation between sites is .
Small residual contamination is
gauge-variant and averages to zero. - Equivalent to using a single stochastic source
with
extreme dilution.
4 x 63 864
13Preliminary Methods
- Configurations were provided by the LHPC
Spectrum Collaboration - anisotropic lattice with
- 2 dynamical flavors, Wilson fermion and gauge
actions - 863 configurations
- 64 (x 12) inversions per configuration at the
light quark mass, for the nucleon correlators - 864 (x 12) inversions per configuration at the
strange mass, for the trace
14Strange scalar form factor
15Ratio approach
- Conventionally, one extracts the (e.g.
zero-momentum) form factor from the large t
behavior of the ratio - (or from a similar expression integrated over
time). - Instead, we fit the numerator directly, since
this allows us - to avoid contamination from backward-propagating
states, which are problematic due to the short
temporal - extent of our lattice (
). - to explicitly take into account the contribution
of (forward-propagating) excited states. - In the following, we always treat the system
- symmetrically with
16(No Transcript)
17Direct fit
- First, we perform a fit to the nucleon two-point
function, of the form - The coefficients and masses are very
well-determined, since we are required to
calculate correlators from all initial times (a
total of 863 x 64 55,232). - Next, we perform a fit to the three-point
function, - Here j1 and j2 are the form factors for the
proton and its first excited state, and j12 is a
transition matrix element between them. In
practice, we expect j2 and j12 to absorb the
contribution of still higher states, and trust
only j1 to be reliable.
18Strange scalar form factor
- For the renormalization-invariant quantity fTs,
we estimate
- where we have inserted the physical nucleon
mass. The second error is the uncertainty in
relating this mass to the lattice scale, the
first error is statistical, and no other
systematics are included.
- Note that the matrix element in the numerator was
calculated for a world
with a 400 MeV pion. If we work
consistently in such a world by inserting our
calculated nucleon mass, the scale dependence
drops out, and
we find
19Momentum dependence of GS(q2)?
s
PRELIMINARY
20Strange axial form factor
PRELIMINARY
- Results have not been renormalized.
- Calculated value is distinct from zero at the 3-s
level.
21Error O(L3/2) ) as L3 ) 1 For Exact Trace in a
Connect correlator,
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35Most Important New Trick Multi-grid Variance
Reduction
- The signal and variance of the first term is down
by 1 to 2 orders of magnitude because Dc D - The Coarse level Trace for D-1c is as cheap to
calculate as the level down operator inverse. - This can of course be done recursively giving (I
think) an O(N log N)trace calculation to fixed
tolerance.
36HARDWARE
- Graphics hardware is well suited to highly
parallel numerical tasks. - Hardware vendors provide development tools to
support high performance computing. - NVIDIA'S CUDA offers direct access to graphics
hardware through a programming language similar
to C. - Dirac-Wilson operator which runs at an effective
68 Gigaflops on the Tesla C870 GPU. - The recently released GTX 280 GPU at 92 Gigaflops
and we expect improvement pending code
optimization. - (Now 98 Gigaflops hope to get O(150) Gigaflops)
37Nvidia GPU architecture
38Two Generations Consumer vs HPC GPUs
- Consumer cards ) High Performance (HPC) GPUs
- I. 8880 GTX ) Tesla C870
- (16 multi-processor with 8 cores each)
- II. GTX 280 ) Tesla C1060
- (30 multi-processor with 8 cores each)
39C870 code using 60 of the memory bandwidth.
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44http//www.scala-lang.org/
45Future software Plans
- Need find out why we are only saturating 60 of
Memory bandwidth - Further educe memory traffic
- 8 real number per SU(3) matrix (2/3 of 12 used
now) - shear spinors in 43 blocks (5/9 of used now)
- Generalize to clover Wilson Domain Wall
operator (slightly better flops/mem ratio). - DMA between GPU on Quad system and network for
cluster - Start to design SciDAC API for many-core
technologies.
46Tesla 10-Series Whats the Big Deal?
47Consumer Chip GTX 280 ) Tesla C1060
481 U Quad S1070 System 8K
49CUDA 2.0 (Compute Unified Device Architecture)
- Can compile CUDA code into highly efficient
SSE-based multi-threaded C code
50Need a GPU Dirac Propagator Farm
- The Clark-Kennedy RHMC Paradox(Faster you go
harder it is to keep up) - Analysis is now the ?????e?? heel
- Solution Dedicated Analysis farm.
- GPU can deliver O(10) to O(100) gain in flops/
- Two quad Tesla ) 1 Sustained Teraflop!
- Two quad Tesla _at_ 25K One BG/L rack _at_ 2,000
K
51Commercial Break
- BOSTON POST DOC IN SEPT 2009
- PetaAPPS/SciDAC fellow
- (QCDNA in Boston Fall 2009?)