Title: QCDOC: Project Status and First Results
1QCDOC Project Status and First Results
SciDAC 2005 June 30, 2005
Norman H. Christ Columbia University
2Outline
- QCDOC Computer
- Architecture
- Construction
- Software
- Performance
- SciDAC component
- First QCDOC results
- Overview of Lattice QCD
- Current emphasis
- Control/reduce errors
- Extend reach
- New ideas
- Symanzik improvement
- Chiral fermions
- Targeted QCD Computers
3Review of Lattice QCD
- Introduce a space-time lattice.
- Perform the Euclidean Feynman path integral.
- Precise non-perturbative formulation.
- Capable of numerical evaluation.
- Evaluate using Monte Carlo, importance sampling,
with hybrid molecular dynamics/Langevin
evolution. - Use space-time formulation directly easily
mounted on a parallel computer.
4Physics Underlying QCD
- Quarks interact by simple gluon exchange.
- Same geometrical beauty as EM.
- With the 1 accurate neglect of
electromagnetism, this treatment is exact! - Interaction energy generates 99 of known mass in
the Universe. - Should explain all of nuclear physics.
- Must be mastered if the underlying properties of
quarks are to be learned from experiment.
Quark/anti-quark p meson
5Sources of Error
6Finite Lattice Spacing Errors
- Computational costs rise rapidly as a ? 0
1/a8 - Space-time volume 1/a4
- Dirac operator inversions 1/a
- Critical slowing down 1/a
- Molecular dynamics time step 1/a2
- Symanzik improvement (Runge-Kutta for field
theory) - Represent O(an) lattice theory errors by higher
dimension operators in effective continuum
theory. - Adjust irrelevant lattice operators to make
c(n)i 0.
7Chiral Fermions
- Domain wall fermions most thoroughly explored.
- 5-D theory with 4-D, chiral surface states.
8Residual Chiral Symmetry Breaking
- Finite Ls produces residual chiral symmetry
breaking
- Size of mres depends on the roughness of the
gauge field
9QCD Machines?
- Regularity of lattice QCD makes parallelization
easy, reduces network cost. - Vanishing I/O and small memory needs allow
economical configuration. - Simple, fundament character of the theory and
stability of the numerical formulation encourage
hardware effort.
10QCD Machines!
1985 1987 1989
1998 2005
16 Mflops 64 Mflops
3.2Gflops 64 Gflops
11Columbia QCD Machines
256-node, 16 Gflops (1989)
16-node, 0.256 Gflops (1985)
8192-node, 0.4 Tflops QCDSP machine (1998)
12QCDOC Goals
- Massively parallel machine capable of strong
scaling use many nodes on a small problem. - Large inter-node bandwidth.
- Small communications latency.
- 1/sustained Mflops cost/performance.
- Low power, easily maintained modular design.
13QCDOC Collaboration
- UKQCD (PPARC)
- Peter Boyle
- Mike Clark
- Balint Joo
- RBRC (RIKEN)
- Shigemi Ohta
- Tilo Wettig
- IBM
- Dong Chen
- Alan Gara
- Design groups
- Yorktown Heights, NY
- Rochester, MN
- Raleigh, NC
- Columbia (DOE)
- Norman Christ
- Saul Cohen
- Calin Cristian
- Zhihua Dong
- Changhoan Kim
- Ludmila Levkova
- Sam Li
- Xiaodong Liao
- HueyWen Lin
- Guofeng Liu
- Meifeng Lin
- Robert Mawhinney
- Azusa Yamaguchi
- BNL (SciDAC)
- Robert Bennett
- Chulwoo Jung
- Konstantin Petrov
- Stratos Efstathiadis
14QCDOC Architecture
- IBM-fabricated, single-chip node.
- 50 million transistors, 5 Watt, 1.3cm x
1.3cm - Processor
- PowerPC 32-bit RISC.
- 64-bit, 1 Gflops floating point unit.
- Memory/node 4 Mbyte (on-chip) lt2 Gbyte
DIMM. - Communications network
- 6-dim, supporting lower dimensional partitions.
- Global sum/broadcast functionality.
- Multiple DMA engines/minimal processor overhead.
- Ethernet connection to each node booting, I/O,
host control. - 7-8 Watt/node, 15 in3 per node.
15Network Architecture
- Red boxes are nodes.
- Blue boxes mother boards.
- Red lines are communications links.
- Green lines are Ethernet connections.
- Green boxes are Ethernet switches.
- Pink boxes are host CPU processors.
16Mesh geometry
- N0N1N2 mother boards are wired as a N0 x N1 x
N2 torus. - With 26 nodes on a mother board, the resulting
machine is a 2N0 x 2N1 x 2N2 x 2 x 2 x 2,
six-dimensional torus. - Use the extra two dimensions to create
lower-dimensional torii. - qpartition_remap -X01 -Y23 -Z4 -T5 maps the six
machine dimensions (0-5) into four physical
dimensions automatically.
4x4x2 (machine) ? 32 (physics)
17QCDOC Chip
- 50 million transistors, 0.18 micron, 1.3 x 1.3 cm
die, 5 Watt
18Software Environment
- Lean kernel on each node
- Protected kernel mode and address space.
- RPC support for host access.
- NFS access to NAS disks (/pfs).
- Normal Unix services including stdout and stderr.
- Threaded host daemon
- Efficient performance on 8-processor SMP
host. - User shell (qsh) with extended commands.
- Host file system (/host).
- Simple remapping of 6-D machine to (6-n)-D torus.
- Programming environment
- POSIX compatible, open-source libc.
- gcc and xlc compilers
- SciDAC standards
- Level-1, QMP protocol
- Level-2 parallelized linear algebra, QDP QDP.
- Efficient level-3 inverters
- Wilson/clover
- Domain wall fermions
- ASQTAD
- p4 (underway)
19Daughter board (2 nodes)
20BNL constructed test jig
21Mother board (64 nodes)
22Edge view of mother board
23Single mother board test jig
24512-Node Machine
25First 4 racks installed at Columbia
26UKQCD Machine (12,288 nodes/10 Tflops)
27SciDAC Role
- SciDAC-supported community-wide software support.
- Postdocs at Universities.
- New staff at National Labs.
- Some management support.
- Software coordinating committee.
- Software structure defined and much code written
- Level 3 high performance inverters, tailored for
QCDOC and clusters. - Level 2 Parallel linear algebra routines needed
for LGT. - Level 1
- Single-node, optimized linear algebra routines.
- QMP message passing protocol (MPI like)
- Supports efficient nearest-neighbor transfers.
- Includes efficient use of QCDOC hardware.
28SciDAC Software Project
UK Peter Boyle
Balint Joo (Mike Clark)
Software coordinating committee
29SciDAC Pay-Off
- Funding for a common effort encouraged
unprecedented collaboration. - Solid software preparation permitted a compelling
case to be made to HEP/NP for significant program
funding. - New multi-year DOE program support
- 5 Tflops QCDOC installed at BNL.
- Continuing multi-Teraflops investment in clusters
starting FY06. - U.S. LGT community resources increased 10X.
- Major science advances in the next 1-2 years.
30Brookhaven Installation
- DOE (left) and RBRC (right) 12K-node QCDOC
machines
31Project Status
- UKQCD 13,312 nodes --5.2M 3-5 Tflops
sustained. - Installed in Edinburgh 12/04.
- Running production at 400 MHz.
- RBRC 12,288 nodes -- 5M 3-5 Tflops
sustained. - Installed at BNL 2/05.
- Running production at 400 MHz..
- DOE 12,288 nodes -- 5.1M 3-5 Tflops
sustained - Installed at BNL 4/05.
- 1/3 being debugged.
- 2/3 performing physics tests.
32ASQTAD Performance
33Application Performance (double precision)
1024-node machine
4096-node machine (UKQCD)
34QCDOC First Results
- Given the difficulty of QCD and our ambitious
goals, important first results will require 6
months to 1 year. - Topics now being pursued
- QCD thermodynamics, study of quark gluon plasma
(Bielefeld/Brookhaven/Columbia/RBRC). - Dynamical, 21 flavor, staggered fermions
(Asqtad) - (MILC and UKQCD collaborations)
- Dynamical, 21 flavor, domain wall fermions
- JLab/UKQCD algorithm development.
- RBC/UKQCD large-scale simulation.
Monte Carlo samples will be used for many
cutting-edge projects.
35RBC Collaboration
- Columbia
- Michael Cheng
- Norman Christ
- Saul Cohen
- Changhoan Kim (Southampton)
- Ludmila Levkova (Indiana)
- Meifeng Lin
- HueyWen Lin
- Oleg Loktik
- Robert Mawhinney
- Samuel Shu
- Azusa Yamaguchi (Glasgow)
- RBRC
- Yasumichi Aoki (Wuppertal)
- Tom Blum
- Chris Dawson
- Taku Izubuchi (Kanazawa)
- Yukio Nemoto
- Jun-Ichi Noaki (Southampton)
- Kostas Orginos (MIT)
- Norikazu Yamada
- Takeshi Yamazaki
- BNL
- Frederico Berruto
- Michael Creutz
- Jack Laiho (Fermilab)
- Peter Petreczki
- Konstantin Petrov
- Sas Prelovsek
- Amajit Soni
36Edinburgh/UKQCD
- Edinburgh
- David Antonio
- Kenneth Bowler
- Peter Boyle
- Michael Clark
- Balint Joo
- Anthony Kennedy
- Richard Kenway
- Christopher Maynard
- Robert Tweedie
- Glasgow
- Azusa Yamaguchi
- Southampton
- Changhoan Kim
- Jun-Ichi Noaki
37Simulations with Dynamical DWF(RBC)
- Improved algorithms give 2-4x speed-up.
- 2002-2004 on QCDSP
- Algorithm development and interesting physics.
Evolution of topological charge
2-flavor, 163 x 32, Ls12, 1/a1.7 GeV
38BK Results
39Large DWF Simulations
- First parts of QCDOC used to explore
- lattice spacing
- action
- quark mass.
- Joint RBC/UKQCD effort.
- Extensive initial studies performed.
40Exploratory Runs this Springpreliminary 163 x
32, Ls 8
41Large DWF Simulations
- These studies determined initial parameter
values - Iwasaki gauge action
- mstrange a 0.04
- mud a 0.01, 0.02, and 0.03
- b 2.13 ? 1/a 1.8 GeV
- These three runs on large, 243 x 64 lattices are
now underway at BNL and Edinburgh.
42First 40 trajectories 243 x 64, Ls16
Evolution of the gauge action. The upper graph is
from Edinburgh and the lower from Brookhaven
43Outlook
- New QCDOC machines offer gt10x capability.
- SciDAC software support
- Convenient tools boost efficiency.
- Application-specific communications interface.
- High-level staff to support/evolve software base.
- Close UK US collaboration.
- Expect important results with a major impact on
high energy and nuclear physics .