THE EARTH SIMULATOR SYSTEM - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

THE EARTH SIMULATOR SYSTEM

Description:

A high-end supercomputer (the Earth Simulator) is just like an Alien with a very ... Mass storage system cartridge tape library system. 10. ctd... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 35
Provided by: lrc17
Category:

less

Transcript and Presenter's Notes

Title: THE EARTH SIMULATOR SYSTEM


1
THE EARTH SIMULATOR SYSTEM
  • By Shinichi HABATA, Mitsuo YOKOKAWA,
  • Shigemune KITAWAKI
  • Presented by Anisha Thonour

2
Extracted from the government website
A high-end supercomputer (the Earth Simulator) is
just like an Alien with a very big head (brain)
but small arms and legs.
To make the most of its CPU power, thousands of
arms and legs are necessary.
3
Definitions
  • Super Computer
  • A supercomputer is a computer that leads the
    world in terms of processing capacity,
    particularly speed of calculation, at the time of
    its introduction.
  • Cost is no object with advanced technologies
    Dr.Pfeiffer
  • Parallel Processing
  • Processing in which multiple processors work on
    a single application simultaneously .

4
Cross-sectional View of the Earth Simulator
Building
5
Topics to be introduced
  • Introduction
  • System Overview
  • Processor Node
  • Interconnection Network
  • Performance
  • Conclusion

6
Introduction
  • Global change prediction using computer
    simulation
  • 1000 times faster
  • 1997 - February 2002
  • 87.5 peak performance(35.86TFLOPS) LINPACK
  • 64.9 peak performance(26.58TFLOPS) global
    atmospheric circulation model with the spectral
    method

7
System Overview
  • Parallel vector super computer
  • 640 processor node and interconnection network
  • 1 processor node holds 8 arithmetic processors
    and main memory
  • Peak performance
  • Processor node 40TFLOPS
  • Achieved performance
  • Processor node 35.86TFLOPS
  • Interconnection network 640 x 640 non-blocking
    crossbar switch
  • Bandwidth 12.3GB/s

8
System Overview ctd.
9
System Overview ctd..
  • 1 cluster consist of 16 processor nodes, a
    cluster control station , an I/O control station
    and system disk
  • 640 nodes divided into 40 clusters
  • 2 types of clusters S cluster(1), L
    cluster(39)
  • S cluster- 2 nodes are used to for interactive
    use and another for small-size batch jobs
  • User disks - storing user files
  • Mass storage system cartridge tape library
    system

10
ctd.
  • Super cluster control station manages all 40
    clusters and provide a single system images
    operational environment
  • High-performance and high-efficiency
    Architectural features
  • Vector Processor
  • Shared memory
  • High-bandwidth and non-blocking interconnection
    crossbar network
  • Parallelizing, high-sustained performance
  • Vector processing on a processor
  • Parallel processing with shared memory within a
    node
  • Parallel processing among distributed nodes via
    the interconnection network

11
Processor Node
  • Each PN consist of 8AP, a main memory system, a
    remote-access control unit and an I/O processor.
  • Arithmetic processor can deliver up to 8GFLOPS
    and there are 8 APs.
  • It uses a high efficiency heat sink using heat
    pipe.
  • High speed main memory device to reduce the
    memory access latency.
  • Paradigms provided within a processor node is
  • Vector processing on a processor.
  • Parallel processing with shared memory.

12
Processor Node Configuration
13
(No Transcript)
14
Interconnection Network
  • 640 x 640 non-blocking crossbar switch
  • Byte-slicing technique
  • Control unit and 128 data switch unit
  • 320 PN cabinets and 65 IN cabinets
  • Each PN cabinets consist of 2 processor nodes and
    65 IN cabinets containing the interconnection
    network.

15
(No Transcript)
16
Interconnection Network Wiring
17
Inter-node communication mechanism
  • Node A requests the control unit to reserve a
    data path from node A to node B, and the control
    unit reserves the data path, then replies to node
    A.
  • Node A begins data transfer to node B
  • Node B receives all the data, then sends the data
    transfer completion code to node A.

18
Inter-node interface with ECC codes
19
Inter-node interface with ECC codes
  • To resolve the error occurrence rate problem, ECC
    codes are added to the transfer data.
  • A receiver node detects the occurrence of
    intermittent inter-node communication failure by
    checking ECC codes, and the error byte data can
    almost always be corrected by RCU within the
    receiver node.
  • ECC used for recovering from inter-node
    communication failure from a data switch unit
    malfunction.
  • Correction done until switch unit is repaired.

20
Barrier Synchronization mechanism using GBC
21
Barrier synchronization mechanism using GBC
  • GBC-Global barrier counter
  • GBF-Global barrier flag
  • Barrier synchronization mechanism
  • The master node sets the number of nodes used for
    the parallel program into GBC within the INs
    control unit
  • The control unit resets all GBFs of the nodes
    used for the program
  • The node, on which task completes, decrements GBC
    within the control unit , and repeats to check
    GBF until GBF is asserted
  • When GBC0, the control unit asserts all GBFs of
    the nodes used for the program
  • All the nodes begin to process the next tasks.
  • The barrier synchronization time is constantly
    less than 3.5µsec

22
Bird's-eye View of the Earth Simulator System
23
(No Transcript)
24
Performance
  • Using GBC feature, MPI-Barrier synchronization
    time is constantly less than 3.5µsec.
  • The software barrier synchronization time
    increases, or is proportional to the number of
    nodes.

25
Performance
  • The interconnection network is a single stage
    network so this performance is always achieved
    for every two-node communication.

26
Performance
  • The ratio of peak performance is more than 85.
  • Performance is proportional to the number of
    nodes.

27
Conclusion
  • High-performance and high-efficiency
    Architectural features
  • Vector Processor
  • Shared memory
  • High-bandwidth and non-blocking interconnection
    crossbar network
  • Parallelizing, high-sustained performance
  • Vector processing on a processor
  • Parallel processing with shared memory within a
    node
  • Parallel processing among distributed nodes via
    the interconnection network
  • 87.5 peak performance(35.86TFLOPS) LINPACK
  • 64.9 peak performance(26.58TFLOPS) global
    atmospheric circulation model with the spectral
    method

28
Applications-Solid Earth Simulation Group
  • We are developing new algorithms for the
    geophysical simulations as well as new grid
    systems in the spherical geometry.

29
Solid Earth Simulation Group
30
To understand the mechanism of the variability
with time scale from a few days to decades and to
study the predictability in the atmosphere.
31
To study the effects of meso-scale phenomena on
the ocean general circulation and the material
transport.
32
To understand the mechanism of the variability
and to study the predictability in the coupled
atmosphereocean system.
33
References
  • http//www.thocp.net/hardware/nec_ess.htm
  • http//www.es.jamstec.go.jp/esc/eng/Hardware/in.ht
    ml

34
Thank You
Write a Comment
User Comments (0)
About PowerShow.com