IBM's - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

IBM's

Description:

4-CPU 'quads' (AKA 'NUMA nodes') Intel Pentium Pro, Xeon, Tanner, and ... Differences due to per-quad memory and Intel bus. Microcoded cache-coherency protocol ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 14
Provided by: paulmc75
Category:
Tags: ibm | quad

less

Transcript and Presenter's Notes

Title: IBM's


1
IBM's Sequent's Experience with NUMA
Paul E. McKenney Hubertus Franke NUMA on
Sequent's DYNIX/ptx Technology Trends NUMA-on-Linu
x Architectural Decisions
2
Introduction
  • Technology Trends
  • Influences from system-architecture evolution
  • NUMA on PTX
  • One data point
  • NUMA on AIX
  • One more data point
  • Next Steps...

3
Sequent's IA32 NUMA HW
  • "Flat NUMA" Hardware
  • 4-CPU "quads" (AKA "NUMA nodes")
  • Intel Pentium Pro, Xeon, Tanner, and Cascades
    CPUs
  • Per-quad memory (up to 8GB/quad)
  • Per-quad PCI I/O
  • Interconnect Based on SCI Ring
  • Differences due to per-quad memory and Intel bus
  • Microcoded cache-coherency protocol

4
Sequent HW Diagram
SCI
SCI I/F
SCI I/F
I/O
I/O
Mem
Mem
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
5
NUMA Support on ptx
  • Topology discovery tmp_ctl()
  • TMP_NENG, TMP_NQUAD, TMP_ENGTOQUAD,
    TMP_QUADNEXTENG, TMP_QUADTOENG
  • Resource Specification
  • quademptyset(), etc. similar to sigmask
    manipulation
  • rsrcdescr_t (specify relative to other object)
  • Process Placement qfork(), qexec()
  • Memory Placement mmapq(), shmgetq()
  • Process Migration attach_quad()

6
Hierarchical NUMA
Memory
Memory
7
AIX NUMA (1)
  • Topology Discovery
  • rs_getpartition(), rs_numrads(), rs_addlocal(),
    rs_findcpu(), rs_getinfo(), rs_getlatency(),
    rs_getinfo_detailed()
  • Resource Set Manipulation
  • rs_alloc(), rs_free(), rs_op(), rs_test(),
    rs_registername(), rs_getnameattr(),
    rs_setnameattr(), rs_discardname(),
    rs_getnamedrset()
  • Attachment Services
  • ra_fork(), ra_exec(), ra_shmget, ra_mmap(),
    ra_attach(), ra_detach_all(), ra_get_attachinfo(),
    ra_free_attachinfo()

8
AIX NUMA (2)
  • Location Services
  • rs_getlocation(), rs_devlocation(), rs_radid()
  • Threads Services
  • pthread_attr_setattach(), pthread_addr_getattach()
  • Memory Services
  • ra_memalloc()
  • Kernel Services
  • ra_thread_create()

9
Technology Trends
  • Latency ratios
  • Need to address low-end performance
  • Critical to Linux's installed base
  • Critical to Linux's success on the desktop/laptop
  • Need to address high-end scaling
  • Critical to Linux's success in the data center

10
Future Trends
  • CPU Speed Increasing Relative to Memory
  • Cache size increases
  • Cache-hierarchy depth increases
  • Shared caches become more prevalent
  • NUMA Latency Ratios Bounded, but Significant
  • Hierarchical NUMA Systems
  • Multiple CPUs on a die, multiple dies in a node
  • Memory not directly involved
  • But will have NUMA-like performance
    characteristics

11
Memory Latency Trends
Sequent systems
12
NUMA on Linux
  • Experience from Many NUMA Pioneers
  • Range of Hardware Architectures
  • Reports of Early NUMA Work in Linux
  • Architectural Trends

13
Linux Architectural Decisions
  • One NUMA API?
  • One NUMA Implementation?
  • Flat, Hierarchical, or Flat with ability to
    extend?
  • Targetted Memory-Latency Ratios?
  • Next Steps
Write a Comment
User Comments (0)
About PowerShow.com