Title: OSCAR SCM Architecture for Multigrain Parallel Processing
1PanelSoftware Challenges in Multi-Core Chip Era
???? Hironori Kasahara Professor, Department
of Computer Science Director, Advanced
Chip-Multiprocessor Research Institute
?????Waseda University http//www.kasahara.cs.wa
seda.ac.jp
2Prof. Gaos Questions (1/3)
- Q1 From software angle do you expect that the
chip level multi-core architectures will soon be
converged to 1-2 style - (like single-core microprocessors did in the
history e.g. VLIW vs. superscalar) ? - If not, why not ? If yes, what would be the 1-2
style in your assumptions from software angle ? - Answer
- Yes, I think multi-core architecture will
converge to SMP for small non-real time systems
and OSCAR like software and hardware
collaborative architecture with local,
distributed shared and centralized memories with
DMA controller for real-time embedded systems.
3MPCoreTM
ARM and NEC Collaboration
Private FIQ lines
Configurable number of hardware interrupt lines
MPCoreTM
Interrupt Distributor
Per-CPU aliased peripherals
Timer
CPU interface
Timer
CPU interface
Timer
CPU interface
Timer
CPU interface
Wdog
Wdog
Wdog
Wdog
IRQ
IRQ
IRQ
IRQ
Configurable between 1 and 4 Symmetric CPU
I D 64bit bus
CoherenceControl bus
Snoop Control Unit (SCU)
Private Peripheral Bus
Optional 2nd AXI R/W64bit bus
Primary AXI R/W64bit bus
Duplicated L1 Tag
L2 (L220)
4Fujitsu FR-1000Multicore Processor
FR550 VLIW Processor
Integer Operation Unit
FR-V Multi-core Processor
Inst. 0
GR
Inst. 1
Inst. 2
Inst. 3
Inst. 4
FR
Inst. 5
Inst. 6
Inst. 7
Media Operation Unit
Fast I/O Bus
- Memory Bus 64bit x 2ch / 266MHz
- System Bus 64bit / 178MHz
- (?????FR-V?2?)
Crossbar (FR1000)
Bus
5CELL Processor Overview
- Power Processor Element (PPE)
- PowerCore processes OS and Control tasks
- 2-way Multi-threaded
- Synergistic Processor Element (SPE)
- 8 SPE offers high performance
- Dual issue RISC Architecture
- 128bit SIMD(16-way)
- 128 x 128bit General Registers
- 256KB Local Store
- DedicatedDMA engines
SPE
PPE
512KB
32KB32KB
6OSCAR Multi-Core Architecture
7Prof. Gaos Questions (2/3)
- Q2 Automatic compilation for parallel machine
did not succeed in general - as proven in the
past history. What do you expect this time for
multi-core revolution ? Will it succeed this
time ? If yes, why do you think we may succeed
this time ? If not, what other software
technology (if any) you predict may have a chance
succeeding ? - Answer
- Yes, I think we will succeed this time
because we have continued the compiler research
for twenty years and finally could develop
multigrain parallelization, local memory
management, data transfer control and frequency,
voltage and power-off control. - I believe the long long time efforts and the
real needs for the compiler will change the
situation.
8Prof. Gaos Questions (3/3)
- Q3 What is your favorite parallel programming
model (if any) ? Why ? - Do you believe that the so-called general
purpose parallel programming models should be the
way to go - for the new multi-core era ? Why or
why not ? - Answer
- My favorite model is OpenMP because vendors
have supported (just we use Parallel, Section,
Flush, Critical), especially section directives
for coarse grain task parallel processing. Also,
we will add some additional directives for OSCAR
type architecture with local, distributed
shared, and on-chip and off-chip centralized
shared memories, DMA controllers and power
control functions.
9API and Parallelizing Compiler in
METI/NEDOAdvanced Multicore for Realtime
Consumer Electronics Project
API to specify data assignment, data transfer,
power reduction control
Sequential Application Program (Subset of C
Language)
Translate into parallel codes for each vender
Executable codes for each vendor chip
Realtime Consumer Electronics Application
Programs Image, Secure Audio Streaming etc.
Waseda OSCAR Compiler
Backend compiler
Proc0ScheduledTasks
Mach. Codes
APIdecoder
Sequential Compiler
T1
Stop
SH multicore
Backend Compiler
Proc1ScheduledTasks
Mach. Codes
APIdecoder
Sequential Compiler
T2
T4
FR-V
Proc2ScheduledTasks
Backend Compiler
Mach.Codes
Sequential Compiler
APIdecoder
T3
T6 Slow
(CELL)
Data Transfer by DTC(DMAC)