Title: stxp
1(No Transcript)
2DAC 2006CAD Challenges for Leading-Edge
Multimedia Designs
3NOMADIKThe challenge of low power, high
performance and scalable multimedia
acceleration Alain Artieri - Patrick
BlouetSTMicroelectronicsJuly 26, 2006
4Multimedia Computing Landscape
5The convergence paradigm
Personal Computer
New Mobile Multimedia Computing Architecture
Consumer Electronics
Mobile Phone
6Consumer versus Computer
- Consumer Products
- High quality of service
- Designed for worst case
- Highly parallel architecture
- Hardware accelerators
- Personal Computer
- Monolithic processor architecture
- High MHz for performance
- High power consumption
- Open OS
- Flexibility
- Rich set of standard interfaces for storage and
connectivity
- Open platform, multi OS
- Flexibility
- Rich set of standard interfaces for storage and
connectivity
New computing architecture must combine the best
of both worlds
7Cell Phones a Key Driver
8Competing Technical Constraints
Scalability
Low Power
Multimedia Performance
9Multimedia Performance Requirements
- Multiple video standard, encode and decode
(MPEG4, H264, WMV, ), up to HDTV format - High resolution VGA screen and above in small
form factor, Output to HDTV with large screen - Multi megapixel camera, DSC class image
reconstruction chain and picture improvement - Sophisticated Audio use cases combination of
multiple Codecs, sound effects, speech codecs, - Advanced 3D graphics acceleration for gaming
- Consume produce high bandwidth multimedia
content
10Low Power
- A key system technology driver
- Of course a product feature
- Battery life time
- But helps product manufacturability
- Stacking in a power budget
- And product cost
- Low cost packaging
- No heat sink
11Nomadik Architecture Overview
12Application Processor Content
Host Processor
Multimedia Accelerator
Peripherals
Embedded Memory
Host processor peripherals, No differentiation
Multimedia Acceleration, differentiating factor
- The architecture design challenge is in
Multimedia Acceleration (Audio, Video, Imaging,
Graphics) - This is were innovation is required and
competitive advantage is built
13Nomadik Multimedia Acceleration Model
Interconnect
DSP
DSP
DSP
Multiple DSP
Tightly Coupled HW
Tightly Coupled HW
Tightly Coupled HW
Attached to HW acceleration
DMA engine
DMA engine
DMA engine
Data mover
- Multiple DSP based sub-system
- Symmetrical DSPs (generic S/W component can run
anywhere) - Attached HW resources (dependence resolved at
component manager level)
14Multiple DSP approach benefits
- High computing performance
- Multiple non interfering domains of intense
activity, each having its own processor, DMA
services and hardware accelerators for data
intensive functions - Hardware acceleration embedding standard
functions (e.g. video codec, image
reconstruction improvement) - Highest predictable performance through a
careful bus and memory hierarchy design - Low Power (target 100s of mW)
- Intrinsic low power sub systems
- Fine grain power management at sub system level
- Leakage management by switching on off sub
systems
15Power management
- Combination of multiple techniques
- Dynamic power reduction
- Clock gating
- Voltage scaling (DVFS)
- Pulse-Width Modulation (PWM)
- Static power reduction
- Biasing
- Power On/Off switching (Power gating)
- A global system issue from power management
inside the OS down to silicon process (e.g. gate
leakage)
16DVFS Principle
Operating System Load Monitor (SW)
CPU Voltage
1.3V
CPU performance requirements
100
28 energy saving
1.2V
Voltage/ Frequency Tables
85
55 energy saving
62
1.1V
- Process Requirements
- Large voltage excursion
- Low leakage
17PWM Principle
Operating System Load Monitor (SW)
CPU Voltage
1.0V
CPU performance requirements
100
15 energy saving
Active clock ratio table
1.0V
85
38 energy saving
62
1.0V
- Process Requirements
- Clock as fast as possible
- Source bias or switch off when clock is stopped
18Multi-step PWM
- Power management state machine under SW control
- Source Bias for short clock stop period
- Power off with context save/restore for long
period
save
restore
Short stop (Source Bias reduced leakage)
Long stop (Power Off zero leakage)
19Power management
- Power mode changes are managed by software
- Constraints and impact must be known by software
developer. - Information initially needed only at design
level is now flowing into the software space. - Power awareness in the software world is coming
form the design world through better link between
design tools and software development tools. - Need for a power view of the application
accessible to software developers.
20Software Architecture for Multimedia Acceleration
21Complex Multimedia Software Stack
User Interface
Operating System
Upward pervasion of design constraints
Multimedia Framework
Multimedia API
Media Network Server
SoC design perimeter
Execution Infrastructure
Codecs, Sensors, Presentation
Hardware
22Objectives
- A unified programming model for distributed
computing - One S/W component can run anywhere possible
- Dynamically configurable
- Run complex algorithms that requires more than
one DSP - Enforce software architecture
- Modularity
- Component programming model
- Multimedia framework
- Comprehensive debug
- System level monitoring
- Component observable by construction (auto code
instrumentation)
23Complex use case illustration
- 16 QCIF decode
- 1 Grab Viewfinder
- Graphics control on Host CPU
- SVGA display
- 100mW
24Architecture evolution
25SoC evolution across technology nodes
- Constant SoC Die Size
- Slow evolution of peripherals (area decrease)
- General purpose CPU sub-system complexity double
at each node (constant area), - Embedded memory capacity double at each node
(constant area) - Loosely coupled DSP sub-system complexity
increase by 30 at each node (30 area decrease)
26Main trends
- Host CPU evolving toward multi-core architecture
to meet the performance increase requirements - HW acceleration mapped on reconfigurable arrays
- Performances close to dedicated HW in many areas
- Good fit with regular design constraints imposed
by 45nm process and beyond - Excellent structure for best optimized power
management - And FLEXIBILITY
27Reconfigurable Hardware (DSP fabric)
- Target signal processing and arithmetic intensive
applications - Reconfigurable array of simple DSP core (CNode)
- Low power architecture
- Hierarchical clock gating
- Distributed leakage control (fine grain power
gating) - Programmable DMA engine
- Reconfigurable at run time, multi task
28Mapping Flow
DFG
Behavioral code
Procedure(In,Out,inout) Constant
A,b,c, Begin Xa-in0 .. End
Coarse grained configuration
Partitioning/static scheduling
N0_i
Level 1
M
U
Clusters
X
N0_o
Level0
Data out
N1_i
N1_o
N2_i
Mux level 2
Data in
N2_o
- Alus execute a cyclic micro-sequence
- Data exchanges through hierarchical clustered
interconnect - Configuration step is sequence loading and
interconnect programming
Data in
Data out
Data in
Data out
Data in
Data out
ILP software pipelining
29Mapping Flow
- 3D optimization problem (place/route/schedule)
- Traditional scheduling techniques for VLIW or
clustered VLIW dont apply - The solution dont take into account the spatial
dimension of the problem - Traditional PR used in FPGA don't apply neither
because they don't consider the time dimension
30What can fit in 45mm² in 45nm
Programmable Multimedia Accelerator
Imaging H/W
192 CNode (40 GOPS)
Video H/W
Interconnect
4MB Multi-port Embedded Memory
L2
Peripherals analog
L1
L1
Host Core 2
Host Core 1
31CAD Challenges
32Main area of CAD challenges
- Low Power design
- Static Dynamic power global optimization
- Power control is becoming very fine grain. Must
be tightly linked with software environment. - Power control is beyond the pure SoC. System
level power view is needed. - Software design
- Efficient software design on hierarchical
multiprocessor engine - Capability to architect design software
architecture as efficiently as HW - Capture tools, simulation, verification,
automated code generation
33Main area of CAD challenges
- Synthesis on Reconfigurable hardware
- Configuring the hardware network
- 3D place route of massively parallel code on
arrays of DSPs - Design constraints going up in the software
- Reconfiguration latency
- Expected performance.
- Reconfigurable hardware managed at software
level. - Software development environment has to be aware
of reconfigurable hardware. - Profiling to extract hot spot and benefit if
doing in hardware. - Code generation as well reconfiguration sequence
for hardware.
34Conclusion
- For multimedia processors, the complexity is
moving to software design - Hardware complexity resolved through regular
design (multicore host, multi-DSP,
coarse-grained DSP fabric) - CAD challenge lies essentially in S/W design
tools - Multimedia software execution infrastructure,
simulation, debug - Programmable hardware acceleration