stxp - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

stxp

Description:

Multiple video standard, encode and decode (MPEG4, H264, WMV, ... (e.g. video codec, image ... Profiling to extract hot spot and benefit if ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 34
Provided by: dmdmar
Category:
Tags: hot | stxp | videos

less

Transcript and Presenter's Notes

Title: stxp


1
(No Transcript)
2
DAC 2006CAD Challenges for Leading-Edge
Multimedia Designs
3
NOMADIKThe challenge of low power, high
performance and scalable multimedia
acceleration Alain Artieri - Patrick
BlouetSTMicroelectronicsJuly 26, 2006
4
Multimedia Computing Landscape
5
The convergence paradigm

Personal Computer

New Mobile Multimedia Computing Architecture
Consumer Electronics
Mobile Phone
6
Consumer versus Computer
  • Consumer Products
  • High quality of service
  • Designed for worst case
  • Highly parallel architecture
  • Hardware accelerators
  • Personal Computer
  • Monolithic processor architecture
  • High MHz for performance
  • High power consumption
  • Open OS
  • Flexibility
  • Rich set of standard interfaces for storage and
    connectivity
  • Open platform, multi OS
  • Flexibility
  • Rich set of standard interfaces for storage and
    connectivity

New computing architecture must combine the best
of both worlds
7
Cell Phones a Key Driver
8
Competing Technical Constraints
Scalability
Low Power
Multimedia Performance
9
Multimedia Performance Requirements
  • Multiple video standard, encode and decode
    (MPEG4, H264, WMV, ), up to HDTV format
  • High resolution VGA screen and above in small
    form factor, Output to HDTV with large screen
  • Multi megapixel camera, DSC class image
    reconstruction chain and picture improvement
  • Sophisticated Audio use cases combination of
    multiple Codecs, sound effects, speech codecs,
  • Advanced 3D graphics acceleration for gaming
  • Consume produce high bandwidth multimedia
    content

10
Low Power
  • A key system technology driver
  • Of course a product feature
  • Battery life time
  • But helps product manufacturability
  • Stacking in a power budget
  • And product cost
  • Low cost packaging
  • No heat sink

11
Nomadik Architecture Overview
12
Application Processor Content
Host Processor
Multimedia Accelerator
Peripherals
Embedded Memory
Host processor peripherals, No differentiation
Multimedia Acceleration, differentiating factor
  • The architecture design challenge is in
    Multimedia Acceleration (Audio, Video, Imaging,
    Graphics)
  • This is were innovation is required and
    competitive advantage is built

13
Nomadik Multimedia Acceleration Model
Interconnect
DSP
DSP
DSP
Multiple DSP
Tightly Coupled HW
Tightly Coupled HW
Tightly Coupled HW

Attached to HW acceleration
DMA engine
DMA engine
DMA engine
Data mover
  • Multiple DSP based sub-system
  • Symmetrical DSPs (generic S/W component can run
    anywhere)
  • Attached HW resources (dependence resolved at
    component manager level)

14
Multiple DSP approach benefits
  • High computing performance
  • Multiple non interfering domains of intense
    activity, each having its own processor, DMA
    services and hardware accelerators for data
    intensive functions
  • Hardware acceleration embedding standard
    functions (e.g. video codec, image
    reconstruction improvement)
  • Highest predictable performance through a
    careful bus and memory hierarchy design
  • Low Power (target 100s of mW)
  • Intrinsic low power sub systems
  • Fine grain power management at sub system level
  • Leakage management by switching on off sub
    systems

15
Power management
  • Combination of multiple techniques
  • Dynamic power reduction
  • Clock gating
  • Voltage scaling (DVFS)
  • Pulse-Width Modulation (PWM)
  • Static power reduction
  • Biasing
  • Power On/Off switching (Power gating)
  • A global system issue from power management
    inside the OS down to silicon process (e.g. gate
    leakage)

16
DVFS Principle
Operating System Load Monitor (SW)
CPU Voltage
1.3V
CPU performance requirements
100
28 energy saving
1.2V
Voltage/ Frequency Tables
85
55 energy saving
62
1.1V
  • Process Requirements
  • Large voltage excursion
  • Low leakage

17
PWM Principle
Operating System Load Monitor (SW)
CPU Voltage
1.0V
CPU performance requirements
100
15 energy saving
Active clock ratio table
1.0V
85
38 energy saving
62
1.0V
  • Process Requirements
  • Clock as fast as possible
  • Source bias or switch off when clock is stopped

18
Multi-step PWM
  • Power management state machine under SW control
  • Source Bias for short clock stop period
  • Power off with context save/restore for long
    period

save
restore
Short stop (Source Bias reduced leakage)
Long stop (Power Off zero leakage)
19
Power management
  • Power mode changes are managed by software
  • Constraints and impact must be known by software
    developer.
  • Information initially needed only at design
    level is now flowing into the software space.
  • Power awareness in the software world is coming
    form the design world through better link between
    design tools and software development tools.
  • Need for a power view of the application
    accessible to software developers.

20
Software Architecture for Multimedia Acceleration
21
Complex Multimedia Software Stack
User Interface
Operating System
Upward pervasion of design constraints
Multimedia Framework
Multimedia API
Media Network Server
SoC design perimeter
Execution Infrastructure
Codecs, Sensors, Presentation
Hardware
22
Objectives
  • A unified programming model for distributed
    computing
  • One S/W component can run anywhere possible
  • Dynamically configurable
  • Run complex algorithms that requires more than
    one DSP
  • Enforce software architecture
  • Modularity
  • Component programming model
  • Multimedia framework
  • Comprehensive debug
  • System level monitoring
  • Component observable by construction (auto code
    instrumentation)

23
Complex use case illustration
  • 16 QCIF decode
  • 1 Grab Viewfinder
  • Graphics control on Host CPU
  • SVGA display
  • 100mW

24
Architecture evolution
25
SoC evolution across technology nodes
  • Constant SoC Die Size
  • Slow evolution of peripherals (area decrease)
  • General purpose CPU sub-system complexity double
    at each node (constant area),
  • Embedded memory capacity double at each node
    (constant area)
  • Loosely coupled DSP sub-system complexity
    increase by 30 at each node (30 area decrease)

26
Main trends
  • Host CPU evolving toward multi-core architecture
    to meet the performance increase requirements
  • HW acceleration mapped on reconfigurable arrays
  • Performances close to dedicated HW in many areas
  • Good fit with regular design constraints imposed
    by 45nm process and beyond
  • Excellent structure for best optimized power
    management
  • And FLEXIBILITY

27
Reconfigurable Hardware (DSP fabric)
  • Target signal processing and arithmetic intensive
    applications
  • Reconfigurable array of simple DSP core (CNode)
  • Low power architecture
  • Hierarchical clock gating
  • Distributed leakage control (fine grain power
    gating)
  • Programmable DMA engine
  • Reconfigurable at run time, multi task

28
Mapping Flow
DFG
Behavioral code
Procedure(In,Out,inout) Constant
A,b,c, Begin Xa-in0 .. End
Coarse grained configuration
Partitioning/static scheduling

N0_i

Level 1

M
U
Clusters

X

N0_o

Level0


Data out
N1_i

N1_o

N2_i

Mux level 2

Data in
N2_o
  • Alus execute a cyclic micro-sequence
  • Data exchanges through hierarchical clustered
    interconnect
  • Configuration step is sequence loading and
    interconnect programming

Data in
Data out
Data in
Data out
Data in
Data out
ILP software pipelining
29
Mapping Flow
  • 3D optimization problem (place/route/schedule)
  • Traditional scheduling techniques for VLIW or
    clustered VLIW dont apply
  • The solution dont take into account the spatial
    dimension of the problem
  • Traditional PR used in FPGA don't apply neither
    because they don't consider the time dimension

30
What can fit in 45mm² in 45nm
Programmable Multimedia Accelerator
Imaging H/W
192 CNode (40 GOPS)
Video H/W
Interconnect
4MB Multi-port Embedded Memory
L2
Peripherals analog
L1
L1
Host Core 2
Host Core 1
31
CAD Challenges
32
Main area of CAD challenges
  • Low Power design
  • Static Dynamic power global optimization
  • Power control is becoming very fine grain. Must
    be tightly linked with software environment.
  • Power control is beyond the pure SoC. System
    level power view is needed.
  • Software design
  • Efficient software design on hierarchical
    multiprocessor engine
  • Capability to architect design software
    architecture as efficiently as HW
  • Capture tools, simulation, verification,
    automated code generation

33
Main area of CAD challenges
  • Synthesis on Reconfigurable hardware
  • Configuring the hardware network
  • 3D place route of massively parallel code on
    arrays of DSPs
  • Design constraints going up in the software
  • Reconfiguration latency
  • Expected performance.
  • Reconfigurable hardware managed at software
    level.
  • Software development environment has to be aware
    of reconfigurable hardware.
  • Profiling to extract hot spot and benefit if
    doing in hardware.
  • Code generation as well reconfiguration sequence
    for hardware.

34
Conclusion
  • For multimedia processors, the complexity is
    moving to software design
  • Hardware complexity resolved through regular
    design (multicore host, multi-DSP,
    coarse-grained DSP fabric)
  • CAD challenge lies essentially in S/W design
    tools
  • Multimedia software execution infrastructure,
    simulation, debug
  • Programmable hardware acceleration
Write a Comment
User Comments (0)
About PowerShow.com