Title: McKinsey presentatie
1General Presentation on IMECs Thematic Design
Activities Ivo Bolsens, Hugo De Man 125
researchers bolsens_at_imec.be
2IMEC organization
- CEO Gilbert Declerck
- Divisions
- DESICS design technologyIvo Bolsens
- SPT process technologyLuc van den Hove
- STDI silicon techn. device integrationHerman
Maes - MCP microsystems packagingRobert Mertens
- INVOMEC trainingEtienne Bourdeaudhui
3Mission
- Design of
- Architectures, Methods and Tools
- for the Implementation of
- Multimedia
- Internet Terminals
4How
- Study requirements of embedded IT systems
- Identify and solve RELEVANT design challenges
- build application demonstrators
- Work out systematic design methods and
supporting tools - build tools for real-life design support
- Develop re-usable, parameterized, white-box IP
- Train and educate
- industry
- university
5Measures of success
- scientific impact
- cooperation with universities in complementary
fields - international network of cooperation with most of
the important industrial performers in our field - portfolio of protected intellectual property
- transfer of technologies to existing companies
- creation of new spin-off companies
- attracting foreign investments in the field of
microelectronics and ICT - turn-over of well trained researchers to industry
6Bridge the gap between systems and silicon
Systems Heaven
a
JAVA, CORBA, JINI
5 million lines of VHDL
0.1µm 1/300 Hair
200 M Transistors
Power/ cm2 V3/l3 T_intercon r
l2/l2
Physics Hell
7Intelligent Home
W W W
MPEG 4 gt100 Gop/s 5 Gtr/s 5 Watt
8DESICS organization
- DIMA design of integrated multimedia
applications - MICS multimedia image compression systems
- EMSYS embedded systems design
- SEMP system exploration for memory power
- DISTA design of integrated systems for telecom
applications - MIRA mixed signal RF applications
- WISE wireless systems
- DBATE digital broadband terminals
9Configurable Home Terminal
Head End
Homegateway router storage basestation
IPv6
HFC
IP Home network
ServiceServer
InternethomeAppliance
modem
CSL
User premises
10Embedded connectivity
Distributed Application
11Challenges
Reconfigurable Software
Agent
Agent
Agent
VM
TCP/IP
RTOS
user data
Digital
CTL
MON
software agents
Front End
Tx/Rx DSP
CFIL/CB
To I/O
parameters synchro
Hardware
Analog
12Challenges in Dynamic Reconfiguration
- Run-time FPGA management
- dynamical creation and deletion of HW processes
- dynamical creation of the related HW/SW
interfaces - dynamical extension of the instruction set
- downloading of FPGA configuration for additional
instruction - Fast HW compilation
- Novel FPGA architectures optimized for partial
runtime configuration - Performance Estimation (dynamic, configuration
time)
13Networked re-configurable computing
Application Layer (Java applet FPGA bitstream)
FPGA
Middleware Layer
FPGA API
FPGA Controller
Real-TimeOperating System
Java Native Interface
Virtual Bus
Hardware Platform
Native Device Driver
Local Bus
Software
Hardware
14The first demonstrator FPGA based NetCam
InternetClient
Ibis CMOSsensor
ATMELEEPROM
Netscape
HTTP
GIF Engine
Reconfiguration
FPGA
request
IP layers
TCP/IP layers
image
10 Base T
10 Base T
network
- first FPGA-based thin-server Internet
Appliance (vs. Dedicated, Linux or uC based) - low power (FPGA 0.7 W)
- throughput scales up to 80 Mb/s
15Worlds first 80 MB/sec WLAN technology
Base station 155Mb/s multi-user rx antenna
diversity
wired backbone
Multi-path fading
- Orthogonal Frequency Division Multiplexing
(OFDM) - Turbo-coding
- Spatial Division Multiple Access (SDMA)
- Hiperlan-2/ IEEE 802.11 compatible
16Single-package transceiver
CMOS IF and digital circuitry
BiCMOSRF circuitry
MEMS switches, varactor, resonators
MCM interconnect inductors, capacitors,
resistors, filters, baluns
Antenna
17Multimedia MPEG-4 member SCtee
- Diversity 3D, Facial and Body Animation,
Video - Scalability time, space, SNR -
Interactivity behaviour f (input bits, user)
18Focus
- Graceful degradation, QOS
- Encode once/ decode everywhere
- Reduces the terminal cost (soft conformance
with - pathological cases)
- Man-Machine Interface Facial Animation
- Real-time SOFTWARE video-coding of CIF images
- Application Specific Processor for Wavelet coding
Demo
19Challenges
Multimedia
MPEG 4 JPEG2000
Several orders of magnitude in performance and
power dissipation need to be gained
Huge requirements gt 2 GOP/s gt 6 GB/s gt 10
MB storage
Drastic reduction of design complexity required
20Worlds first MPEG-4 compliant silicon
Max 30 fps CIF (352x288) Scalable architecture
21C/C system refinement exploration
Data mngnt
Concurrency mngnt
Platform constraint
Platform integration
22Deeply embedded system
Interfaces
Dedicated logic
- mP core
- Dedicated logic
- accelerator synthesis
- multi-DSP core
- retargetable ASIP compiler
- Memory/MMU
- Interfaces
- system integration
- Analog
phone book
keypad intfc
phonebook
RAM ROM
DMA
protocol
control
S/P
Frontier
Coware
Demod and sync
Target
Viterbi Equal.
voice recognition
speech quality enhancement
de-intl decoder
A
RPE-LTP speech decoder
digital down conv
D
Multi-DSP core
All of this fits in one, cheap, package
23Deeply embedded system
mP core
Memory/MMU
System protocol
- mP core
- system layer compiler
- Dedicated logic
- multi-DSP core
- memory/MMU
- dynamic static mem mngnt addr expr.
- Interfaces
- Analog
- A/D RF
Data
phone book
keypad intfc
phonebook
RAM ROM
DMA
protocol
control
S/P
Demod and sync
Viterbi Equal.
voice recognition
speech quality enhancement
Mixed Signal
de-intl decoder
A
RPE-LTP speech decoder
digital down conv
D
Analog
All of this fits in one, cheap, package
24Current challenges and solutions
- System Specification and System-level Refinement
with Exploration Support (algorithm design level,
concurrent task level, system timing simulation) - Data Transfer and Storage Exploration for
Massive Real Time Data Manipulation (dynamic
memory mngntstatic transfer and storage, address
generation) - Co-Design for Heterogenous Implementation
Paradigms (refinement from unified HW/SW
model,RTOS modeling, complete system
simulation) - RF front-end exploration (fast mixed-signal
co-simulation, chip-package co-design, noise
coupling)
25SoC or --- (S.O.S.)
- Design productivity gap grows !
- Complexity increase 40 per year
- Design productivity increase 15 per year
26System-level design
- Higher abstraction level
- Executable specs
- Object-oriented design
- Multi-paradigm modeling
- Behavioral IP re-use
- Incremental refinement to RT-HDL (HW) and C/C
(SW)
27System design issues in IT-Application domain
Embedded system
28Global concurrency management design flow for
dynamic concurrent tasks with data-dominated
behaviour
Dynamic memory mgmt
Physical memory mgmt
Address optimization
29TCM steps aim at removing the bottlenecks for
better performance
Optimized system specification
Task1
Task2
Inter-task DTSE
Task concurrency mngnt
Task3
Task-level system architecture
30The gray box approach focuses on the most
relevant TCM issues
High Level Specification
Black-box TCG 1
Improved Gray-box lt10
task concurrency extraction improvement
Initial gray-box TCG 10
Reduce complexity Create freedom
Initial TCG 50
Simplify the model
White-box TCG 100
C Specification
31Task Level DTSE and TCM
32Results on IM1 player
Cost
x
x
Time-Budget (MA cycle budget)
33The 2-processor approach (scheduling assignment)
Taskn
Task2
Task1
Vdd1V
Vdd3.3V
34Comparison of scheduling the original and
transformed graphs
original
Transformed
35Combination of static and dynamic scheduler
Static Scheduling
Static Scheduling
Dynamic Scheduling
1
3
2
A
B
1
A
B
3
2
Static scheduling done at compiling time,
exploring all the optimization possibility Dynamic
scheduling done at run time, providing
flexibility and dynamic control at low cost
36Dynamic Scheduling result
total energy
20
24
32
32
39
node number in timer threads
Two Proc.(vlow 1V, vhigh 5V)
One Proc.(v 5V)
37SoC refinement and exploration
- Implementation
- Final hardware
- Appl. software
- OS services optimized for application
- System requirements
- Abstract functionality
- Real-time constraints
- Target platform constr.
R E Q U I R E M E N T
SoC appl. timing
Application implementation (HW/SW)
Process mgmtconstr.
Memory mgmt constr.
R E A L
Process mgmt impl. (HW/SW)
Memory mgmt impl. (HW/SW)
Final platform (Silicon)
Target platform
38Refinement and exploration
- Memory mgmt
- Dynamic memory
- alloc / free (C)
- new / delete (C)
- abstract data type refinement
- virtual memory mgmt
- Static memory
- platform-independent code transformations
- real-time cost-optimal physical memory
organisation - Address optimisation
- Process mgmt
- Task level concurrency mgmt (platform indep.)
- transformations
- static/dynamic scheduling
- resource allocation
- Instruction-level concurrency mgmt
- refinement from unified HW/SW model
- RTOS modeling/simulation including timing
- traditional HW/SW co-design and compilers
39Refinement - OCAPI / MATISSE
- Implementation
- Target hardware
- OS services optimized for application
- Virtual prototype
- Soft implementationusing host OS and host
hardware
SoC appl. arch.
Application implementation (HW/SW)
V I R T U A L
Process mgmt
Memory mgmt
R E A L
Process mgmt impl. (HW/SW)
Memory mgmt impl. (HW/SW)
OSAPI
Target HW (Silicon)
Host HW (HP/PC)
40Unified Modeling and Refinement of HW and SW
OCAPI-xlC Class Lib
Flexible Primitives express
High LevelSystem Model
- Concurrency
- Communication
- Interface design/reuse
unified HW/SW model
Built-in Code Generators create
RefinedModel
- VHDL/Verilog/C
- Testbenches
41SoC design flow
C System Model
C
HW
SW
OSAPI
FSMD
42System Model
C
HW
SW
OSAPI
FSMD
43Global data management design flow for dynamic
concurrent tasks with data-dominated behaviour
Dynamic memory mgmt
Physical memory mgmt
Address optimization
44Data Management Flow
Abstract Data Type (ADT) Refinement
ADT
ConcreteData types
Dynamic Memory Mngnt.
Virtual memory mgmt (VMM) Refinement
VirtualMemorySegments
Physical memory mgmt(PMM) Refinement
PhysicalMemories
Physical Memory Mngnt.
45Matisse ADT refinement
ATM_cell Data_In Association_Table
Routing_Table Routing_Table new
Association_Table() Data_In new
ATM_cell() if ( Routing_Table-gtLookup(Data_In)
) ...
Impl. alternatives
46ADT refinement results
- Select best DT impl. for each ADT
LL(A)
LL(B)
PA(B)
LL(A)
PA(B)
BT(A)
PA(A)
Power cost function
AR(B)
Different data types
47VM size for ATM MUX in network 1
PA(9)
PA(9)
PA(9)
PA(5)
PA(5)
32
PA(5)
PA(5)
PA(5)
PA(5)
PA(5)
32
32
PA(5)
PA(5)
32
PA(5)
PA(5)
AR(4)
PA(5)
PA(9)
AR(4)
AR(4)
AR(4)
AR(4)
256
AR(4)
AR(4)
AR(4)
256
256
256
AR(4)
AR(4)
AR(4)
AR(4)
1 VMS Size 133 mm2 Power 110 mW
3 VMS Size 137 mm2 Power 37 mW
2 VMS Size 137 mm2 Power 49 mW
2 VMS Size 137 mm2 Power 68 mW
48Memory CPU Performance Bottleneck
Performance
1000
100
Moores Law
10
1
1980
1985
1990
1995
2000
Time
Patterson
49Data-transfer and data-storage bottlenecks SDRAM
access
50Data-transfer and data-storage bottlenecks cache
misses
51Data-transfer and data-storage bottlenecks
system bus load
Diskaccess bus
Main system bus
L2 bus
MainMemory
L2 cache
Data paths
L1 cache
Hard disk
System chip
OtherSystemResources
OtherSystemResources
52Memory Power Bottleneck
53Multi-processor System Design
54Platform design requires change
Application engineer
55Data Transfer Storage Principles
3 Exploit memory hierarchy
Local Latch 1 Bank 1
Processor Data Paths
L1 Cache
L2 Cache
Cache Bank Recombine
Local Latch N Bank N
Chip
Off-chip SDRAM
6 Exploit limited life-time and data layout
freedom
5 Meet real-time constraints
56Pareto curves allow task trade-off decision DAB
illustration
TASK-1
TASK-2
TASK-3
12
15
1000
y
8
10
g
r
e
500
n
E
4
5
0
0
0
0
10000
20000
30000
40000
0
50000
100000
0.0
2.0
4.0
6.0
Execution time
Execution time
Execution time
Mapped on two processors
Source Digital Audio Broadcast
57Pareto curves allowtask trade-off decision
TASK-1
TASK-2
TASK-3
12
15
1000
y
8
10
g
r
e
500
n
E
4
5
0
0
0
0
10000
20000
30000
40000
0
50000
100000
0.0
2.0
4.0
6.0
Execution time
Execution time
Execution time
Single proc. Large mem. overhead
Source Digital Audio Broadcast
58Pareto curves allowtask trade-off decision
TASK-1
TASK-2
TASK-3
12
15
1000
y
8
10
g
r
e
500
n
E
4
5
0
0
0
0
10000
20000
30000
40000
0
50000
100000
0.0
2.0
4.0
6.0
Execution time
Execution time
Execution time
Source Digital Audio Broadcast
59Cavity Detection Algorithm on Intel Pentium-MMX
(execution time)
60Resource limited software
TRIMEDIA processor
100
90
80
70
60
50
Percentage ()
40
30
20
10
0
Exec Time
Power
Bus Load
Initial Algorithm
DTSE Transformed
61Voice coder (SW cache) full power summary
Relative power
- Gain in power of additional factor 6 comparedto
optimized (platform independent code)
62MPEG - 4 Motion Estimation
1.0
Relative Power
0.5
0.0
Resulting Power Reduction 8
63Consistent Speed Up on Different Platforms for
MPEG4 video decoder
Performance of PI MPEG-4 Video Decoder on
Different Platforms
120.0
Pentium II 350 MHz
HP PA RISC 180 MHz
100.0
TriMedia 100 MHz
80.0
60.0
Framerate (frames/second)
40.0
20.0
0.0
M D CIF 120 kbps
Foreman CIF 450
Cal Mob CIF 2
30 fps
kbps 25 fps
Mbps 30 fps
64Power Reduced with Factor 21 to 48
Assesment Memory Power Reduction
(Proprietory Architecture)
5.0
4.5
4.0
3.5
3.0
2.5
Remaing Power ()
2.0
1.5
1.0
0.5
0.0
M D CIF 120
Foreman CIF 450
Cal Mob CIF 2
kbps 30 fps
kbps 25 fps
Mbps 30 fps
65Turbo coding principle
Decoder
Encoder
Û
C
Y
U
C 1
C 2
66Results
Original bit-rate 0.07 Mbit/s power 1.07
?J/bit latency 5900 ?s area 3.5 mm2
673D Texture Mapping using Mesa GL on TriMedia
TM1000
- Reduction in total cyclesby 44
- Reduction in Data cache accesses by factor 2
- Reduction in Instruction Cache accesses by 40
68Crisis in current (RT) design flow
E F F O R T
Ok?
69Objectives
Drastically shorten design time (months to
weeks!) ? raise the abstraction level
Meet timing constraints as soon as possible
? expose timing bottlenecks at higher level
Low implementation cost ? systematic
methodology to control cost
70(Re)Using High-Level Synthesis
Conventional HLS
ADOPTHLS
Less muxes/registers using ACUs (NOT
conventional High-Level Synthesis)
71Disabling the time-bomb for logic synthesis
Synthesis time (minutes)
160
120
80
40
0
Scheduling
Logic synthesis
72Exploration_at_High-level avoids complexity explosion
(V)HDL lines (x 103)
8
6
4
2
0
Behavior
RT
Gate
73Efficient use of high-level synthesis (I)
reduced cost
Gates (x 103)
After logic synthesis !
7
6
5
Muxes
4
Registers
3
2
1
0
HLS
ADOPT HLS
74Efficient use of high-level synthesis (II)
improved delay
Critical path
After logic synthesis !
75Results for programmable processors cavity
detection
Performance (seconds)
16
14
12
IMAGE 1280x1000 pixels HP 9000/777 256 MB RAM
10
8
6
4
2
0
Adopt
Initial
DTSE
DTSE Adopt (Glb.Trf.)
DTSE Adopt (Loc.Trf.)
76Analog-Digital Co-Design FAST
Demonstrator 5 GHz WLAN terminal
Mixed-signal front-end architecture exploration
-gt tools
Analog/digital partitioning
MCM vs. on-chip passives
Digital channel filtering
Chip partitioning
Noise coupling in mixed-signal Ics -gt tools
methods
LO
Chip-package co-design -gt architectures
77Interaction with ROW
UNIVERSITIES KULeuven, RUGent, VUBrussel, EC univ
INDUSTRY
Problems Tools, IP,.
System Specialists
3 acad.staff 15 Ph.D. 125 researchers 10
Residents
Algorithm specialists
Residents
Circuit specialists
78The Desics pipeline
Alcatel National Philips Ericsson Intel ESA
D6 RESEARCH PROGRAMME
D6/ INDUSTRY TRANSFER PROJECTS
Industry product development
79Strategic Research Cooperation
- Wireless Local Area Network
- MPEG-4
- System-on-Chip Design Technology
80IMEC is part of a closed loop
- Closed Loop approach
- You cannot make an economic engine running
without a closed belt. - Only the right combination of ALL elements can
foster a successful industrial development,
based upon an increasingly knowledge based
society.
Knowledge creation
State-of-the-artscience parks
DSP Valley
Venture capital
Entrepreneurship
Permanent training initiatives
81Conclusions
- - requirements for future embedded system
applications learned from
IIAPs by - - building demonstrators
- - systematic design flows and methods
- - white box IP re-use
- - design automation
- - transfer through education training
- http//www.imec.be/ocapi
- http//www.imec.be/3/3.6.html
82Ivo Bolsens Vice President
Hugo De Man Senior Fellow
Paul Six Associate VP
Jean Roggen Manager Strategic Programmes
Niek Van Dierdonck DST DSP Technology Support
Annemie Stas Administration
Ivo Bolsens Vice President
Marc Engels Department Director
Stephane Donnay MIRA Mixed Signal and
RF Applications
Bert Gyselinckx WISE Wireless Systems
Serge Vernalde DBATE Digital Broadband Terminals
Francky Catthoor SEMP System Exloration for
Memory and Power
Didi Verkest EMSYS Embedded Systems
Jan Bormans MICS Multi-media Image Compression
Systems