Application Specific Processors with VLIW Architecture - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Application Specific Processors with VLIW Architecture

Description:

Better performance and lower power consumption (compared to general purpose processors) ... Instruction Execution Timings in various Architectures [Ref : Hwang et al] ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 45
Provided by: man147
Category:

less

Transcript and Presenter's Notes

Title: Application Specific Processors with VLIW Architecture


1
Application Specific Processors with VLIW
Architecture
  • Anshul Kumar
  • anshul_at_cse.iitd.ernet.in
  • Dept of CSE, I.I.T. Delhi
  • Jan 2, 2002

2
Outline
  • Why VLIW Architecture
  • Various Facets of VLIW Architecture
  • Exploring VLIW Design Space
  • Control Path
  • Data Path
  • VLIWs with Application Specific FUs

3
Outline
  • Why VLIW Architecture
  • Various Facets of VLIW Architecture
  • Exploring VLIW Design Space
  • Control Path
  • Data Path
  • VLIWs with Application Specific FUs

4
Motivation for ASIPs
  • Better performance and lower power consumption
    (compared to general purpose processors)
  • Higher flexibility and reuse potential (compared
    to ASICs)
  • Our focus gt High performance

5
Classification of Parallel Architectures
Parallel architectures PAs
Data-parallel architecture
Function-parallel architectures
Instruction-level PAs
Thread level PAs
Process-level PAs
ILPs
MIMDs
DPs
Pipelined processors
Superscalar processors
Distributed Memory MIMD
VLIWs
Shared Memory MIMD
Vector architectures
SIMDs
Associative And neural architectures
Systolic architectures
Ref Sima et al
6
Distinction between VLIW and Superscalar
processors
VLIW Approach
Cache/ memory
Fetch Unit
Single multi-operation instruction
FU
FU
FU
Register file
multi-operation instruction
Ref Sima et al
7
Distinction between VLIW and Superscalar
processors
Superscalar Approach
Decode and issue unit
Cache/ memory
Fetch Unit
Multiple instruction
FU
FU
FU
Sequential stream of instructions
Instruction/control
Register file
Data
FU
Funtional Unit
Ref Sima et al
8
Instruction Execution Timings in various
Architectures
Ref Hwang et al
9
VLIW History
  • The term coined by J.A. Fisher (Yale) in 1983
    ELI S12 (prototype)
    Trace (Commercial)
  • Origin lies in horizontal microcode optimization
  • Another pioneering work by B. Ramakrishna Rau in
    1982
    Poly cyclic (Prototype)
    Cydra-5 (Commercial)
  • Recent developments
    Trimedia Philips
    TMS320C6X Texas Instrumens

10
Why Superscalar Processors are commercially more
popular as compared to VLIW processor ?
  • Binary code compatibility among scalar
    superscalar processors of same family
  • Same compiler works for all processors (scalars
    and superscalars) of same family
  • Assembly programming of VLIWs is tedious
  • Code density in VLIWs is very poor
    - Instruction encoding schemes
    Area Performance

11
VLIW Architecture for ASIPs
  • Advantages of superscalar processors dont hold
    in ASIP domain
    Code compatibility with off the self
    processors - Use of off the self compilers
  • ASIPs require retargetable compilers or compiler
    generators
  • VLIWs fit nicely into ASIP philosophy analyze the
    application and adapt the architecture
  • Area scaned by omitting dynamic scheduling can be
    used for application specific features

12
Outline
  • Why VLIW Architecture
  • Various Facets of VLIW Architecture
  • Exploring VLIW Design Space
  • Control Path
  • Data Path
  • VLIWs with Application Specific FUs

13
Data path A simple VLIW Architecture
FU
FU
FU
Register file
Scalability ? Access time, area, power
consumption sharply increase with number of
register ports
14
Data path Clustered VLIW Architecture(distribut
ed register file)
Alternative 1
FU
FU
FU
FU
FU
FU
Register file
Register file
Register file
Interconnection Network
15
Data path Clustered VLIW Architecture(distribut
ed register file)
Alternative 2
FU
FU
FU
FU
Register file
Register file
outputs can be multicast
Interconnection Network
16
Controlling FUs by Instructions
  • Data stationary encoding
  • All fields of an instruction in same word
  • Time stationary encoding
  • Fields of different instructions which act
  • in same time slot in same word

17
Data Stationary Unicast type UniOp
Control flow for Data Stationary Unicast type
UniOp
18
Time Stationary Unicast type UniOp
Control flow for TimeStationary Unicast type UniOp
19
Data Stationary Multicasttype UniOp
Control flow for Data Stationary Multicast type
UniOp
20
Time Stationary Multicast type UniOp
Control flow for Time Stationary Multicast type
UniOp
21
Outline
  • Why VLIW Architecture
  • Various Facets of VLIW Architecture
  • Exploring VLIW Design Space
  • Control Path
  • Data Path
  • VLIWs with Application Specific FUs

22
Instruction Encoding NOP Compression
Book Hwang et al
23
Instruction Encoding Application Specific
  • Identify the limited patterns (contributions of
    fields) and encode
  • Eliminate constant bits / fields
  • Eliminate bits / fields derivable from other bits
    / fields

24
Synthesizing Instruction Set from parameterized
Micro Architecture
Huang et al, DAC 1994
Bit width specification for some instruction
field types
25
Synthesizing Instruction Set from parameterized
Micro Architecture
MOP Specification
26
Synthesizing Instruction Set from parameterized
Micro Architecture
MO1 m(r20)lt-r20
MO2r0lt-r20
MO3m(r21)lt-r21
MO4r1lt-r21
MO5r2lt-r22
MO6PClt-PC1024
Data/ Control flow graph of MOP of a basic block
27
Synthesizing Instruction Set from parameterized
Micro Architecture
Schedule I for the MOP in prev. slide and the
resulted instructions
28
Synthesizing Instruction Set from parameterized
Micro Architecture
Schedule II for the MOP in slide no. 20 and the
resulted instructions
29
Outline
  • Why VLIW Architecture
  • Various Facets of VLIW Architecture
  • Exploring VLIW Design Space
  • Control Path
  • Data Path
  • VLIWs with Application Specific FUs

30
Implementation of Basic Operations
Operations in given application - primitive
implemented in HW kernel - basic multiple
implementation alternatives -
HW choices - SW (primitive
other basic ops) Objective maximize performance
under given area and power constraints
M. Imai et al ISSS 1996
31
Problem Formulation
Solution vector X (x0, x1, , xi, ,
xn) Where x0 HW kernel for all primitive ops
xi implementation method for ith
operation Area constraint ?xi a(xi) ? A_max Where
a(xi) area for xi Power constraint ?xi p(xi) ?
P_max Where p(xi) power of xi Objective
function minimize T(X) Where T(X) execution
time for choice X
32
Estimating Execution Time T(X)
T(X) ?j Fj . (t(Bj,X) Cj) b Where Fj
execution frequency of basic block Bj
t(Bj,X) execution cycles for Bj using
X Cj cycles needed to
branch from Bj to other blocks b
execution cycles reduced by untaken branch
33
Estimating Block Execution Time t(Bj,X)
Computed from ?-(i) execution cycles for ith
basic operation ?-(i)
?(?u ki(u) . ?i(u)) / fi)? Where u a
distinct data value tuple for operation i
?i(u) execution cycles ki(u)
frequency count for u fi total
frequency count
34
Intercluster conections
BUS
Sanchez et al ISSS 2000
BUS
incoming value register
MUX
Local Register File
Cluster
Cluster
MUX network
Cluster
FU
FU
L1 Cache
L1 Cache
VLIW clustered architecture
Detailed architecture of a single cluster
35
Intercluster connectionsCompute Accelerator on
Transmogrifier - 2
Data/Signal Bus
Host Computer
PM
PM
Zhang et al, FPCCM 2000
Top level interconnection network
PM
PM
PM Processing Module
Field-Programmable Compute Accelerator
Top Level Architecture
36
Intercluster connectionsCompute Accelerator on
Transmogrifier - 2
To/From Top-Level Interconnection Network
FU
FU
FU
Control Logic
Zhang et al, FPCCM 2000
PC
PM-Level Interconnection Block
Control Store
Memory
RF
RF
Processing Module
From/To Host Computer
Processing Module (PM) Architecture
37
Parameters for each Processing Module
  • No. and sizes and and ports of storage units
  • No. and types of FUs
  • PM level interconnections
  • Top level interconnections

38
Outline
  • Why VLIW Architecture
  • Various Facets of VLIW Architecture
  • Exploring VLIW Design Space
  • Control Path
  • Data Path
  • VLIWs with Application Specific FUs

39
Coarse grain Fus with VLIW core
Busa et al, ISSS 2000
Multiplexer network
IR
Micro Code
Reg2
Reg1
Reg1
Reg1
Reg2
Reg2
Coarse grain FU
Prg. Counter Logic
MULT
RAM
ALU
Embedded (co)-processors as Fus in a VLIW
architecture
40
Application Specific FUs - 1
number of inputs
FU
functionality
number of outputs
latency initiation interval I/O time shape
41
Application Specific FUs - 2
access memory
include control
Mem/ Cache
RF
FU
42
Conclusions What needs to be done?
  • Define design space
  • Profile and analyze application
  • Identify architectural parameters
  • Identify special functional units
  • Estimate performance
  • Synthesize processor
  • Generate code
  • Validate design

43
Acknowledgements
  • Manoj Kumar Jain
  • manoj_at_cse.iitd.ernet.in
  • C. P. Joshi
  • csm99003_at_cse.iitd.ernet.in

44
Thanks
Write a Comment
User Comments (0)
About PowerShow.com