Title: Register Allocation for Clustered VLIW Architecture in IMPACT
1Register Allocation for Clustered VLIW
Architecture in IMPACT
Under the Guidance of Prof. Anshul Kumar
M.Sai Sasi Kiran 2001MCS015
2Presentation Outline
- Objective
- Background
- IMPACT compiler
- Instruction Set Pruning
- Architecture Representation
- Implementation
- Example
- Test Results
- References
3Objective
- To do register allocation for clustered VLIW
Architecture in IMPACT compiler framework. - Inputs Prepass scheduled intermediate code
- Processor Architecture
- Outputs Register Allocated intermediate code
4Presentation Outline
- Objective
- Background
- IMPACT compiler
- Instruction Set Pruning
- Architecture Representation
- Implementation
- Example
- Test Results
- References
5Typical VLIW Architecture
6Typical Clustered VLIW Architecture
Interconnection Network
Reg File
Reg File
Reg File
FU 1
FU 2
FU 3
FU 1
FU 2
FU 3
FU 1
FU 2
FU 3
CLUSTER 3
CLUSTER 1
CLUSTER 2
7Presentation Outline
- Objective
- Background
- IMPACT compiler
- Instruction Set Pruning
- Architecture Representation
- Implementation
- Example
- Test Results
- References
8IMPACT compiler
9Presentation Outline
- Objective
- Background
- IMPACT compiler
- Instruction Set Pruning
- Architecture Representation
- Implementation
- Example
- Test Results
- References
10Instruction Set Pruning
- Instruction Set in IMPACT
- Large instruction set with around 260
instructions. - Contains many complex instructions
- For eg. Multiply_add, Multiply_subtract
etc., - Contains some instructions which are not
supported by the public version.
11Need for Instruction Set Pruning
Register File
Interconnection Network
FU 1
FU n
AFU 1
AFU n
FU 2
Interconnection Network
- Architectures fine grain FUs are unioperation
type, follows SPARC instruction set. (i.e RISC
instruction set) - Better instruction encoding with less number of
opcodes
12Instruction Set Pruning (contd)
- Pruning the Instruction Set
- Splitting Large Complex instructions in... two or
more small instructions. - mac r1,r2,r3,r4 gt mul r1, r2, r5
- add r5, r3, r4
- Removing unimplemented Instructions
13Presentation Outline
- Objective
- Background
- IMPACT compiler
- Instruction Set Pruning
- Architecture Representation
- Implementation
- Example
- Test Results
- References
14Example of a Clustered VLIW Architecture
Register file 2
Register file 1
ALU
MEM
ALU
BR
ALU
MEM
ALU
15What is the additional information needed?
- Number of Clusters
- Size and Type of Register Files
- Number and Type of Functional Units
- Interconnection Network
16Internal representation of the Architecture
(contd)
17Presentation Outline
- Objective
- Background
- IMPACT compiler
- Instruction Set Pruning
- Architecture Representation
- Implementation
- Example
- Test Results
- References
18Overall Implementation Approach
- 1. Value Live Range Table Construction
2. Cluster assignment for Virtual Registers
(Each Virtual Register has only one value)
3. Addition of MOV statements to the prepass
scheduled intermediate code for data movement
between clusters
4. Register allocation for each value live range
19Value Live Range Table Construction
- Input
- - Prepass Scheduled intermediate code
- Output
- - Value live range table
20Value Live Range Table Construction (contd..)
- for each cycle (k)
- for each issued instruction (j)
- for each operand of type register (l) of
instruction (j) - if (register value is produced)
then - Add new entry of VR (I)
in the value live range table - Add the register details in the
produced column - else //the VR is already present in
table - Go to corresponding VR entry row in
the table - Add the register details in the
consumed column - end if
- end for
- end for
- end for
21Value Live Range Table Construction (contd..)
- Details
- Cycle number
- Slot number
- Whether it is source or destination
- Opcode of issued instruction (j)
22Example
VR3 VR1 VR2 Phase 1 Gives the live ranges
for VR1,VR2,VR3 Note Each VR contains only one
value
Register File 2
Register File 1
ALU
23Example Value Live Range Table
Represented as Cycle_Slot
24Cluster assignment for Virtual Registers
- Inputs
- - Value Live Range Table
- - Processor Architecture (HMDes Representation)
- - Prepass scheduled intermediate code
- Outputs
- - Value Live range table in which cluster
assignment for the virtual registers are
filled
25Cluster assignment for Virtual Registers
- for each live range (l)
- for each occurrence of Virtual Register (k)
- Find associated FU and port //Either
produces or consumes - if (FU port connected to single cluster) then
- assign VR to that cluster
- end if
- end for
- for each occurrence of Virtual Register(k)
- Find associated FU and port //Either produces
or consumes - if (FU port not connected to single cluster)
then - for (all the clusters to which FU port is
connected) do - count number of times the VR occurrs in the
corresponding cluster - end for
- assign VR to cluster in which it occurred max
times. - end if
- end for
- end for
26Example
VR3
VR3 VR1 VR2 Phase 1 Gives the live ranges
for VR1,VR2,VR3.
Register File 2
Register File 1
VR2
VR1
Phase 2 For VR2, Cluster 1 1 Cluster 2 3
ALU
27Data Movement Between Clusters (Mov Statements)
- Input
- - Prepass Scheduled intermediate code
- - Processor Architecture
- - Value Live Range Table
-
- Output
- - Modified Prepass scheduled intermediate code
28Data Movement Between Clusters (Mov Statements)
- for each live range (I)
- if (not all VR occurrences in the current Live
Range are in one cluster) then - if ( There exists a path between clusters )
then - Insert mov statements
- else
- Report Error
- Exit
- end if
- end if
- end for
29Example
VR3
Register File 2
Register File 1
VR2
VR1
ALU
In case VR1 is produced in RF2 (and required in
ALU) then according to the algorithm we use this
ALU to transfer the VR1 from RF2 to RF1
30Value Live Range ... Register Assignment
- for each cluster (I)
- Pass corresponding register file details to
register allocator - Perform register assignment for all the value
live ranges belonging to cluster (I) - end for
31Presentation Outline
- Objective
- Background
- IMPACT compiler
- Instruction Set Pruning
- Architecture Representation
- Implementation
- Example
- Test Results
- References
32Test Results Architecture
- Architecture
- Issue Width 8
- Number of Clusters 4
- Size of Integer Register File in each Cluster
16 - Size of Floating point Register File in each
Cluster 16 - Number of ALUs 8
- Number of LD/ST 8
- Number of FALUs 8
- Number of Branch units 1
33Interconnection Network
Register File 1
Register File 2
Register File 3
Register File 4
IALU 2 FALU 2 LD/ST 2 Branch 1
IALU 2 FALU 2 LD/ST 2
IALU 2 FALU 2 LD/ST 2
IALU 2 FALU 2 LD/ST 2
MEMORY
34Test Results Register Files
- CLUSTER No. Register File Type Machine registers
- CLUSTER 0 INTEGER CALLER REG FILE 128 ... 135
- CLUSTER 0 INTEGER CALLEE REG FILE 160 ... 167
- CLUSTER 1 INTEGER CALLER REG FILE 136 ... 143
- CLUSTER 1 INTEGER CALLER REG FILE 168 ... 175
- CLUSTER 2 INTEGER CALLER REG FILE 144 ... 151
- CLUSTER 2 INTEGER CALLEE REG FILE 176 ... 183
- CLUSTER 3 INTEGER CALLER REG FILE 152 ... 159
- CLUSTER 3 INTEGER CALLER REG FILE 184 ... 191
- CLUSTER 0 FLOAT CALLER REG FILE 192 ... 199
- CLUSTER 0 FLOAT CALLEE REG FILE 224 ... 231
- CLUSTER 1 FLOAT CALLER REG FILE 200 ... 207
- CLUSTER 1 FLOAT CALLER REG FILE 232 ... 239
- CLUSTER 2 FLOAT CALLER REG FILE 208 ... 215
- CLUSTER 2 FLOAT CALLEE REG FILE 240 ... 247
- CLUSTER 3 FLOAT CALLER REG FILE 216 ... 223
- CLUSTER 3 FLOAT CALLER REG FILE 248 ... 255
35Test Results Register Allocation for Cluster 0
- VR No. Type Machine Register Class
- 1 INT 163 CALLEE
- 2 INT 161 CALLEE
- 4 FLOAT 224 CALLEE
- 5 FLOAT 196 CALLER
- 14 FLOAT 194 CALLER
- 25 INT 134 CALLER
- 26 INT 132 CALLER
- 30 INT 131 CALLER
- 34 FLOAT 193 CALLER
- 38 FLOAT 192 CALLER
- 46 INT 135 CALLER
- 48 INT 133 CALLER
- 49 INT 135 CALLER
- 54 FLOAT 195 CALLER
- 67 INT 162 CALLEE
36Test Results (contd)
- VR No. Type Machine Register Class
- 68 INT 164 CALLEE
- 78 INT 133 CALLER
- 82 INT 160 CALLEE
- 83 INT 130 CALLER
- 86 INT 134 CALLER
- 88 INT 130 CALLER
- 92 INT 129 CALLER
- 93 INT 131 CALLER
- 94 FLOAT 192 CALLER
- 103 INT 128 CALLER
- 112 INT 129 CALLER
- 114 INT 128 CALLER
- 115 INT 132 CALLER
- 124 INT 128 CALLER
37Test Results Register Allocation for Cluster 1
- VR No. Type Machine Register Class
- 87 INT 142 CALLER
- 89 INT 168 CALLEE
- 90 INT 169 CALLEE
- 91 INT 137 CALLER
- 95 INT 141 CALLER
- 96 INT 143 CALLER
- 106 INT 136 CALLER
- 108 FLOAT 197 CALLER
- 109 INT 136 CALLER
- 110 INT 139 CALLER
- 116 INT 138 CALLER
- 119 INT 140 CALLER
- 121 INT 137 CALLER
38Test Results Register Allocation for Cluster 2
- VR No. Type Machine Register Class
- 3 INT 176 CALLEE
39Test Results Register Allocation for Cluster 3
- VR No. Type Machine Register Class
- 97 INT 157 CALLER
- 98 INT 158 CALLER
- 99 INT 184 CALLEE
- 100 INT 155 CALLER
- 101 INT 156 CALLER
- 102 INT 157 CALLER
- 104 INT 154 CALLER
- 105 INT 153 CALLER
- 107 INT 152 CALLER
40Test Results (contd)
- VR No. Type Machine Register Class
- 111 INT 152 CALLER
- 113 INT 158 CALLER
- 117 INT 152 CALLER
- 118 INT 155 CALLER
- 120 INT 156 CALLER
- 122 INT 154 CALLER
- 123 INT 153 CALLER
- 125 INT 159 CALLER
41Presentation Outline
- Objective
- Background
- IMPACT compiler
- Instruction Set Pruning
- Architecture Representation
- Implementation
- Example
- Test Results
- References
42References
- Compilers Principles, Techniques and ...ols,
Aho,Sethi,Ullman - Machine Independent Register Allocation for the
IMPACT Compiler, Richard - A Method for Register Allocation ... Loops in
Multiple Register File Architectures, Nikil Dutt,
Kennedy,David - Partitioned Register File for TTAs, Johan and
Henk - Scalar Program Performance on Multiple-Instruction
-Issue Processors with a Limited Number of
Registers, Scott, William, Pohua, Wen-mei - Instruction Scheduling for Clustered VLIW DSPs,
Rainer Leupers
43Acknowledgements
- Prof. Anshul Kumar
- Prof. M.Balakrishnan
- Dr. P.R. Panda
- Anup Gangwar
- Basant K. Dwivedi
44Thanks
Thanks