Title: 32bit PowerPC Assembly Basics
132-bit PowerPC Assembly Basics
- Comparison instructions
- Branch and jump instructions
- Simple Code Sequences
2Where Are Branches Used?
- In C control statements
- If statement
- if (n gt 0)
-
- else
-
-
- While loop
- while (s ! NULL)
-
-
- For loop
- for (i 0 i lt N i)
-
-
- Do loop
- do
-
- while (s ! NULL)
- Others
- e.g. max (x gt y) ? x y
3Comparison Instructions
- To set up conditions in CR or XER bits
- Set by arithmetic/logic/shift instructions with
. suffix - Set by comparison instructions
- Compare signed word and unsigned word
- cmpw r3, r4 set CR0 as for signed r3-r4
- cmplw r3, r4 set CR0 as for unsigned r3-r4
- Cmplw compare logical
- Compare using immediate values
- cmpwi r3, 200 set CR0 as for signed r3-200
- cmplwi r3, 200 set CR0 as for unsigned r3-200
-
4Comparison Instructions
- Compare and set specific condition registers
- Comparison may specify which CR field to use
- cmpw cr3, r3, r4 set CR3 instead of CR0
- cmplwi cr2, r3, r4 logical and using immediate
and set CR2 - cmpw cr0, r3, r4 equivalent to cmpw r3, r4
5Branch Basic Terms
- branch condition, branch-target
- Unconditional branches
- Always jump to the target address
- Conditional branches
- Take the branch only if some condition holds
- Target address
- Determining the address of the next instruction
6Unconditional Branches
- Unconditional branches
- C Assembly
- while (1) loop addi r9, r9, 1
- XX1 b loop
(-4) The target loop is
specified as an offset from the current
instruction (PC-relative).
7Conditional Branches
- Commonly used branches
- Use condition register CR0 LT, GT, EQ, SO
- Common forms ble target_address
- ble branch if less then or equal ? GT0
- blt branch if less then ? LT1
- beq branch if equal ? EQ1
- bne branch if not equal ? EQ0
- bge branch if greater than or equal to ? LT0
- bgt branch if greater than ?GT1
- All encoded in the same instruction format (see
next)
8Conditional Branches
- Using CR fields
- bne cr2, target branch if EQ of CR2 is zero
- Example using branch with comparison
instructions - loop
- addi r3, r3, 1 increase r3
- cmpw r3, r4 compare r4
- bne target branch if r3 ! r4
- Example using different CR field
- loop
- addi r3, r3, 1 increase r3
- cmpw cr3, r3, r4 compare using cr3
- bne cr3, target branch if r3 ! r4
9Determining Target Address
- PC-relative next PC PC EXTS(PC-Offset
0b00) - Absolute next PC EXTS(PC-Offset 0b00)
- Register next PC value of register
- Can use two special registers LR or CTR
- Why sign-extension of an address (for absolute)?
- Are addresses ever negative?
- Upper address space usually reserved for I/O
addresses (say oxff000000 onwards). - 0xff00 gets sign-extended to 0xffffff00.
10Determining Target Address
- Use PC-relative or absolute addressing a
suffix - Use PC-relative address
- Use absolute address ba loop
- Update LR option l suffix
- If updating, save PC4 into LR
- Do not update LR b target_addr
- Update LR bl func_addr
- Update LR and use absolute address bla func_addr
-
- When do we want to save PC4?
11Underlying Details
Instruction format
0-5
6-10
11-15
16-29
30
31
bx
18
PC-Offset
AA
LK
16
BO
AA
LK
BI
BD
bcx
19
BO
LK
BI
00000
16
bclrx
19
BO
LK
BI
00000
528
bcctrx
- bx encodes 24-bit address (26-bit effective)
- bcx encodes 14-bit address (16-bit effective)
- bclrx uses LR register as target address
- bcctrx uses CR register as target address
- x representing AA and LK bits, e.g. l,
a, la
12Underlying Details
Instruction Fields
16
BO
AA
LK
BI
BD
bcx
- BO Branch options
- Encodes branching on TRUE or FALSE or on CTR
values - BI Index of the CR bit to use
- five bits index to 32 CR bits, 3-bit for CR
index, 2-bit to select LT, GT, EQ, or SO - BD Branch displacement
- 14-bit (16-bit effective), signed-extended
- AA absolute address bit
- 1 use absolute addressing 0 use PC-relative
addressing - LK link bit
- 1 update LR with PC4 0 do not update
13Underlying Details
BO and BI Fields
- Frequently used BO encoding in bc, bclr, and
bcctr - BO00100 (4) branch if the condition is false
- BO01100 (12) branch if the condition is true
- BO10100 (20) branch always
- BO10000 (16) decreases CTR then branch if
CTR!0 - Examples
- blt target_addr ? bc 12, 0, target_addr
- blt cr3, target_addr ? bc 12, 12, target_addr
- blr ? bclr 20, 0 unconditional branch to addr in
LR - bnelr target_addr ? bclr 4, 2 branch to LR if
not equal - Explanation bc 4, 14, target_addr branch if bit
14 in CR (CR3EQ) is false (because BO4) ? bne
cr3, target_addr
14Underlying Details
AA and LK fields
0-5
6-10
10-15
16-29
30
31
bx
18
Offset
AA
LK
16
BO
AA
LK
BI
BD
bcx
19
BO
LK
BI
00000
16
bclrx
19
BO
LK
BI
00000
528
bcctrx
- Branch examples using AA and LK bits (zeros by
default) - bl target_addr branch and save PC4 in LR
- ba target_addr branch using absolute addressing
- bla target_addr branch using absolute
addressing - and save PC4 in LR
15Support Procedure Call/Return
- Link Register
- Supporting function calls
- A parent function calls a child function bl
child_func - LR lt PC 4
- PC lt child function address
- The child function executes
- The child function returns blr
- PC lt LR
- The parent function continues
- blr branch to link register address
- All bx forms b, ba, bl, bla
- LK 1 if link register is to be updated
- Q What to do if the child function calls another
function?
16Simple Code Sequences
- How to translate
- C arithmetic expressions
- C if statement
- C for loops
- Function calls (next week)
17C Arithmetic Expressions
- Basic operations
- static int sum
- static int x1, x2
- static int y1, y2
-
- sum (x1x2)-(y1y2)100
- Assembly
- lwz r3, 4(r13) load x1
- lwz r0, 8(r13) load x2
- add r4, r3, r0 x1x2
- lwz r3, 12(r13) load y1
- lwz r0, 16(r13) load y2
- add r0, r3, r0 y1y2
- subf r3, r0, r4 minus
- addi r0, r3, 100 add 100
- stw r0, 0(r13) store sum
Q What would happen if signed is changed to
unsigned?
18C Arithmetic Expressions
- Sign extension
- static short sum
- static short x1, x2
- static short y1, y2
-
- sum (x1x2)-(y1y2) 100
- Assembly
- lha r3, 2(r13) load x1
- lha r0, 4(r13) load x2
- add r4, r3, r0 x1x2
- lha r3, 6(r13) load y1
- lha r0, 8(r13) load y2
- add r0, r3, r0 y1y2
- subf r3, r0, r4 minus
- addi r0, r3, 100 add 100
- sth r0, 0(r13) store sum
19If-then-else
- C Program
- if (x gt y)
- z 1
- else
- z 0
- Assembly
- cmpw r3, r4
- ble skip1
- li r31, 1
- b skip2
- skip1 li r31, 0
- skip2
- Notes
- Code generated by CodeWarrior and then revised
- x ? r3 y ? r4 z ? r31
- li r31, 1 gt addi r31, 0, 1 li called simplified
mnemonic
20If-then-else
- C Program
- static int x, y
- static int max
- if (x y gt 0)
- max x
- else
- max y
- Assembly
- lwz r4, 0(r13) load y
- lwz r0, 4(r13) load x
- subf r0, r4, r0 x-y
- cmpwi r0, 0x0000 x-ygt0?
- ble skip1 no, skip maxx
- lwz r0, 0(r13) load x
- stw r0, 8(r13) maxx
- b skip2 skip maxy
- skip1 lwz r0, 4(r13) load y
- stw r0, 8(r13) maxy
- skip2
- Notes
- Generated by CodeWarrior and then revised
- Can you optimize the code? i.e. reduce number of
instruction but produce the same output
21If-then-else
Binary code
- Disassembled code
- Address Binary Assembly
- 00000048 7C001800 cmpw r0,r3
- 0000004C 4081000C ble 12
- 00000050 3BE00001 li r31,1
- 00000054 48000008 b 8
- 00000058 3BE00000 li r31,0
- 0000005C
- Assembly Source
- cmpw r0, r3
- ble skip1
- li r31, 1
- b skip2
- skip1 li r31, 0
- skip2
22For loop
- C code
- static int sum
- static int X100
-
- int i
- sum 0
- for (i 0 i lt 100 i )
- sum Xi
- Assembly
- li r0, 0 sum 0 sum?r31
- stw r0, 0(r13) sum 0
- li r31, 0 i?r31
- b cmp_
- loop slwi r4, r31, 2 r4i4
- lis r3, X_at_ha load X address
- ori r3, r3, X_at_lo load X address
- add r3, r3, r4 Xi address
- lwz r4, 0(r3) load Xi
- lwz r0, 0(r13) load sum
- add r0, r0, r4 sumXi
- stw r0, 0(r13) store sum
- addi r31, r31, 1 increase i
- cmp_ cmpwi r31, 0x0064 0x64 100
- blt loop
- (generated by CodeWarrior and then revised)
Exercise (1) How many instructions will be
executed? (2) Optimize the code to reduce the
loop body to 4 instructions (3) further reduce
the loop body to 3 instructions. Loop body
includes the branch instruction.