Title: Appendix C: Assembly Language Programming
1Appendix C Assembly Language Programming
- CS 271 Computer Architecture
- Indiana University Purdue University Fort Wayne
2Machine and assembly language
- Machine language
- Used to program the ISA level of a computer
system - ISA level is just above the microarchitecture
level - Instructions consist of strings of 1s and 0s
- Writing programs is very difficult and tedious
- Assembly language
- An easier-to-use symbolic representation of
machine language - Uses mnemonics for operations
- ADD, MUL, MOV, CMP, PUSH, etc.
- Also includes calls to operating system service
routines - An assembly language program is translated into
machine language by a program called an assembler
3Intel 8088 assembly language
- Every distinct processor has its own assembly
language - However, most assembly languages are similar
- The 8088 processor was used in the original IBM
PC - 8088 assembly language programs run on modern
Pentium 4 processors - Most of the Pentiums core instructions are the
same as the 8088s - But act, however, on 32-bit registers instead of
16-bit registers - Learning about computer architecture is
facilitated by learning an assembly language - Intel 8088 assembly language is a good choice and
also is a gentle introduction to Pentium assembly
language programming
48088 sample assembly language program
(a) An assembly language program (b) The
corresponding tracer display
58088 sample assembly language program
- In the sample program make note of . . .
- Constant definitions (used by the assembler)
- _EXIT 1, _WRITE 4, _STDOUT 1
- Pseudoinstructions (commands to the assembler)
- .SECT .TEXT ! Activity section
- .SECT .DATA ! Data section
- .ASCII Hello World\n ! 12-byte string
initialization - Labels (converted to memory addresses by
assembler) - start, hw, de
- Instructions (translated into machine language)
- MOV, PUSH, ADD, SUB
- Operating system call SYS
6Intel 8088 assembly language
- To program the 8088, it is necessary to have
detailed knowledge of the instruction set
architecture - Processor and fetch / execute cycle
- Registers
- Memory and addressing
- The instruction set
78088 processor and fetch / execute cycle
- The fetch / execute cycle involves special
registers - Program Counter (PC)
- Also known as Instruction Pointer (IP)
- The PC always contains the address of the next
instruction - Fetch-execute cycle
- Fetch the next instruction from the memory
location referred to by the - PC register
- 2. Increment the PC
- 3. Decode the fetched instruction
- 4. Fetch any needed data from memory and/or
processor registers - 5. Execute the instruction
- Store the results of the instruction in memory
and/or registers - Go back to step 1 and repeat
88088 registers
- There is a set of 14 registers
- Each register is 16-bits wide
98088 registers
- The registers AX, BX, CX, and DX are general
registers - The High and Low bytes of each can be accessed
- E.g., AH and AL are the high and low bytes of AX
- AX is the accumulator
- Used for results and as the target of many
instructions - BX is the base register
- Used as a pointer to a memory address for some
instructions - CX is the counter register
- Used for a loop counter
- Automatically decremented
- Loop ends when CX reaches 0
- DX is the data register
- Used with AX to hold high-order bits when a
32-bit long word when needed
108088 registers
- Maximum memory is 1 MB
- Each byte is addressable
- 20-bit addresses are needed to address 1 MB (
220) - 16-bit registers can address only 216 64 KB
- Segment registers point to the base address of a
64 KB area of memory known as a segment - A segment register gives the 16 high-order bits
- Remaining 4 bits are all zero
- Base addresses must be evenly divisible by 24
16 - Segment registers are CS, DS, SS, and ES
118088 segment registers
- CS is the code pointer
- Points to the code segment containing program
instructions - DS is the data pointer
- Points to the data segment containing program
data - SS is the stack pointer
- Points to the system stack segment used for
subroutine linkage - ES points to the extra segment
- Can be used whenever another segment is needed
12The 8088 system stack
- The system stack . . .
- Holds stack frames and temporary variables
- Grows toward smaller addresses
- Only 2-byte words are allowed at even addresses
- A stack frame is created each time a method is
called for holding . . . - Return address
- Parameters
- Local variables
- Temporary variables are also pushed onto and
popped from the stack as the subroutine runs
138088 pointer and index registers
- There are four registers in this group SP, BP,
SI, DI - SP is the stack pointer
- An index that is added to SS to point to the top
of the system stack - For a PUSH or CALL, the SP is decremented
- For a POP, the SP is incremented
- BP is the base pointer
- Contains an index to a location within the stack
- Typically points to the beginning of the current
subroutines stack frame
148088 pointer and index registers
- SI is the source index register
- DI is the destination index register
- SI and DI are often used . . .
- with BP to address data on the stack
- with BX to compute the address of a data location
in memory - An additional index register is the PC (also
called IP) - The PC indexes into into the code segment to
address the next instruction in a program - The programmer has no direct control over the PC
15Segments
higher addresses
stack
combined stack and data segment (64 KB)
BP
current stack frame
- Typically, the stack segment and the data segment
are the same
SP
top of stack
data
SSDS
code segment (64 KB)
program code
PC
CS
0
memory
16 Condition codes and flags
- The flag register is actually a set of 1-bit
registers - Also called condition code register
- Some of the bits are set according to the result
of arithmetic instructions - Z set if result is 0
- S set result is negative
- O set if overflow occurred
- C set by a carry
- P set according to the parity of the result
- Other bits control processor operation
- I bit enables interrupts
- T bit enables tracing mode for debugging
- D bit controls direction of string operations
17Data
- The 8088 supports 4 data types
- 1-byte byte
- 2-byte word
- 4-byte long
- binary coded decimal (not supported by
interpreter) - The 8088 is little endian
- The low-order part of a word is stored in the
lower address - A long is stored in the AX DX combination with
the low-order word in AX
18Addressing
- Addressing refers to techniques (addressing
modes) for representing the locations of data
elements in memory or in registers - An operand is an assembly language code used to
represent a data element - An effective address is an address in a memory
segment - Parentheses around a register indicate the
register is a pointer to the effective address - In describing addressing, the symbol indicates
a numerical value or label
19Addressing
- Instructions can have 0, 1, or 2 operands
- The operands of two-address instructions are
typically called destination and source - Example MOV AX, BX
- AX is the destination
- BX is the source
- This instruction replaces the contents of AX by a
copy of BX - Sometimes an operand is implicit (not mentioned)
- Example MULB BL
- This multiplies AX by BL (1 byte) and stores the
result in AX
20Addressing modes
- Register addressing example AL and CL
- MOV CL, AL
- The operand is simply the name of a byte or word
register
21Data segment addressing modes
- Direct addressing examples ()
- ADD CX, (20)
- The word in the data segment at index 20 is added
to CX - Involves addresses 20 and 21
- ADD CL, (20)
- The byte in the data segment at index 20 is added
to CX - Register indirect addressing example (SI)
- MOV CX, (SI)
- Move the data segment word pointed to by SI into
CX
22Data segment addressing modes
- Register displacement addressing example 20(SI)
- MOVB AL, 20(SI)
- If SI contains 17, then the effective address is
byte 37 - Move the data segment byte at effective address
37 into AL - Register with index addressing example (BX)(DI)
- PUSH (BX)(DI)
- The effective address is the sum of the BX and DI
registers - PUSH the data segment word at the effective
address on the system stack
23Data segment addressing modes
- Register index displacement addressing example
(BX)(DI) - This combines the previous two modes
- PUSH 20(BX)(DI)
- The effective address is the sum of the BX and DI
registers plus 20
24Stack segment addressing modes
25Stack segment addressing modes
- Except for direct addressing, all the data
segment modes carry over for the stack segment - However . . .
- The BP pointer is used in place of BX
- Neither SI nor DI may be used in indirect or
displacement modes - The names of the stack segment addressing modes
are - Base pointer indirect
- Base pointer displacement
- Base pointer with index
- Base pointer index displacement
26Immediate addressing
- With immediate addressing, a source operand is a
constant byte or word - Example
- MOV AX, 23
- The AX register is loaded with a decimal 23
27Implied addressing
- The operand is implicit in the instruction itself
- Example PUSH AX
- This decrements SP by 2 and copies AX to the
location pointed to by SP - Example CLC
- Sets the carry flag
28The 8088 instruction set
- There are various groups of instructions
- Data transfer
- Arithmetic
- Logical
- Shift and rotate
- Test and bit flag
- Looping
- Repetitive string operations
- Jump and call
29Notation
- Operand type
- r - a register
- e - an effective address in memory
- - immediate data
- label
- string
- Direction, if relevant or
- Status flags indicates the flag is
affected - MOV(B) indicates both . . .
- word version MOV
- byte version MOVB
30The 8088 instruction set
- Move (actually copy), exchange, and stack
instructions - Arithmetic (addition / subtraction,
multiplication / division)
31The 8088 instruction set
- More arithmetic
- Logical
- Shift and rotate
32The 8088 instruction set
- Test and bit flag
- Looping (destination label must be within 128
bytes of PC) - Repetitive string operations REPx (used with next
group) - Jump and call
33Jump, call, and return instructions
- CALL and unconditional JMP may be near or far
- A near jump is within the current code segment
- A far jump . . .
- is anywhere within the 20-bit address space
- A new value for CS must be supplied
- Conditional jumps
- Must be within 128 bytes of the PC
- Otherwise, for example, replace
- by
- This is done automatically by the assembler
JNZ ahead JMP
farlabel ahead - - -
JZ farlabel
34Conditional jumps
- Usually depend on the values in status flags
- Status flags are set by a prior TEST of CMP
instruction - For signed operations . . .
- Use greater than or less than
- For unsigned operations . . .
- Use above or below
35Conditional jumps
36Subroutine call and return instructions
- Parameters (arguments) are pushed onto the stack
in reverse order prior to the call - The subroutine call instruction CALL . . .
- Pushes the PC onto the stack
- This is the return address
- Loads the PC with the label or effective address
- The return instruction RET . . .
- Pops the return address from the stack and stores
it in the PC - Execution thus continues at the instruction
immediately after the CALL instruction
37Subroutine calls
- RET , with immediate
- Adds bytes to the SP to eliminate arguments
from the stack - To access arguments, the subroutine should
- Push BP onto the stack
- Copy SP into BP
- Now . . .
- Return address is at BP 2
- Argument 2 is at BP 6
- Local variable 2 is a BP - 4
38Subroutine calls
- To clean up local variables and temporary results
on the stack before returning . . . - Copy BP into SP
- Pop the old BP
- Good subroutine practice
- Caller should assume AX and DX will change
- The caller should stack them prior to pushing
arguments as needed - The subroutine should stack any registers it will
change and restore them prior to returning
39System calls and function calls
- System calls invoke operating system services
- To invoke . . .
- Push the needed arguments in reverse order
- Push the call number
- Execute SYS
40System calls
- After a system call
- Return values are left in AX or DXAX
- Arguments remain on the stack
- Caller should adjust SP accordingly
- It is good practice to define system call numbers
as constants at the beginning of an assembler
program - _OPEN and _CREAT have 2 arguments
- The name argument is the effective address of the
start of a string for the file name - The second argument is 0, 1, or 2
- 0 open for reading
- 1 open for writing
- 2 open for both reading and writing
- The return integer in AX is a file descriptor to
be used for reading, writing, and closing the file
41System calls
- Some files are automatically opened
- Standard input (descriptor 0)
- Standard output (descriptor 1)
- Standard error output (descriptor 2)
- _READ and _WRITE have 3 arguments
- File descriptor
- The starting address of a buffer to hold data
- Number of bytes to transfer
- _CLOSE involves only the file descriptor
42Function calls
- _GETCHAR reads one character from standard input
and puts it in AL (with AH set to zero) - _PUTCHAR writes a byte to standard output
- _PRINTF outputs formatted information
- The first argument is the address of a format
string - d converts an integer to a decimal string
- x and o convert to hex and octal, respectively
- There should be one argument on the stack for
each value expected by the format string - s indicates a null-terminated string with
effective address on the stack - Format string x d and y d\n prints 2
numbers followed by a line feed
43Function calls
- _SPRINTF is like _PRINTF except the formatted
string is sent to a buffer in memory - _SSCANF is the reverse of _SPRINTF and . . .
- Reads a string containing integers in decimal,
octal, or hex from a buffer - Converts the values according to a format string
- Places the converted values into memory locations
indicated by additional arguments
44The assembler
- An assembler is a program that translates an
assembly language program into machine language - The assembly language program is written with
mnemonics such as ADD and AX together with labels
and constant definitions to represent computer
activity in symbolic form - The output of the assembler is an object file
- The object file must be combined with the object
files of any needed system subroutines - A linker program performs this task
- The result is a single executable binary file
- The executable binary file may be loaded into
memory and executed
45The assembler
- An assembler typically makes two passes through a
program in order the translate it - Pass 1 builds a symbol table
- A symbol table associates the identifiers used
for labels and constant definitions with numbers - Constants can be entered directly
- Labels represent addresses and must calculated
- Label calculation
- The assembler maintains an internal location
counter that keeps track of the number of bytes
allocated so far for data and instructions - When a label appears, it is given the current
value of the location counter
46The assembler
- Pass 2 does code generation
- The value of every symbol is known at the
beginning of pass 2 - Each instruction is read again and . . .
- If an instruction refers to a label, the symbol
table is consulted - The numerical equivalent is written into the
object file - The assembler also initializes data in any data
section - This results from pseudoinstructions such as . .
. - message .ASCII Hello World
- table .WORD 11, 19, 26
- Note constant definitions, labels, and
documentation are not carried over into the
object file
47The as88 assembler
- To assemble prog.s, enter as88 prog at a command
prompt - Program comments start with ! and continue until
end-of-line - Program sections
- .SECT .TEXT
- For processor instructions placed in the code
segment - .SECT .DATA
- To reserve memory in the data segment that and
initialize it - .SECT .BSS
- Block Started by Symbol section
- To reserve memory in the data segment that is not
initialized - A program may have many occurrences of each
section - However, .TEXT must be first, .DATA second, and
.BSS third - The linker arranges the sections in the code and
data segments - Each section has its own location counter
48The as88 assembler
- Labels
- Any instruction or data word may begin with a
label - A label all by itself is associated with the next
instruction of word - Global labels
- Alphanumeric identifier followed by a colon, such
as here - These must all be unique and not keywords or
mnemonics - Must appear at the start of each section
- Local labels
- Single digit followed by a colon, such as 5
- Instruction JMP 3f jumps forward to the closest
3 label - Instruction JMP 2b jumps backward to the closest
2 label
49The as88 assembler
- Constant symbols may be defined
- TABLESIZE 100
- System defined values usually begin with an
underscore - _WRITE 4
- Numerical values
- Decimal
- Default, as in 1234
- Hex
- Starts with Ox, as in Ox713
- Constants and labels
- Only the first 8 characters are significant
- Arithmetic operations are allowed by the
assembler , -, , /, - For grouping, use square brackets instead
of parentheses
50The as88 assembler
- Pseudoinstructions
- Directives to the assembler
- .BYTE, .WORD, and .LONG expect comma-separated
list of constant expressions
51The as88 assembler
- Pseudoinstructions
- .ASCIZ and .ASCII
- Represent the string supplied in double quotes
- .ASCIZ appends an additional zero byte
- Escape symbols are allowed in strings
52The as88 assembler
- Pseudoinstructions
- .SPACE n increments the location counter by n
- Used in the .BSS section to reserve memory area
- .ALIGN 2 and .ALIGN 4 advance the location
counter to the first address evenly divisible by
2 or 4, respectively - Used before the .WORD and .LONG
pseudoinstructions - .EXTRN identifier requests that the identifier is
made available to the linker for external
references - Used, for example, when identifier is the entry
point of a subroutine that will be called from a
separately assembled program
53The as88 tracer
- After assembly of program prog.s, enter . . .
- s88 prog to run the program
- t88 prog to trace or debug the program
- Tracer windows are indicated below
54The as88 tracer
Each command must be followed by a carriage
return (the Enter key). An empty box indicates
that just a carriage return is needed. Commands
with no Address field listed above have no
address. The symbol represents an integer
offset.
55The as88 tracer
Each command must be followed by a carriage
return (the Enter key). An empty box indicates
that just a carriage return is needed. Commands
with no Address field listed above have no
address. The symbol represents an integer
offset.
56Examples
- To understand techniques for programming in
assembly language study five of the examples
found in the textbook - Hello World example, HlloWrld.s, pp. 736 739
- General registers example, genReg.s, pp. 740
741 - Vector product example, vecprod.s, pp. 741 744
- Debugging the arrayprt.s program, pp. 744 747
- Dispatch table example, jumptbl.s, pp. 750 752
- The description of the code leads you through
- Study and understand the program code
57Hello World example, HlloWrld.s
(a) An assembly language program (b) The
corresponding tracer display
58General registers example, genReg.s
- (a) Part of a program.
- (b) The tracer register window after line 7 has
been executed. - (c) The tracer register window after 7 loop
iterations (note DXAX pair)
59Vector product example, vecprod.s
60Vector product example (continued)
61Vector product example (continued)
- Execution of vecprod.s when it reaches line 28
for the first time.
62Debugging the arrayprt.s program
63Dispatch table example, jumptbl.s
- A program demonstrating a multiway branch using a
dispatch table.
64Dispatch table example (continued)