Lecture Assembly language and assemblers - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Lecture Assembly language and assemblers

Description:

So the question is - how are we to assign address values to the labels. ... if INITIALISE follows a label name then the number following INITIALISE is ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 27

Provided by: martin159

Category:

more less

Transcript and Presenter's Notes

Title: Lecture Assembly language and assemblers

1
Lecture Assembly language and assemblers

In this lecture I will cover
Assembly language and its relation to machine
languages
Examination of structure and operation od an
assembler as an example of a language translator

2
An assembler

We are going to look at Assemblers as an example
of a translator
Remember a translator converts the whole of a
source code file contents into an output file of
executable code for the given target machine
as our example we are using an assembler for an
assembly language version of Simple Machine
Language
Assemblers translate assembly language source
code into machine code

3
Assembly language

so what is an assembly language?
when writing at low level you could write
programs using numerical values for the
instructions and the address locations, etc. e.g.
in SML we might write,
2007
3008
2107

for very small programs and using SML which only
has a small number of instructions this might be
doable, but for anything else it becomes very
time consuming and prone to error
a programmer would have to remember
1. all the numerical values for all of the op
codes
2. the addresses of the locations being used for
data and in particular which locations are being
used to hold which data
3. the addresses of locations within the code to
branch to

to solve this problem assembly languages were
developed
the op code numeric values are replaced with
mnemonic names which represent symbolically the
operation the op code performs e.g. ADD instead
of 30, BRANCHNEG instead of 41,etc
the numeric address values are replaced with
label names which identify a given location,
which can be
1. location of instruction - where an instruction
is held (locations to branch to) or
2. location of data - where data is held

So assuming that address location 0A is the
address of the next instruction after the end of
a loop, then we can give that location a label
name like ENDOFLOOP1 and thus write
BRANCHNEG ENDOFLOOP1
to specify the instruction, instead of writing
410A

Equally assuming that address location 08 holds
some data we can give it a label like NUM2,
then in our assembly language we can write
ADD NUM2
instead of 3008
it is much easier to read and to write assembly
language code rather than machine code.

8
Op code mnemonics
9
Labels

As we have seen labels can be used to represent
address values - to locate instructions or data
So the question is - how are we to assign address
values to the labels.
There are 2 methods for doing this, one for each
of the 2 different types of location that a label
might represent.

Assigning address to labels when label
represents
1. Location of instructions. The label name is
placed at the start of a line or immediately
before a line which contains the instruction
which you want to be labelled (e.g. so we can
branch to it at some time) - label is given
address value of location in code of the
instruction it is labelling.
2. Location of data. The label name is placed at
the start of a line and a special directive to
the assembler is used to indicate that the
location in the code where the label is placed
is to be used to hold data and to permit the user
to initialise that location with a data value
We will see some examples later

11
Assembler directives

An assembler directive is an instruction to the
assembler program about something the assembler
must do, it is not a source code instruction
that will be translated into machine code for
later execution.
INITIALISE is an assembler directive to allow us
to initialise data at a specific location, which
we can access using a label
if INITIALISE follows a label name then the
number following INITIALISE is placed in the
location identified by the label

12
Characteristics of assembly languages

Characteristics of an assembly language include
1.one-to-one correspondence between assembly
language mnemonics and machine code instructions
2.one-to-one correspondence between label names
and memory addresses
in an assembly language for a real processor
there are many more instructions and thus
mnemonics, and many more sophisticated directives
to the assembler

13
Format of assembly language code

Typically source code is arranged into 4 fields
1. An optional label field - must start in the
first column of a line
2. An operation field - for mnemonics or
assembler directives - must NOT start in first
column of a line
3. An operand field - for label names that
represent locations of data or locations to
branch to - some operations e.g. HALT do not have
an operand
4. An optional comment field
Fields are separated by at least one space or tab

14
Example assembly language program for SML

READ FIRST
READ SECOND
LOAD FIRST
SUBTRACT SECOND first-second
BRANCHNEG OUT2ND if neg then
WRITE FIRST 2nd number
HALT is larger
OUT2ND WRITE SECOND otherwise
HALT first larger or
FIRST INITIALISE 0 same size
SECOND INITIALISE 0

15
Object code for example

1009
100A
2009
310A
4107
1109
4300
110A
4300
0000
0000
Previous slide easier to read, understand and
write

16
Major functions of assembler

1. Replace symbolic op-codes with numeric
op-codes
2. Replace symbolic location names (e.g. labels)
with actual numeric addresses
3. Allocate addresses into which numeric code is
to be placed i.e what to use as the start address
of code
4. Detect and report errors encountered

17
Implementation of an assembler

We are now going to look at how an assembler is
written (implemented in code). We will look at
how
1. To hold the information about the mapping
between instruction mnemonics and opcode values
and between label names and address values. This
is done using a symbol table.
2. To break up (analyse) the program text into
meaningful components so we can identify the
instruction mnemonics and label names, etc. This
is done using a tokenizer.

18
1. Use a symbol table

Symbol table,
1. Associates opcode mnemonic with numeric opcode
value and
2. Associates label name with numeric address e.g.

19
2. Use a tokenizer

During the assembly process it is necessary to
analyse(i.e. break down) into its components each
line of assembly source code
a Tokenizer takes a string as a parameter and
separates out from the string all the significant
components that make up the string
it analyses the string into a series of tokens
that are separated by delimiters

the default delimiters are white space characters
like ltspacegt and lttabgt so given a string from
earlier assembly language program,
OUT2ND WRITE SECOND otherwise
it would break it up into 5 tokens
OUT2ND
WRITE
SECOND
otherwise

21
Operation of an assembler

Structure of assembler determined by forward
referencing problem - a symbol may be used
(referenced) in an operand field before it has
been defined. This is a problem because the
assembler cannot generate machine code for the
instruction until the reference has been resolved
i.e. it knows what address value symbolic forward
reference represents.
Normally handled by reading source file in 2
passes.

22
First pass - to sort out the references

First pass reads through assembly language
program one line at a time, noting the occurrence
of labels (i.e. symbolic references) that occur
at the beginning of a line and storing them into
the symbol table, together with the location
address value they identify
to enter correct address value for a label the
assembler must know the address of the location
of the instruction being labelled or address of
location of data word being labelled

To do this, as the assembler works through one
line after another it calculates how much space
is required for each instruction and operand it
comes across and keeps the address of current
location in a location counter which is updated
as the assembler moves from one instruction to
the next
So when the assembler finds a label it knows what
the address of the current location is.
In SML this is easy to do because all
instructions take up the same space i.e. one
word, but in other assembly languages this is not
the case and the calculation is more complex.

24
Second pass - to output the translation

Second pass reads through source code one line at
a time, constructing the numeric instruction
values which are to be output. It does this by,
1. looking up the mnemonic op-codes in the symbol
table and reading out the numeric value that has
been associated with the given mnemonic.
2. looking up the operand labels in the symbol
table and reading out the address values which it
appends onto numeric opcode values to give
complete instruction word
3.Instruction word then output to object file

25
Relocatable code

Currently Simpletron loads a machine code program
into consecutive locations starting from address
0. It would be nice to be able to load a program
anywhere in memory and be able to run the program
from that location.
We can easily tell the virtual machine to load a
program starting from some address other than 0.
BUT - our machine language programs use address
values for the location of data and the location
of places to branch to. Loading the program
starting from a memory location other than 0 will
then mean that all the addresses in the machine
code will now be wrong.

To solve this problem the assembler must output
relocatable code, so that a loader (called a
relocating loader) can load the program anywhere
in memory. To do this the assembler
1. produces code as if it was going to be loaded
starting from address 0 as before but also,
2. adds information to each instruction record in
the object file indicating whether or not
instruction operand values need modification
The relocating loader then will add to operand
address values the value of the new start address
(often called an offset). This will give the
correct address.