Lecture Assembly language and assemblers - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Lecture Assembly language and assemblers

Description:

So the question is - how are we to assign address values to the labels. ... if INITIALISE follows a label name then the number following INITIALISE is ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 27
Provided by: martin159
Category:

less

Transcript and Presenter's Notes

Title: Lecture Assembly language and assemblers


1
Lecture Assembly language and assemblers
  • In this lecture I will cover
  • Assembly language and its relation to machine
    languages
  • Examination of structure and operation od an
    assembler as an example of a language translator

2
An assembler
  • We are going to look at Assemblers as an example
    of a translator
  • Remember a translator converts the whole of a
    source code file contents into an output file of
    executable code for the given target machine
  • as our example we are using an assembler for an
    assembly language version of Simple Machine
    Language
  • Assemblers translate assembly language source
    code into machine code

3
Assembly language
  • so what is an assembly language?
  • when writing at low level you could write
    programs using numerical values for the
    instructions and the address locations, etc. e.g.
    in SML we might write,
  • 2007
  • 3008
  • 2107

4
  • for very small programs and using SML which only
    has a small number of instructions this might be
    doable, but for anything else it becomes very
    time consuming and prone to error
  • a programmer would have to remember
  • 1. all the numerical values for all of the op
    codes
  • 2. the addresses of the locations being used for
    data and in particular which locations are being
    used to hold which data
  • 3. the addresses of locations within the code to
    branch to

5
  • to solve this problem assembly languages were
    developed
  • the op code numeric values are replaced with
    mnemonic names which represent symbolically the
    operation the op code performs e.g. ADD instead
    of 30, BRANCHNEG instead of 41,etc
  • the numeric address values are replaced with
    label names which identify a given location,
    which can be
  • 1. location of instruction - where an instruction
    is held (locations to branch to) or
  • 2. location of data - where data is held

6
  • So assuming that address location 0A is the
    address of the next instruction after the end of
    a loop, then we can give that location a label
    name like ENDOFLOOP1 and thus write
  • BRANCHNEG ENDOFLOOP1
  • to specify the instruction, instead of writing
  • 410A

7
  • Equally assuming that address location 08 holds
    some data we can give it a label like NUM2,
    then in our assembly language we can write
  • ADD NUM2
  • instead of 3008
  • it is much easier to read and to write assembly
    language code rather than machine code.

8
Op code mnemonics
9
Labels
  • As we have seen labels can be used to represent
    address values - to locate instructions or data
  • So the question is - how are we to assign address
    values to the labels.
  • There are 2 methods for doing this, one for each
    of the 2 different types of location that a label
    might represent.

10
  • Assigning address to labels when label
    represents
  • 1. Location of instructions. The label name is
    placed at the start of a line or immediately
    before a line which contains the instruction
    which you want to be labelled (e.g. so we can
    branch to it at some time) - label is given
    address value of location in code of the
    instruction it is labelling.
  • 2. Location of data. The label name is placed at
    the start of a line and a special directive to
    the assembler is used to indicate that the
    location in the code where the label is placed
    is to be used to hold data and to permit the user
    to initialise that location with a data value
  • We will see some examples later

11
Assembler directives
  • An assembler directive is an instruction to the
    assembler program about something the assembler
    must do, it is not a source code instruction
    that will be translated into machine code for
    later execution.
  • INITIALISE is an assembler directive to allow us
    to initialise data at a specific location, which
    we can access using a label
  • if INITIALISE follows a label name then the
    number following INITIALISE is placed in the
    location identified by the label

12
Characteristics of assembly languages
  • Characteristics of an assembly language include
  • 1.one-to-one correspondence between assembly
    language mnemonics and machine code instructions
  • 2.one-to-one correspondence between label names
    and memory addresses
  • in an assembly language for a real processor
    there are many more instructions and thus
    mnemonics, and many more sophisticated directives
    to the assembler

13
Format of assembly language code
  • Typically source code is arranged into 4 fields
  • 1. An optional label field - must start in the
    first column of a line
  • 2. An operation field - for mnemonics or
    assembler directives - must NOT start in first
    column of a line
  • 3. An operand field - for label names that
    represent locations of data or locations to
    branch to - some operations e.g. HALT do not have
    an operand
  • 4. An optional comment field
  • Fields are separated by at least one space or tab

14
Example assembly language program for SML
  • READ FIRST
  • READ SECOND
  • LOAD FIRST
  • SUBTRACT SECOND first-second
  • BRANCHNEG OUT2ND if neg then
  • WRITE FIRST 2nd number
  • HALT is larger
  • OUT2ND WRITE SECOND otherwise
  • HALT first larger or
  • FIRST INITIALISE 0 same size
  • SECOND INITIALISE 0

15
Object code for example
  • 1009
  • 100A
  • 2009
  • 310A
  • 4107
  • 1109
  • 4300
  • 110A
  • 4300
  • 0000
  • 0000
  • Previous slide easier to read, understand and
    write

16
Major functions of assembler
  • 1. Replace symbolic op-codes with numeric
    op-codes
  • 2. Replace symbolic location names (e.g. labels)
    with actual numeric addresses
  • 3. Allocate addresses into which numeric code is
    to be placed i.e what to use as the start address
    of code
  • 4. Detect and report errors encountered

17
Implementation of an assembler
  • We are now going to look at how an assembler is
    written (implemented in code). We will look at
    how
  • 1. To hold the information about the mapping
    between instruction mnemonics and opcode values
    and between label names and address values. This
    is done using a symbol table.
  • 2. To break up (analyse) the program text into
    meaningful components so we can identify the
    instruction mnemonics and label names, etc. This
    is done using a tokenizer.

18
1. Use a symbol table
  • Symbol table,
  • 1. Associates opcode mnemonic with numeric opcode
    value and
  • 2. Associates label name with numeric address e.g.

19
2. Use a tokenizer
  • During the assembly process it is necessary to
    analyse(i.e. break down) into its components each
    line of assembly source code
  • a Tokenizer takes a string as a parameter and
    separates out from the string all the significant
    components that make up the string
  • it analyses the string into a series of tokens
    that are separated by delimiters

20
  • the default delimiters are white space characters
    like ltspacegt and lttabgt so given a string from
    earlier assembly language program,
  • OUT2ND WRITE SECOND otherwise
  • it would break it up into 5 tokens
  • OUT2ND
  • WRITE
  • SECOND
  • otherwise

21
Operation of an assembler
  • Structure of assembler determined by forward
    referencing problem - a symbol may be used
    (referenced) in an operand field before it has
    been defined. This is a problem because the
    assembler cannot generate machine code for the
    instruction until the reference has been resolved
    i.e. it knows what address value symbolic forward
    reference represents.
  • Normally handled by reading source file in 2
    passes.

22
First pass - to sort out the references
  • First pass reads through assembly language
    program one line at a time, noting the occurrence
    of labels (i.e. symbolic references) that occur
    at the beginning of a line and storing them into
    the symbol table, together with the location
    address value they identify
  • to enter correct address value for a label the
    assembler must know the address of the location
    of the instruction being labelled or address of
    location of data word being labelled

23
  • To do this, as the assembler works through one
    line after another it calculates how much space
    is required for each instruction and operand it
    comes across and keeps the address of current
    location in a location counter which is updated
    as the assembler moves from one instruction to
    the next
  • So when the assembler finds a label it knows what
    the address of the current location is.
  • In SML this is easy to do because all
    instructions take up the same space i.e. one
    word, but in other assembly languages this is not
    the case and the calculation is more complex.

24
Second pass - to output the translation
  • Second pass reads through source code one line at
    a time, constructing the numeric instruction
    values which are to be output. It does this by,
  • 1. looking up the mnemonic op-codes in the symbol
    table and reading out the numeric value that has
    been associated with the given mnemonic.
  • 2. looking up the operand labels in the symbol
    table and reading out the address values which it
    appends onto numeric opcode values to give
    complete instruction word
  • 3.Instruction word then output to object file

25
Relocatable code
  • Currently Simpletron loads a machine code program
    into consecutive locations starting from address
    0. It would be nice to be able to load a program
    anywhere in memory and be able to run the program
    from that location.
  • We can easily tell the virtual machine to load a
    program starting from some address other than 0.
  • BUT - our machine language programs use address
    values for the location of data and the location
    of places to branch to. Loading the program
    starting from a memory location other than 0 will
    then mean that all the addresses in the machine
    code will now be wrong.

26
  • To solve this problem the assembler must output
    relocatable code, so that a loader (called a
    relocating loader) can load the program anywhere
    in memory. To do this the assembler
  • 1. produces code as if it was going to be loaded
    starting from address 0 as before but also,
  • 2. adds information to each instruction record in
    the object file indicating whether or not
    instruction operand values need modification
  • The relocating loader then will add to operand
    address values the value of the new start address
    (often called an offset). This will give the
    correct address.
Write a Comment
User Comments (0)
About PowerShow.com