Translating Code - PowerPoint PPT Presentation

1 / 54

About This Presentation

Title:

Translating Code

Description:

Source = Assembly Language = symbolic representation of machine language. ... Operating system examines addresses and provides address translation. ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 55

Provided by: ise2

Category:

more less

Transcript and Presenter's Notes

Title: Translating Code

1
Assembly Language

Chapter 7

2
Translating Code

Translation
Given source, produce target Source -gt Target.
Done when no processor to execute source.
Results in generating an executable.
Can optimize, and check for semantics.
C, C, Pascal.
Interpretation
Read each line and execute it.
Done when no processor to execute source.
No executable generated.
Not much optimization possible, little checking.
Lisp

3
Assemblers and Compilers

Compiler
Source High level language (C, C)
Target Machine language, or its symbolic
representation.
Assembler
Source Assembly Language symbolic
representation of machine language.
Target machine language.

4
Assembly Language

Symbolic Representation of a numeric machine
language
Why?
Easy to read and understand.
Easy to remember ADD, MV rather than their
hexadecimal numbers.
Performance A good assembly language program can
beat any optimizing compiler.
Access to Machine A high level language does not
have access to underlying machine.
One line of assembly corresponds to one
instruction

5
Sometimes only Assembly

Examples with limited resources
Smart cards
Embedded systems, cell phones, pagers etc.
Characteristics
limited power, memory CPU cycles etc.
No paging, code should fit in RAM.
No operating system either.

6
Performance

10 of code executed 90 time Called critical,
because it is the bottleneck in improving
performance.
Rewrite critical parts in assembly and hand
optimize them called tuning.
There are examples of tuning improving
performance by 20 to 50 faster with about 1/3
to 1/5 of original code length.
Downside Tedious and time consuming.

7
Some Statistics
8
Format of Assembler Statements

Statement above blank computes NIJ
Statements below blank reserves memory
Parts label, Opcode, Operands, Comments

9
Labels

Symbolic names for memory addresses
Needed for branching out.
Needed to access data storage.
Usually starts with first column, and has finite
width (say 8 columns)
Colons
Motorola does not
Spark requires,
Intel requires for code labels, but not for data
labels.

10
Opcode Field

Two Kinds
OpCode
Command to assembler (assembler directive)
Opcode
Symbolic representation of Opcode. Eg. MOV, LD
Motorola has MOVE for Memory lt-gt Register
movement of data
Spark uses LD (load) and ST (store)
Some instructions need more than one line.

11
Instructions Needing Two Lines

Example Spark has 32 bit or 44 bit addresses
Instructions hold at most 22 bits on immediate
data. How is full address provided?
SETHI HI(I),R1
Zero upper 32 bits and lower 10 bits of 64 bit
register R1.
LD R1LO(I),R1
Adds R1 and low-order 10 bits of address I,
(forms address of I) fetch that word from memory
in to R1

12
Addressing Granularity

Can address at byte, word and long operands. How
to indicate granularity?
Option 1 use different instructions.
EAX to move 32-bit items.
AX to move 16-bit items.
Option 2 Have suffixes indicating length.
MOVE.L for long words.
MOVE.W for words.

13
Pseudoinstrctions

Assembler directives Pseudoinstrctions
Directives for the assembler, not instructions.
Example
SEGMENT starts new segment.
EQU for symbolic expressions e.g.. BASE EQU 10,
now BASE can be used instead of 10.
Storage allocation. E.g..TABLE DB 11, 23, 49
Allocates space for 3 bytes, initializes them to
11,23,49 and sets TABLE to address of 11.

14
Controlling Visibility of Symbols

Programs reside in many files. Need to refer to
symbols in other files.
PUBLIC Allows other files to refer to this
definition.
EXTERN Asks the assembler to look in other files
for this symbols.
INCLUDE Includes contents of other file bodily
in this file.

15
Some Assembly Directives
16
Conditional Assembly

WORDSIZE EQU 16
IF WORDSIZE GT 16
WSIZ DW 32
ELSE
WSIZE DW 16
ENDIF
Can maintain one source for many machines

17
Macros

Need repeated sequences of instructions.
Three ways
Write them all over again
Laborious, and error prone.
Write a procedure and call it when needed
Good for long sets, but call overhead can
significantly slow down code if not too many
lines are there.
Macro Definition
Give a name for a piece of text, possibly with
some parameters.
Use it by stating the name, possibly with
instantiations to parameters.

18
Example Macro Definition
19
More on Macros

Macro Call using a macro name as opcode.
Macro Expansion Replacing the name with the
body.
Macro expansion happens during assembly process,
NOT during execution. Hence no stack used.
The same code is executed by processor with or
without macro. NOT a procedure call.

20
Comparison of Macros and Procedures
21
Macros with Parameters
22
Advanced Features of Macros

Macros within Macros
M1 Macro
IF WORDSIZE GT 16
M2 MACRO
.
ENDM
ELSE
M3 MARO
.
ENDM
ENDIF
ENDM

One of the problems
Address Duplication.
Solution Ability to pass label as a parameter.
Recursive Calling
Need to have a method to pass a parameter from
caller to calle.
Calee decreases parameter to stop recursion.

23
Implementing Macros

Assembler maintains table of macro definitions
with
Macro name.
Stored text of definition.
Parameters.
Parameters written in an easily recognizable
format
During expansion replace
Name with body text.
Formal parameters with instantiations.

24
Assembly Process

Two pass process
Pass One Collects definitions of symbols,
statement labels etc.
Pass Two Re-reads the statements, replaces
symbolic names with values, and translates to
target language.
Why Two Passes?
Need to solve the forward referencing problem.
Need to find values of names that have not yet
been defined.

25
Pass One

Pass one builds
Symbol table Containing (Symbol, Value) pairs.
Pseudoinstruction table
Opcode Table Details of opcodes used.
In assigning value, the assembler must know the
address of the symbol during execution.
To do so it maintains
Instruction Location Counter (ILC) during
assembly.
Start with zero.
Increase by instruction length each time new
instruction processed.

26
Instruction Location Counter
27
Contents of Symbol table

Symbol.
Length of data field.
Relocation bits I.e. does the symbol change if
program loaded at different address?
Recall immediate addressing and indirect
addressing.
Visibility/security bits I.e. should the
procedure be accessible to other procedures?

28
Example Symbol Table
29
Contents of the Opcode Table

Symbolic opcode.
Operands.
Hexadecimal value of opcode.
Instruction length.
Type number indicating group.
Depends on address types (immediate vs direct),
parameter types etc.
32 bit add is different from 16 bit add.

30
Example Opcode Table
31
Pass One
32
Pass two

Generate Object program from data collected in
pass one.
Outputs information to be used by a Linker.
Linker Forms a single executable from object
code produced at different times.
Errors generated during pass two are printed out
and the translation procedure stopped.

33
Pass Two
34
Implementing the Symbol Table

Associative Memory
A set of pairs, given a key (symbol) must produce
the value.
Implemented as an array of pairs with proper
operations.
Insert(), delete(), , getValue().
Binary Search Tree.
Keep (symbol, value) pairs on a sorted binary
tree.
In searching, compare with key, go left or right
depending on (key on node lt, , gt key)
HASH Tables
Values attached to has hashing buckets.

35
Hash Table
36
Linking and Loading

Translating sources to executable involves
Compiling or assembling all source procedures in
to separate object modules.
On Unix these are .o, in NT .obj.
Linking all object modules into one executable.
On Unix these have no specific extension
(sometimes a.out) and in NT .exe.
Two step process saves time.

37
Assembling and Linking Process
38
What the Linker has to do

Take separately compiled objects, and put them in
one liner address space, and adjust the addresses
to that they are what individual modules had in
mind.
It should do
Find externally defined addresses.
Translate addresses so that they can be loaded at
any physical location.

39
Object Modules
40
(No Transcript)
41
Liker Activity

The Linker solves the relocation problem and the
external reference problem by
Constructing a table of all object modules and
their lengths.
Based on table assigns starting addresses to each
table.
Adds relocation constant (starting address its
module) to each memory reference.
Finds all instructions referencing other
procedures and inserts addresses in place.

42
Structure of an Object Module I

Identification names and lengths of different
parts.
Entry Point Symbols in this module and their
values.
External.. List of externally defined symbols
and which instructions use them.
Linker inserts these later on.
Machine Inst.Only this loaded into memory for
execution.

43
Structure of an Object Module II

Relocation Directory Relocatable addresses are
listed here. The linker has no way to guess this.
The assembler produces this part.
End of Module May have parity information such
as a checksum.

44
Binding Time and Dynamic Relocation

A Process migrates throughout queues and occupies
memory many times.
Need addresses to change accordingly. Hence
cannot compute absolute addresses.
Binding Time Actual time of computing addresses.

45
Possible Binding Time

When module is written.
When module assembled.
When program linked.
When program loaded.
When base register used for addressing loaded.
When instruction containing address is executed.

46
Issues in Address Binding

If instruction containing address moved after
binding, then address incorrect.
Address Binding
When symbolic names are bound to virtual
addresses.
When virtual address are bound to physical
addresses.
Linker creates binding of (symbolic names to
virtual addresses).
No effect on paged or not.

47
Three Methods to Relocate

Virtual Memory and paging.
Need only to know the page table.
Relocation register Points to starting physical
address. All references automatically add
relocation register to address.
Relative addressing All addresses are either
constant (for devices) or relative to PC.

48
Dynamic Linking

Previous linker links all possible procedures
statically at link time, if they are used or not.
Some procedures (such as exception handlers) are
rarely used. Hence can reduce executable size if
linking is postponed to calling time.
Called Dynamic Linking.

49
MULTICS Style Dynamic Linking

Each object has linkage segment with addresses
with procedures that may be called.
Procedure calls are translated to addresses in
this block.
Compiler fills in an invalid address, hence
causes a trap.
In turn the dynamic linker finds the proper
address, fills it in and restarts the instruction.

50
Before the Call
51
After the Call
52
Dynamic Linking in Windows

DLLs Special file format with procedure and/or
data.
Library sharing is done through DLLs.
DLLs have no main, hence cannot run by
themselves.
Implicit Linking Use program statically linked
through import library glue. Operating system
examines addresses and provides address
translation.
Explicit Linking User program makes an explicit
call to library routine at runtime. Makes
additional calls to OS to get address to load
library procedure.