Title: Translating Code
1Assembly Language
2Translating Code
- Translation
- Given source, produce target Source -gt Target.
- Done when no processor to execute source.
- Results in generating an executable.
- Can optimize, and check for semantics.
- C, C, Pascal.
- Interpretation
- Read each line and execute it.
- Done when no processor to execute source.
- No executable generated.
- Not much optimization possible, little checking.
- Lisp
3Assemblers and Compilers
- Compiler
- Source High level language (C, C)
- Target Machine language, or its symbolic
representation. - Assembler
- Source Assembly Language symbolic
representation of machine language. - Target machine language.
4Assembly Language
- Symbolic Representation of a numeric machine
language - Why?
- Easy to read and understand.
- Easy to remember ADD, MV rather than their
hexadecimal numbers. - Performance A good assembly language program can
beat any optimizing compiler. - Access to Machine A high level language does not
have access to underlying machine. - One line of assembly corresponds to one
instruction
5Sometimes only Assembly
- Examples with limited resources
- Smart cards
- Embedded systems, cell phones, pagers etc.
- Characteristics
- limited power, memory CPU cycles etc.
- No paging, code should fit in RAM.
- No operating system either.
6Performance
- 10 of code executed 90 time Called critical,
because it is the bottleneck in improving
performance. - Rewrite critical parts in assembly and hand
optimize them called tuning. - There are examples of tuning improving
performance by 20 to 50 faster with about 1/3
to 1/5 of original code length. - Downside Tedious and time consuming.
7Some Statistics
8Format of Assembler Statements
- Statement above blank computes NIJ
- Statements below blank reserves memory
- Parts label, Opcode, Operands, Comments
9Labels
- Symbolic names for memory addresses
- Needed for branching out.
- Needed to access data storage.
- Usually starts with first column, and has finite
width (say 8 columns) - Colons
- Motorola does not
- Spark requires,
- Intel requires for code labels, but not for data
labels.
10Opcode Field
- Two Kinds
- OpCode
- Command to assembler (assembler directive)
- Opcode
- Symbolic representation of Opcode. Eg. MOV, LD
- Motorola has MOVE for Memory lt-gt Register
movement of data - Spark uses LD (load) and ST (store)
- Some instructions need more than one line.
11Instructions Needing Two Lines
- Example Spark has 32 bit or 44 bit addresses
- Instructions hold at most 22 bits on immediate
data. How is full address provided? - SETHI HI(I),R1
- Zero upper 32 bits and lower 10 bits of 64 bit
register R1. - LD R1LO(I),R1
- Adds R1 and low-order 10 bits of address I,
(forms address of I) fetch that word from memory
in to R1
12Addressing Granularity
- Can address at byte, word and long operands. How
to indicate granularity? - Option 1 use different instructions.
- EAX to move 32-bit items.
- AX to move 16-bit items.
- Option 2 Have suffixes indicating length.
- MOVE.L for long words.
- MOVE.W for words.
13Pseudoinstrctions
- Assembler directives Pseudoinstrctions
- Directives for the assembler, not instructions.
- Example
- SEGMENT starts new segment.
- EQU for symbolic expressions e.g.. BASE EQU 10,
now BASE can be used instead of 10. - Storage allocation. E.g..TABLE DB 11, 23, 49
- Allocates space for 3 bytes, initializes them to
11,23,49 and sets TABLE to address of 11.
14Controlling Visibility of Symbols
- Programs reside in many files. Need to refer to
symbols in other files. - PUBLIC Allows other files to refer to this
definition. - EXTERN Asks the assembler to look in other files
for this symbols. - INCLUDE Includes contents of other file bodily
in this file.
15Some Assembly Directives
16Conditional Assembly
- WORDSIZE EQU 16
- IF WORDSIZE GT 16
- WSIZ DW 32
- ELSE
- WSIZE DW 16
- ENDIF
- Can maintain one source for many machines
17Macros
- Need repeated sequences of instructions.
- Three ways
- Write them all over again
- Laborious, and error prone.
- Write a procedure and call it when needed
- Good for long sets, but call overhead can
significantly slow down code if not too many
lines are there. - Macro Definition
- Give a name for a piece of text, possibly with
some parameters. - Use it by stating the name, possibly with
instantiations to parameters.
18Example Macro Definition
19More on Macros
- Macro Call using a macro name as opcode.
- Macro Expansion Replacing the name with the
body. - Macro expansion happens during assembly process,
NOT during execution. Hence no stack used. - The same code is executed by processor with or
without macro. NOT a procedure call.
20Comparison of Macros and Procedures
21Macros with Parameters
22Advanced Features of Macros
- Macros within Macros
- M1 Macro
- IF WORDSIZE GT 16
- M2 MACRO
- .
- ENDM
- ELSE
- M3 MARO
- .
- ENDM
- ENDIF
- ENDM
- One of the problems
- Address Duplication.
- Solution Ability to pass label as a parameter.
- Recursive Calling
- Need to have a method to pass a parameter from
caller to calle. - Calee decreases parameter to stop recursion.
23Implementing Macros
- Assembler maintains table of macro definitions
with - Macro name.
- Stored text of definition.
- Parameters.
- Parameters written in an easily recognizable
format - During expansion replace
- Name with body text.
- Formal parameters with instantiations.
24Assembly Process
- Two pass process
- Pass One Collects definitions of symbols,
statement labels etc. - Pass Two Re-reads the statements, replaces
symbolic names with values, and translates to
target language. - Why Two Passes?
- Need to solve the forward referencing problem.
- Need to find values of names that have not yet
been defined.
25Pass One
- Pass one builds
- Symbol table Containing (Symbol, Value) pairs.
- Pseudoinstruction table
- Opcode Table Details of opcodes used.
- In assigning value, the assembler must know the
address of the symbol during execution. - To do so it maintains
- Instruction Location Counter (ILC) during
assembly. - Start with zero.
- Increase by instruction length each time new
instruction processed.
26Instruction Location Counter
27Contents of Symbol table
- Symbol.
- Length of data field.
- Relocation bits I.e. does the symbol change if
program loaded at different address? - Recall immediate addressing and indirect
addressing. - Visibility/security bits I.e. should the
procedure be accessible to other procedures?
28Example Symbol Table
29Contents of the Opcode Table
- Symbolic opcode.
- Operands.
- Hexadecimal value of opcode.
- Instruction length.
- Type number indicating group.
- Depends on address types (immediate vs direct),
parameter types etc. - 32 bit add is different from 16 bit add.
30Example Opcode Table
31Pass One
32Pass two
- Generate Object program from data collected in
pass one. - Outputs information to be used by a Linker.
- Linker Forms a single executable from object
code produced at different times. - Errors generated during pass two are printed out
and the translation procedure stopped.
33Pass Two
34Implementing the Symbol Table
- Associative Memory
- A set of pairs, given a key (symbol) must produce
the value. - Implemented as an array of pairs with proper
operations. - Insert(), delete(), , getValue().
- Binary Search Tree.
- Keep (symbol, value) pairs on a sorted binary
tree. - In searching, compare with key, go left or right
depending on (key on node lt, , gt key) - HASH Tables
- Values attached to has hashing buckets.
35Hash Table
36Linking and Loading
- Translating sources to executable involves
- Compiling or assembling all source procedures in
to separate object modules. - On Unix these are .o, in NT .obj.
- Linking all object modules into one executable.
- On Unix these have no specific extension
(sometimes a.out) and in NT .exe. - Two step process saves time.
37Assembling and Linking Process
38What the Linker has to do
- Take separately compiled objects, and put them in
one liner address space, and adjust the addresses
to that they are what individual modules had in
mind. - It should do
- Find externally defined addresses.
- Translate addresses so that they can be loaded at
any physical location.
39Object Modules
40(No Transcript)
41Liker Activity
- The Linker solves the relocation problem and the
external reference problem by - Constructing a table of all object modules and
their lengths. - Based on table assigns starting addresses to each
table. - Adds relocation constant (starting address its
module) to each memory reference. - Finds all instructions referencing other
procedures and inserts addresses in place.
42Structure of an Object Module I
- Identification names and lengths of different
parts. - Entry Point Symbols in this module and their
values. - External.. List of externally defined symbols
and which instructions use them. - Linker inserts these later on.
- Machine Inst.Only this loaded into memory for
execution.
43Structure of an Object Module II
- Relocation Directory Relocatable addresses are
listed here. The linker has no way to guess this.
The assembler produces this part. - End of Module May have parity information such
as a checksum.
44Binding Time and Dynamic Relocation
- A Process migrates throughout queues and occupies
memory many times. - Need addresses to change accordingly. Hence
cannot compute absolute addresses. - Binding Time Actual time of computing addresses.
45Possible Binding Time
- When module is written.
- When module assembled.
- When program linked.
- When program loaded.
- When base register used for addressing loaded.
- When instruction containing address is executed.
46Issues in Address Binding
- If instruction containing address moved after
binding, then address incorrect. - Address Binding
- When symbolic names are bound to virtual
addresses. - When virtual address are bound to physical
addresses. - Linker creates binding of (symbolic names to
virtual addresses). - No effect on paged or not.
47Three Methods to Relocate
- Virtual Memory and paging.
- Need only to know the page table.
- Relocation register Points to starting physical
address. All references automatically add
relocation register to address. - Relative addressing All addresses are either
constant (for devices) or relative to PC.
48Dynamic Linking
- Previous linker links all possible procedures
statically at link time, if they are used or not.
- Some procedures (such as exception handlers) are
rarely used. Hence can reduce executable size if
linking is postponed to calling time. - Called Dynamic Linking.
49MULTICS Style Dynamic Linking
- Each object has linkage segment with addresses
with procedures that may be called. - Procedure calls are translated to addresses in
this block. - Compiler fills in an invalid address, hence
causes a trap. - In turn the dynamic linker finds the proper
address, fills it in and restarts the instruction.
50Before the Call
51After the Call
52Dynamic Linking in Windows
- DLLs Special file format with procedure and/or
data. - Library sharing is done through DLLs.
- DLLs have no main, hence cannot run by
themselves. - Implicit Linking Use program statically linked
through import library glue. Operating system
examines addresses and provides address
translation. - Explicit Linking User program makes an explicit
call to library routine at runtime. Makes
additional calls to OS to get address to load
library procedure.
53Using DLLs
54Dynamic Linking in Unix
- Shared Library Supports only implicit linking.
- An archive file containing multiple data segments
and procedures. - Has two parts
- Host Library Statically links to executable.
- Target Library Called at runtime.