Title: Languages and the Machine
1Languages and the Machine
2Topics
- The Compilation Process
- The Assembly Process
- Linking and Loading
- Macros
- We will skip
- Case Study Extensions to the Instruction Set
The Intel MMX and Motorola AltiVec SIMD
Instructions
3Compilation Process
- Assembly to Machine code fairly straightforward,
but compilation is not - Translate a program written in a high level
language into a functionally equivalent program
in assembly language - Consider a simple high-level language assignment
statement - Foo Bar Zot 15
- Steps involved in compiling this statement into
assembly code - Lexical analysis separate into tokens, Foo, ,
, etc. - Syntactic Analysis / Parsing Determine that we
are performing an assignment, VAR EXPRESSION - Semantic Analysis Determine that Foo, Bar, Zot
are names, 4 is an integer - Code Generation Determine the proper assembly
code to perform the action - ld Bar, r0, r1
- ld Zot, r0, r2
- addcc r1, r2, r1
- addcc r1, 15, r2
- st r2, r0, Foo
4Compiler Issues
- Each compiler specific to a particular ISA
- E.g., an int on one machine may be 32 bits, on
another may be 64 bits - Cause of error in networking library ported to
Alpha - Int issue not a problem in Java JVM specifies 32
bits - E.g., in previous example, if the ISA allowed
operands of addcc to be memory addresses, we
could have done - addcc Bar, Zot, r1
- addcc r1, 15, Foo
- Hopefully the compiler generates efficient code
but optimization is a tough issue! - Cross compiler one that generates code for a
different ISA (example, CodeWarrior)
5Mapping Variables to Memory
- Global variables
- Accessible from anywhere in the program, given a
fixed address - E.g., global variable X at memory address 400
- Local variables
- Also called automatic variables
- Defined inside a function or method, e.g.
- void foo()
-
- int a,b
-
-
- These variables created when foo is invoked,
destroyed when foo exits - These variables are created by pushing them on
the stack when the function is invoked, and are
popped off when the function exits
6Local Variables and the Stack
- Recall that the stack typically grows downward in
memory - Here we start with 1234 stored on the top of the
stack
Mem
Mem
0 4 8
0 4 8
FFFF
1234
1234
Push FFFF
SP 8
SP 4
7Local Variables and the Stack
- In our case, local variables are pushed on the
stack upon entering the function - void foo() int a
- Copy SP into Frame Pointer FP (also called the
Base Pointer, or BP)
Mem before Foo
Mem in Foo
0 4 8
0 4 8
Var a
1234
1234
SP 8
SP 4
FP 8
8Accessing Stack Variables
- These variables are referenced as offsets from
the frame pointer, called based addressing - To access a fp 4
Mem in Foo
0 4 8
Var a
Why not use sp ? Consider pushing lots of
stuff on the stack Or data structures
1234
SP 4
FP 8
9C to ASM Example on x86
pushl ebp movl esp, ebp subl 8, esp movl
3, -4(ebp) movl 4, -8(ebp) movl
-4(ebp),eax imul1 -8(ebp),eax movl eax,
c .comm c,4,4
- include ltstdio.hgt
- int c
- int main()
-
- int a,b
- a3
- b4
- cab
10Arrays in Memory
- Arrays may be allocated on the stack or allocated
off the heap, a pool of memory where portions may
be dynamically allocated. Access elements of an
array a bit different than regular variables. - int A10 Array of 10 integers
Mem allocated for A
0 4 8 40
A (Base) 4
A0 A1 A9
ElementAddr A (IndexSize) e.g. A2 is at 4
(24) 12
11If-Statements
- Conditional statements map to a comparison and a
branch instruction - C
- if (xy) statement1 else statement2
- Assembly (assume X in r1, Y in r2)
- subcc r1, r2 ! Zero flag set if res0
- bne Statement2 ! Branch if zero flag is not
set - ! Statement1 code
- ba StatementNext ! Branch always
- Statement2 ! Statement2 code
- StatementNext
12Loops
- While, Do-While, For loops implemented using the
same conditional check and branch as the if-then
statement - The branch returns back to previous code instead
of jumping forward over code
13Production Level Assemblers
- Allow programmer to specify location of data and
code - Provide mnemonics for all instructions and
addressing modes - Permit the use of symbolic labels to represent
addresses and constants - Provide a means to specify the starting address
of the program - Include a way to share variables between
different assembled programs - Support macros
14Assembly Example
15Assembled Code
16Two Pass Assemblers
- Most assemblers are two-pass
- First pass
- Determine addresses of all data and instructions
- Perform any assembly-time arithmetic
- Put definitions and constants into the symbol
table - Second pass
- Generate machine code
- Insert actual addresses and values of symbols
which are known from the symbol table - Two passes useful for forward references, i.e.
referencing later on in the program
17Forward Reference
18Symbol Table
- Generated during the first pass
- Maps identifiers to values, table filled in as
values are encountered and the program is parsed
from top to bottom - .org 2048 Says assemble code starting at 2048
- const .equ value Defines const equal to value
19(No Transcript)
20Assembled Program
21Final Tasks of the Assembler
- Linking and Loading
- We need the following additional info
- Module name and size
- Address start symbol
- Information about global and external symbols
- Information about any library routines
- Values of constants
- Relocation information
22Location of Programs in Memory
- We have been using .org to specify a fixed start
location - Typically we will want programs capable of
running in arbitrary locations - If we are concatenating together different
modules, the addresses for identifiers in the
different modules must be relocated - Linker software that combines separately
assembled modules - Loader software that loads another program into
memory and may modify addresses if the program is
loaded in a location different from the origin - Must also set appropriate registers, e.g. SP
23Linking .global and .extern
- A .global is used in the module that a symbols is
defined and .extern is used in every other module
that refers to it
24Linking and Loading
- Symbol tables for previous example
- Symbols whose address might change market
relocatable (not all addresses! Some may be fixed)
25DLLs
- Windows uses Dynamic Link Libraries, or DLLs
- Linking a common routine in many programs results
in duplicate code from that common routine in
each program - In a DLL, commonly used routines (e.g. memory
management, graphics) present in only one place,
the DLL - Smaller program sizes, each program does not need
to have its own copy - All programs share the exact same code while
executing - Dont need recompiling or relinking
- Disadvantages
- Deletion of a shared DLL by mistake can cause
problems - Versions must be the same
- DLL code file can live in many places in Windows
- DLL Hell
26Macros
- An assembly macro looks kind of like defining a
subroutine - For example, there say that there is no PUSH
instruction to push data on the stack. We can
make a macro for push
27Macro Expansion
- Given the previous macro, we could now write the
following code - push r15 ! Push r15 on the stack
- push r20 ! Push r20 on the stack
- Upon assembly, these macros are expanded to
generate the following actual code - addcc r14, -4, r14
- st r15, r14
- addcc r14, -4, r14
- st r20, r14
28Macros vs. Subroutines
- Later we will see how to write actual subroutines
we can call - Only one copy of the shared code in a subroutine
- Tradeoffs
- Subroutines
- Takes up less memory since only one copy of the
code - But slower than macros subroutines have overhead
of invoking and returning - Macros
- Take up more space than subroutine call due to
macro expansion for each occurrence of the macro - Faster than subroutines no overhead to
invoke/return
29Skipping for now
- Discussion on Pentium MMX
- We may return to this later if time permits