Title: Pengantar Organisasi Komputer
1IKI10230Pengantar Organisasi KomputerKuliah no.
09 Compiling-Assembling-Linking
Sumber1. Paul Carter, PC Assembly Language2.
Hamacher. Computer Organization, ed-53. Materi
kuliah CS61C/2000 CS152/1997, UCB
21 April 2004 L. Yohanes Stefanus
(yohanes_at_cs.ui.ac.id)Bobby Nazief
(nazief_at_cs.ui.ac.id) bahan kuliah
2Steps to Starting a Program
Assembly program foo.s
Object(mach lang module) foo.o
Executable(mach lang pgm) foo.exe
3Example C ? Asm ? Obj ? Exe ? Run
- include ltstdio.hgt
- int main (int argc, char argv)
- int i
- int sum 0
- for (i 0 i lt 100 i i 1)
- sum sum i i
- printf ("The sum from 0 .. 100 is d\n", sum)
- Input High-Level Language Code (e.g., C, Java)
- Output Assembly Language Code(e.g., Intel x86)
- Note Output may contain directives
5Example C ? Asm ? Obj ? Exe ? Run
- segment .text
- LC0 db "The sum from 0 .. 100 is d",0xa,0
- _main
- push ebp
- mov ebp,esp
- sub esp,24
- mov dword ebp-8,0
- mov dword ebp-4,0
- L3
- cmp dword ebp-4,100
- jle L6
- jmp L4
- L6
- mov eax,ebp-4
- imul eax,ebp-4
- add ebp-8,eax
L5 inc dword ebp-4 jmp L3 L4
add esp,-8 mov eax,ebp-8
push eax push dword LC0 call
_printf add esp,16 L2 mov
esp,ebp pop ebp ret
6Where Are We Now?
Assembly program foo.s
Object(mach lang module) foo.o
Executable(mach lang pgm) a.out
- Reads and Uses Directives
- Replace Pseudoinstructions
- Produce Machine Language
- Creates Object File
8Producing Machine Language
- Simple Case
- Arithmetic, Logical, Shifts, and so on.
- All necessary info is within the instruction
already. - What about Branches?
- PC-Relative
- So once pseudoinstructions are replaced by real
ones, we know by how many instructions to branch. - What about jumps?
- Some require absolute address.
- What about references to data?
- These will require the full 32-bit address of the
data. - Addresses cant be determined yet, so we create
two tables
9Symbol Table
- List of items in this file that may be used by
other files. - What are they?
- Labels function calling
- Data anything in the .data section variables
which may be accessed across files - First Pass record label-address pairs
- Second Pass produce machine code
- Result can jump to a later label without first
declaring it
10Relocation Table
- List of items for which this file needs the
address. - What are they?
- Any label jumped to jmp or call
- internal
- external (including lib files)
- Any piece of data
11Object File Format
- object file header size and position of the
other pieces of the object file - text segment the machine code
- data segment binary representation of the data
in the source file - relocation information identifies lines of code
that need to be handled - symbol table list of this files labels and data
that can be referenced - debugging information
12Example C ? Asm ? Obj ? Exe ? Run
- segment .text
- 0x0
- db "The sum from 0 .. 100 is d",0xa,0
- 0x1d
- push ebp
- mov ebp,esp
- sub esp,24
- mov dword ebp-8,0
- mov dword ebp-4,0
- 0x34
- cmp dword ebp-4,100
- jle 0x05 (0x42)
- jmp 0x00000012 (0x54)
- 0x42
- mov eax,ebp-4
- imul eax,ebp-4
- add ebp-8,eax
0x4c inc dword ebp-4 jmp
0xffffffe0 (0x34) 0x54 add esp,-8
mov eax,ebp-8 push eax push
0x0 call 0x0 add esp,16 0x6e
mov esp,ebp pop ebp ret
13Symbol Table Entries
- Symbol Table
- Label Address
- LC0 0x00000000
- main 0x0000001d
- L3 0x00000034
- L6 0x00000042
- L5 0x0000004c
- L4 0x00000054
- L2 0x0000006e
- Relocation Information
- Offset Type Value
- 0x0000005f dir32 .text
- (LC0 offset 0 of .text segment)
- 0x00000064 DISP32 _printf
14Where Are We Now?
Assembly program foo.s
Object(mach lang module) foo.o
Executable(mach lang pgm) a.out
15Link Editor/Linker
- Step 1 Take text segment from each .o file and
put them together. - Step 2 Take data segment from each .o file, put
them together, and concatenate this onto end of
text segments. - Step 3 Resolve References
- Go through Relocation Table and handle each entry
- That is, fill in all absolute addresses
16Four Types of Addresses
- PC-Relative Addressing (beq, bne) never relocate
- Absolute Address (jmp, call) always relocate
- External Reference (usually call) always
relocate - Data Reference always relocate
17Resolving References
- Linker assumes first word of first text segment
is at address 0x00000000. - Linker knows
- length of each text and data segment
- ordering of text and data segments
- Linker calculates
- absolute address of each label to be jumped to
(internal or external) and each piece of data
being referenced - To resolve references
- search for reference (data or label) in all
symbol tables - if not found, search library files (for example,
for printf) - once absolute address is determined, fill in the
machine code appropriately - Output of linker executable file containing text
and data (plus header)
18Example C ? Asm ? Obj ? Exe ? Run
- segment .text
- 0x15c0
- db "The sum from 0 .. 100 is d",0xa,0
- 0x15dd
- push ebp
- mov ebp,esp
- sub esp,24
- mov dword ebp-8,0
- mov dword ebp-4,0
- 0x15f4
- cmp dword ebp-4,100
- jle 0x05 (0x1602)
- jmp 0x12 (0x1614)
- 0x1602
- mov eax,ebp-4
- imul eax,ebp-4
- add ebp-8,eax
0x160c inc dword ebp-4 jmp 0xe0
(0x15f4) 0x1614 add esp,-8 mov
eax,ebp-8 push eax push
0x000015c0 call 0x00001778 (0x2da0)
add esp,16 0x162e mov esp,ebp
pop ebp ret 0x1628 0x1778 0x2da0
19Peta Memori .EXE
- 00000000
- ...
- 000015C0
- 00001631
- ...
- 0000B000
- ...
- 0000BB04
Obj lainnya
Obj lainnya (..., _printf, ...)
20Where Are We Now?
Assembly program foo.s
Object(mach lang module) foo.o
Executable(mach lang pgm) a.out
21Loader (1/3)
- Executable files are stored on disk.
- When one is run, loaders job is to load it into
memory and start it running. - In reality, loader is the operating system (OS)
- loading is one of the OS tasks
22Loader (2/3)
- So what does a loader do?
- Reads executable files header to determine size
of text and data segments - Creates new address space for program large
enough to hold text and data segments, along with
a stack segment - Copies instructions and data from executable file
into the new address space (this may be anywhere
in memory)
23Loader (3/3)
- Copies arguments passed to the program onto the
stack - Initializes machine registers
- Most registers cleared, but stack pointer
assigned address of 1st free stack location - Jumps to start-up routine that copies programs
arguments from stack to registers and sets the PC - If main routine returns, start-up routine
terminates program with the exit system call
24Example C ? Asm ? Obj ? Exe ? Run
- 0x000015c0 0x20656854 0x206d7573 0x6d6f7266 0x2e2
03020 - 0x000015d0 0x3031202e 0x73692030 0x0a642520 0xe58
95500 - 0x000015e0 0x0018ec81 0x45c70000 0x000000f8 0xfc4
5c700 - 0x000015f0 0x00000000 0x64fc7d81 0x7e000000 0x001
2e905 - 0x00001600 0x458b0000 0x45af0ffc 0xf84501fc 0xe9f
c45ff - 0x00001610 0xffffffe0 0xfff8c481 0x458bffff 0xc06
850f8 - 0x00001620 0xe8000015 0x00001778 0x0010c481 0xec8
90000 - 0x00001630 0x0000c35d
- 0x000015c0 54 68 65 20 73 75 62 20 66 72 6f 6d
20 30 20 2e - T h e s u m f r o m
0 . - 000015dd 55 push ebp
- 000015de 89e5 mov ebp,esp
- 000015e0 81ec18000000 sub esp,0x18
- 000015e6 c745f800000000 mov ebp-8,0
- 000015ed c745fc00000000 mov ebp-4,0
- 000015f4 817dfc64000000 cmp ebp-4,0x64
- 000015fb 7e05 jle 0x1602
25- .ASM, .O, .EXE
26Example C ? Asm ? Obj ? Exe ? Run
- .text
- LC0
- .ascii "The sum from 0 .. 100 is d\12\0"
- main
- pushl ebp
- movl esp,ebp
- subl 24,esp
- movl 0,-8(ebp)
- movl 0,-4(ebp)
- L3
- cmpl 100,-4(ebp)
- jle L6
- jmp L4
L6 movl -4(ebp),eax imull
-4(ebp),eax addl eax,-8(ebp) L5
incl -4(ebp) jmp L3 L4 addl
-8,esp movl -8(ebp),eax pushl
eax pushl LC0 call _printf
addl 16,esp L2 movl ebp,esp
popl ebp ret
27Example C ? Asm ? Obj ? Exe ? Run
0x40 movl -4(ebp),eax imull
-4(ebp),eax addl eax,-8(ebp) 0x4a
incl -4(ebp) jmp -0x1b
(0x34) 0x50 addl -8,esp movl
-8(ebp),eax pushl eax pushl
0x0 call 0x0 (undefined) addl
16,esp 0x64 movl ebp,esp
popl ebp ret
- .text
- 0x0
- .ascii "The sum from 0 .. 100 is d\12\0"
- 0x20
- pushl ebp
- movl esp,ebp
- subl 24,esp
- movl 0,-8(ebp)
- movl 0,-4(ebp)
- 0x34
- cmpl 100,-4(ebp)
- jle 6 (0x40)
- jmp 0x14 (0x50)
28Symbol Table Entries
- Symbol Table
- Label Address
- LC0 0x00000000
- L2 0x00000064
- L3 0x00000034
- L4 0x00000050
- L5 0x0000004a
- L6 0x00000040
- main 0x00000020
- Relocation Information
- Address Instr. Type Dependency
- 0x0000005c call printf
29Example C ? Asm ? Obj ? Exe ? Run
0x1600 movl -4(ebp),eax imull
-4(ebp),eax addl eax,-8(ebp) 0x160a
incl -4(ebp) jmp -0x1b
(0x15f4) 0x1610 addl -8,esp
movl -8(ebp),eax pushl eax
pushl 0x15c0 call 0x2d90 addl
16,esp 0x1624 movl ebp,esp
popl ebp ret
- .text
- 0x15c0
- .ascii "The sum from 0 .. 100 is d\12\0"
- 0x15e0
- pushl ebp
- movl esp,ebp
- subl 24,esp
- movl 0,-8(ebp)
- movl 0,-4(ebp)
- 0x15f4
- cmpl 100,-4(ebp)
- jle 6 (0x1600)
- jmp 0x14 (0x1610)