Generating Programs and Linking - PowerPoint PPT Presentation

About This Presentation
Title:

Generating Programs and Linking

Description:

debug symbol table (.debug only if '-g' compile flag used) ... any line of code referencing a memory address must be flagged for relocation ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 49
Provided by: Richa149
Category:

less

Transcript and Presenter's Notes

Title: Generating Programs and Linking


1
Generating Programs and Linking
  • Professor Rick Han
  • Department of Computer Science
  • University of Colorado at Boulder

2
CSCI 3753 Announcements
  • Moodle - posted last Thursdays lecture
  • Programming shell assignment 0 due Thursday at
    1155 pm, not 11 am
  • Introduction to Operating Systems
  • Chapters 3 and 4 in the textbook

3
Operating System Architecture
App2
App3
App1
System Libraries and Tools (Compilers, Shells,
GUIs)

Scheduler
VM
File System
OS Kernel
Disk
Memory
CPU
Display
Mouse
I/O
4
What is an Application?
Program P1
  • A software program consist of a sequence of code
    instructions and data
  • for now, let a simple app a program
  • Computer executes the instructions line by line
  • code instructions operate on data

Code
Data
5
Loading and Executing a Program
6
Loading and Executing a Program
OS Loader
Main Memory
Program P1 binary
7
Generating a Programs Binary Executable
  • We program source code in a high-level language
    like C or Java, and use tools like compilers to
    create a programs binary executable

Program P1s Binary Executable
file P1.c
Source Code
Compiler
Assembler
Linker
P1.s
P1.o
Data
technically, there is a preprocessing step before
the compiler. gcc -c will generate relocatable
object files, and not run linker
8
Linking Multiple Object Files Into an Executable
P1 or P1.exe
file P1.c
foo2.o
Source Code
Compiler cc1
Assembler as
Linker ld
P1.s
P1.o
Data
foo3.o
  • linker combines multiple .o object files into one
    binary executable file
  • why split a program into multiple objects and
    then relink them?
  • breaking up a program into multiple files, and
    compiling them separately, reduces amount of
    recompilation if a single file is edited
  • dont have to recompile entire program, just the
    object file of the changed source file, then
    relink object files

9
Linking Multiple Object Files Into an Executable
P1 or P1.exe
file P1.c
foo2.o
Source Code
Compiler cc1
Assembler as
Linker ld
P1.s
P1.o
Data
foo3.o
  • in combining multiple object files, the linker
    must
  • resolve references to variables and functions
    defined in other object files - this is called
    symbol resolution
  • relocate each objects internal addresses so that
    the executables combination of objects is
    consistent in its memory references
  • an objects code and data are compiled in its own
    private world to start at address zero

10
Linker Resolves Unknown Symbols
P1.c int globalvar10 main(...) -----
f1(...) -----
foo2.c void f1(...) ---- void f2(...)
---- globalvar1 4 ----
11
Linker Resolves Unknown Symbols
ELF relocatable object file
  • ELF relocatable object file contains following
    sections
  • ELF header (type, size, size/ sections)
  • code (.text)
  • data (.data, .bss, .rodata)
  • .data initialized global variables
  • .bss uninitialized global variables (does not
    actually occupy space on disk, just a
    placeholder)
  • symbol table (.symtab)
  • relocation info (.rel.text, .rel.data)
  • debug symbol table (.debug only if -g compile
    flag used)
  • line info (map C .text line s only if -g)
  • string table (for symbol tables)

ELF header .text .rodata .data .bss .symtab .rel.t
ext .rel.data .debug .line .strtab Section header
table
12
Linker Resolves Unknown Symbols
  • Symbol table contains 3 types of symbols
  • global symbols - defined in this object
  • global symbols referenced but not defined here
  • local symbols defined and referenced exclusively
    by this object, e.g. static global variables and
    functions
  • local symbols are not equivalent to local
    variables, which get allocated on the stack at
    run time

13
Linker Resolves Unknown Symbols
global symbol referenced here but defined
elsewhere
extern float f1() int globalvar10 void
f2(...) static int x-1 -----
global symbols defined here
local symbol
  • The symbol table informs the Linker where symbols
    referenced or referenceable by each object file
    can be found
  • if another file references globalvar1, then look
    here for info
  • if this file reference f2, then another object
    files symbol table will mention f2

14
Linker Resolves Unknown Symbols
  • Each entry in the ELF symbol table looks like
  • typedef struct
  • int name / string table offset /
  • int value / section offset or VM
    address /
  • int size / object size in bytes
    /
  • char type4, / data, func, section or src
    file name (4 bits) /
  • binding4/ local or global (4 bits)
    /
  • char reserved / unused /
  • char section / section header index,
    ABS, UNDEF, /
  • ELF_Symbol

heres where we flag the undefined status
15
Linker Resolves Unknown Symbols
  • During linking, the linker goes through each
    input object file and determines if unknown
    symbols are defined in other object files

Linker
16
Linker Resolves Unknown Symbols
  • What if two object files use the same name for a
    global variable?
  • Linker resolves multiply defined global symbols
  • functions and initialized global variables are
    defined as strong symbols, while uninitialized
    global variables are weak symbols
  • Rule 1 multiple strong symbols are not allowed
  • Rule 2 choose the strong symbol over the weak
    symbol
  • Rule 3 given multiple weak symbols, choose any
    one

17
Linker Resolves Unknown Symbols
  • Linking with static libraries
  • Bundle together many related .o files together
    into a single file called a library or .a file
  • e.g. the C library libc.a contains printf(),
    strcpy(), random(), atoi(), etc.
  • library is created using the archive ar tool
  • the library is input to the linker as one file
  • linker can accept multiple libraries
  • linker copies only those object modules in the
    library that are referenced by the application
    program
  • Example gcc main.c /usr/lib/libm.a
    /usr/lib/libc.a

18
Linker Resolves Unknown Symbols
libfoo.a
  • a static library is a collection of relocatable
    object modules
  • group together related object modules
  • within each object, can further group related
    functions
  • if an application links to libfoo.a, and only
    calls a function in foo3.o, then only foo3.o will
    be linked into the program

foo1.o
foo2.o
foo3.o
foo4.o
19
Linker Resolves Unknown Symbols
  • Linker scans object files and libraries
    sequentially left to right on command line to
    resolve unknown symbols
  • for each input file on command line, linker
  • updates a list of defined symbols with objects
    defined symbols
  • tries to resolve the undefined symbols (from
    object and from list of previously undefined
    symbols) with the list of previously defined
    symbols
  • carries over the list of defined and undefined
    symbols to next input object file
  • so linker looks for undefined symbols only after
    theyre undefined!
  • it doesnt go back over the entire set of input
    files to resolve the unknown symbol
  • if an unknown symbol becomes referenced after it
    was defined, then linker wont be able to resolve
    the symbol!
  • Thus, order on the command line is important -
    put libraries last!

20
Linker Resolves Unknown Symbols
  • Example gcc libfoo.a main.c
  • main.c calls a function f1 defined in libfoo.a
  • scanning left to right, when linker hits
    libfoo.a, there are no unresolved symbols, so no
    object modules are copied
  • when linker hits main.c, f1 is unresolved and
    gets added to unresolved list
  • Since there are no more input files, the linker
    stops and generates a linking error
  • /tmp/something.o In function main
  • /tmp/something.o undefined reference to f1

21
Linker Resolves Unknown Symbols
  • Example gcc main.c libfoo.a
  • main.c calls a function f1 defined in libfoo.a
  • scanning left to right, when linker hits main.c,
    it will add f1 to the list of unresolved
    references
  • when linker next hits libfoo.a, it will look for
    f1 in the librarys object modules, see that it
    is found, and add the object module to the linked
    program
  • No errors are generated. A binary executable is
    generated.
  • Lesson 1 the order of linking can be important,
    so put libraries at the end of command lines
  • Lesson 2 an undefined symbol error can also
    mean that you
  • didnt link in the right libraries, didnt add
    right library path
  • forgot to define the symbol somewhere in your code

22
Linker Relocates Addresses
  • After resolving symbols, the linker relocates
    addresses when combining the different object
    modules
  • merges separate code .text sections into a single
    .text section
  • merges separate .data sections into a single
    .data section
  • each section is assigned a memory address
  • then each symbol reference in the code and data
    sections is reassigned to the correct memory
    address
  • these are virtual memory addresses that are
    translated at load time into real run-time memory
    addresses

23
Linked ELF Executable Object File
ELF executable object file
  • ELF executable object file contains following
    sections
  • ELF header (type, size, size/ sections)
  • segment header table
  • .init (programs entry point, i.e. address of
    first instruction)
  • other sections similar
  • Note the absence of .rel.tex and .rel.data -
    theyve been relocated!
  • Ready to be loaded into memory and run
  • only sections through .bss are loaded into memory
  • .symtab and below are not loaded into memory
  • code section is read-only
  • .data and .bss are read/write

ELF header segment header table .init .text .rodat
a .data .bss .symtab .debug .line .strtab Section
header table
24
Loading Executable Object Files
Run-time memory
  • Run-time memory image
  • Essentially code, data, stack, and heap
  • Code and data loaded from executable file
  • Stack grows downward, heap grows upward

User stack
Unallocated
Heap
Read/write .data, .bss
Read-only .init, .text, .rodata
25
Object Files are Relocatable
P1.exe
file P1.c
Source Code
Compiler cc1
Assembler as
Linker ld
P1.s
P1.o
Data
  • assembler generates relocatable object code .o
    in ELF format (UNIX) or PE format (Windows)
  • assembler doesnt generate absolute addresses,
    because
  • dont know to what other object files youll be
    linked with
  • the binary executable could be loaded anywhere in
    RAM
  • so relocate or translate the addresses to their
    proper memory locations later

26
What do Relocatable Object Files Contain?
  • In order to be relocatable, any line of code
    referencing a memory address must be flagged for
    relocation

27
Linking Multiple Objects Files Into an Executable
P1.c
main(int argc, char argv) -----
f1(parameters) ----- int
function1(parameters) -----
28
Generating a Programs Binary Executable
  • We program source code in a high-level language
    like C or Java, and use tools like compilers to
    create a programs binary executable

Program P1s Binary Executable
file P1.c
Source Code
Compiler
Assembler
Linker
P1.s
P1.o
Data
technically, there is a preprocessing step before
the compiler
29
A Relocatable Object File
P1.exe
file P1.c
Source Code
Compiler
Assembler as
Linker ld
P1.s
P1.o
Data
  • linker combines multiple .o relocatable object
    files into one binary executable file

30
(No Transcript)
31
Executing a Program
Main Memory
Program P1 binary
32
CPU Execution of a Program
Main Memory
  • Program Counter PC points to address of next
    instruction to fetch

Program P1 binary
CPU fetches next instruction
indicated by PC
Fetch any data needed
  • ALU Arithmetic Logic Unit

Write any output data
33
MultiprogrammingBatch Processing
  • Load program P1 into memory, called a job, and
    execute on CPU, running to completion
  • Then load program P2 into memory, and run to
    completion
  • or you could have multiple programs in memory,
    arranged in a queue, lined up waiting for the CPU
  • You would submit a batch job to the computer, and
    while the batch job was running, you could go
    play tennis, and then come back for the results
  • very non-interactive

34
Multiprogramming
Main Memory
Programs Executing on CPU
Time
P1
P1
P1 blocks on I/O
CPU is Idle! gt Poor Utilization, Billions of
Wasted Cycles
P2
P1 resumes
P1
P1 completes, P2 starts
P3
P2
35
Multiprogramming
  • What if Program P1 blocks waiting for something
    to complete?
  • waiting on I/O, e.g. waiting for a disk write to
    complete, or waiting for a packet to arrive over
    the radio
  • I/O can be very slow compared to CPU speed
  • then CPU is idle for potentially billions of
    cycles!
  • Better if CPU switches to another program P2 and
    begins executing P2
  • better utilization of the CPU

36
Multiprogramming
Main Memory
Programs Executing on CPU
Time
P1
P1
P1 blocks on I/O
OS Scheduler Switches CPU Between
Multiple Executing Programs
P2
P2 blocks, P3 starts
P2
P3
P3 completes, P1 resumes
P1
P3
37
Multiprogramming
Main Memory
  • CPU time-multiplexes between executable programs
  • programs share CPU
  • Memory is space-multiplexed between multiple
    programs
  • programs share RAM
  • Each program sees an abstract machine (provided
    by OS)
  • it has its own private (slower) CPU
  • it has its own private (smaller) memory

P1
P2
P3
38
Multitasking
  • Early computers were big mainframes
  • Wed like to share the memory and CPU of a
    mainframe not just between different programs or
    batch jobs, but also between different human
    users
  • Time sharing systems were developed
  • Give each user a very small slice of the CPU pie
    frequently

39
Multitasking
Main Memory
Programs Executing on CPU
Time
P1
P2
P3
P1
OS Scheduler Switches CPU Rapidly
Between Multiple Executing Programs
P1
P2
P3
P2
P3 finishes
P1
P2
P1
P3
P2
40
Multitasking
  • Enables interactivity
  • In the small time slice a program is given, it
    can draw a character on the screen that youve
    just typed - appearance of interactivity
  • In old time-sharing systems, depending on the
    load, it may take 15 seconds for the character to
    appear on screen! (learned to type ahead)
  • In time, this was applied to multiple programs on
    a PCs CPU
  • listen to MP3s while editing your documents -
    interactive multitasking

41
Operating Systems Course Overview
  • Chapter 3 OS Organization
  • Chapter 4-5 Hardware/Device Management
  • Single applications view OS provides hardware
    abstraction
  • Process Management
  • multiple application OS provides hardware
    abstraction, resource sharing and isolation
  • Memory Management
  • File Management
  • Security
  • Distributed OS

42
Operating System Abstraction Model
  • Multiprogramming, virtual memory, and other
    OS-related concepts seek to give each process an
    abstract representation of the machine
  • each Process has its own private memory or
    address space within which it executes and
    manipulates data
  • each Process has its own private CPU (slower than
    real CPU)
  • Well-defined interfaces to other resources
    (devices, shared memory, etc.)

43
Operating Systems Process Management
  • For example, in Process Management we will cover
  • Process definition, Address Spaces
  • Multithreading
  • Is a program application? Ex. threaded Web
    server as a multithreaded app versus
    multi-process app
  • a process defines an address space
  • multiple threads in a process can share an
    address space
  • A single application may spawn multiple processes
    and/or threads
  • Cuts down on context switch overhead, and allows
    rapid sharing of memory
  • Scheduling
  • Synchronization
  • Deadlock

44
Operating System Trends
  • Hardware support for operating systems has
    evolved too
  • Mode bit support in CPU
  • user mode vs. kernel/supervisor mode
  • early PCs did not have this support
  • Todays embedded microcontrollers also lack this
    support
  • Page faulting hardware and MMU
  • Lack of such HW support can allow user programs
    to accidentally or maliciously overwrite OS
    kernel code!

45
Done
46
Timeline
  • Single program view
  • OS only provides hardware abstraction
  • Not resource sharing and isolation
  • Multiprogramming view
  • OS provides hardware abstraction and resource
    sharing and isolation
  • Programs have to share
  • memory, CPU, hardware access, files, etc.

47
Timeline
  • Drill down abstraction of each component
  • each application as a program, a sequence of
    code/instructions
  • each program is stored on disk - permanent or
    non-volatile storage
  • as needed, programs are loaded into memory, need
    a way to share memory
  • programs in memory take turns executing on and
    sharing the CPU
  • In multitasking systems, take turns quickly, in a
    finely interleaved manner
  • Stay with the big picture - thats whats missing
    from these OS textbooks - component view

48
Timeline
  • Hardware and devices after another big picture
    intro
  • Bryant and OHallaron
  • interrupts
  • Traps
  • Signals
  • CPU mode bit - user mode vs kernel/supervisor
    mode
Write a Comment
User Comments (0)
About PowerShow.com