Wheres My Compiler - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Wheres My Compiler

Description:

Back End. Executable. Code. Compiler. Front End. Parse source code. Produce ... exports symbol table for use by debugger, not just internal to front-/back-end ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 47
Provided by: JimM142
Category:

less

Transcript and Presenter's Notes

Title: Wheres My Compiler


1
Wheres My Compiler?
  • Developer tools past, present, and future
  • Jim Miller
  • Software Architect, Developer Frameworks
  • Microsoft Corporation
  • (with help from Carol Eidt, Phoenix Project,
    Microsoft Corporation)

2
Outline
  • What Is A Compiler?
  • A Brief History of Developer Tools
  • My First Compiler
  • Compilers, compilers, everywhere

3
What Is A Compiler?
  • A converter from one representation (source code)
    to another (executable code)
  • Preserves (most of) the meaning of the source
  • One part of a modern tool chain used to produce
    executable artifacts (applications)

4
A Compiler
Compiler
5
Figures of Merit
  • Code Quality how efficient is the generated
    code?
  • Speed and Space these arent independent, but
    they arent the same either
  • Throughput how fast is the code generated?
  • Footprint how large is the compiler?

6
Outline
  • What Is A Compiler?
  • A Brief History of Developer Tools
  • My First Compiler
  • Compilers, compilers, everywhere

7
1950s Just a Compiler, Please
  • The compiler references a runtime, but the
    runtime is supplied by the OS at a fixed location
    in memory
  • FORTRAN runtime input/output formatting
  • COBOL runtime also search and sort
  • OS loader loads the compiler output into memory,
    transfers control
  • Address space is small (lt 8K word), CPU is slow
    (lt 1,000 instructions/sec.)
  • Figure of merit Code Quality
  • Compiler must optimize code for space
  • Compiler must optimize code for speed

8
Inside the Compiler (in concept)
Source Code
Front End
Back End
ExecutableCode
9
Inside the Compiler (in concept)
Source Code
Back End
ExecutableCode
10
Inside the Compiler (in concept)
Source Code
Front End
  • Linearize parse tree
  • Code Analysis
  • Basic block analysis
  • Control- and data-flow graph analysis
  • Optimize (machine-independent)
  • Redundant and dead code elimination
  • Code restructuring
  • Convert to executable code
  • Register allocation
  • Peephole optimization
  • Branch prediction and tensioning

Back End
ExecutableCode
11
1960s Linkers
  • Programs are growing in size
  • Programs are built with libraries
  • Libraries provide reusable code fragments
  • Virtual memory systems are invented
  • Tool chain is in two stages
  • Compile independent modules
  • Combine the modules using a linker
  • Figure of merit Code quality (speed)

12
Tools Compiler Linker
Includes external references
Linker
Executable Code
13
1970s Symbolic Debugger
  • OS written in high-level language
  • Compilers provide sufficient code performance and
    low-level access
  • High-level languages provide large runtime
    libraries in multiple units
  • Static linker pulls only required units into a
    given program image
  • Compiler exports symbol table for use by
    debugger, not just internal to front-/back-end
  • Figure of merit Code quality (speed)

14
Compiler, Linker, Debugger
Symbol table(s)
Linker
Debugger
Running Program
15
1980s Dynamic Loading, Threading
  • To improve OS performance, by reducing physical
    memory pressure, read/only parts of libraries are
    shared between applications
  • Loaded on first reference
  • OS loader fixes up references to shared libraries
    just like the static linkers
  • Not all libraries are loaded into the same
    virtual address
  • Concurrency issues addressed in programming
    languages
  • Locks, monitors, events, polling
  • Order of operations visible across thread
    boundaries
  • Memory model semantics become an issue
  • Ada introduces rendez-vous, other languages have
    other constructs
  • Tool chain
  • Compiler(s)
  • Linker
  • Loader
  • Symbolic debugger
  • Figure of merit Code quality (speed, but this is
    related to space)

16
OS Dynamic Loader
Includes fixups for shared code
Static Linker
Symbol table(s)
Image File
Image File
Image File
OS Loader
Debugger
Running Program
17
1990s JITs and Managed Runtimes
  • Garbage Collection goes mainstream
  • Previously LISP, APL, SmallTalk
  • 1990s Java, Jscript, C, VB
  • Verification requires runtime to analyze code
  • Verification is similar to front-end compiler
    work
  • Can be done to native code, but much simpler with
    an intermediate language
  • Just-in-time (JIT) compilation increases
    performance over pure interpretation
  • Typically by a factor of 5 to 15
  • Tool chain split the compiler in two!
  • Linearize the AST to create Intermediate Language
    (IL)
  • Save symbol table as metadata
  • Reorder the chain
  • Figures of merit Throughput first, code quality
    second

18
OS Dynamic Loader (repeat)
Includes fixups for shared code
Static Linker
Symbol table(s)
Image File
Image File
Image File
OS Loader
Debugger
Running Program
19
OS Dynamic Loader (repeat)
Source Code
Front End
Compiler
Back End
Object Code
Static Linker
Image File
OS Loader
Debugger
Running Program
20
Managed Runtime
Compiler
Compiler
Image File
OS Loader
DynamicLinker
Runtime
Back End
Debugger
Running Program
21
Managed Runtime
Metadata Intermediate Language
Compiler
Compiler
Image File
OS Loader
DynamicLinker
Runtime
Back End
Debugger
Running Program
22
2000s Reflection-based Computation
  • Reflection ability of a program to observe and
    possibly modify its structure and behavior
  • Compilers preserve meaning but runtime
    reflection makes more information visible, so
    optimizations are more limited
  • Metadata (symbol table) or equivalent needed at
    runtime, not just compile/link time
  • Interactive Development Environments (IDEs)
  • Intellisense
  • Refactoring
  • Interactive syntax analysis
  • Query Integration
  • Builds expression trees (ASTs) at compile time
  • Runtime operations to combine and manipulate them
  • Figures of merit
  • Compiler and JIT compiler throughput
  • Pre-JIT compiler balance of throughput and
    code quality

23
Runtime Reflection
Source Code
Metadata Intermediate Language
Front End
DevelopmentEnvironment
Image File
OS Loader
DynamicLinker
Metadata(symbol table)
Back End
Debugger
Running Program
24
Outline
  • What Is A Compiler?
  • A Brief History of Developer Tools
  • My First Compiler
  • Compilers, compilers, everywhere

25
1970 Numbles
  • Number puzzles for Nimble minds
  • Column in Computers and Automation
  • Numble verifier written by Stuart Nelson
  • Input language
  • SEND MORE MONEY
  • Output a program to try all possible values for
    letter assignments to digits
  • Handled , -, , and
  • Hand coded in PDP-9 assembly language

26
Outline
  • What Is A Compiler?
  • A Brief History of Developer Tools
  • My First Compiler
  • Compilers, compilers, everywhere
  • Free-standing compilers
  • Under the hood
  • Inside applications
  • In the tool chain
  • Inside libraries

27
Special-Purpose Compilers
  • Compile-to-hardware
  • Aspect-Oriented Programming (AOP) weaver
  • Parser finds new syntax to mark insertion points
  • Back-end inserts code snippets for different
    aspects
  • More generally assembly rewriting
  • Work-flow and object design languages
  • Input may be textual or graphic layouts
  • Output may be code or graphic designs

28
Mark-up Compilers
  • XML schema (or DTD)
  • Output parser
  • Output deserializer
  • Web-services Description (WSDL)
  • Output proxy that parses input and dispatches
  • Output code to convert data structure to XML
    (serializer)
  • XAML (Windows Presentation Framework)
  • Output parser
  • Output executable code

29
Outline
  • What Is A Compiler?
  • A Brief History of Developer Tools
  • My First Compiler
  • Compilers, compilers, everywhere
  • Free-standing compilers
  • Under the hood
  • Inside applications
  • In the tool chain
  • Inside libraries

30
Modern Hardware CPU
  • Compile machine code to micro code
  • CPU Architecture is the abstraction boundary
  • RISC vs CISC is an old debate
  • x86 and x64 are CISC on the outside, RISC on the
    inside
  • Part of the instruction cache
  • Engineering note an icache miss now often means
    a pause to compile in addition to a memory fetch!
  • Allows innovation in actual hardware while still
    running existing code
  • Chips optimized for specific usage scenarios
  • Chips take advantage of materials science
    advances
  • Chips take advantage of new internal
    architectures (multi-core)

31
Modern Hardware Graphics
  • Graphics memory isnt just for data
  • Very sophisticated compilation steps
  • Parallel execution with CPU
  • Adapts to changing hardware organization
  • Raster scan vs vector
  • Resolution, speed, synchronization
  • Adapts to predominant usage pattern
  • Animation
  • 3D
  • Shading

32
Outline
  • What Is A Compiler?
  • A Brief History of Developer Tools
  • My First Compiler
  • Compilers, compilers, everywhere
  • Free-standing compilers
  • Under the hood
  • Inside applications
  • In the tool chain
  • Inside libraries

33
Databases
  • SQL is a full programming language
  • Compiled to intermediate form on client
  • Intermediate form is passed to server for
    execution
  • Server optimizes the intermediate form to produce
    an execution plan
  • Query optimization
  • Additional inputs include
  • Size of tables
  • Frequency of query types
  • Indexing information
  • Outputs include
  • Executable code
  • Temporary indexes
  • Background indexing requests
  • Updated frequency information

34
Hardware Emulators
  • Object code translation at runtime
  • HP3000 to PA-RISC in 1983
  • Vax to Alpha in 1990s
  • 32-bit programs on 64-bit hardware
  • Alternate hardware emulation
  • Device emulators for everything from smart cards
    to cell phones to iPod to pocket PCs
  • JIT compilation trades start-up time for high
    performance execution
  • Often, but not always, a good trade-off

35
Code Analysis Tools
  • Analyzing API surface
  • Simple to do with front end ASTs
  • Remodularizing implementation
  • Requires static and dynamic dependency analysis
    normal compiler back end work
  • Requires rebuilding the program, easily done
    using front end ASTs
  • Race detection
  • Instrument code at compile time
  • Gather data as it runs under high stress

36
Tree Shakers
  • Start with AST tree and appropriate dependency
    graph
  • Pull AST nodes found starting at a given graph
    node, recursively
  • Convert resulting set of AST nodes to appropriate
    output format
  • Example uses
  • Subset library based on initial set of types
  • Statically link subset of library for a given
    application

37
Outline
  • What Is A Compiler?
  • A Brief History of Developer Tools
  • My First Compiler
  • Compilers, compilers, everywhere
  • Free-standing compilers
  • Under the hood
  • Inside applications
  • In the tool chain
  • Inside libraries

38
A Modern Interactive Development Environment (IDE)
  • Code editor
  • Knows the programming language, provides syntax
    support and context-sensitive name lookup
  • Project system
  • Tracks the public shape of components
  • Tracks dependencies between components
  • Build system
  • Orders clean-up, compile, and link operations
  • Debugger
  • Allows inspection and modification of values at
    runtime
  • Allows control operations (e.g., breakpoint,
    continue, restart)
  • Dynamic Support
  • Allows program modification interwoven with
    execution (edit and continue)
  • Global interaction space (read-eval-print loop)

39
Compilers in the IDE (I)
  • In the code editor
  • Incrementally parses the code as it is being
    entered. Note must deal with incorrect syntax
    and partial programs.
  • Suggests possible completions based on a symbol
    table. Note symbol table must include external
    references maintained by the project system.
  • Refactoring operations require both syntactic and
    semantic analysis. Note refactoring requires
    information maintained by the project system.
  • In the debugger
  • Expression evaluation

40
Compilers In the IDE (II)
  • Dynamic support
  • Edit-and-continue
  • Requires a full, incremental compiler
  • For efficiency, it also requires the ability to
    compress the output as a diff between the
    original and the new code
  • Interactive workspace
  • Like LISP, APL, SmallTalk, Python, etc.
  • Requires
  • a compiler or
  • an interpreter -- really, a compiler front end to
    generate an AST combined with a tree walker to
    execute the tree.
  • The compiler must be capable of generating code
    that uses code and objects resident in the
    evaluation environment, which generally means a
    reliance on reflection.

41
Compilers in the Linker
  • The linker sees the whole program, so its
    better positioned to do global analysis
  • Solution write a compiler
  • Input language is object file format (native code
    or IL)
  • Output language is OS image file format
  • Optimizations
  • Aggressive in-lining across module boundaries
  • Code motion across module boundaries
  • Full type system analysis (treat leaf types as
    sealed)
  • Issues
  • These flow graphs are big
  • The linker doesnt see the whole program (dynamic
    linking)
  • Reflection and dynamic linking reduce permitted
    optimizations
  • Or require the ability to back out or recompute
    optimizations at runtime

42
Profile-Guided Optimization
  • Idea Instrument the program, run it with typical
    loads, then re-optimize using this profiling
    data. (Similar to Hotspot)
  • Optimizations
  • Optimize only hot code fragments
  • So you can spend more time on them
  • Method and basic block reordering to increase
    code density
  • Code reordering to optimize branch prediction and
    minimize long references
  • Cache locality optimizations for data and code

43
Outline
  • What Is A Compiler?
  • A Brief History of Developer Tools
  • My First Compiler
  • Compilers, compilers, everywhere
  • Free-standing compilers
  • Under the hood
  • Inside applications
  • In the tool chain
  • Inside libraries

44
For the Developer
  • Regular expression parsing
  • Grammar is usually more powerful than regular
    expressions
  • Serialization and Deserialization
  • Reflects on data type to be marshalled
  • Generates specialized code to convert to stream
    format (serialization) or parse into in-memory
    format (deserialization)

45
For the Compiler Writer
  • Parser-generators
  • lex
  • yacc
  • AST tool kits
  • Microsoft is investing in this area
  • Provides integration into may aspects of the IDE
  • Executable file format tool kits
  • Queensland University of Technology PERWAPI
  • Optimization tool kits
  • Microsofts Phoenix project

46
Questions?
Write a Comment
User Comments (0)
About PowerShow.com