Research Overview for LLVM Group - PowerPoint PPT Presentation

About This Presentation
Title:

Research Overview for LLVM Group

Description:

University of Illinois at Urbana-Champaign. Joint work with: Chris Lattner, Dinakar Dhurjati, Sumant Kowshik ... Example: Microsoft Longhorn (basis of Vista) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 48
Provided by: vikr99
Category:

less

Transcript and Presenter's Notes

Title: Research Overview for LLVM Group


1
Automatic Pool Allocation Compile-Time Control
Over Complete Pointer-Based Data Structures
Vikram Adve
University of Illinois at Urbana-Champaign
Joint work with Chris Lattner, Dinakar
Dhurjati, Sumant Kowshik
Thanks NSF (CAREER, Embedded02, NGS00, NGS99,
OSC99), Marco/DARPA
2
Why Does Data Layout Matter?
Performance Working sets Spatial
locality Temporal locality Heap allocation
overheads
Security Buffer overruns Dangling
pointers Uninitialized pointers
S/w Reliability Dangling pointers Checkpointing St
atic bug detection Static data race detection
  • and complex heap-based data structures are
    ubiquitous.

3
Compiling Pointer-Intensive Codes Today
  • Current analyses and transformations focus on
    primitives
  • disambiguate individual loads and stores
  • optimize individual loads and stores
  • reorder, split, or merge individual data types
  • Q. Can compilers manipulate entire logical data
    structures?
  • A list?
  • A tree of linked lists?
  • A hashtable?
  • A graph?

4
List 1 Nodes
List 2 Nodes
Tree Nodes
5
Why Segregate Data Structures into Pools?
  • Programs are designed around data structures
  • Direct benefit of segregation Better performance
  • Smaller working sets
  • Improved spatial locality
  • Sometimes convert irregular to regular strides
  • Primary Goal Better compiler information
    control
  • Compiler knows where (sets of) data structures
    live in memory
  • Compiler knows order of data in memory (in some
    cases)
  • Compiler knows type information ? runtime
    points-to graph
  • Compiler knows which pools point to which other
    pools
  • Compiler knows bounds on pool lifetimes

6
Outline
  • Automatic Pool Allocation
    LAPLDI05
  • Using Pool Allocation to Improve Performance
  • Use 1 Improving heap locality, performance
  • Use 2 Transparent pointer compression
    LAMSP05
  • Using Pool Allocation for Bug Detection, Security
  • Use 3 Detecting buffer overruns fast and
    transparently DAICSE06
  • Use 4 Detecting all dangling pointer errors fast
    DASubmitted
  • Use 5 SAFECode ...
  • SAFECode A Safe Execution Environment for C/C
  • Sound program analysis, memory safety for full C
    DKAPLDI06
  • Memory safety for type-safe C
    DKALTECS05

7
  • Automatic Pool Allocation
  • The transformation algorithm

Lattner and Adve, PLDI 2005 (Best Paper Award)
8
Pool Allocation Current Approaches
Compiler has no information about pool properties
  • Current Manual Pool Allocation
  • Via library By class (e.g., C STL), scope, or
    data structure
  • Via language support By scope or data structure
  • Automatic Region Inference for ML (Tofte
    Birkedal, Aiken)
  • By lifetime only, e.g., stack of regions
  • Limited destructive updates

Goal is memory management, not layout control,
not DS separation
  • Never automated before
  • Imperative languages including C, C,
  • Pool allocation by logical data structures

9
Pool Allocation The Key Insight
  • Partition heap objects according to the results
    of some pointer analysis.
  • The pointer analysis representation we use is
    called a Data Structure Graph (DS Graph).

10
DS Graph Properties
  • int G
  • void twoLists()
  • list X makeList(10)
  • list Y makeList(100)
  • addGToList(X)
  • addGToList(Y)
  • freeList(X)
  • freeList(Y)

11
DS Graph forOlden MSTBenchmark
Key Insight Fully context-sensitive points-to
graph identifies data structure instances Fully
context-sensitive ? Identify objects by full
acyclic call paths
12
DS Graph for Olden EM3D Benchmark
13
DS Graph for Olden Power Benchmark
  • Olden-Power Benchmark
  • build_tree()
  • t malloc()
  • t-gtl build_lateral()
  • build_lateral()
  • l malloc()
  • l-gtnext build_lateral()
  • l-gtb build_branch()

14
Automatic Pool Allocation Overview
  • Segregate memory according to points-to graph
  • N graph nodes ? 1 pool (default 1-to-1)
  • Retain explicit free() for objects

Points-to graph (two disjoint linked lists)
Pool 1
Pool 2
15
Points-to Graph Assumptions
  • Specific assumptions
  • Separate points-to graph for each function
  • Unification-based graph
  • Can be used to compute escape info
  • Use any points-to that satisfies the above
  • Our implementation uses DSA LattnerPhD
  • Infers C type info for many objects
  • Context-sensitive
  • Field-sensitive analysis
  • Results show that it is very fast

DSApool allocation time lt 3 of GCC -O3 for all
tested programs.
16
Pool Allocation Example
  • list makeList(int Num)
  • list New malloc(sizeof(list))
  • New-gtNext Num ? makeList(Num-1) 0
  • New-gtData Num return New
  • int twoLists( )
  • list X makeList(10)
  • list Y makeList(100)
  • GL Y
  • addGToList(X)
  • addGToList(Y)
  • freeList(X)
  • freeList(Y)

Change calls to free into calls to poolfree ?
retain explicit deallocation
17
Pool Allocation Algorithm Details
  • Indirect Function Calls
  • call fp1 arg1 argN fp1 ? F1, F2
  • call fp2 arg1 argN fp2 ? F2, F3
  • Must pass same pool arguments to F1, F2 and F3
  • Partition functions into equivalence classes
  • If F1, F2 have common call-site ? same class
  • Merge points-to graphs for each equivalence class
  • Apply previous transformation unchanged
  • Pools reachable from global variables
  • Such a pooldesc is a runtime constant, so make
    it global also
  • See paper for details LAPLDI05

18
Two Further Refinements
  • (1) Eliminating poolfree()
  • poolfree() just before pooldestroy() is
    redundant
  • This is effectively Static Garbage Collection !
  • DS Create(P)
  • ProcessData(DS)
  • Free(DS, P) // redundant if ...
  • pooldestroy(P)
  • (2) Reducing Pool Lifetimes
  • Pools need not be created / destroyed at function
    boundaries
  • Intraprocedural flow analysis to create later,
    destroy earlier
  • Can be extended interprocedurally Aiken et al.,
    PLDI 96

19
Pool Allocation Properties
  • Strengths
  • Transparent Fully automatic for any LLVM program
  • Static Map Every pointer var/field points to
    unique, known pool
  • Pool Type Information Many type-homogeneous
    pools
  • Lifetimes Lifetime of every pool is bounded
  • Pool Points-to Graph Compiler knows which pools
    contain pointers to every pool, and vice versa
  • Limitations
  • No deallocation No automatic deallocation of
    items in pools
  • Unsafe No guarantee of memory safety
  • Lifetimes Pools reachable from global vars have
    global lifetime
  • Missing type info Type-unsafe objects (DS nodes)

20
  • Use 1 of Pool Allocation
  • Improving performance of heap-intensive codes

Lattner and Adve, PLDI 2005
21
Simple Pool Allocation Statistics
91
DSAPool allocation compile time is small less
than 3 of GCC compile time for all tested
programs. See paper for details
22
Pool Allocation Speedup
  • Several programs unaffected by pool allocation
  • 10-20 speedup across many pointer intensive
    programs
  • Some programs (ft, chomp) order of magnitude
    faster

23
Cache/TLB miss reduction
Miss rates measured with perfctr on AMD Athlon
2100
  • Sources
  • Defragmented heap
  • Reduced inter-object padding
  • Segregating the heap!

24
Chomp Access Pattern with Malloc
25
Chomp Access Pattern with PoolAlloc
26
FT Access Pattern With Malloc
  • Heap segregation has a similar effect on FT
  • See Lattners Ph.D. thesis for details

27
Pool Specific Optimizations
  • Different Data Structures Have Different
    Properties
  • Pool allocation segregates heap
  • Optimize using pool-specific properties
  • Examples of properties we look for
  • Pool is type-homogenous
  • Pool contains data that only requires 4-byte
    alignment
  • Opportunities to reduce allocation overhead

28
Looking closely Anatomy of a heap
  • Fully general malloc-compatible allocator
  • Supports malloc/free/realloc/memalign etc.
  • Standard malloc overheads object header,
    alignment
  • Allocates slabs of memory with exponential growth
  • By default, all returned pointers are 8-byte
    aligned
  • In memory, things look like (16 byte allocs)

4-byte padding for user-data alignment
4-byte object header
16-byte user data
One 32-byte Cache Line
29
Pool-Specific Optimizations
  • Selective Pool Allocation
  • Dont pool allocate when not profitable
  • PoolFree Elimination
  • poolfree redundant if followed by pooldestroy
  • Bump-pointer allocation if pool has no
    poolfree
  • Eliminate per-object header
  • Eliminate freelist overhead (faster object
    allocation)
  • Type-safe pools infer a type for the pool
  • Use 4-byte alignment for pools we know dont need
    it

30
PAOpts (3/4) Bump Pointer Optzn
  • If a pool has no poolfrees
  • Eliminate per-object header
  • Eliminate freelist overhead (faster object
    allocation)
  • Eliminates 4 bytes of inter-object padding
  • Pack objects more densely in the cache
  • Interacts with poolfree elimination (PAOpt 2/4)!
  • If poolfree elim deletes all frees, BumpPtr can
    apply

16-byte user data
16-byte user data
16-byte user data
16-byte user data
One 32-byte Cache Line
31
PAOpts (4/4) Alignment Analysis
  • Malloc must return 8-byte aligned memory
  • It has no idea what types will be used in the
    memory
  • Some machines bus error, others suffer
    performance problems for unaligned memory
  • Type-safe pools infer a type for the pool
  • Use 4-byte alignment for pools we know dont need
    it
  • Reduces inter-object padding

4-byte object header
16-byte user data
16-byte user data
16-byte user data
16-byte user data
One 32-byte Cache Line
32
Pool Optimization Speedup (FullPA)
PA Time
  • Baseline 1.0 Run Time with Pool Allocation
  • Optimizations help all of these programs
  • Despite being very simple, they make a big impact

33
  • Use 3 of Pool Allocation
  • Detecting buffer overruns fast and transparently
  • Dhurjati and Adve, ICSE 2006, to appear

34
Array Bounds Errors
  • Most common reason for security attacks
  • Over 50 of attacks reported by CERT
  • 1988 First exploited
  • 2006 Continues to get exploited

Key problem Tracking target object of each
pointer is very expensive (without fat pointers)
35
Jones-Kelley Transparent Bounds Checking
(, ) (p,n 4) (, )
ref lookup(q)
Check(ref, r)
Idea Register all array objects in a global
splay tree lookup on every pointer calculation
Advantage Backwards-compatible no wrappers
needed Problem 4-5x slowdowns (up to 12x for
Ruwase-Lam extension)
36
Separate search tree per pool
ref lookup(P1,q)
  • 3 Key Insights
  • Splay tree for a pool should be (very) small.
  • In fact, 2-element cache works great!
  • Pool for each pointer is known!
  • In type-homogeneous pools, can distinguish (and
    ignore) scalars.

Check(ref, r)
37
Experimental Results
  • Dramatic improvement in lookup overheads
  • Average overhead 12 for Olden (34, 69 for 2
    cases)
  • lt 4 for 2 system daemons
  • Compares with 5x-6x for original Jones-Kelly.
  • Up to 11x-12x for Ruwase-Lam extension (which we
    use).
  • Effective in finding bugs
  • Zitsers suite models 14 buffer overruns in
    sendmail (7), wu-ftpd (4), bind (3)
  • All 14 detected successfully.

Caveat Like J-K, doesnt work for casts from
pointers to int and back
38
  • Use 5 SAFECode
  • A Safe Compilation Strategy for C/C Programs
  • Sound analysis Dhurjati and Adve, PLDI 2006,
    to appear
  • Formal proof of soundness is in accompanying
    technical report TR UIUCDCS-R-2005-2657.
  • Memory safety Dhurjati et al., PLDI 2006, TECS
    2005

39
Safe Languages Provide Basic Guarantees
e.g., Java, C, Modula -3, ML
  • Prevent memory access violations
  • Detect errors during development
  • Enable sound compile-time analyses
  • e.g. in tools for safety checking, model
    checking, program verification

Often ignored
Weakly typed languages like C, C do not provide
any of these benefits
40
Why care about C/C?
  • Huge body of essential legacy software
  • Dominant in critical domains OS kernels,
    embedded systems, daemons, language run-time
    systems.
  • Example Microsoft Longhorn (basis of Vista)?
  • Less than 25 in C Amitabh Srivastava,
    CGO 04 keynote address
  • Mostly high level components, e.g., windowing
    system
  • Performance critical code still in C/C

The features that make C/C popular for system
software are the features that make C/C
unsafe Nested structs stack-allocated objects
untagged unions explicit free custom
allocators.
41
Current Solutions
Solution Overhead No memory violations Error checking Sound static analysis
Purify, Valgrind Several 100x - some -
SafeC 5x - some -
Jones-Kelley 5-6x - some -
SFI Over 2x y - -
FisherPatil 2x-6x Y Y -
Yong Over 2x - some -
SAFECode 0-30 Y some Y
CCured Upto 1.87x Y some Y
Cyclone 1x-2x Y some y
Pure C
Modified C
42
SAFECode Compiler and Run-time System
  • A typed assembly language (LLVM)
  • Language-independent
  • Simple, transparent runtime system
  • Sound analysis and memory safety
  • Heap safety via Automatic Pool Allocation
    run-time checks
  • Stack safety via Data Structure Analysis (DSA)
    heap conversion
  • Array safety via pool checks or precise array
    bounds checks

Initially, for type-safe C, with restricted
pointer casts TECS 2005 Now, for nearly
arbitrary, unmodified C programs PLDI 2006
43
Guaranteeing Static Analysis
  • Many program verification tools build on alias
    analysis, call graph, assumed type information
  • E.g., SLAM, ESP , BLAST
  • Memory errors can invalidate these analyses
  • Detecting all memory errors is expensive
  • Dangling pointer errors
  • Precise array bounds errors

Solution Enforce key analyses in the presence
of some memory errors Alias analysis, call
graph, type information.
44
What is Alias Analysis
A static summary of memory objects and their
connectivity
struct List head makeList(20)
int P4 Pi . struct List Q (Struct
List )P Q-gtval
TK Type Known, TU Type Unknown
45
Memory errors invalidate alias analysis
struct List tail, head
int B4
TU
head.field1 tail
Tmp (struct List)B
Tmp-gtfield6 .. //could corrupt head.field1
  • head.field1 could point any where in memory
  • pointer analysis incorrect
  • head.field1 could corrupt memory of another TK
    node

46
Enforcing Alias Analysis
  • Problem 1
  • Must ensure that tmp points to
    an object in this points-to
    set
  • With normal allocation
  • Objects are scattered in memory
  • Checking set membership at run-time is extremely
    expensive
  • Insight1
  • Automatic Pool Allocation partitions heap
    corresponding to nodes in the graph. These
    partitions are compact and can be checked
    efficiently!

Caveat Currently only flow-insensitive,
unification based
47
Enforcing Alias Analysis
  • Problem 2
  • Checking every pointer access or initialization
    is still very expensive
  • Insight 2
  • Ignoring memory errors, any pointer obtained
    from TK pool already has correct aliasing
    behavior.
  • Pointers obtained from other pools will be
    explicitly checked
  • Poolcheck(PP, p , align)
  • Mask lower k bits of p, look in hash table of
    page addresses in PP
  • Alignment check if array references in TK pool

48
Tolerating Dangling Pointers
  • Problem 3
  • But memory errors (dangling pointer errors, array
    bounds violations) could corrupt locations in TK
    pools
  • Insight 3 (also used for type-safe C w/o GC)
  • Reallocating a freed block to a new request of
    the same type cannot cause any type violation or
    (in the same pool) aliasing violation, despite
    dangling pointers.
  • Only array references in TK pools must be
    checked (can optimize)
  • Poolcheck(PP, p , align).

49
Evaluation of Run-time Overhead
  • Programs Olden, Ptrdist, 3 system daemons
  • No source changes necessary
  • Compared Olden with Ccured.

Program SAFECode ratio CCured ratio
bh 1.03 1.31
bisort 1.00 0.97
em3d 1.27 1.49
treeadd 0.99 2.72
tsp 0.99 1.23
yacr2 1.30 -
ftpd 1.00 -
fingerd 1.03 -
Max 1.30 2.72
50
  • Summary

51
What Could You Do With Pool Allocation?
  • Embedded Systems
  • Pointer compression, data compression for
    embedded codes
  • Data partitioning for explicit local memories /
    buffers / tiles
  • Power savings for dead / dormant pools
  • Dependable Systems
  • Efficient checkpointing by ignoring unmodified
    pools
  • Efficient replicated execution for servers
  • Focusing instrumentation for program testing
  • High Performance Systems
  • Data-structure-centric profiling
  • Linked pointer prefetching

52
Summary
  • Automatic Pool Allocation
  • Gives compilers information about data structure
    layouts, lifetimes, points-to information
  • SAFECode
  • A sound execution strategy for C, C
    programs enable sound analysis, enforce memory
    safety.

llvm.cs.uiuc.edu
53
llvm.cs.uiuc.edu
Write a Comment
User Comments (0)
About PowerShow.com