Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages

1 / 36
About This Presentation
Title:

Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages

Description:

wwf-J-prog means 'weak well-formedness', which is defined by the following properties: Number of parameter types and of parameter names are equal ' ... –

Number of Views:42
Avg rating:3.0/5.0
Slides: 37
Provided by: Tos1
Category:

less

Transcript and Presenter's Notes

Title: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages


1
Secure Compiler Seminar 9/12Survey on Design
ofSecure Low-Level Languages
  • Toshihiro YOSHINO, Yonezawa Lab.
  • lttossy-2_at_yl.is.s.u-tokyo.ac.jpgt

2
A Secure Low-Level Language is Needed
  • As the secure compiler target language
  • secure means it has a method to prove programs
    properties
  • Memory safe, control flow safe,
  • For this, its concrete formal model should be
    given
  • It should also below-level
  • To reduce complexity of JIT compiler (In other
    words, TCB)

Secure Compiler
Verifier
JIT compilation
3
Existing Researches
  • Two major approaches
  • TAL, PCC
  • Extension to conventional assembly languages
  • Utilizes certain logic (such as type theories) to
    prove safety
  • Virtual Machines
  • Introduce intermediate languages of their own
  • Many of them adopt safe-by-construction design
  • e.g. Java VM is semantically safe in memory
    operation
  • Java VM, Microsoft CIL, mvm Franz et al.
    2003,Jinja Klein et al. 2006, ADL Yoshino
    2006,

4
Comparison PCC vs. Java VM
Machine Independent
  • PCC
  • (Extended) Machine code Proof
  • Generated code and proof are machine-dependent
  • Requires one VC implementation for each
    architecture
  • Java VM
  • Verifier ensures type and control flow safety
  • It often restricts optimizations
  • Leads to performance degradation
  • High cost to perform verification
  • Stack is nothing more than a set of untyped
    (variable-number) registers

Machine Dependent
5
Limitations ofJava Bytecode Verification
  • Initialization check is incomplete
  • Correlation with other variables is not taken
    into account
  • Example
  • class Test int test(boolean b) int i
    try if(b) return 1 i 2 finally
    if(b) i 3 return i
  • Incomplete common subexpression elimination
  • Cannot eliminate c.s. in address calculations
    (array refs)

6
Then What Should We Do?
  • Maximize machine-independent part
  • Avoids porting cost of the system
  • Tradeoff against the size of TCB (Trusted
    Computing Base)
  • But recent works in PCC and TAL (e.g.
    Foundational PCC) aim solely to minimize TCB
  • Reduce proof size and generation cost
  • PCC requires much effort to produce proof,
    because the targets level is very low
  • Registers and memory are untyped, etc.

7
mvm Franz et al. 2003
  • Aimed to find the semantics level that
  • Is effective at supporting proof-carrying code
  • Can also be translated efficientlyinto highly
    performing native code(on many platforms)
  • Separated design between VMlayer and PCC layer

1 M. Franz et al. A Portable Virtual Machine
Target for Proof-Carrying Code. IVME 03.
8
mvm Virtual Machine Design
  • Register-based architecture
  • The number of registers is not bounded
  • Registers are categorized by the type of
    values Integer, Boolean, Pointer, Address
  • Pointer registers are used to store pointers to
    heap objects (more specifically, array heads)
  • Address registers are for storing results of
    address arithmetic
  • Bounds check is not performed in arithmetic, so
    it has to be checked in higher layer
  • Heap can be used to store objects
  • Heap model is explained next

9
mvm Virtual Machine Design
mvm Virtual Machine
Integer
Boolean
Pointer
Address
label1 instr instr label2
Heap
10
mvm Heap Model
  • mvm heap consists of arrays of objects
  • Object representation in mvm
  • Each object is tagged
  • Tag can only be written with new operation and is
    immutable after creation
  • Two sections of data area values and pointers
  • Integers, booleans are stored into the first
    section
  • Pointers are stored into the second section

1
42
11
mvm Heap Model
  • A type is associated with its tag value, layout
    and structure
  • This association is managed by compiler
  • Layout describes the sizes of data sections
  • Structure describes the possible substructure
    inside pointer section
  • Example of type information

means disjunction
datatype T Int of int Pair of int
int T list
ltgt means a tuple
12
mvm Heap Model
  • Example of a T list object tree

intint
T
T list
13
mvm Instructions
  • Arithmetics, Logical calculation
  • Similar to many other languages -)
  • Branch
  • Unconditional goto label
  • Conditional brtrue bi, label / brfalse bi, label
  • Condition must be taken from a boolean register
  • Jump is allowed only to a label
  • Conditional by object tag (RTTI) iftag

14
mvm Instructions
  • Object creation and access
  • pj new(tag, ik)
  • Creates an array of ik objects with type tag
  • r load(sizev, sizep tag, pk, offset)
  • store(sizev, sizep tag, pk, offset, r)
  • sizes and tag are used to check memory safety
  • Pointer registers and address registers
  • Object access also permit address registers ak
  • This distinction is for supporting garbage
    collection
  • Address registers always contain derived
    pointers

15
mvm Instructions
  • Accessing arrays
  • an adda(sizev, sizep tag, pk, il)
  • Calculates address of the il-th element in an
    array of type tag stored at pk
  • in getlen(pk)
  • Guards
  • Bounds checking CHECKLEN(pk, il)
  • Validity checking CHECKNOTNULL(pk)
  • Type checking CHECKTAG(pi, sizev, sizep tag)
  • These guards are inserted when static checking
    failed

16
An Example mvm Program
17
Type Safety in mvm Programs
  • Operations on primitives are all type-safe
  • Because registers to store values are distinct
  • Type-safety proofs are needed only for
    non-primitive operations
  • Pointers, arrays and records
  • For every pointer operation, check that result
    pointer
  • Points to the beginning of an array, record or
    value
  • Points to an object of the correct type

18
Jinja Klein et al. 2006
  • A Java-like programming language built on
    Isabelle/HOL
  • Formal description of Jinja language, Jinja VM
    and compiler are given
  • Several properties were machine-checked
  • Big step evaluation and small step evaluation
    (atomic operations) are equivalent
  • Compiler correctness

2 G. Klein, T. Nipkow. A Machine-Checked Model
for a Java-Like Language, Virtual Machine,
and Compiler. TOPLAS 28(4), 2006.
19
Jinja Language
  • Jinja is not Java
  • Object-oriented language with exceptions
  • A program is a set of class definitions and,a
    class consists of several fields and methods
  • Method body is an expression
  • Overriding is supported as in Java
  • But not overloading, because it is complicated
  • Language is statically typed
  • Type system ensures that the execution of a
    well-typed program never gets stuck

20
Jinja Language Language Elements
  • Values
  • Boolean Bool b, Integer Intg i, Reference Addr a
  • Null reference Null, Dummy value Unit
  • Expressions
  • Val v , binary operations e1 op e2 , Var V , V
    e , e1 e2 ,
  • Conditional if (e) e1 else e2 , while (e) e /
    Block VT, e
  • Object construction new C
  • Casting Cast C e
  • Field access e.FD , e.FD e
  • D is annotation added in preprocessing (e.g., by
    typechecker)
  • Method call e.M(e, e, )
  • Exception throw e , try e1 catch(C V) e2

21
Jinja Language Semantics
  • Big step semantics
  • Typical operational semantics
  • State ltHeap, Local Variablesgt
  • Detail abbreviated because nothing special
  • Small step semantics
  • Finer-grained semantics
  • One-step evaluation
  • Useful for formalizing parallelism (?)
  • Each (small) operation is considered atomic
  • Not discussed in the paper

22
Jinja Language Semantics
  • Big and small semantics are proven to be
    equivalent
  • wwf-J-prog means weak well-formedness, which is
    defined by the following properties
  • Number of parameter types and of parameter names
    are equal
  • this is not included in parameter list
  • Free variables in the method body only refer to
    this or these parameters

23
Jinja VM
  • Similar to Java VM
  • Stack-based machine with heap
  • State ltaddr option, heap, frame listgt
  • First element is possibly a generated exception
  • Third element is a call-stack
  • Frame ltstack, registers, cname, mname, pcgt
  • where stack value list, registers value list
  • Evaluation of operands are done on stack
  • Registers are for storing local variables

24
Jinja VM Instructions
  • Basic operations
  • Push v / Pop
  • Register operations Load n / Store n
  • Arithmetics IAdd,
  • Logical operations CmpEq,
  • Object manipulation
  • Construction New cname
  • Casting Checkcast cname
  • Field access Getfield vname cname / Putfield
    vname cname
  • Method invocation Invoke mname n

25
Jinja VM Instructions
  • Control flow operation
  • Branching Goto n / IfFalse n
  • n is relative offset from the instruction
  • Exit from a method Return
  • Exception
  • Throwing an exception Throw
  • Information about exception handlers (try-catch)
    are attached to method declarations
  • Handler is retrieved from there when needed

26
Jinja VM Semantics
  • Please refer to the paper for detail
  • Basically, straightforward and intuitive
  • In this level, there are no runtime checks
  • For example, IAdd (Integer addition) does not
    check whether its argument is really integers
  • Otherwise, the result is unspecified
  • This kind of checks is performed by a bytecode
    verifier

27
Jinja VM Bytecode Verification
  • JVM relies on the following assumptions
  • Types are correct
  • No overflow or underflow in stack
  • Code containment
  • Register initialization before use
  • Just the same as Java VM
  • Bytecode verifier statically ensures these
    assumptions

28
Jinja VM Bytecode Verification
  • Abstract interpretation
  • Instead of values, consider only types

State
State Type
29
Jinja Compiler and its Correctness
  • 2-staged compilation
  • Map parameter names to register indices
  • Assign local variables to registers
  • Gather variable occurrences and use it to lookup
  • Code generation
  • expression ? instruction list (compE2)
  • Straightforward definition
  • Exception table generation (compEx2)
  • Separated from compE2, because exception table
    must contain global addresses

30
Jinja Compiler and its Correctness
  • Correctness of compilation
  • If a program is weakly well-formed, then

Heap,Vars
Heap,Frame
Jinjaprogram
JVMbytecode
compilation
Heap,Vars
Heap,
31
Implementation of Jinja
  • http//afp.sourceforge.net/entries/Jinja.shtml
  • About 20kLoC in Isabelle/HOL
  • Over 1,000 theorems are defined
  • It takes about 25 min. to process these proofs on
    a 3GHz Pentium 4 machine with 1GB RAM

32
Summary of Todays Talk
  • We would need a secure low-level language for the
    target of a secure compiler
  • Minimize machine-dependent part to reduce
    implementation cost
  • Also reduce cost for proof generation
  • To answer this, surveyed two VM projects
  • mvm
  • Aimed to find the sweet spot that reconciles
    high performance and small type-safety proofs
  • Jinja
  • Constructed a unified formal model of a Java-like
    language, the underlying VM and compiler
  • In contrast to mvm, this research is oriented
    toward higher-level languages and compilers
    properties

33
How about ADL Yoshino 2006 ?
  • The position of ADL is close to mvm
  • To provide a common basis of implementing
    verifier for low-level languages
  • Assumed translation direction is opposite
  • mvm is an intermediate code of compilation
  • ADL is designed to simulate real machines

JVM, mvm
MachineCode
ADL
Secure L3
34
How about ADL Yoshino 2006 ?
  • ADL takes minimalist approach
  • Only 7 kinds of commands
  • Instead, expression-based design to allow complex
    formulae to be easily written
  • ADL can be used as an intermediate language?
  • Probably some modification needed
  • Register allocation is done, but except for
    variables
  • Minimalist design, however, may increase
    complexity in constructing a verification logic
  • Abstract interpretation is often not sufficient,
    so a verification logic may want to calculate
    exact values

35
More References
  • LLVM Project Lattner 2000
  • http//www.llvm.org/
  • Use VM for interprocedural optimization
  • SafeTSA Amme et al. 2001
  • SSA-based language for mobile code security
  • Dis virtual machine Winterbottom et al. 1997
  • Omniware system Adl-Tabatabai et al. 1996

36
(Typical) Compiler Construction andSeveral
Intermediate Languages
Lexing /Parsing
TypeChecking
Normalize(SSA, etc.)
LLVM,SafeTSA
Java, CIL,Jinja(VM)
Optimize
Intermediate CodeGeneration
TAL,PCC
mvm
ADL
RegisterAllocation
Target Code Generation
PrettyPrinting
Write a Comment
User Comments (0)
About PowerShow.com