Title: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages
1Secure Compiler Seminar 9/12Survey on Design
ofSecure Low-Level Languages
- Toshihiro YOSHINO, Yonezawa Lab.
- lttossy-2_at_yl.is.s.u-tokyo.ac.jpgt
2A Secure Low-Level Language is Needed
- As the secure compiler target language
- secure means it has a method to prove programs
properties - Memory safe, control flow safe,
- For this, its concrete formal model should be
given - It should also below-level
- To reduce complexity of JIT compiler (In other
words, TCB)
Secure Compiler
Verifier
JIT compilation
3Existing Researches
- Two major approaches
- TAL, PCC
- Extension to conventional assembly languages
- Utilizes certain logic (such as type theories) to
prove safety - Virtual Machines
- Introduce intermediate languages of their own
- Many of them adopt safe-by-construction design
- e.g. Java VM is semantically safe in memory
operation - Java VM, Microsoft CIL, mvm Franz et al.
2003,Jinja Klein et al. 2006, ADL Yoshino
2006,
4Comparison PCC vs. Java VM
Machine Independent
- PCC
- (Extended) Machine code Proof
- Generated code and proof are machine-dependent
- Requires one VC implementation for each
architecture - Java VM
- Verifier ensures type and control flow safety
- It often restricts optimizations
- Leads to performance degradation
- High cost to perform verification
- Stack is nothing more than a set of untyped
(variable-number) registers
Machine Dependent
5Limitations ofJava Bytecode Verification
- Initialization check is incomplete
- Correlation with other variables is not taken
into account - Example
- class Test int test(boolean b) int i
try if(b) return 1 i 2 finally
if(b) i 3 return i - Incomplete common subexpression elimination
- Cannot eliminate c.s. in address calculations
(array refs)
6Then What Should We Do?
- Maximize machine-independent part
- Avoids porting cost of the system
- Tradeoff against the size of TCB (Trusted
Computing Base) - But recent works in PCC and TAL (e.g.
Foundational PCC) aim solely to minimize TCB - Reduce proof size and generation cost
- PCC requires much effort to produce proof,
because the targets level is very low - Registers and memory are untyped, etc.
7mvm Franz et al. 2003
- Aimed to find the semantics level that
- Is effective at supporting proof-carrying code
- Can also be translated efficientlyinto highly
performing native code(on many platforms) - Separated design between VMlayer and PCC layer
1 M. Franz et al. A Portable Virtual Machine
Target for Proof-Carrying Code. IVME 03.
8mvm Virtual Machine Design
- Register-based architecture
- The number of registers is not bounded
- Registers are categorized by the type of
values Integer, Boolean, Pointer, Address - Pointer registers are used to store pointers to
heap objects (more specifically, array heads) - Address registers are for storing results of
address arithmetic - Bounds check is not performed in arithmetic, so
it has to be checked in higher layer - Heap can be used to store objects
- Heap model is explained next
9mvm Virtual Machine Design
mvm Virtual Machine
Integer
Boolean
Pointer
Address
label1 instr instr label2
Heap
10mvm Heap Model
- mvm heap consists of arrays of objects
- Object representation in mvm
- Each object is tagged
- Tag can only be written with new operation and is
immutable after creation - Two sections of data area values and pointers
- Integers, booleans are stored into the first
section - Pointers are stored into the second section
1
42
11mvm Heap Model
- A type is associated with its tag value, layout
and structure - This association is managed by compiler
- Layout describes the sizes of data sections
- Structure describes the possible substructure
inside pointer section - Example of type information
means disjunction
datatype T Int of int Pair of int
int T list
ltgt means a tuple
12mvm Heap Model
- Example of a T list object tree
intint
T
T list
13mvm Instructions
- Arithmetics, Logical calculation
- Similar to many other languages -)
- Branch
- Unconditional goto label
- Conditional brtrue bi, label / brfalse bi, label
- Condition must be taken from a boolean register
- Jump is allowed only to a label
- Conditional by object tag (RTTI) iftag
14mvm Instructions
- Object creation and access
- pj new(tag, ik)
- Creates an array of ik objects with type tag
- r load(sizev, sizep tag, pk, offset)
- store(sizev, sizep tag, pk, offset, r)
- sizes and tag are used to check memory safety
- Pointer registers and address registers
- Object access also permit address registers ak
- This distinction is for supporting garbage
collection - Address registers always contain derived
pointers
15mvm Instructions
- Accessing arrays
- an adda(sizev, sizep tag, pk, il)
- Calculates address of the il-th element in an
array of type tag stored at pk - in getlen(pk)
- Guards
- Bounds checking CHECKLEN(pk, il)
- Validity checking CHECKNOTNULL(pk)
- Type checking CHECKTAG(pi, sizev, sizep tag)
- These guards are inserted when static checking
failed
16An Example mvm Program
17Type Safety in mvm Programs
- Operations on primitives are all type-safe
- Because registers to store values are distinct
- Type-safety proofs are needed only for
non-primitive operations - Pointers, arrays and records
- For every pointer operation, check that result
pointer - Points to the beginning of an array, record or
value - Points to an object of the correct type
18Jinja Klein et al. 2006
- A Java-like programming language built on
Isabelle/HOL - Formal description of Jinja language, Jinja VM
and compiler are given - Several properties were machine-checked
- Big step evaluation and small step evaluation
(atomic operations) are equivalent - Compiler correctness
2 G. Klein, T. Nipkow. A Machine-Checked Model
for a Java-Like Language, Virtual Machine,
and Compiler. TOPLAS 28(4), 2006.
19Jinja Language
- Jinja is not Java
- Object-oriented language with exceptions
- A program is a set of class definitions and,a
class consists of several fields and methods - Method body is an expression
- Overriding is supported as in Java
- But not overloading, because it is complicated
- Language is statically typed
- Type system ensures that the execution of a
well-typed program never gets stuck
20Jinja Language Language Elements
- Values
- Boolean Bool b, Integer Intg i, Reference Addr a
- Null reference Null, Dummy value Unit
- Expressions
- Val v , binary operations e1 op e2 , Var V , V
e , e1 e2 , - Conditional if (e) e1 else e2 , while (e) e /
Block VT, e - Object construction new C
- Casting Cast C e
- Field access e.FD , e.FD e
- D is annotation added in preprocessing (e.g., by
typechecker) - Method call e.M(e, e, )
- Exception throw e , try e1 catch(C V) e2
21Jinja Language Semantics
- Big step semantics
- Typical operational semantics
- State ltHeap, Local Variablesgt
- Detail abbreviated because nothing special
- Small step semantics
- Finer-grained semantics
- One-step evaluation
- Useful for formalizing parallelism (?)
- Each (small) operation is considered atomic
- Not discussed in the paper
22Jinja Language Semantics
- Big and small semantics are proven to be
equivalent - wwf-J-prog means weak well-formedness, which is
defined by the following properties - Number of parameter types and of parameter names
are equal - this is not included in parameter list
- Free variables in the method body only refer to
this or these parameters
23Jinja VM
- Similar to Java VM
- Stack-based machine with heap
- State ltaddr option, heap, frame listgt
- First element is possibly a generated exception
- Third element is a call-stack
- Frame ltstack, registers, cname, mname, pcgt
- where stack value list, registers value list
- Evaluation of operands are done on stack
- Registers are for storing local variables
24Jinja VM Instructions
- Basic operations
- Push v / Pop
- Register operations Load n / Store n
- Arithmetics IAdd,
- Logical operations CmpEq,
- Object manipulation
- Construction New cname
- Casting Checkcast cname
- Field access Getfield vname cname / Putfield
vname cname - Method invocation Invoke mname n
25Jinja VM Instructions
- Control flow operation
- Branching Goto n / IfFalse n
- n is relative offset from the instruction
- Exit from a method Return
- Exception
- Throwing an exception Throw
- Information about exception handlers (try-catch)
are attached to method declarations - Handler is retrieved from there when needed
26Jinja VM Semantics
- Please refer to the paper for detail
- Basically, straightforward and intuitive
- In this level, there are no runtime checks
- For example, IAdd (Integer addition) does not
check whether its argument is really integers - Otherwise, the result is unspecified
- This kind of checks is performed by a bytecode
verifier
27Jinja VM Bytecode Verification
- JVM relies on the following assumptions
- Types are correct
- No overflow or underflow in stack
- Code containment
- Register initialization before use
- Just the same as Java VM
- Bytecode verifier statically ensures these
assumptions
28Jinja VM Bytecode Verification
- Abstract interpretation
- Instead of values, consider only types
State
State Type
29Jinja Compiler and its Correctness
- 2-staged compilation
- Map parameter names to register indices
- Assign local variables to registers
- Gather variable occurrences and use it to lookup
- Code generation
- expression ? instruction list (compE2)
- Straightforward definition
- Exception table generation (compEx2)
- Separated from compE2, because exception table
must contain global addresses
30Jinja Compiler and its Correctness
- Correctness of compilation
-
- If a program is weakly well-formed, then
Heap,Vars
Heap,Frame
Jinjaprogram
JVMbytecode
compilation
Heap,Vars
Heap,
31Implementation of Jinja
- http//afp.sourceforge.net/entries/Jinja.shtml
- About 20kLoC in Isabelle/HOL
- Over 1,000 theorems are defined
- It takes about 25 min. to process these proofs on
a 3GHz Pentium 4 machine with 1GB RAM
32Summary of Todays Talk
- We would need a secure low-level language for the
target of a secure compiler - Minimize machine-dependent part to reduce
implementation cost - Also reduce cost for proof generation
- To answer this, surveyed two VM projects
- mvm
- Aimed to find the sweet spot that reconciles
high performance and small type-safety proofs - Jinja
- Constructed a unified formal model of a Java-like
language, the underlying VM and compiler - In contrast to mvm, this research is oriented
toward higher-level languages and compilers
properties
33How about ADL Yoshino 2006 ?
- The position of ADL is close to mvm
- To provide a common basis of implementing
verifier for low-level languages - Assumed translation direction is opposite
- mvm is an intermediate code of compilation
- ADL is designed to simulate real machines
JVM, mvm
MachineCode
ADL
Secure L3
34How about ADL Yoshino 2006 ?
- ADL takes minimalist approach
- Only 7 kinds of commands
- Instead, expression-based design to allow complex
formulae to be easily written - ADL can be used as an intermediate language?
- Probably some modification needed
- Register allocation is done, but except for
variables - Minimalist design, however, may increase
complexity in constructing a verification logic - Abstract interpretation is often not sufficient,
so a verification logic may want to calculate
exact values
35More References
- LLVM Project Lattner 2000
- http//www.llvm.org/
- Use VM for interprocedural optimization
- SafeTSA Amme et al. 2001
- SSA-based language for mobile code security
- Dis virtual machine Winterbottom et al. 1997
- Omniware system Adl-Tabatabai et al. 1996
36(Typical) Compiler Construction andSeveral
Intermediate Languages
Lexing /Parsing
TypeChecking
Normalize(SSA, etc.)
LLVM,SafeTSA
Java, CIL,Jinja(VM)
Optimize
Intermediate CodeGeneration
TAL,PCC
mvm
ADL
RegisterAllocation
Target Code Generation
PrettyPrinting