Title: Practical Data Confinement
1Practical Data Confinement
- Andrey Ermolinskiy, Sachin Katti, Scott Shenker,
Lisa Fowler, Murphy McCauley
2Introduction
- Controlling the flow of sensitive information is
one of the central challenges in managing an
organization - Preventing exfiltration (theft) by malicious
entities - Enforcing dissemination policies
3Why is it so hard to secure sensitive data?
4Why is it so hard to secure sensitive data?
- Modern software is rife with security holes that
can be exploited for exfiltration
5Why is it so hard to secure sensitive data?
- Modern software is rife with security holes that
can be exploited for exfiltration - Users must be trusted to remember, understand,
and obey dissemination restrictions - In practice, users are careless and often
inadvertently allow data to leak - E-mail sensitive documents to the wrong parties
- Transfer data to insecure machines and portable
devices
6Our Goal
- Develop a practical data confinement solution
7Our Goal
- Develop a practical data confinement solution
- Key requirement compatibility with existing
infrastructure and patterns of use - Support current operating systems, applications,
and means of communication - Office productivity apps word processing,
spreadsheets, - Communication E-mail, IM, VoIP, FTP, DFS,
- Avoid imposing restrictions on user behavior
- Allow access to untrusted Internet sites
- Permit users to download and install untrusted
applications
8Our Assumptions and Threat Model
- Users
- Benign, do not intentionally exfiltrate data
- Make mistakes, inadvertently violate policies
- Software platform (productivity applications and
OS) - Non-malicious, does not exfiltrate data in
pristine state - Vulnerable to attacks if exposed to external
threats - Attackers
- Malicious external entities seeking to exfiltrate
sensitive data - Penetrate security barriers by exploiting
vulnerabilities in the software platform
9Central Design Decisions
- Policy enforcement responsibilities
- Cannot rely on human users
- The system must track the flow of sensitive
information, enforce restrictions when the data
is externalized
10Central Design Decisions
- Policy enforcement responsibilities
- Cannot rely on human users
- The system must track the flow of sensitive
information, enforce restrictions when the data
is externalized - Granularity of information flow tracking (IFT)
- Need fine-grained byte-level tracking and policy
enforcement to prevent accidental partial
exfiltrations
11Central Design Decisions
- Placement of functionality
- PDC inserts a thin software layer (hypervisor)
between the OS and hardware - The hypervisor implements byte-level IFT and
policy enforcement - A hypervisor-level solution
- Retains compatibility with existing OSes and
applications - Has sufficient control over hardware
12Central Design Decisions
- Placement of functionality
- PDC inserts a thin software layer (hypervisor)
between the OS and hardware - The hypervisor implements byte-level IFT and
policy enforcement - A hypervisor-level solution
- Retains compatibility with existing OSes and
applications - Has sufficient control over hardware
- Resolving tension between safety and user freedom
- Partition the application environment into two
isolated components a Safe world and a Free
world
13Partitioning the User Environment
Safe Virtual Machine
Unsafe Virtual Machine
Access to sensitive data
Unrestricted communication and execution of
untrusted code
Hypervisor
IFT, policy enforcement
Hardware (CPU, Memory, Disk, NIC, USB, Printer, )
14Partitioning the User Environment
Sensitive data
Non-sensitive data
Trusted code/data
Exposure to the threat of exfiltration
Untrusted (potentially malicious) code/data
15PDC Use Cases
- Logical air gaps for high-security environments
- VM-level isolation obviates the need for multiple
physical networks - Preventing information leakage via e-mail
- Do not disseminate the attached document
- Digital rights management
- Keeping track of copies document self-destruct
- Auto-redaction of sensitive content
16Talk Outline
- Introduction
- Requirements and Assumptions
- Use Cases
- PDC Architecture
- Prototype Implementation
- Preliminary Performance Evaluation
- Current Status and Future Work
17PDC Architecture Hypervisor
- PDC uses an augmented hypervisor to
- Ensure isolation between safe and unsafe VMs
- Tracks the propagation of sensitive data in the
safe VM - Enforces security policy at exit points
- Network I/O, removable storage, printer, etc.
18PDC Architecture Tag Tracking in the Safe VM
- PDC associates an opaque 32-bit sensitivity tag
with each byte of virtual hardware state - User CPU registers accessible
- Volatile memory
- Files on disk
19PDC Architecture Tag Tracking in the Safe VM
- These tags are viewed as opaque identifiers
- The semantics can be tailored to fit the specific
needs of administrators/users - Tags can be used to specify
- Security policies
- Levels of security clearance
- High-level data objects
- High-level data types within an object
20PDC Architecture Tag Tracking in the Safe VM
- An augmented x86 emulator performs fine-grained
instruction-level tag tracking (current
implementation is based on QEMU) - PDC tracks explicit data flows (variable
assignments, arithmetic operations)
eax
add eax, ebx
ebx
21PDC Architecture Tag Tracking in the Safe VM
- An augmented x86 emulator performs fine-grained
instruction-level tag tracking (current
implementation is based on QEMU) - PDC also tracks flows resulting from pointer
dereferencing
mov eax, (ebx)
eax
Tag merge
ebx
Memory
22Challenges
- Tag storage overhead in memory and on disk
- Naïve implementation would incur a 400 overhead
- Computational overhead of online tag tracking
- Tag explosion
- Tag tracking across pointer exacerbates the
problem - Tag erosion due to implicit flows
- Bridging the semantic gap between application
data units and low-level machine state - Impact of VM-level isolation on user experience
23Talk Outline
- Introduction
- Requirements and Assumptions
- Use Cases
- PDC Architecture
- Prototype Implementation
- Storing sensitivity tags in memory and on disk
- Fine-grained tag tracking in QEMU
- On-demand emulation
- Policy enforcement
- Performance Evaluation
- Current Status and Future Work
24PDC Implementation The Big Picture
Dom 0
Safe VM
QEMU / tag tracker
App1
App2
Safe VM (emulated)
PageTag Descriptors
NIC
Network daemon
VFS
NFS Client
NFS Server
PDC-ext3
Xen-RPC
Xen-RPC
Event channel
Shadow page tables
Safe VM page tables
Shared ring buffer
PageTag Mask
PDC-Xen (ring 0)
CR3
CPU
25Storing Tags in Volatile Memory
- PDC maintains a 64-bit PageTagSummary for each
page of machine memory - Uses a 4-level tree data structure to keep
PageNumber ?PageTagSummary
mappings
0
9
19
29
31
PageNumber
Array of 64-bit PageTagSummary structures
26Storing Tags in Volatile Memory
Page-wide tag for uniformly-tagged pages
PageTagSummary
Pointer to a PageTagDescriptor otherwise
- PageTagDescriptor stores fine-grained
(byte-level) tags within a page in one of two
formats
Linear array of tags (indexed by page offset)
PageTagDescriptor
RLE encoding
27Storing Tags on Disk
- PDC-ext3 provides persistent storage for the safe
VM - New i-node field for file-level tags
- Leaf indirect blocks store pointers to
BlockTagDescriptors - BlockTagDescriptor byte-level tags within a block
Data block
i-node
FileTag
Linear array
Leaf Ind. block
BlockTagDescriptor
Ind. block
RLE
28Back to the Big Picture
Dom0
Safe VM
QEMU / tag tracker
App1
App2
Emul. CPU Context
Safe VM (emulated)
NIC
Network daemon
VFS
NFS Client
NFS Server
PDC-ext3
Xen-RPC
Xen-RPC
Event channel
Shadow page tables
Safe VM page tables
Shared ring buffer
PageTag Mask
PDC-Xen (ring 0)
CR3
CPU
29Fine-Grained Tag Tracking
- A modified version of QEMU emulates the safe VM
and tracks movement of sensitive data
- QEMU relies on runtime binary recompilation to
achieve reasonably efficient emulation - We augment the QEMU compiler to generate a tag
tracking instruction stream from the input stream
of x86 instructions
Intermediate representation (TCG)
stage 2
Host machine code block (x86)
stage 1
Tag tracking code block
30Fine-Grained Tag Tracking
- Tag tracking instructions manipulate the tag
status of emulated CPU registers and memory
Basic instruction format
Dest. Operand
Src. Operand
Action
Clear, Set, Merge
Reg, Mem
Reg, Mem
- The tag tracking instruction stream executes
asynchronously in a separate thread
31Fine-Grained Tag Tracking
- Problem some of the instruction arguments are
not known at compile time - Example mov eax,(ebx)
- Source memory address is not known
- The main emulation thread writes the values of
these arguments to a temporary log (a circular
memory buffer) at runtime - The tag tracker fetches unknown values from this
log
32Binary Recompilation (Example)
Input x86 instructions
Intermediate representation
Tag tracking instructions
mov eax, 123
movi_i32 tmp0,123
Clear4 eax
st_i32 tmp0,env,0x0
push ebp
ld_i32 tmp0,env,0x14
Set4 mem,ebp,0
ld_i32 tmp2,env,0x10
Merge4 mem,esp,0
movi_i32 tmp14, 0xfffffffc
add_i32 tmp2,tmp2,tmp14
qemu_st_logaddr tmp0,tmp2
st_i32 tmp2,env,0x10
MachineAddr(esp)
Tag tracking argument log
33Binary Recompilation
- But things get more complex
- Switching between operating modes
(Protected/real/virtual8086, 16/32bit)
34Binary Recompilation
- But things get more complex
- Switching between operating modes
(Protected/real/virtual8086, 16/32bit) - Recovering from exceptions in the middle of a
translation block
35Binary Recompilation
- But things get more complex
- Switching between operating modes
(Protected/real/virtual8086, 16/32bit) - Recovering from exceptions in the middle of a
translation block - Multiple memory addressing modes
36Binary Recompilation
- But things get more complex
- Switching between operating modes
(Protected/real/virtual8086, 16/32bit) - Recovering from exceptions in the middle of a
translation block - Multiple memory addressing modes
- Repeating instructions
- rep movs
37Binary Recompilation
- But things get more complex
- Switching between operating modes
(Protected/real/virtual8086, 16/32bit) - Recovering from exceptions in the middle of a
translation block - Multiple memory addressing modes
- Repeating instructions
- rep movs
- Complex instructions whose semantics are
partially determined by the runtime state
saved SS
saved ESP
saved EFLAGS
saved CS
iret
saved EIP
38Back to the Big Picture
Dom0
Safe VM
QEMU / tag tracker
App1
App2
Emul. CPU Context
Safe VM (emulated)
NIC
Network daemon
VFS
NFS Client
NFS Server
PDC-ext3
Xen-RPC
Xen-RPC
Event channel
Shadow page tables
Safe VM page tables
Shared ring buffer
PageTag Mask
PDC-Xen (ring 0)
CR3
CPU
39On-Demand Emulation
- During virtualized execution, PDC-Xen uses the
paging hardware to intercept sensitive data access
- Maintains shadow page tables, in which all memory
pages containing tagged data are marked as not
present
QEMU / tag tracker
PageTag Descriptors
PageTag Mask
- Access to a tagged page from the safe VM causes a
page fault and transfer of control to the
hypervisor
Shadow page tables
Safe VM page tables
PDC-Xen (ring 0)
40On-Demand Emulation
- If the page fault is due to tagged data, PDC-Xen
suspends the guest domain and transfers control
to the emulator (QEMU) - QEMU initializes the emulated CPU context from
the native processor context (saved upon entry to
the page fault handler) and resumes the safe VM
in emulated mode
Safe VM
Dom0
QEMU / tag tracker
Access to a tagged page
Safe VM memory mappings
Emul. SafeVM CPU
Page fault handler
SafeVM VCPU
Dom0 VCPU
Safe VM Memory
Dom0 Memory
41On-Demand Emulation
- Returning from emulated execution
- QEMU terminates the main emulation loop, waits
for the tag tracker to catch up - QEMU then makes a hypercall to PDC-Xen and
provides - Up-to-date processor context for the safe VM VCPU
- Up-to-date PageTagMask
42On-Demand Emulation
- Returning from emulated execution
- QEMU terminates the main emulation loop, waits
for the tag tracker to catch up - QEMU then makes a hypercall to PDC-Xen and
provides - Up-to-date processor context for the safe VM VCPU
- Up-to-date PageTagMask
- The hypercall awakens the safe VM VCPU (blocked
in the page fault handler) - The page fault handler
- Overwrites the call stack with up-to-date values
of CS/EIP, SS/ESP, EFLAGS - Restores other processor registers
- Returns control to the safe VM
43On-Demand Emulation - Challenges
44On-Demand Emulation - Challenges
- Updating PTEs in read-only page table mappings
- Solution QEMU maintains local writable shadow
copies, synchronizes them in background via
hypercalls
45On-Demand Emulation - Challenges
- Updating PTEs in read-only page table mappings
- Solution QEMU maintains local writable shadow
copies, synchronizes them in background via
hypercalls
- Transferring control to the hypervisor during
emulated execution (hypercall and fault handlers) - Emulating hypervisor-level code is not an option
- Solution Transient switch to native execution
- Resume native execution at the instruction that
causes a jump to the hypervisor (e.g., int 0x82
for hypercalls)
46On-Demand Emulation - Challenges
- Delivery of timer interrupts (events) in emulated
mode - The hardware clock advances faster in the
emulated context (i.e., each instruction consumes
more clock cycles) - Xen needs to scale the delivery of timer events
accordingly
47On-Demand Emulation - Challenges
- Delivery of timer interrupts (events) in emulated
mode - The hardware clock advances faster in the
emulated context (i.e., each instruction consumes
more clock cycles) - Xen needs to scale the delivery of timer events
accordingly
- Use of the clock cycle counter (rdtsc
instruction) - Linux timer interrupt/event handler uses the
clock cycle counter to estimate timer jitter - After switching from emulated to native
execution, the guest kernel observes a sudden
jump forward in time
48Policy Enforcement
- The policy controller module
- Resides in dom0 and interposes between the
front-end and the back-end device driver - Fetches policies from a central policy server
- Looks up the tags associated with the data in
shared I/O request buffers and applies policies
Dom0
Safe VM
Netw. interface back-end
Netw. Interface front-end
Block storage back-end
Block storage front-end
Policy controller
49Network Communication
- PDC annotates outgoing packets with
PacketTagDescriptors, carrying the sensitivity
tags - Current implementation transfers annotated
packets via a TCP/IP tunnel
Payload
TCPHdr
IPHdr
EthHdr
Annotation
TCP/IP encapsulation
Tags
Payload
TCPHdr
IPHdr
EthHdr
TCPHdr
IPHdr
EthHdr
50Talk Outline
- Introduction
- Requirements and Assumptions
- Use Cases
- PDC Architecture
- Prototype Implementation
- Preliminary Performance Evaluation
- Application-level performance overhead
- Filesystem performance overhead
- Network bandwidth overhead
- Current Status and Future Work
51Preliminary Performance Evaluation
- Experimental setup
- Quad-core AMD Phenom 9500, 2.33GHz, 3GB of RAM
- 100Mbps Ethernet
- PDC Hypervisor based on Xen v.3.3.0
- Paravirtualized Linux kernel v.2.6.18-8
- Tag tracker based on QEMU v.0.10.0
52Application-Level Overhead
- Goal estimate the overall performance penalty
(as perceived by users) in realistic usage
scenarios - First scenario recursive text search within a
directory tree (grep) - Input dataset 1GB sample of the Enron corporate
e-mail database (http//www.cs.cmu.edu/enron) - We mark a fraction (F) of the messages as
sensitive, assigning them uniform sensitivity tag - We search the dataset for a single-word string
and measure the overall running time
53Application-Level Overhead
PDC-Xen, paravirt. Linux, tag tracking
Standard Xen, paravirt. Linux
Linux on bare metal
F ()
54Filesystem Performance Overhead
- Configurations
- C1 Linux on bare metal standard ext3
- C2 Xen, paravirt. Linux dom0 exposes a
paravirt. block device Guest domain mounts it as
ext3 - C3 Xen, paravirt. Linux dom0 exposes ext3 to
the guest domain via NFS/TCP - C4 Xen, paravirt. Linux dom0 exposes ext3 to
the guest domain via NFS/Xen-RPC - C5 Xen, paravirt. Linux dom0 exposes PDC-ext3
to the guest domain via NFS/Xen-RPC
- First experiment sequential file write
throughput - Create a file ? write 1GB of data sequentially ?
close ? sync
55Filesystem Performance Overhead
- Configurations
- C1 Linux on bare metal standard ext3
- C2 Xen, paravirt. Linux dom0 exposes a
paravirt. block device Guest domain mounts it as
ext3 - C3 Xen, paravirt. Linux dom0 exposes ext3 to
the guest domain via NFS/TCP - C4 Xen, paravirt. Linux dom0 exposes ext3 to
the guest domain via NFS/Xen-RPC - C5 Xen, paravirt. Linux dom0 exposes PDC-ext3
to the guest domain via NFS/Xen-RPC
56Filesystem Performance Overhead
- Second experiment Metadata operation overhead
- M1 Create a large directory tree (depth6,
fanout6) - M2 Remove the directory tree created by M1 (rm
rf )
57Network Bandwidth Overhead
- We used iperf to measure end-to-end bandwidth
between a pair of directly-connected hosts
- Configurations
- NC1 No packet interception
- NC2 Interception and encapsulation
- NC3 Interception, encapsulation, and annotation
with sensitivity tags - Sender assigns sensitivity tags to a random
sampling of outgoing packets - We vary two parameters Tag Prevalence (P) and
Tag Fragmentation (F)
58Network Bandwidth Overhead
59Performance Evaluation - Summary
- Application performance in the safe VM
- 10x slowdown in the worst-case scenario
- We expect to reduce this overhead significantly
through a number of optimizations - Disk and network I/O overhead
- Proportional to the amount sensitive data and the
degree of tag fragmentation - 4x overhead in the worst-case scenairo (assuming
32-bit tag identifiers)
60Summary and Future Work
- PDC seeks a practical solution to the problem of
data confinment - Defend against exfiltration by outside attackers
- Prevent accidental policy violations
- Hypervisor-based architecture provides mechanisms
for isolation, information flow tracking, and
policy enforcement
- Currently working on
- Improving stability and performance of the
prototype - Studying the issue of taint explosion in Windows
and Linux environments and its implications on
PDC