RAMSES Regeneration And iMmunity SErviceS: A Cognitive Immune System

About This Presentation

Title:

RAMSES Regeneration And iMmunity SErviceS: A Cognitive Immune System

Description:

Data structures and functions within program. Used by program components to talk to each other ... Attack: Removes all removable files in web server document ... – PowerPoint PPT presentation

Number of Views:384

Avg rating:3.0/5.0

Slides: 57

Provided by: securesy

Category:

more less

Transcript and Presenter's Notes

Title: RAMSES Regeneration And iMmunity SErviceS: A Cognitive Immune System

1
RAMSES (Regeneration And iMmunity SErviceS)A
Cognitive Immune System
Self Regenerative Systems 18 December 2007

Mark Cornwell
James Just
Nathan Li
Robert Schrag
Global InfoTek, Inc

R. Sekar Stony Brook University
2
Outline

Overview
Efficient content-based taint identification
Syntax and taint-aware policies
Memory attack detection and response
Testing
Red Team suggestions
Questions
Demo

3
RAMSES Attack Context

Attack target program mediatingaccess to
protected resources/services
Attack approach use maliciously crafted input
to exert unintended control over protected
resource operations
Resource or service uses
Well-defined APIs to access
OS resources
Command interpreters
Database servers
Transaction servers,
Internal interfaces
Data structures and functions within program
Used by program components to talk to each other

4
Example 1 SquirrelMail Command Injection
Input Interface
sendtonobody rm rf
send_to_list _GETsendto
commandgpg r nobody rm rf 2gt1
command gpg -r send_to_list 2gt1
Program
popen(command) Attack Removes all removable
files in web server document tree
popen(command)
Output Interface
5
Example 2 phpBB SQL Injection
topic-1 UNION SELECT ord(substring(user_passwo
rd,1,1)) FROM phpbb_users WHERE user_id 3
Input Interface
topic_id_GETtopic
sql SELECT p.post_id FROM POSTS_TABLE WHERE
p.topic_id topic_id
sql SELECT p.post_id FROM POSTS_TABLE WHERE
p.topic_id -1 UNION SELECT ord(substring(user_p
assword,1,1)) FROM phpbb_users WHERE user_id 3
Program
sql_query(sql) Attack Steal another users
password
sql_query(sql)
Output Interface
6
Attack Space of Interest (CVE 2006)
Generalized Injection Attacks
7
Detection Approach

Attack use maliciously crafted input to exert
unintended control over output operations
Detect exertion of control
Based on taint degree towhich output depends
on input
Detect if control is intended
Requires policies (or training)
Application-independent policies are preferable

8
RAMSES Goals and Approach

Taint analysis develop efficient and
non-invasive alternatives
Analyze observed inputs and outputs
Needs no modifications to program
Language-neutral
Leverage learning to speed up analysis
Attack detection develop framework to detect a
wide range of attacks, while minimizing policy
development effort and FP/FNs
Structure-aware policies leverage
interplaybetween taint and structural changes to
output requests
Use Address-Space Randomization (ASR) for memory
corruption
ASR efficient, in-band, positive tainting for
pointer-valued data
Immunization filter out future attack instances
Output filters drop output requests that violate
taint-based policies
Input filters Project policies on outputs to
those on inputs
Relies on learning relationships between input
and output fields
Network-deployable

9
Efficient Content-Based Taint Identification
10
Steps

Develop efficient algorithms for inferring flow
of input data into outputs
Compare input and output values
Allow for parts of input to flow into parts of
output
Tolerate some changes to input
Changes such as space removal, quoting, escaping,
case-folding are common in string-based
interfaces
Based on approximate substring matching
Leverage learning to speed up taint inference
Even the efficient content-matching algorithms
are too expensive to run on every input/output
Same learning techniques can be used for
detecting attacks using anomaly detection

11
Weighted Substring Edit Distance Algorithm

Maintain a matrix Dij of minimum edit
distance between p1..i and s1..j
Dij minDi-1j-1 SubstCost(pi,sj),
Di-1j
DeleteCost(pi), Dij-1
InsertCost(sj)
D0j 0 (No cost for omitting any prefix of
s)
Di0 DeleteCost(p1)DeleteCost(pi)
Matches can be reconstructed from the D matrix
Quadratic time and space complexity
Uses O(ps) memory and time

12
Improving performance

Quadratic complexity algorithms can be too
expensive for large s, e.g., HTML outputs
Storage requirements are even more problematic
Solution Use linear-time coarse filtering
algorithm
Approximate D by FD, defined on substrings of s
of length p
Let P (and S) denote a multiset of characters in
p (resp., s)
FD(p, s) min(P-S, S-P)
Slide a window of size p over s, compute FD
incrementally
Prove D(p, r) lt t ? FD(p, r) lt t for all
substrings r of s
Result O(p2) space and time complexity in
practice
Implementation results
Typically 30x improvement in speed
200x to 1000x reduction in space
Preliminary performance measurements 40MB/sec

13
Efficient online operation

Weighted edit-distance algorithms are still too
expensive if applied to every input/output
Need to run for every input parameter and output
Key idea
Use learning to construct a classifier for
outputs
Each class consists of similarly tainted outputs
taint identified quickly, once the class is
known
Classifying strings is difficult
Our technique operates on parse trees of output
For ease of development, generality, and
tolerance to syntax errors, we use a rough
parser
Classifier is a decision tree that inspects parse
tree nodes in an order that leads to good
decisions

14
Decision Tree Construction

Examines the nodes of syntax tree in some order
The order of examination is a function of the set
of syntax trees
Chooses nodes that are present in all candidate
syntax trees
Avoids tests on tainted data, as they can vary
Avoids tests that dont provide significant
degree of discrimination
similar-valued fields will be collected
together and generalized, instead of storing
individual values
Incorporates a notion of suitability for each
field or subtree in the syntax tree
Takes into account approximations made in parsing

15
Example of a Decision Tree

1. SELECT FROM phpbb_config
2. SELECT u.,s. FROM phpbb_sessions
s,phpbb_users u WHERE s.session_id'a3523d78160ef
dafe63d8db1ce5cb0ba' AND u.user_ids.session_user
_id
3. SELECT FROM phpbb_themes WHERE themes_id1
4. SELECT c.cat_id,c.cat_title,c.cat_order FROM
phpbb_categories c,phpbb_forums f WHERE
f.cat_idc.cat_id GROUP BY
c.cat_id,c.cat_title,c.cat_order ORDER BY
c.cat_order
5. SELECT FROM phpbb_forums ORDER BY
cat_id,forum_order
switch (1)
case ROOT switch (1.1)
case CMD switch (1.1.2)
case c FINAL _at_1.1.1SELECT
_at_1.1.3. cat_id,c.cat_title,c.cat_order
FROM phpbb_categories
c,phpbb_forums f WHERE f.cat_idc.cat_id GROUP
BY
c.cat_id,c.cat_title,c.cat_order ORDER BY
c.cat_order
case u FINAL _at_1.1.1SELECT
_at_1.1.3. ,s. FROM phpbb_sessions
s,phpbb_users u WHERE
s.session_id'a3523d78160efdafe63d8db1ce5cb0ba'
AND
u.user_ids.session_user_id
case FINAL _at_1.1.1SELECT
_at_1.1.3FROM phpbb_??????

16
Implementation Status and Next Steps

Rough parsers implemented for
HTML/XML
Shell-like languages (including Perl/PHP)
SQL
Preliminary performance measurements
Construction of decision trees 3MB/sec
Classification only 15MB/sec
Significant improvements expected with some
performance tuning
Next steps
Develop better clustering/classification
algorithms based on tree edit-distance
Current algorithm is based entirely on a top-down
traversal, and fails to exploit similarities
among subtrees

17
Syntax and taint-aware policies
18
Overview of Policies

Leverage structuretaint to simplify/generalize
policy
Policy structure mirrors that of parse trees
And-Or trees with cycles
Can specify constraints on values (using regular
expressions) and taint associated with a parse
tree node
Most attacks detected using one basic policy
Controlling commands vs command parameters
Controlling pointers vs data

19
Controlling commands Vs parameters

Observation parameters dont alter syntactic
structure of victims requests
Policy Structure of parse tree for victims
request should not be controlled by untrusted
input (tainted data)
Alternate formulation tainted data shouldnt
span multiple fields or tokens in victims
request

20
Policy prohibiting structure changes

Define structure change without using a
reference
Avoids need for training and associated FP issues
Policy 1
Tainted data cannot span multiple nodes
for binary data, it should not span multiple
fields
Policy 2
Tainted data cannot straddle multiple subtrees
Tainted data spans two adjacent subtrees, and at
least one of them is not fully tainted
Tainted data overflowed beyond the end of one
subtree and resulted in a second subtree
Both policies can be further refined to constrain
the node types and children subtrees of the nodes

21
Commands Vs parameters Example 2

Memory corruption attack overflowing stack buffer
For binary data, we talk about message fields
rather than parse trees
..
Violation tainted data spans multiple stack
fields
Heap overflows involve tainted data spanning
across multiple heap blocks

22
Attacks Detected by No structure change Policy

Various forms of script or command injection
SQL injection
XPath injection
Format string attacks
HTTP response splitting
Log injection
Stack overflow and heap overflow

23
Application-specific policies

Not all attacks have the flavor of command
injection
Develop application-specific policies to detect
such attacks
Policy 3 Cross-site scripting no tainted
scripts in HTML data
Policy 4 Path traversal tainted file names
cannot access data outside of a certain document
tree
Other examples
Policy 5 No tainted CMD_NAME or CMD_SEPARATOR
nodes in shell or SQL commands

24
Implementation status

Four test applications
phpBB
SquirrelMail
PHP/XMLRPC
WebGoat (J2EE)
Detects following attacks without FPs
Command injection (Policies 1, 2, 5)
SQL injection (1, 2, 5)
XSS (3)
HTTP Response splitting (2)
Path traversal (4)
Memory corruption detected using ASR
Should be able to detect many other attacks
easily
XPATH injection (1,2), Format-string (1, 2), Log
injection (1,2)

25
Memory Attack Discussion
26
Memory Error Based Remote Attack

Attackers goal
Overwrite target of interest to take over
instruction execution
Attackers approach
Propagate attacker controlled input to target of
interest
Violate certain structural constraints in the
propagation process

27
Stack Frame Structural Violation
As stack frame
Function arguments
High
Return address
Previous stack frame
Exception Registration Record
Local variables
Bs stack frame
Function arguments
Return address( to A)
Previous stack frame
Local variables
Cs stack frame
Function arguments
Low
Return address (to B)
EBP
Previous stack frame
FS0
Exception Registration Record
Local variables
ESP
28
Heap Block Structural Violation
Size Previous Size

Segment Index
Flags
Unused
Tag Index
FLink
BLink
Windows Free Heap Block Header Structure

Happens when removing free block from
double-linked list
Ability to write 4 bytes into any address,
usually well known address, like function
pointer, return address, SEH etc.

29
ASLR and Crash Analysis

ASLR randomizes the addresses of targets of
interest
Memory attack using the original address will
miss and cause crash (exception).
Crash analysis tracks back to vulnerability,
which enables accurate signature generation
Structural information usually retrievable at
runtime, thanks to enhanced debugging technology
Crash analysis aided with JIT(Just In-time
Tracing)
JIT triggered at certain events
Suspicious network inputs, e.g. sensitive JMP
address
Attach/detach JIT monitor at event of interest
Memory dump can be dumped in the right
granularity, log info from a few KB to a 2GB

30
Crash Root Cause Analysis
Root Cause Analysis
Exception Record/Context, Faulting
thread/Instructions/Registers Stack
trace/Heap/Module/Symbols
Stack Corruption
Heap Corruption
Read Access Violation Bad EIP (Corrupted
Return Address or SEH)
Read Access Violation Bad Deference (Corrupted
Local Variables/passing parameters)
Write Access Violation (Address to write, Value
to write )
31
Stack-based Overflow Analysis

Target driven analysis
The goal of attack string is to overwrite target
of interest on stack, e.g., return address, SEH
handler.
Start matching target values from crash dump to
input, like EIP, EBP and SEH handler
More efficient than pattern match in the whole
address space
If any targets are matched in input, expand in
both directions to find LCS
A match usually indicates the input size needed
to overflow certain targets

32
SEH Overflow and Analysis

A unique approach for Windows exploit
SEH stands for Structured Exception Handler
Windows put EXCEPTION_REGISTRATION_RECORD chain
on stack with SEH in the record.
More reliable and powerful than overwrite return
address
More JMP address to use (pop/pop/ret)
An exception (accidental/intentional) is desired
Can bypass /GS buffer check
SEH crash analysis
Catch the first exception as well as the second
one (caused by ASR)
Locate the SEH chain head from first dump,
usually overwritten by input
Usually first exception is enough, second
exception can be used for confirmation

33
Heap Overflow Analysis

How to analyze heap overflow attack?
Exploit happens in free blocks unlink
Multiple ways to trigger
Write Access Violation with ASR
with overwriting in invalid address
Overwrite 4 bytes value in arbitrary address
Interested targets include return address, SEH,
PEB and UEF
Exploit contains the pair (Address To Write,
Value to Write)
Appeared in the overflowed heap blocks
Usually contained in registers
Should be provided from input by attacker
Match found in synthetic heap exploits
The value pairs need to be in fixed offset
For a given heap overflow vulnerability
To enable overwrite the right address with the
right value desired

34
Case Studies
35
Case Study RPC DCOM

Step 1 Exception Analysis
FAULTING_IP
18759f
ExceptionCode c0000005 (Access violation)
Attempt to read from address 0018759f
PROCESS_NAME svchost.exe
FAULTING_THREAD 00000290
PRIMARY_PROBLEM_CLASS STACK_CORRUPTION
Step 2 Target Input correlation
StackBase 0x6c0000, StackLimit 0x6bc000,Size
0x4000
Begin analyze on Target Overwrite and Input
Correlation
Analyze crash EIP
Find EIP pattern at socket input
Bytes size to overwrite EIP 128
Analyze crash EIP done!
Analyze SEH
Find SEH byte at socket input
Bytes size to overwrite SEH handler 1588
Analyze SEH done!

36
Signature Generation

Signature generation
Signature captures the vulnerability
characteristics
Minimum size to overwrite certain target(s)
Use contexts to reduce false positive
Using incoming input calling stack
Stack offset can uniquely identify the context
Using incoming input semantic context
Message format like HTTP url/parameter
Binary message field

37
Components Implementation

RAMSES
Crash Monitor
Catch interested
exception only
Snapshots for a
given period
Self healer

Protected Application
1
Infrastructure Save Crash
Dump Extract Relevant Info Search/Match Disassembl
e
Crash(Exception)
Uses
Windows Debug Engine
Generate
2
Crash Dump
5
Analyze
4
Signature

RAMSES
Crash Analyzer
Fault type detection
Security oriented
analysis
Feedback

Provide Input History
3
Uses
Crash Dump provides the same interface as LIVE
process, so Crash Analyzer actually does NOT
have to work on saved crash dump file.
38
Testing
39
Test Attacks Applications

Baseline Applications
phpBB (php)
squirrelMail (php)
WebGoat (java)
hMailServer (C)

Many sub languges SQL, XML, JavaScript, HTML,
HTTP, JSON, shell, cmd, path
40
Possible Testbed Configurations
41
Traffic Generation

Purpose
Coverage of legitmate structural variation in
monitored structures
SQL, command strings, call parameters
Stress of log complexity for practicality
Multiple users, multiple sessions
Performance measurements
Program performance metrics
Quantify performance impact

42
Traffic Generation to Web Sites

Approaches
Simple Record/Playback (basic)
with minor substitutions (cookies, ips)
shell scripts, netcat, MaxQ (jython based
Custom DOM/Ajax scripting (learning)
Can access dynamically generated browser content
after(during) client side script eval
Automated site crawls of URLS
Automated form contents (site specific metadata)
COTS tools
Load testing and metrics

43
(No Transcript)
44
Red Team Suggestions
45
Suggested Red Team ROEs

Initial telecons held in Fall
Claim RAMSES will defeat most generalized
injection attacks on protected applications
Red Team should target our current and planned
applications rather than new ones (unless new
application, sample attacks and complete traffic
generator can be provided to RAMSES far enough in
advance for learning and testing)
Remote network access to the targeted application
Attack designated application suite
Required instrumentation yet to be determined
Red Team exercise start 15 April or later

46
RAMSES Project Schedule
Baseline Tasks 1. Refine RAMSES
Requirements 2. Design RAMSES 3. Develop
Components 4. Integrate System 5. Analyze Test
RAMSES 6. Coordinate Rept Prototypes Optional
Tasks O.3 Cross-Area Exper
CY06
CY09
CY07
CY08
Q4
Q1
Q2
Q3
Q4
Q1
Q2
Q3
Q4
Q1
Q3
Q2
Q3
1
2
3
Red Team Exercise
Today 11 September 2007
47
Next Steps
48
Plans

Develop input filters from output policies
Extend memory error analyzer
Demonstrate RAMSES on more applications and
attack types
Native C/C app (most likely app is hMail
server)
Java
Integrate components
Performance and false positive testing
Red Team exercise

49
Questions?
50
Backup
51
Tokenizing and Parsing

Focus on rough parsing that reveals approximate
structure, but not necessarily all the details
Accurate parsers are time-consuming to write
More important may not gracefully handle errors
(common in HTML) or language extensions and
variations (different shells, different flavors
of SQL)
Implemented using Flex/Bison
Currently done for SQL and shell command
languages
Parse into a sequence of statements, each
statement consisting of a command name and
parameters
Incorporates a notion of confidence to deal with
complex language features, e.g., variable
substitutions in shell
Modest effort for adding additional languages,
but substantially simplifies subsequent learning
tasks
Dont anticipate significant additions to this
language list (other than HTML/XML)

52
Taint inference Vs Taint-tracking

Disadvantages of learning
False negatives if inputs transformed before use
Low likelihood for most web apps
False positives due to coincidence
Mitigated using statistical information
Plan to evaluate these experimentally
Benefits of learning
Low performance overhead
Some significant implicit flows handled without
incurring high false positives
Can address attacks multi-step attacks where
tainted data is first stored in a file/database
before use
More generally, in dealing with information flow
that crosses module boundaries

53
Attack Coverage 2004
(Stack-smashing, heap overflow, integer overflow,
data attacks)
Generalized Injection Attacks
CVE Vulnerabilities (Ver. 20040901)
54
RAMSES System Concept
Protected System
Web Server (IIS/Apache)
Web App (PHP/ ASP)
SQL Database (MySQL)
Network/App Firewall (e.g. mod_security)

OS DLLs
Application DLLs
Network DLLs

Key research problems
Learn taint propagation
Identify tainted components in output, generate
filtering criteria
Learn input/output transformation
Use transformation to project output filters to
input

55
Advantages of RAMSES Filters

Filters easily sharable
Complements Application Community focus on end
user applications
Filters are human readable
Filter generation algorithms can be enhanced to
address privacy concerns wrt sharing

56
Filter types

Filter Criteria
Correlative filters
Equality-based filter
Structure-based filter
Statistical filter
Causal filters
Filtering criteria derived from attack detection
criteria (policy or anomaly)

Filter Location
Input filter
Easier to deploy but harder to synthesize
Output filter (precedes sensitive operation)
Easier to synthesize than input filter, but
deployment needs deeper instrumentation
May be too late for some attacks (memory
corruption)

Note All filters evaluated using large number of
benign samples and ?1 attack sample

Write a Comment

User Comments (0)