Intelligent Detection of Malicious Script Code - PowerPoint PPT Presentation

About This Presentation
Title:

Intelligent Detection of Malicious Script Code

Description:

Grab only necessary information from webcrawling results ... DISPIDs had a large range, from lows of less than -2 billion, to highs of over 3 ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 66
Provided by: kam9
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Intelligent Detection of Malicious Script Code


1
Intelligent Detection of Malicious Script Code
  • CS194, 2007-08
  • Benson Luk
  • Eyal Reuveni
  • Kamron Farrokh
  • Advisor Adnan Darwiche
  • Sponsored by Symantec

2
Outline for Project
  • Phase I Setup
  • Set up machine for testing environment
  • Ensure that whitelist is clean
  • Phase II Crawling
  • Modify crawler to output only necessary data.
    This means
  • Grab only necessary information from webcrawling
    results
  • Listen into Internet Explorers Javascript
    interpreter and output relevant behavior
  • Phase III Database
  • Research and develop an effective structure for
    storing data and link it to webcrawler
  • Phase IV Analysis
  • Research trends for normalcy and investigate
    possible heuristics

3
Approach to Project
  • First Quarter Infrastructure
  • Second Quarter Data Gathering
  • Third Quarter Data Analysis
  • (Note some overlap between quarters)

4
Infrastructure
  • Internet Explorer 7, Windows XP SP2 Professional
  • Main testing environment
  • Norton Antivirus
  • Protects against malicious files and scripts
  • Can access logs to determine which sites
    launched attacks
  • Integrated into automated site visiting

5
Infrastructure
  • CanaryCallback.dll
  • Plugin into Internet Explorer
  • Able to access most data received by low-level
    Javascript interpreter
  • The function being called (DISPID)
  • The class that the function belongs to (GUID)
  • The list of types and values of parameters passed
    into the function. Examples
  • VT_I4 4-byte integer
  • VT_BSTR Byte string
  • VT_DISPATCH Object
  • Large part of first and second quarter was spent
    programming, debugging, and maintaining the
    functions that would handle the data
  • Functions to grab data type
  • Functions to parse data values (some stored in
    bitstreams)
  • Functions to output data to file
  • If types did not have an obvious output format
    (i.e. VT_DISPATCH), we had to create one that
    would accurately represent as many components of
    the data as possible

6
Infrastructure
  • Python
  • Scripting language
  • Designed to handle parsing with ease
  • Script for infrastructure was used to perform
    three tasks
  • Launch Internet Explorer (uses the cPAMIE
    engine), load website, then close Internet
    Explorer
  • Access and parse Nortons web attack logs for any
    attacks launched by website
  • Sort script data from CanaryCallback DLL based on
    DLL data and attack logs (Was there an attack?
    Did any scripts run? Etc.)
  • Heretrix
  • Open-source webcrawler with high customizability
  • Can run specific crawls that target a set of
    domains, and output minimal information
  • Uses HTTP requests does not render crawled
    sites
  • The purpose is to gather as many URLs with
    scripts as possible for a large sample base

7
Infrastructure Crawler
Step 0 URL queue is seeded with domain list
Step 3 Append URLs to log data and URL queue iff
they satisfy our set of rules
Heretrix raw data
URL queue
Step 4 Get rid of excess data, leaving only URL
information for each site, and output to new file
Step 1 Grab URL from queue
Crawler
Python parser
Step 2 Grab source from URL
Heretrix parsed data
WWW
Repeat steps 1-4 until crawl limit is reached.
8
Infrastructure Gatherer
Norton Antivirus Logs
Step 5 Python analyzes callback data and logs to
decide whether a site is clean, dirty, or has no
scripts
Norton Antivirus CanaryCallback data
Python controller
Step 4 IE7 informs PAMIE that it is finished
Python kills IE7
Step 1 Python script grabs site from crawl data
Step 3 IE7 Javascript interpreter outputs to
file containing all DLL data
Step 2 cPAMIE component loads IE and sends it to
specified site
Heretrix parsed data
Internet Explorer 7
Step 6 Python outputs sorted and formatted data
to relevant files for future analysis
Formatted output
Repeat steps 1-6 until URL list is exhausted.
9
Data gathering
  • Heretrix crawls
  • First crawl 5 seeds, depth 5
  • 5 million sites found
  • Second crawl 10 seeds, depth 3
  • 3 million sites found
  • Third crawl 200 seeds, depth 1
  • 18,500 sites found
  • Fourth crawl 200 seeds, depth 2
  • 3 million sites found
  • First two crawls produced data that was biased
    towards large, interlinked sites the last two
    broad crawls were run to remedy this.
  • CanaryCallback gathering
  • For first and second crawls, a chosen set of
    1,000 or so sites were run through by gatherer
    component.
  • For third crawl, all sites (18,500) were
    processed by gatherer
  • For fourth crawl, several tasks were performed
  • 20,000 sites were processed by gatherer
  • In mid-May, the same 1000 sites were processed 28
    times (about 4 times per day) from May 7 to May 13

10
Data analysis setup
  • CanaryCallback data analysis
  • Main choice for parsing data was Python
    scripting language
  • Too much data for MS Access or even MySQL
  • Python scripts were developed to facilitate
    analysis in manner similar to SQL
  • Scripts to aggregate data sets and frequencies
  • Scripts to calculate various metrics of data
    sets, such as
  • Smallest data point
  • Largest data point
  • Average data point
  • Variance of data point
  • Total data points
  • Sum of data points
  • Scripts to output to file in Excel spreadsheet
    (CSV) for deeper analysis

11
Individual data analysis
  • Third quarter and last half of second quarter
    were spent focusing on as wide a range of data as
    possible
  • To accomplish this, our group split up and
    pursued a different line of research individually
  • Individual presentations will follow
  • Eyal Activity categorization
  • Benson Integer argument trend analysis
  • Kamron Byte string argument trend analysis

12
Activity Categorization
13
Activity Analysis
  • There is an obvious connection between a
    function and the site using it
  • Is it possible to quantify this relationship,
    and establish whether certain functions are used
    in a specific kind of site?
  • Characterize a site based on how active it is
    i.e, how many function calls are made while the
    site is loaded
  • Does there exist a pattern in the data that will
    be able to distinguish an abnormal usage of any
    function based on the characteristic of the site?

14
Site Function Usage Statistics
  • Total number of sites 14848
  • Average function calls per site 5777
  • Average function calls per function 1984
  • Standard deviation of function calls per
    function 25493
  • Standard deviation of function calls per site
    14181
  • Minus outliers none
  • Three Standard Deviations below 0
  • Two Standard Deviations below 0
  • One Standard Deviation below 12086
  • One Standard Deviation above 1633
  • Two Standard Deviations above 510
  • Three Standard Deviations above 296
  • Normal distribution outliers 323
  • Median 1456
  • First quartile 438
  • Third quartile 4029
  • Interquartile range 3591
  • Minus outliers none
  • Lower whisker starts at 0
  • Upper whisker ends at 9365
  • Box and whisker outliers 2048

15
Correlation analysis
  • Related each function to the site calling it
    using the number of function calls on that site
  • Each tuple consisted of the number of times a
    function was called at a particular site, and the
    number of total function calls that were made at
    that site
  • The correlation between the variables in the
    tuple was made for each individual function
  • Many functions were not common, and so not
    enough data was available to make a conclusion
    about them
  • For the functions that had enough (over 100)
    sites that called them, the correlation values
    were between .004 and -.01, showing no
    correlation between the function and the script
    activity of the site calling it

16
Function Usage Amount
  • An interesting trend arose when analyzing the
    correlation data
  • There are functions that are called
    hundreds/thousands of times
  • Despite this, sites seem to call a specific
    function only a couple times.
  • Example
  • GUID 3050f3fd-98b5-11cf-bb82-00aa00bdec0b,
    DISPID 1
  • Called 346 times, only in 11 sites is it called
    more than 3 times (3.2)

17
(No Transcript)
18
Categorization Approach
  • Since no correlation was found, another approach
    was taken
  • According to trends in the script activity data,
    divide the sites into distinct categories
  • Examine the function behavior in each category,
    as opposed to individual sites
  • Three categories were chosen, roughly along the
    median and the end of the third quartile
  • This gave one category 50 of the data, while
    the other two had 25 of the data
  • An attempt to avoid bias toward the extremely
    script-heavy sites

19
Categorization Heuristic
  • A heuristic was developed to determine whether a
    function would be more likely to appear in a
    certain category
  • F ((avgl - avgsite)(L - avgfunc)(avgm -
    avgsite)(M - avgfunc)(avgh - avgsite)(H -
    avgfunc)) / 3
  • avgl, avgm, and avgh are the average number of
    function calls per category (542, 2882, and 22745
    respectively)
  • avgsite is the overall average number of
    function calls per site (5777)
  • avgfunc is the avg number of function calls per
    function (1984).
  • L, M, and H are the specific number of times the
    function was called in the low, medium, and high
    category

20
Statistical Variation Among Categories
  • The heuristic separated out the functions into
    three distinct sections
  • Along the higher values were mostly functions
    that had few arguments supplied
  • In the middle, there were whole objects
    represented (a GUID, and all of its related
    function calls)
  • At the lowest negative values were functions
    that were commonly called with arguments

21
Argument Distributions
  • A further analysis was done on whether there
    exists a difference in the behavior of a function
    in the separate categories
  • The distributions of BSTR (Byte String) lengths
    and I4 (4-byte Integer) values were considered
  • Several functions were examined, but this
    specific one (referred to as Second, as it had
    the second highest heuristic value) is exemplary
    of the trends noticed
  • The argument type frequency of Second

LOW 0 arguments 20713 I4 arguments 0 BSTR
arguments 2634 DISPATCH arguments 14 NULL
arguments 0 BOOL arguments 0
MID 0 arguments 170861 I4 arguments 0 BSTR
arguments 9888 DISPATCH arguments 1 NULL
arguments 0 BOOL arguments 0
HIGH 0 arguments 1215964 I4 arguments 0 BSTR
arguments 9447 DISPATCH arguments 19 NULL
arguments 0 BOOL arguments 0
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
Conclusions of Approach
  • The trend seen is that there is no major
    statistical difference in the argument value
    distribution among the categories, but there are
    distinct characteristic differences seen
  • Functions that appear more commonly in
    less-active sites tend to have arguments supplied
    to them
  • No general correlation exists between functions
    and how active the site calling it is
  • There may exist correlation in some other
    characteristic, however

26
Integer analysis
27
Functions through Three Sets
  • Looked through 3 of the runs
  • 5 seeds, depth 5 1,324 sites
  • 10 seeds, depth 3 1,184 sites
  • 200 seeds, depth 1 15,790 sites
  • Picked three most common functions with integer
    arguments of the first run to analyze
  • Goal Look for consistency throughout function
    behavior in differing sets of sites

28
Functions through Three Sets
  • In all three data sets, the values of the
    argument had a very large range, from 0 to the
    millions or billions
  • Distributions did not stay consistent through
    sets, all had differing commonly occurring values

29
Functions through Three Sets
  • Similar pattern in all 3 sets
  • Low values were used
  • Numbers near 0 most common, occurrences drop off
    as values get larger

30
Functions through Three Sets
  • Values range from 0 to in the hundreds
  • Second data set did not have enough data
  • Similar common numbers in both sets 3, 300, and
    728

31
Patterns in DISPID Usage
  • Looked at what DISPIDs were used, without regard
    to the GUIDS of the calling classes
  • DISPIDs had a large range, from lows of less
    than -2 billion, to highs of over 3 million
  • Out of 743,270 functions analyzed, The vast
    majority had DISPIDs within 4 distinct ranges
  • 205 of the function did not fall within these
    groups, and instead were one of 6 other numbers
  • Within each of the four ranges, occurrences at
    specific numbers formed patterns

32
DISPID Usage First Range
  • The most common range for DISPIDs
    3,000,000-3,001,286
  • 490,201 functions, about 66
  • 1,067 out of 1,286 different numbers used
  • Numbers nearer to 3 million are most common,
    higher numbers were used less

33
DISPID Usage Second Range
  • Second common range for DISPIDs 0-2,313
  • 164,224 functions, about 22
  • 39 numbers in this range were used
  • 0 and 1,103 were the most common
  • Numbers clumped around 5 groups 0-9, 127-154,
    1002-1168, 1500-1504, and 2001-2015, with 2313
    being an exception

34
DISPID Usage Third Range
  • Third range for DISPIDs -2,147,417,109 to
    -2,147,411,105
  • 50,541 functions, about 7
  • 55 numbers in this range used
  • Most occurrences were around numbers ending in
    round thousands

35
DISPID Usage Fourth Range
  • Fourth range for DISPIDs 10,001-10,087
  • 38,099 functions, about 5
  • 75 numbers out of the range were used
  • Uniquely used by 3050f55d-98b5-11cf-bb82-00aa00bd
    ce0b
  • DISPIDs 10,001-10,007 are most common

36
Patterns in DISPID Usage
  • Looked at what DISPIDs were used, without regard
    to the GUIDS of the calling classes
  • DISPIDs had a large range, from lows of less
    than -2 billion, to highs of over 3 million
  • Out of 743,270 functions analyzed, The vast
    majority had DISPIDs within 4 distinct ranges
  • Within each of the four ranges, occurrences at
    specific numbers formed patterns

37
Function with Multiple Integers
  • Looked for patterns in the relations among the
    integer arguments of functions taking multiple
    arguments
  • Not very many functions in this category
  • One took two arguments, first was always 0
  • One took two arguments, always the same.
    Arguments were all from (1,1) to (31,31) and
    (1908,1908) to (1908)
  • All came from 2 signup sites on a particular
    website
  • Two took two differing arguments, could not find
    relation between arguments
  • Other functions did not have a large enough
    sample size

38
Functions with Multiple Integers
  • Function itself had consistent patterns in the
    values it took 95 of arguments were (1,1) or
    (3,2)
  • No consistent relations between arguments

39
Function Pairs
  • Examined
  • GUID 3050f55d-98b5-11cf-bb82-00aa00bdce0b
  • DISPIDs 10001-10062
  • Out of 38,099 occurrences, 3,595 were followed
    by
  • GUID c59c6b12-f6c1-11cf-8835-00a0c911e8b2
  • DISPID 0
  • Second function had no independent occurrences
  • Similar arguments
  • First function took a variety of numbers and
    types of arguments
  • Second function always took a DISPATCH argument,
    followed by the same arguments as the first
    function

40
Conclusions of Approach
  • Functions arguments through sets
  • Seems to be consistent patterns in certain
    functions
  • Range, values taken, values common, value
    distribution
  • DISPID usage
  • 4 ranges with very few exceptions
  • Common subranges or distribution patterns within
    each range
  • Multiple arguments
  • Uncommon type of function
  • No noticeable relations in arguments
  • Function pairs
  • Dependent functions have clear patterns
  • Function position
  • Argument types and values
  • Only one example do more exist?

41
Byte string analysis
42
Byte String Analysis
  • Buffer overflows are a common method of
    exploiting a targeted system
  • One method create a very long string to break
    boundary checking, then append shellcode at the
    end to inject into the assembly code
  • We are interested in the length of BSTR objects
    feeded into given functions
  • For any given API, what is considered a normal
    string length?

43
Class-based analysis
  • Initial analyses were done on a class-by-class
    basis
  • Samples were grouped together and analyzed
    according to GUID
  • Byte strings are typically very small
  • More than 70 of the commonly called Javascript
    classes typically received byte strings of less
    than length 20. (39 out of 55 functions from this
    crawl)
  • Less than 10 of these ever receive a string
    greater than 5000 characters in length (4 out of
    55 functions from this crawl).

44
Class-based analysis
  • Analysis of individual classes shows same trend
    toward smaller strings
  • However, analyzing based on classes groups byte
    strings of all class functions together, which
    results in inaccuracy and lost information

45
Parameter-based analysis
  • Second analysis split samples into individual
    arguments of unique functions of each class
  • Given a sample set with values in the interval
    (a, b) with average µ and standard deviation s,
    we expect values to largely lie within the
    interval (µ s, µ s)
  • We also expect (µ s, µ s) to be smaller than
    (a, b)
  • The smaller (µ s, µ s) is in proportion to
    (a, b), the more well-defined our sample set
    becomes

46
Parameter-based analysis
  • Length of expected interval 2s
  • Length of entire interval n b a 1
  • 2s/n represents the ratio of the expected
    interval to the entire interval
  • Since 2s lt n, 0 lt 2s/n lt 1
  • When 2s/n 0, s 0 and all values in data set
    are equal
  • When 2s/n 1, s n/2 and all values in data
    equal either a or b
  • As 2s/n goes from 0 to 1, shape of graph begins
    to shift

47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
  • When ratio is 0, amount of strings is typically
    low
  • Otherwise, ratio increases as amount of strings
    decreases
  • The function arguments with the smallest
    non-zero ratio are the most well-defined

55
(No Transcript)
56
Analysis with pruning
  • Only function arguments that see 9 or fewer
    strings are removed however
  • Most zero-ratio functions are pruned (2607 to
    731)
  • Many functions with ratio gt 0.5 are pruned (1540
    to 883)
  • Functions with ratio lt 0.5 are affected
    minimally (1442 to 1332)

57
Analysis with pruning
  • Only function arguments that see 99 or fewer
    strings are removed however
  • Almost all zero-ratio functions are pruned (731
    to 232)
  • Almost all functions with ratio gt 0.5 are pruned
    (883 to 266)
  • Only some functions with ratio lt 0.5 are
    affected (1332 to 979)

58
Analysis with pruning
As a function is seen in the wild more
frequently, the byte string lengths it takes in
begin to fall into specific intervals. Functions
with substantial evidence are well-defined in the
lengths of byte strings they tend to receive!
59
Comparing w/malicious data
  • Symantec provided us with test samples used for
    Canary testing
  • These samples trigger browser exploit but do not
    inject actual shellcode
  • The worst thing they can do is crash the browser
  • Malicious samples fell into one of three
    categories
  • Bad BSTR
  • Bad I4
  • Bad DISPATCH (object)
  • Example MSIE Popup Window Address Bar Spoofing
    Weakness
  • Callback data
  • Compare with data from May crawl
  • 491 strings seen over the 20,416 websites visited
    during that crawl
  • Smallest 70
  • Largest 80
  • Average 76.32
  • Standard deviation 2.33
  • Expected interval 73.99, 78.65

60
Trend volatility
  • How does web activity change over time?
  • 28 crawls of 1000 sites over May 7 to May 13 were
    performedto investigate this
  • Each crawl differs by several hundred thousand
    DLL calls
  • Amount of sites with actual scripts change

61
Trend volatility
  • These runs were done 5.5 hrs apart
  • Change is very slight
  • Zero-ratio functions increase
  • High-ratio functions decrease

62
Trend volatility
  • These runs were done 1 day apart
  • Change is also very slight
  • Zero-ratio functions decrease
  • Mid-ratio functions (R 0.5) increase

63
Trend volatility
  • These runs were done 6 days apart
  • Change is a little more apparent
  • Zero-ratio functions decrease
  • Mid-ratio functions (R 0.5) increase

64
Trend volatility
  • State of Javascript activity on Web is constantly
    changing
  • Changes are somewhat unpredictable (and entirely
    dependent on decisions of webmaster)
  • These changes in the long run are not major
    however, they still exist and need to be addressed

65
Conclusions of Approach
  • Substantial evidence in favor of existing trends
    for byte string arguments
  • This approach can be adapted to anything that can
    be quantified as a number
  • Changes in state of web will require any
    heuristic developed to have at least a basic
    learning capability
  • Plan to continue research over the summer
Write a Comment
User Comments (0)
About PowerShow.com