Title: Intrusion Prevention and Detection for Web Databases
1Intrusion Prevention and Detection for Web
Databases
- by
- Ehud Gudes and Alex Roichman
- June 2008
2Outline
- Web Applications and Databases
- Architecture
- Attack surface
- Multilayered security solution for web databases
- Parameterized View prevention method
- Session Key parameter
- Rolling Key parameter
- Implementation and analysis
- DIWeDa intrusion detection method
- Architecture
- Profile building algorithm
- Intrusion detection algorithm
- Implementation and evaluation
- Summary
3Traditional DB
- Two-tier architecture
- Users are fixed and known
- Users number is limited
- All transactions are passed via the same
connection and users connect directly to the DB
4Traditional DB Access Control
- DB can identify and authenticate its users
- DB can authorize users by traditional
user/role-based access control - It is quite easy to follow up single user
transactions to seek signs of intrusion - Views can be used to determine for a user the
only part of the database that interests her
5Web Databases
- Three-tier architecture
- Users are casual and unknown
- Users number is not limited
- Users do not connect directly to the DB
6Connection Pooling
- Different web users can run their SQL statements
on the same DB connection and one user can run
her SQL statements on different connections - This technique contributes to application
efficiency since the time to open and close the
connection is saved per each request - It has serious implications on the databases
access control mechanism
7Web DB Access Control
- Web DB does not identify the real application
user who accesses it - The only user accessing the database is the user
of the web application server most often with
full access privileges (administrator or
super-user) - No user-based access control can be applied
- No or very limited RBAC can be applied
- The principle of minimal privilege is violated
- No more fine-grained access control to the web DB
exists
8Web DB is vulnerable
- There is no way to limit the web user privileges
at the database level of the web databases. - There are still many secured web applications,
but their security can be achieved only by
application means (writing lines of code that
implement security policies) and not by database
access control means (GRANT/REVOKE and VIEWS) - This situation is very problematic
- Implementing access control by writing code is a
time consuming task - Enterprise may have one DB but many applications,
then access control is distributed and in many
cases not consistent - Programmers must be security specialists
- There is no confidence that all accesses to DB
are authorized - As a result "70 of websites are at immediate
risk of being hacked!" (http//www.acunetix.com)
9Attack example SQL Injection
- In many Web application the SQL sentence is
structured as a string and the users parameter
is concatenated to this string
- strSQL "SELECT Salary FROM Salary_Table
WHERE Employee_No 123 - AND Salary_Date '" dateParam "'"
- But the intruder can type 01.2007' or '1' '1
and retrieve the entire table
SELECT Salary FROM Salary_Table WHERE
Employee_No 123 AND Salary_Date
'01/2007' OR '1' '1'
10Attack example Business Logic Violation
- Another attack on web databases is the Business
Logic Violation (BLV) Attack. - For example an enterprise can require that each
buyer must key up her paying details prior to
buying some book in an online bookstore
application. Buying without entering a payment
details will violate a business rule of an
enterprise.
11Web attacks reports
- Many web applications have serious security holes
and vulnerability reports found on the Internet
show how the existing situation is critical - On November 01, 2005 a high school student used
SQL injection to break into the site of a
Taiwanese information security magazine and steal
customer's information (http//www.webappsec.org/p
rojects/whid/list_id_2005-46.shtml). - On January 13, 2006 hackers broke into a Rhode
Island government web site and stole credit card
data from individuals who have done business
online with state agencies (http//www.webappsec.o
rg/projects/whid/list_id_2006-3.shtml). - On March 29, 2007 a SQL injection flaw in
official Indian government site was discovered
(http//www.webappsec.org/projects/whid/list_id_20
07-12.shtml). - On June 29, 2007 hackers defaces Microsoft U.K.
web page by using an SQL injection attack
(http//www.cgisecurity.com/2007/06/27). - On August 12, 2007 the United Nations web site
was defaced by using SQL injection
(http//hackademix.net/2007/08/12/united-nations-v
s-sql-injections/). -
12Web attacks reports SQL Injection is very
popular
- SQL injection attacks are extremely prevalent,
and ranked as the second most common form of
attack on web applications in 2006 in CVE (Common
Vulnerabilities and Exposures list). - The percentage of these attacks among the overall
number of attacks reported rose from 5.5 in 2004
to 14 in 2006 (http//cve.mitre.org). - Recent reports suggest that a large number of
applications on the web are vulnerable to SQL
injection attacks (http//www.darkreading.com/docu
ment.asp?docid103774WT.svlnews1_3), - The number of attacks are on the increase, and
SQL Injection is on the list of most prevalent
forms of attack (http//www.computerweekly.com/Art
icles/2004/04/16/201840/Topfivethreats.htm) - The recent SQL Injection attack on CardSystems
Solutions that exposed several hundreds of
thousands of credit card numbers is an example of
how such attack can victimize an organization and
members of the general public (http//www.securewo
rks.com/press/20060718-sql.html).
13Related work
- Conventional methods for providing protection in
databases rely heavily on the identity of the
entity accessing the database - Several suggestions were published in the
literature to prevent SQL injection attacks - There is research that proposes to refer to the
structure of each SQL statement possible in the
system and to fingerprint that structure - An additional approach is to refer to some
interesting properties of each SQL sentence such
as referenced tables and fields - Another approach to database IDS is not to build
profiles for each SQL sentence, but for each
database user - Another approach is to search for data
dependencies among the data items in a database
14Related work criticism
- The proposed prevention methods are applicative
and cannot be implemented by the database
built-in mechanisms they try to shield databases
from attacks instead of making databases
resistant to them - The proposed protection methods do not fit web
databases since they do not propose any algorithm
for dealing with web session traceability
problem they do not work at the session level
and their approach is not role-based thus they
cannot detect many attacks from web applications
15Our Approach
- We propose the two layered security model for web
databases - The purpose of the first line of defense is to
prevent intrusions. - The purpose of the second line of defense is to
detect attacks which elude the prevention layer. - We use the Parameterized Views method - a novel
fine-grained access control mechanism at the
first prevention line of defense. - We use DIWeDa (Detecting Intrusion in Web
Databases) as our method of intrusion detection
at the second line of defense. - Two layers work together to achieve the highest
level of web database protection.
16Prevention - the first line of defense
- The Parameterized View method solves the problems
of absence of web database internal access
control and enable using of views as a means of
access restriction - This method is capable of preventing attacks to
the DB by the existing native database protection
mechanisms - It is attack-independent and significantly
decreases the database attack surface - It enables real-life development of secure
Internet database by using commercially available
tools
17Parameterized View
- Our concept is based on the use of parameterized
views as the means to supervise the accesses to
the DB from web systems - The parameterized view transfers the user's
identity to the DB and the view displays the
relevant data to this user accordingly, thus
providing fine-grained access control to web DB - Example of the Parameterized View definition
CREATE VIEW Student_Marks_View WITH pStudent_No
SELECT FROM Student_Marks_Table WHERE
Student_No pStudent_No
18Traditional Parameterized Views without random
parameters
- Example of select from the parameterized view
- But if instead of the course number an intruder
types 12345
UNION Select From Student_Marks_View(2)
then the SQL
injection will still be possible - The solution to this problem is not to use the
explicit user identity in the SQL statement but
instead use a run-time generated identifier which
will be very difficult to fake.
SELECT FROM Student_Marks_View(1) WHERE Cour
se_No 12345
SELECT FROM Student_Marks_View(1) WHERE
Course_No 12345 UNION SELECT
FROM Student_Marks_View(2)
19Parameterized views flow of events
20Our Parameterized View
- The main requirement is that the parameter will
be difficult to fake - The application will be authorized to
parameterized views only - In the parameterized view, the parameter will
contain the identity of the user - We present two design solutions for the
parameterized views - Application Session Key parameter
- Rolling Key parameter
- For details See our SACMAT2007 paper
21Detection - the second line of defense
- Although access control can prevent many attacks
on web databases, it cannot prevent all of them. - One kind of an attack on web databases which
cannot be prevented is the Business Logic
Violation(BLV) - This attack cannot be prevented ether by a
database not by the parameterized views since the
existing database/views access control can grant
or revoke access to resources only according to
the accesor identity/role/parameter and cannot
rely on the business logic of an enterprise. - If many data-centric attacks cannot be
affectively prevented, it make a sense at least
to detect them.
22Detecting BLV at the DB level vs. Application
level
- BLV can be detected either at the application or
at the database level, but- - An enterprise in general has several
applications, but one database. - Applications are changed frequently, but business
rules which are seen at the database level are
stable. - Thus it is preferable to have only one IDS at the
database level to enforce stable business rules
and not to have IDS for each application and to
cope with continues application's changes. - The conclusion it is preferable to detect BLV
at the database level by a web database Intrusion
Detection System.
23DIWeDa our solution for Web Database Intrusion
Detection Systems
- Intrusion Detection Systems (IDS) for network and
operating systems have existing for over 20
years. - IDS for databases is relatively new field of
research that has surfaced in the last few years. - IDS for web databases is almost not exist.
- We present DIWeDa, which is able to detect not
only most known attacks such as SQL injections,
but also more complicated and complex kinds of
attacks such as business logic violations.
24DIWeDa role based approach
- Since different roles of an enterprise have
different authorizations, the best strategy for
web database IDS is to build profiles not per an
enterprise but per enterprise roles. - Role-based models are widely used for old desktop
applications, but most of the web applications do
not use roles - the real user of a web session is unknown at the
database level due to connection pooling issues,
thus it is impossible to apply role base access
control to web databases - sometimes the actual role a user uses is
determined dynamically only at run time. - DIWeDa identifies database roles from the learnt
profiles and look for intrusions from one role to
another.
25Employing many intrusion classifiers
- We use data mining techniques to detect
intrusions and employ two different intrusion
classifiers - SQL content classifier (this paper)
- SQL parameters classifier
- The combination of intrusion classifiers help us
to achieve very high accuracy and preciseness in
detecting of attacks. - The combination of intrusion classifiers was
previously used for network or operation systems
IDS, but we first use it for detecting intrusions
for web databases.
26DIWeDas System Architecture
27Profile Builder algorithm
- During the training phase DIWeDa learns an
application behavior and builds profiles which
will be used later during the detection phase. - Each DIWeDas profile consists of
- the SQL Fingerprint-Set that represents the
enterprise SQL space - Statistical characteristics of parameters for
each fingerprint - The Cluster-Set that represents the SQL content
of each enterprise role
28Profile Builder Fingerprint-Set
- SQL fingerprint represents the structure of a SQL
sentence. - DIWeDa learns all possible fingerprints of an
application and save them in the Fingerprint-Set. - Some attacks, such as SQL injection, always
change the structure of SQL, so if a session
contains an SQL statement without known
fingerprint this session might be under attack. - But more complicated attacks such as logic
violations cannot be detected only by looking for
a statement level SQL structure anomaly- we also
must look for a session level SQL content anomaly.
29Profile Builder Session Vector
- Each application session can be presented by its
SQL Session Vector. - A Session Vector is a binary vector SV with the
length equal to the number of fingerprints in the
application, where the ith bit is 1 if the
application session submits SQL with the ith
fingerprint, else bit i is 0. (note order is
immaterial) - For example, if some application session runs SQL
statements with corresponding fingerprints 1, 3,
5 and there are 7 different fingerprints at all,
we have the following session vector - SV 1, 0, 1, 0, 1, 0, 0
30Profile Builder SQL space
- The Session Vector enables us to formally define
the session SQL contents. - We can think about the application SQL space as
an n-dimensional space, where n is the number of
fingerprints for the application. - Then each session's SQL content can be seen as a
vector (or a point with n coordinates) in the
n-dimensional space.
31How Sessions are identified?
- Connection pooling makes identifying and
separating under sessions difficult - But with parameterized views the SQL statements
issued by a user are uniquely identified by the
unique random key of the parameterized view.
Thus, sessions can be identified and monitored.
32Profile Builder distribution in SQL space
- If a session's SQL content was
- absolutely random, then the distribution
- of vectors in the space should be uniform.
- But in reality, a session's SQL content
- is not random- this means that
- a session can submit only SQL statements
- that a session's role is supposed to submit.
- The groups of points in the last figure represent
different access roles and show that SQL contents
of sessions from the same role are very similar,
while SQL contents of sessions from different
roles are far apart.
33Profile Builder looking for intrusions
- We will find clusters (application roles) by
using data mining techniques - We will look for intrusion by looking for
sessions with vectors which are far from the
found clusters
34Profile Builder Cluster
- The Cluster is a group of highly similar Session
Vectors which belong to the same role. - The Cluster Centroid is a vector CC with vector
values that are the respective means of the
cluster vectors. - The distance D between two clusters represented
by their centroids is computed as
35Profile Builder Clustering Algorithm
- Now we can use the data mining technique
(hierarchical clustering algorithm) and cluster
application sessions of the same role. - This algorithm builds a tree which is called the
cluster Dendrogram tree.
36Profile Builder Building Dendrogram
- Hierarchical_Clustering_Algorithm
- //Building Dendrogram tree
-
- For each application session build its Session
Vector. - Start with each Session Vector as a separate
cluster. - Save all clusters received at this stage in a
Cluster-Set and initialize i to 1. - Select two closest clusters to merge into a
single cluster. - Compute the new cluster centroid for the merged
cluster. - Save all clusters received at this stage in a
corresponding Cluster-Set and advance i by 1. - Repeat steps 4-6 until we get a single cluster.
37Profile Builder Cluster-Set
- Now we will choose the best Cluster-Set (layer of
the dendrogram tree) to serve as a profile for
DIWeDa. - We recommend to choose a Cluster-Set with the
Minimum Total Distance to serve as the profile
for DIWeDa. - This approach finds a cluster-set with small
specific clusters that are far from a global
centroid. - By using this approach, DIWeDa will find specific
separated roles (clusters).
38Profile Builder Cluster-Set with Minimum Total
Distance
- The Minimum Total Distance is computed by the
following formula - where Intra-Cluster Distance is computed by
- and Inter-Cluster Distance is computed by
39Profile Builder Algorithm
- To summarize, our algorithm of building
cluster-based profile is as follows - Build_Cluster_Based_Profile
-
- Find all application fingerprints with support gt
SIR and save them in Fingerprint-Set - Run Build_Dendrogram
- Select the appropriate set of clusters (layer in
Dendrogram) with the Minimum Total Distance and
save it in Cluster-Set - Delete Clusters from Cluster-Set with a small
Support -
40Detection Phase Session Anomaly by SQL content
- The probability of an analyzed session
represented by its session vector SV to have an
abnormal SQL content is defined by the formula -
- where MD (Minimum Distance) is the distance
between SV and the closest cluster centroid from
the cluster-set and NUS is the number of
unexpected statements in the analyzed session. - This formula is based on the following
assumption - The further a session vector is from any existing
cluster, the more abnormal a session is. - The more unexpected statement a session has, the
more abnormal a session is.
41Detection Phase Intrusion Threshold
- Any session with probability to be abnormal
greater than the Session Intrusion Threshold will
be classified as intrusion. - The Session Intrusion Threshold is the IDS
sensitivity and is represented by a number from
the range 0, 1, where each session with the
probability to be abnormal greater than this
threshold will be classified as intrusion. - The interesting thing that was learned from our
evaluation is that the best threshold is equal to
the Cluster-Set Maximum Radius.
42Detection Phase Maximum Radius
- The Cluster-Set Maximum Radius as the maximum
between Cluster Maximum Distances over all
clusters in the cluster-set.
- The idea is that any session with a vector inside
a cluster will be classified as legal and outside
a cluster as intrusion
43DIWeDa Implementation and Evaluation
- We implemented a prototype of the proposed system
and used it to evaluate the system. - The prototype was developed with C and SQL
Server 2005. - The online bookstore application was used as a
web application benchmark. - The profiles were built by manually operating
this application. - The analyzed sessions were created by synthetic
data
44Inputs to our system - DB log
45Test case patterns results
of FN of FP of TN of TP of instances Pattern Desc Pattern
0 9 191 0 200 Legal sessions 1
0 0 0 5 5 Where clause modification 2
0 0 0 5 5 Field clause modification 3
0 0 0 5 5 From clause modification 4
0 0 0 10 10 SQL randomization 5
0 0 0 5 5 Business logic escalation 1 (mix of different roles SQL contents) 6
0 0 0 5 5 Business logic escalation 2 (mix of randomly chosen legal SQLs) 7
5 0 0 0 5 Business logic escalation 3 (sessions with random order of SQLs) 8
0 0 0 15 15 Business logic escalation 4 (sessions built on SQLs without their original contents) 9
0 0 0 5 5 Complex attacks scenario 10
46DIWeDa Implementation and Evaluation
- We found that our proposal for the system
threshold of 0.07 was the best as can be seen
from the following ROC. - At this threshold we achieved TPR 92.5 and
FPR5. - The following table shows the evaluation summary
of FN of FP of TN of TP of instances Pattern
0 9 191 0 200 1
0 0 0 5 5 2
0 0 0 5 5 3
0 0 0 5 5 4
0 0 0 10 10 5
0 0 0 5 5 6
0 0 0 5 5 7
5 0 0 0 5 8
0 0 0 15 15 9
0 0 0 5 5 10
47Evaluation summary
- Our True Positive Rate was about 92.5
- Our False Positive Rate was about 5
48Future work
- Improving accuracy
- increasing TPR and decreasing FPR may be achieved
by employing more than two intrusion classifiers
and by finding a smart algorithm of classifiers
fusion - Adapting the proposed algorithms to new kind of
web attacks
49Summary
- Multilayered security model for web databases
- the novel fine-grained access control mechanism
as the first prevention line of defense - general attack independent prevention mechanism
- prevention at DB layer
- user/session traceability at DB layer
- the session anomaly detection mechanism as the
second intrusion detection line of defense - role-based approach
- detection of attacks which can be seen only at
session level - detection accuracy due to the combination of
different intrusion classifiers - observing the normal working application with no
assumption that the learning period is clear from
attacks - Our solution proposes a very high protection
level for existing web database environments, and
this is the main significant advantage of this
research.
50More on Parameterized Views
51Our Parameterized View
- The main requirement is that the parameter will
be difficult to fake - The application will be authorized to
parameterized views only - In the parameterized view, the parameter will
contain the identity of the user - We present two design solutions for the
parameterized views - Application Session Key parameter
- Rolling Key parameter
- For details See our SACMAT2007 paper
52Application Session Key
- A user requests to perform a process of
identification when she provides a username and a
password to the application - The application runs a DB stored procedure that
accepts the user's username and password and
returns a random number (AS_key) - DB stores the random number in a table of active
users - The application knows the user who works with it,
and stores the AS_key as well. Each SQL sentence
that will be run on behalf of the user will be
run with a parameter of user's corresponding
AS_key - The AS_key is cleared when a user disconnects
from the application
53Example of Application Session Key
- When Alice entering the system, she is
authenticated and receives a random key like
11011 - The application is authorized to access only the
parameterized views - In order the view will return the data, it must
get a valid parameter - Now Alice may access information about her salary
from different months, but not the salary of
different employees.
54Example of Application Session Key
- Each SQL of Alice will look as follows
- And the definition of the view looks as follows
SELECT Salary FROM Salary_View(110011) WHERE
Salary_Date '01/2007'
CREATE VIEW Salary_View WITH pAS_key SELECT
FROM Salary_Table WHERE Employee_No IN
( SELECT Employee_No FROM Users_Table
WHERE Users_Table.AS_keypAS_key )
55Rolling Key
- The first three steps are similar to the previous
parameter method based on the application session
key, except that now AS_key also serves as the
seed of the rolling key - DB and the application agree on a common
encryption key (Enc_key). This encryption key is
used to generate the next rolling key from the
current one - DB stores the AS_key, Enc_key and the current
Roll_Key in a table of active users
56Rolling Key
- Now, when the SQL sentence arrives, it contains a
request for a parameterized view with 2
parameters the AS_key and the rolling key. As a
result, two things occur - The view returns the filtered data that belongs
only to the user that the session and rolling
keys belong to - The rolling key is advanced to the next number
both in the application and in the DB
57Example of Rolling Key
- When Alice entering the system, she is
authenticated and receives two random keys - Now she can run a statement like
SELECT Salary FROM Salary_View(110111, 000111)
WHERE Salary_Date '01/2007'
- But if she wants to run another statement, she
will submit
SELECT Salary FROM Salary_View(110111, 111010)
WHERE Salary_Date '02/2007'
58Comparison of our two methods
- Both approaches significantly decrease the attack
surface of web DB - The safety of the parameter method rests with the
difficulties of guessing the various keys - Example of SQL Injection into the parameterized
view
SELECT Salary FROM Salary_View(11011) WHERE
Salary_Date '01/2007' OR '1' '1'
Because Salary_View returns only the data of the
attacker, she may access information about her
salary from different months, but not the salary
of different employees.
59Comparison of our two methods
- AS key method is preferable from the efficiency
point of view the calculation of the next key in
the rolling key method is expensive - Rolling key method is preferable from security
point of view the rolling key method is
resistant to replay attacks and a correct guess
of a code is only valid for running one sentence - Both methods allow the identification of a real
user at the DB layer, thus fine-grained access
control can be applied at the DB layer
60Implementation
- The parameterized views are not yet the part of
the SQL standard - But we can implement our method with existing
functions stored in DB (table functions) - These functions are supported in most databases
and they can be used in standard DML sentences
SELECT FROM Student_Marks_Func(10101)
- The advantage of table functions is their ability
to support delete and update actions, while
limiting the activity of them to the return value
of a function that depends on the parameter passed
61More on Parameters Classifier
62Profile Builder - Characterizing SQL Parameters
- Detecting anomalous SQL sentences cannot rely
only on the SQL structure (fingerprints), we also
want to characterize the SQL parameter values. - Our goal is to capture normal characteristics of
parameters and then use these characteristics
during the detecting phase of DIWeDa to discover
anomalies in user parameters that are applied to
SQL statements. - We use two statistical models for characterizing
parameters parameter length and parameter
character distribution.
63Profile Builder - Characterizing SQL Parameters
- By computing the parameter length mean and
variance, we can find anomalies in a parameter,
when a malicious input is passed. - Legal SQL parameters have regular structures and
contain only human-readable characters. - In the case of attacks, such as the parameter
buffer overflow or parameter guessing by a brute
force attack, the character distribution is
changed dramatically.
64Profile Builder - Characterizing SQL Parameters
65Profile Builder - Characterizing SQL Parameters
- Characterize_Parameters
-
- For each fingerprint F from the Fingerprint Set
do - For each parameter P from F do
- Find characteristics of P (AVG, STD and
- Character Distribution) and save these
- characteristics under the indices of F
and P
66Detection Phase Session Anomaly by Parameters
- Testing parameter length by the following
formula - Testing character distribution by Pearson test
- Combining the two tests
67Detection Phase Classifiers Combiner
- Our method compute the probability of a session
to be legal based on following features - Session SQL content anomaly
- Session parameter anomaly
- Two classifiers support us with a value in the
same range 0, 1 that can be used as evidence to
what degree the analyzed session is abnormal. - Now we combine them. We rely on the fact that our
two classifiers are independent values of
parameters do not influence SQL structures and
vise versa, thus we can compute