Title: MAIDS Mining Alarming Incidents in Data Streams Implementation Discussion
1MAIDS (Mining Alarming Incidents in Data
Streams)Implementation Discussion
- MAIDS group
- NCSA and Dept. of CS
- University of Illinois at Urbana-Champaign
- www.maids.ncsa.uiuc.edu
2Implementation Essentials
- Framework Titled time window
- Extended FPgrowth for mining frequent patterns in
data streams - Extended Naïve Bayes for mining classification
models in data streams - Extended K-means by integration of
micro-clustering and macro-clustering for cluster
analysis in data streams - Extended H-tree cubing method for
multi-dimensional query answering in data streams - Application developments and testing
3Framework Titled Time Window (1)
- Natural tilted time frame window
- Example Minimal quarter, then 4 quarters ? 1
hour, 24 hours ? day, - Logarithmic tilted time frame window
- Example Minimal 1 minute, then 1, 2, 4, 8, 16,
32,
4Framework Titled Time Window (2)
- Pyramidal tilted time frame window
- Example Suppose there are 5 frames and each
takes maximal 3 snapshots - Given a snapshot number N, if N mod 2d 0,
insert into the frame number d. If there are
more than 3 snapshots, kick out the oldest one.
5Frequent Pattern Finder
- Frequent Pattern Finder Tilted Window
FPgrowth - A tilted time frame
- Different time granularities (natural vs.
pyramidal) - second, minute, quarter, hour, day, week,
- Targeted items
- User- or expert- selected items as targeted items
- Trace targeted items and their combinations using
FP-tree - FP-tree registers items with tilted time window
information - Mining based on the extended FPgrowth algorithm
on tilted window
6FPGrowth (1) FP-Tree Construction
TID Items bought (ordered) frequent
items 100 f, a, c, d, g, i, m, p f, c, a, m,
p 200 a, b, c, f, l, m, o f, c, a, b,
m 300 b, f, h, j, o, w f, b 400 b, c,
k, s, p c, b, p 500 a, f, c, e, l, p, m,
n f, c, a, m, p
min_support 3
- Scan DB once, find frequent 1-itemset
- Sort frequent items in frequency descending
order, f-list - Scan DB again, construct FP-tree
F-listf-c-a-b-m-p
7FPGrowth (2) FP-Tree Mining
- Start at the frequent item header table in the
FP-tree - Traverse the FP-tree by following the link of
each frequent item p - Accumulate all of transformed prefix paths of
item p to form ps conditional pattern base
Conditional pattern bases item cond. pattern
base c f3 a fc3 b fca1, f1, c1 m fca2,
fcab1 p fcam2, cb1
8FP-Tree with Tilted Window
- With a fixed order of itemset (could be based on
sampling, or alphabetic ordering) - Construct FP-tree while scanning data stream
- Each node contains a tilted time frame for count
accumulation - Add to the newest slot
- Propagate when needed
F-listf-c-a-b-m-p
9Mining Frequent Patterns in Dynamic Data Streams
- Mining when a user submits mining queries
(on-demand) - Mining on FPtree is done using FPgrowth on the
data in the corresponding time windows - If mine freq. patterns in the last 30 minutes,
then - If mine freq. patterns between 6am to 8am, then
- We may compare what has been changed in the last
24 hours (by comparing their frequent patterns,
i.e., mining the current patterns, mining the
patterns 24 hours ago, and then comparing them)
10Classification for Dynamic Data Streams
- Methodology Naïve Bayes Titled Time Windows
- Tilted time framework as shown above (natural
vs. pyramidal) - Instead of decision-trees, consider other models
which do not changes drastically - Naïve Bayesian with boosting is a good approach
- Major advantages
- Store statistical information related to each
variable - Model construction prediction
- Incremental updating and dynamic maintenance
- Advanced task Comparing of models to find changes
11Bayesian Classification Why?
- Probabilistic learning Calculate explicit
probabilities for hypothesis, among the most
practical approaches to certain types of learning
problems - Incremental Each training example can
incrementally increase/decrease the probability
that a hypothesis is correct. Prior knowledge
can be combined with observed data. - Probabilistic prediction Predict multiple
hypotheses, weighted by their probabilities - Standard Even when Bayesian methods are
computationally intractable, they can provide a
standard of optimal decision making against which
other methods can be measured
12Bayesian Theorem Basics
- Let X be a data sample whose class label is
unknown - Let H be a hypothesis that X belongs to class C
- For classification problems, determine P(H/X)
the probability that the hypothesis holds given
the observed data sample X - P(H) prior probability of hypothesis H (i.e. the
initial probability before we observe any data,
reflects the background knowledge) - P(X) probability that sample data is observed
- P(XH) probability of observing the sample X,
given that the hypothesis holds
13Bayesian Theorem
- Given training data X, posteriori probability of
a hypothesis H, P(HX) follows the Bayes theorem -
- Informally, this can be written as
- posterior likelihood x prior / evidence
- MAP (maximum posteriori) hypothesis
- Practical difficulty require initial knowledge
of many probabilities, significant computational
cost
14Naïve Bayes Classifier
- A simplified assumption attributes are
conditionally independent - The product of occurrence of say 2 elements x1
and x2, given the current class is C, is the
product of the probabilities of each element
taken separately, given the same class
P(y1,y2,C) P(y1,C) P(y2,C) - No dependence relation between attributes
- Greatly reduces the computation cost, only count
the class distribution. - Once the probability P(XCi) is known, assign X
to the class with maximum P(XCi)P(Ci)
15Training dataset
Class C1buys_computer yes C2buys_computer
no Data sample X (agelt30, Incomemedium, Stud
entyes Credit_rating Fair)
16Naïve Bayesian Classifier Example
- Compute P(X/Ci) for each class
- P(agelt30 buys_computeryes)
2/90.222 - P(agelt30 buys_computerno) 3/5 0.6
- P(incomemedium buys_computeryes)
4/9 0.444 - P(incomemedium buys_computerno)
2/5 0.4 - P(studentyes buys_computeryes) 6/9
0.667 - P(studentyes buys_computerno)
1/50.2 - P(credit_ratingfair buys_computeryes)
6/90.667 - P(credit_ratingfair buys_computerno)
2/50.4 - X(agelt30 ,income medium, studentyes,credit_
ratingfair) - P(XCi) P(Xbuys_computeryes) 0.222 x
0.444 x 0.667 x 0.0.667 0.044 - P(Xbuys_computerno) 0.6 x
0.4 x 0.2 x 0.4 0.019 - P(XCi)P(Ci ) P(Xbuys_computeryes)
P(buys_computeryes)0.028 - P(Xbuys_computeryes)
P(buys_computeryes)0.007 - X belongs to class buys_computeryes
17Naïve Bayes for Data Streams
- Store single variable statistics
(Attribute-Value-ClassLabel AVC-list) in titled
time windows - Incremental update based on count propagation in
the titled time window - For computing accuracy, partition data into
training set and testing set, and derive
prediction accuracy similar to non-stream data - Boosting based on the testing data, put more
weight on the data whose prediction is incorrect - Advanced task Comparing of models to find changes
18Training and Testing for Data Streams
- Two classes of models for prediction peer vs.
future - We study how to predict future class
- Take the data in the current window as testing
data - Take the data in the previous windows as training
set - Derive models based on different weighting scheme
(e.g., uniform, linear decreasing, logarithmic
decreasing, etc.) - Test and select the best model
- Then based on this modeling scheme, construct
model by including the current window data as new
training set - To predict peer class
- The training and test partition is along the same
time framework - There is no retraining process
19www.cs.uiuc.edu/hanj