Shailendra Mishra - PowerPoint PPT Presentation

About This Presentation
Title:

Shailendra Mishra

Description:

Shailendra Mishra Director (CEP) – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 15
Provided by: complexev6
Category:

less

Transcript and Presenter's Notes

Title: Shailendra Mishra


1
Shailendra Mishra
  • Director (CEP)

2
Case Study Collusion detection in multi-player
Games
  • Consider the following problem
  • (A) Commits an identity theft.
  • (A) Acquires (n) credit cards as a result of the
    identity theft.
  • (A) Goes to an online gaming site uses the credit
    cards to play online poker with his (s) friends.
  • (A) loses all his money to his (s) friends.
  • The online gaming company has to now pay his
    friends.
  • Analysis of the problem
  • Assume, for a moment that (A) didnt commit
    identity theft.
  • (A) is playing a fair game with his friends or
    otherwise.
  • The results of this game, generate a stream of
    outcomes of wins and losses by (A) to any no. of
    his friends where 1lt i lt s.
  • The problem is to detect whether the pattern of
    wins and losses are genuine or not.
  • More formally, we are asking
  • When is a certain number of a particular
    subsequence unlikely to be fortuitous.

3
Modeling the Collusion Detection Problem
  • Let T be an ordered sequence of events.
  • Let W be the window observation of size w within
    which the analysis is confined.
  • Formally consider an alphabet ? of cardinality
    ? .
  • Consider an event sequence T t1, t2,tn of
    length n over ?.
  • We then define an episode over ? as follows
  • Single pattern S s1s2s3sm of length m
  • Set of patterns S1,S2,,Sd .
  • Set of all distinct permutations of S where
    ordering within window of observation doesnt
    matter.

4
Formal Statement of the Problem
  • Assume event sequence is generated by a memory
    less Bernoulli or Markov source.
  • Lets restate our problem formally we are
    interested in finding ??(n, w, m) that represents
    the number of windows containing atleast one
    occurrence of S, when sliding the window n events
    over T.
  • To address this
  • Compute the Expected value ?? (n, w, m).
  • Compute Var(?? (n, w, m).
  • Show that ?? (n, w, m) converges to a normal
    distribution.
  • Allows us to set a threshold ?(n, m, w) s.t for a
    given confidence level ? that P(? (n, w, m) gt
    ?(n, w, m)) lt ?.
  • Implies, For ?(n, w, m) occurrences of such
    windows, probability that such a number is
    generated by randomness is highly unlikely.

5
Formulation of equivalent Pattern Matching Problem
  • Given an alphabet ? a1, a2, , a ? and a
    pattern Ss1s2sm of length m.
  • Search occurrences of S as subsequence within a
    window W of size w in another sequence known as
    the event sequence T t1t2..tn of length n.
  • A valid occurrence of S in T corresponds to a
    set of integers i1, i2,..,im such that the
    following hold
  • 1 lti1 lt i2 lt lt im lt n
  • ti1 s1, ti2 s2, tim sm
  • im i1 lt w
  • We now estimate ?? (n, w, m, S, ?) which
    represents (windows) that contains atleast one
    occurrence of S, when sliding window over n
    consecutive events in event sequence T over
    alphabet ?.

6
Theorams Results Gwadera, Attalah
Szpankowski (Purdue)
  • Consider a memoryless source with pi being the
    probability of generating symbol ai e ?.
  • Also, assume P(S) ?m i1 pi
  • Result -1 Probability that a window of size w
    contains atleast one occurrence of episode S.
  • For all m and w gt m we have
  • P?(w, m) P(S) ? w-m i0 ? ? k0mnk ?qknk
  • where qk 1-pk
  • Result -2
  • Let now m be fixed and i ? j gt pi ? pj, then
    for any e gt 0
  • P?(w, m) 1 - P(S) ? m i1 ? (1-pi)w /pi ?j?I m
    1/(pj-pi) O(ew)
  • where w -gt 8

7
Computation of Bounds
  • Assume a memoryless source, then for x O(1), we
    have
  • limn-gt8P?? (n, w, m)-E(?? (n, w, m))/v(Var(P(??
    (n, w, m)) lt x
  • 1/2p?-8x exp(-t2/2)dt for a fixed m and w.
  • Now lets establish the threshold for ?(n, m, w).
  • First we find an a0 for a given ß s.t
  • ß ? a0 8 exp(-t2/2)dt P N(0, 1) gt a0
  • Where N(0, 1) is the standard normal
    distribution.
  • We set the threshold
  • ?(n, w, m) E(?? (n, w, m) vVar(?? (n, w, m)
  • As long as we are in the region where central
    limit theoram applies
  • P?? (n, w, m) gt ?(n, w, m) lt ß

8
A
9
Shailendra Mishra
  • Director (CEP)

10
SQL Standards update
  • Pattern Matching Proposal Version 12 of the
    review draft has been circulated.
  • Participants Coral8 ltsome partsgt, IBM, Oracle,
    Streambase.
  • BEA systems also reviewed the draft.
  • Status 12th version of the draft is ready and
    has been circulated.
  • Objective - Submit a working draft to ANSI SQL
  • Discussing a streams language proposal with IBM
  • Participants IBM ORACLE
  • Status Exchanged Docs. Regarding language
    specifications
  • Objective - Submit a working draft to ANSI SQL
  • Discussing convergence language proposal with
    Streambase
  • Participants IBM Streambase
  • Status Discussing convergence proposal for the
    last 6 months
  • Objective - Submit a paper to Transactions on
    Databases (TODS)

11
Pattern Query With ONE ROW PER MATCH
  • SELECT a_symbol, a_tstamp, / start time /,
    a_price, / start price /,
  • max_c_tstamp, / inflection time /,
    last_c_price, / low price /,
  • max_f_tstamp, / end time /, last_c_price, /
    end price /, Matchno
  • FROM Ticker MATCH_RECOGNIZE (PARTITION BY Symbol
  • MEASURES A.Symbol AS a_symbol, A.Tstamp AS
    a_tstamp,
  • A.Price AS a_price, MAX (C.Tstamp) AS
    max_c_tstamp,
  • LAST (C.Price) AS last_c_price, MAX (F.Tstamp) AS
    max_f_tstamp
  • MATCH_NUMBER AS matchno
  • ONE ROW PER MATCH
  • AFTER MATCH SKIP PAST LAST ROW
  • MAXIMAL MATCH
  • PATTERN (A B C D E F)
  • DEFINE B AS (B.price lt PREV(B.price)),
  • C AS (C.price lt PREV(C.price)),
  • D AS (D.Price gt PREV(D.price)),
  • E AS (E.Price gt PREV(E.Price)),
  • F AS (F.Price gt PREV(F.price)
  • AND F.price gt A.price))

12
Pattern Query With All ROWs PER MATCH
SELECT a_symbol, a_tstamp, / start time /,
a_price, / start price /, max_c_tstamp, /
inflection time /, last_c_price, / low price
/, max_f_tstamp, / end time /, last_c_price,
/ end price /, Matchno FROM Ticker
MATCH_RECOGNIZE (PARTITION BY Symbol MEASURES
A.Symbol AS a_symbol, A.Tstamp AS
a_tstamp, A.Price AS a_price, MAX (C.Tstamp) OVER
() AS max_c_tstamp, LAST (C.Price) OVER () AS
last_c_price, MAX (F.Tstamp) OVER () AS
max_f_tstamp MATCH_NUMBER AS matchno CLASSIFIER
AS classy AFTER ROW PER MATCH AFTER MATCH SKIP
PAST LAST ROW MAXIMAL MATCH PATTERN (A B C D E
F) DEFINE B AS (B.price lt PREV(B.price)), C AS
(C.price lt PREV(C.price)), D AS (D.Price gt
PREV(D.price)), E AS (E.Price gt
PREV(E.Price)), F AS (F.Price gt
PREV(F.price) AND F.price gt A.price))
13
MATCH_RECOGNIZE syntax
The full syntax of the MATCH_RECOGNIZE clause is
as under PARTITION BY optional MEASURES -
optional, but we expect this will always be
used ONE ROW ALL ROWS PER MATCH default
to ONE ROW AFTER MATCH SKIP TO NEXT ROW PAST
LAST ROW TO ltvariablegt TO LASTltvariablegt
TO FIRST ltvariablegt - default AFTER MATCH
SKIP PAST LAST ROW MAXIMAL INCREMENTAL
MATCH - defaults to MAXIMAL MATCH PERMUTE
optional PERMUTE EXPAND - optional PATTERN
mandatory SUBSET optional DEFINE
mandatory CLASSIFIER - optional (ALL ROWS PER
MATCH only) MATCH_NUMBER - optional
14
A
Write a Comment
User Comments (0)
About PowerShow.com