Title: Overview
1 Overview
- A Quantum Computation Simulation Language
- Anomaly Detection in the Windows Registry
- Detecting Splice Sites in Genes
- Rotationally Invariant Face Detection
2 -HSK
- A Quantum Programming Language and Compiler
Katherine Heller, Krysta Svore, Maryam Kamvar (Al
Aho)
3What is -HSK?
- Quantum Computation Simulation Language
- Quantum Compiler
- Q-HSK enables simplified programming of quantum
algorithms with built-in graphics
4Many Worlds Interpretation
- One formulation of quantum theory
- Each universe has a corresponding amplitude
(i.e. complex number) - amplitude2 probability of existence
u3
u1
x
u2
u4
5Qubits
- Quantum analogue of a classical bit
- Takes on values 0, 1, or superposition of
states - ? a 0 ß 1 where a2 ß2
1 -
- ? cos(? / 2) 0 eif sin(? / 2) 1
6Quantum Gates
- Reversible all unitary operators (U UI)
- Universal quantum gates U2,XOR, Toffoli
- Some common gates Hadamard, QFT, CNOT
H
H
1
0
1/v2 ( 0 1)
7Key Features of the Q-HSK Compiler
- Familiar C-style syntax
- Matrix operations via CBLAS
- Complex and real data types
- A quantum type qreg
- A graphical view of quantum algorithms
- Lucid representation of quantum qubits,
registers, and gates - Interactive user options (start, stop, pause,
change animation rate) - Detailed text output to trace algorithm
8A Simple Example
- int main( )
-
- int a, i
- qreg q
- qcreate(5)
- i 0
- while (i lt 5)
-
- qi (0.0, 0.0)
- i i 1
-
- q computeHadamard(q)
- a Measure(q)
- printf(This is the measure d, a)
- return 0
H
M
0
0
0
q
0
0
9 Shors Algorithm
- Factors large numbers
- n - number to factorize
- x random number
- a ranges from 0 to q-1
- n2ltqlt2n2
- r period of xa (mod n) exp. classically
- one factor of n is gcd(xr/2-1,n) fast
classically
10Graphical Interface
11Architecture of Q-HSK Compiler
lex.yy.c
y.tab.c
translate.c
Program.q
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Translator
Program.cpp
g
Executable
Java
javac
Graphics
12One Class Support Vector Machines for Detecting
Anomalous Windows Registry Accesses
Collaborators Krysta Svore, Angelos Keromytis,
Sal Stolfo
13Host Based Intrusion Detection Systems
- Microsoft Windows most often attacked
- Current method to combat attacks
- Virus Scanners and Security Patches
- Problem These do not combat unknown attacks so
frequent updates are needed - Host based IDS
- Monitor system accesses to detect intrusions
- Application of data mining techniques
14The Windows Registry and RAD
- Windows Registry
- Stores configuration settings for system
parameters security information, programs, etc. - Programs query the registry for information
- Registry Anomaly Detection
- audit sensor
- model generator
- anomaly detector
Process EXPLORER.EXE Query OpenKey Key
HKCR\CKSUD\B41DB860-8EE4-11D2-9906-EA9FADC173CA\
shellex\MayChangeDefaultMenu Response
SUCCESS ResultValue NOTFOUND
15Probabilistic Anomaly Detection Algorithm
- Computes 25 consistency checks
- P(Xi) and P(XiXj)
- Multinomial with Hierarchical Prior
- For observed elements i
- P(X i) C(Ni a)/(k0aN)
- where N - total number of observations
- Ni - number of observations of symbol I
- a pseudo count for each observed symbol
- k0 number of observed symbols
- L number of possible symbols
- For unobserved elements i
- P(X i) (1-C)1/(L-k0)
- C N/(NL-k0 )
16One Class SVMs
- Analogous to two class SVM where all data lies
in the first class and the origin is sole member
of second class - Solve optimization problem to find rule f with
maximal margin - f(x)w,xb
- Equivalent to solving the dual quadratic
programming problem - mina (1/2) ?I,j aiajK(xi,xj) s.t.
0ai1/(?l) , ?i ai 0 - Kernel function projects input vectors into a
feature space allowing for non-linear decision
boundaries - F X ? RN K(xi,xj) F(xi), F(xj)
17Experiments
- Kernels
- Linear K(x,y) (xy)
- Polynomial K(x,y) (xy1)d
- Gaussian K(x,y) e -x-y2/(2s2)
- Feature Vectors
- Binary
- Frequency-based
18Results
19Sequence Information for the Splicing of Human
Pre-mRNA Identified by Support Vector Machine
Classification
Collaborators Xiang Zhang, Ilana Hefter,
Christina Leslie, Larry Chasin
20What Is Splicing?
DNA
mRNA
21Pseudo Exons
- Consensus Sequences
- Donor Site MAGgtragt (MA/C, ra/g)
- Acceptor Site (y)10ncagG (yc/t, na/c/g/t)
- Donor and acceptor sites scored based on
closeness to consensus - Identifying Pseudo Exons
- Intronic segments
- Have high scoring donor and acceptor sites
- We look for discriminative signals in intronic
regions near real and pseudo exons
22String Kernels
- Feature map number of times each k-length
(contiguous) string occurs in sequence - Dimension of feature space is Nk
Example
k2
Sequence ACCTGGTG
1
AC
23Splice Kernels
- Hypothesis False splice sites are intrinsically
defective due to bad internal nt combinations - All possible size k internal nt combinations are
features - Example (k2) If the internal combination
(3g,5a) occurs, that feature value is 1,
otherwise it is 0
24Recursive Feature Selection
- Normal vector to the hyperplane
- w?i1..m yiaixi
- If wj large in absolute value, the jth feature
is important for SVM discrimination - Approximation due to degree 2 polynomial kernel
calculate wup and wdown separately, then
eliminate bottom 50 of features for each - Stop when ROC score drops below 90 of original
value on untouched test set
25Results
26Rotationally Invariant Face Detection Using
Multi-Resolution Histograms
Collaborators Shikher Bisaria, Tony Jebara
27Face Detection
- Given a picture with faces, how do we determine
where the faces are in the image? Which pixels
are face pixels? - We would like to determine this with a system
that - Runs in real time
- Recognizes rotations of faces
- (e.g. when someone tilts their head to one side)
28Gaussian Blurring
-
- Face images are greyscale (.pgms)
- Successive levels of blur are obtained by
reconvolving previous level of blur images with a
2 dimensional gaussian function - Mathematically equivalent to two passes of a one
dimensional gaussian function - g(i,j) 1/(2ps2) ?m?n e -(m2n2)/(2s2)
f(i-m,j-n) - 1/(2ps2) ?m e -m2/(2s2) ?n e
-n2/(2s2) f(i-m,j-n)
29Multi-Resolution Histograms
- Histogram equalize the image
- Concatenate histograms of image together after
successive levels of gaussian blurring
30Average Histograms
- Compute average face and non-face
multi-resolution histograms from training set - Average Non-Face Histogram Average
Face Histogram
31Optimization Problem
- C(a) mina HFAVG hF2 HNFAVG hNF2
- Where hF (1/?i ai) ?i aihi
- hNF (1/?i (1- ai)) ?i (1-ai)hi
- such that 0 ai 1 , ?i ai 1
- Let ßi (1- ai)
- Q hi,hj
- ca hi,HFAVG constant
- cß hi,HNFAVG constant
- mina,ß aTQa 1/(N-1)2 ßTQß 2caTa
2/(N-1)cßTß
32Solve Using SMO
- aiNEW 1/(N-1)2 Qii - 1/(N-1)2 ?k?i,jak Qjj
(1- ?k?i,jak ) Qjj - - (1- ?k?i,jak ) Qij 1/(N-1)2 ?k?i,jak Qij -
1/(N-1)2 Qij - cai - cßi caj - cßj ?k?i,j(ak Qik) - ?k?i,j(ak
Qjk) - - 1/(N-1)2 ?k?i,j(ak Qik) 1/(N-1)2 ?k?i,j(ak
Qjk) / Qii Qjj - - 2Qij 1/(N-1)2 Qii 1/(N-1)2 Qjj -
2/(N-1)2 Qij - Bounds for aiNEW
- L 0
- H 1 - ?k?i,jak
- ajNEW (1 - ?k?i,jak ) - aiNEW
33Results