Title: Markov Decision Processes: Approximate Equivalence
1Markov Decision Processes Approximate Equivalence
- Michel de Rougemont
- Université Paris II LRI
- http//www.lri.fr/mdr/
2The world of MDPs
- Follow-up of On the complexity of partially
observed markov decision processes, 1996, D.
Burago, Anatol, Mdr - What is robustness? Deviation model in the 1990s.
- Distance on runs in the 2000s
- Efficient Distance of a run to an MDP
- Approximate Comparison of MDPs
- Statistics Analysis of Probabilistic Processes,
- (LICS 2009 with Mathieu Tracol)
3M.D.P
- S States s,t,u,v
- S actions a,b,c
- P(u t,b)0.
- Policy s resolves the
- non-determinism.
- Example s(t)b, s(v)c
- Run s,t,a,u,a,v
- Trace aba
4This talk
- Approximation of Decision problems Property
Testing - Non deterministic Automata Tester for membership
and equivalence. - Markov Decision Processes Tester for the
Existence of Strategies, and Equivalence -
51. Testers on a class K
- Let F be a property on a class K of structures
U - An e -tester for F is a probabilistic algorithm
A such that - If U F, A accepts
- If U is e far from F, A rejects with high
probability - F is testable if there is a probabilistic
algorithm A such that - A is an e -tester for all e
- Time(A) is independent of nsize(U).
- Robust characterizations of polynomials, R.
Rubinfeld, M. Sudan, 1994 - Property Testing and its connection to Learning
and Approximation. O. Goldreich, S. Goldwasser,
D. Ron, 1996. -
- Tester usually implies a linear time corrector.
(e1, e2)-Tolerant Tester -
6 Edit Distances with Moves on Strings
- Classical Edit DistanceInsertions, Deletions,
Modifications - Edit Distance with moves dist(w,w)
- 0111000011110011001
- 0111011110000011001
- 3. Edit Distance with Moves generalizes to
Ordered Trees
7Uniform statistics k-gram
W001010101110 length n, u.stat any
subwords of length k, n-k1 blocks, shingles
8 Tester for equality
Edit distance with moves. NP-complete problem,
but approximable in constant time with additive
error. Uniform statistics ( )
W001010101110 Theorem 1. u.stat(w)-u.stat(w)
approximates dist(w,w)/n. Sample N subwords
of length k, compute Y(w) and Y(w) Lemma
(Chernoff). Y(w) approximates u.stat(w). Corollar
y. Y(w)-Y(w) approximates dist(w,w)/n. Tester
1 If Y(w)-Y(w) lte. accept, else reject.
9Tester for W ? r (regular language)
Hu.stat(W) W in r is a union of
polytopes. 2 Polytopes for r.
Y(w)
Membership Tester
102. Equivalence Tester for regular properties
Time polynomial in mMax(A , B ) The exact
equivalence is PSPACE complete
113. Markov Decision Processes
- Policies s
- HR History dependent and Randomized,
- MR(k) Memory k, Randomized
- SD Stationary Deterministic
- Communicating MDP
- SD s(t)b, s(v)c
- Trace 1 abac ab abac ab .
- Trace 2 ab abac ab abac.
-
12Classical results k1
- State-action frequencies
- For a class K of strategies
- Theorem (Puterman, Derman, Tsitsiklis)
- For a communicating MDP,
-
13Generalization
- Theorem For a communicating MDP
H
x
14Existence of a strategy
- Input MDP, wn ,d, ?
- Theorem Existence of a strategy is PSPACE hard
but testable. - Tester Sample wn
- Estimate the dist to H (linear program)
H
x
15General MDPs
- Union of polytopes each H can be computed by a
linear program. - Threshold value for each component.
H2 .6
H1 .4
16Equivalence of MDPs
- Decide if the Polytopes are identical with
identical threshold values. - Equivalence Tester discretize the polytopes
with an e grid. Check mutual inclusion.
17Conclusion
- Testers for MDPs. Verify property such as
Almost surely there are less than 10 a - After an a, there is a b
-
- 2. Testers for probabilistic systems
- Approximate Probabilistic Membership
- Approximate Equivalence
- 3. VERAP http//www.lri.fr/mdr/verap/