Markov Decision Processes: Approximate Equivalence - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Markov Decision Processes: Approximate Equivalence

Description:

Property Testing and its connection to Learning and Approximation. ... Union of polytopes: each H can be computed by a linear program. ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 18

Provided by: lri76

Category:

more less

Transcript and Presenter's Notes

Title: Markov Decision Processes: Approximate Equivalence

1
Markov Decision Processes Approximate Equivalence

Michel de Rougemont
Université Paris II LRI
http//www.lri.fr/mdr/

2
The world of MDPs

Follow-up of On the complexity of partially
observed markov decision processes, 1996, D.
Burago, Anatol, Mdr
What is robustness? Deviation model in the 1990s.
Distance on runs in the 2000s
Efficient Distance of a run to an MDP
Approximate Comparison of MDPs
Statistics Analysis of Probabilistic Processes,
(LICS 2009 with Mathieu Tracol)

3
M.D.P

S States s,t,u,v
S actions a,b,c
P(u t,b)0.
Policy s resolves the
non-determinism.
Example s(t)b, s(v)c
Run s,t,a,u,a,v
Trace aba

4
This talk

Approximation of Decision problems Property
Testing
Non deterministic Automata Tester for membership
and equivalence.
Markov Decision Processes Tester for the
Existence of Strategies, and Equivalence

5
1. Testers on a class K

Let F be a property on a class K of structures
U
An e -tester for F is a probabilistic algorithm
A such that
If U F, A accepts
If U is e far from F, A rejects with high
probability
F is testable if there is a probabilistic
algorithm A such that
A is an e -tester for all e
Time(A) is independent of nsize(U).
Robust characterizations of polynomials, R.
Rubinfeld, M. Sudan, 1994
Property Testing and its connection to Learning
and Approximation. O. Goldreich, S. Goldwasser,
D. Ron, 1996.
Tester usually implies a linear time corrector.
(e1, e2)-Tolerant Tester

6
Edit Distances with Moves on Strings

Classical Edit DistanceInsertions, Deletions,
Modifications
Edit Distance with moves dist(w,w)
0111000011110011001
0111011110000011001
3. Edit Distance with Moves generalizes to
Ordered Trees

7
Uniform statistics k-gram
W001010101110 length n, u.stat any
subwords of length k, n-k1 blocks, shingles

8
Tester for equality
Edit distance with moves. NP-complete problem,
but approximable in constant time with additive
error. Uniform statistics ( )
W001010101110 Theorem 1. u.stat(w)-u.stat(w)
approximates dist(w,w)/n. Sample N subwords
of length k, compute Y(w) and Y(w) Lemma
(Chernoff). Y(w) approximates u.stat(w). Corollar
y. Y(w)-Y(w) approximates dist(w,w)/n. Tester
1 If Y(w)-Y(w) lte. accept, else reject.
9
Tester for W ? r (regular language)
Hu.stat(W) W in r is a union of
polytopes. 2 Polytopes for r.
Y(w)
Membership Tester
10
2. Equivalence Tester for regular properties
Time polynomial in mMax(A , B ) The exact
equivalence is PSPACE complete
11
3. Markov Decision Processes

Policies s
HR History dependent and Randomized,
MR(k) Memory k, Randomized
SD Stationary Deterministic
Communicating MDP

SD s(t)b, s(v)c
Trace 1 abac ab abac ab .
Trace 2 ab abac ab abac.

12
Classical results k1

State-action frequencies
For a class K of strategies
Theorem (Puterman, Derman, Tsitsiklis)
For a communicating MDP,

13
Generalization

Theorem For a communicating MDP

H
x
14
Existence of a strategy

Input MDP, wn ,d, ?
Theorem Existence of a strategy is PSPACE hard
but testable.
Tester Sample wn
Estimate the dist to H (linear program)

H
x
15
General MDPs

Union of polytopes each H can be computed by a
linear program.
Threshold value for each component.

H2 .6
H1 .4
16
Equivalence of MDPs

Decide if the Polytopes are identical with
identical threshold values.
Equivalence Tester discretize the polytopes
with an e grid. Check mutual inclusion.

17
Conclusion

Testers for MDPs. Verify property such as
Almost surely there are less than 10 a
After an a, there is a b
2. Testers for probabilistic systems
Approximate Probabilistic Membership
Approximate Equivalence
3. VERAP http//www.lri.fr/mdr/verap/

Write a Comment

User Comments (0)