Title: Near-optimal Observation Selection using Submodular Functions
1Near-optimal Observation Selectionusing
Submodular Functions
- Andreas Krause
- joint work with Carlos Guestrin (CMU)
2River monitoring
Mixing zone of San Joaquin and Merced rivers
- Want to monitor ecological condition of rivers
and lakes - Which locations should we observe?
3Water distribution networks
- Pathogens in water can affect thousands (or
millions) of people - Currently Add chlorine to the source and hope
for the best - Sensors in pipes could detect pathogens quickly
- 1 Sensor 5,000 (just for chlorine)
deployment, mainten. - ? Must be smart about where to place sensors
- Battle of the Water Sensor Networks challenge
- Get model of a metropolitan area water network
- Simulator of water flow provided by the EPA
- Competition for best placements
- Collaboration with VanBriesen et al (CMU Civil
Engineering)
4Fundamental questionObservation Selection
- Where should we observe to monitor complex
phenomena? - Salt concentration / algae biomass
- Pathogen distribution
- Temperature and light field
- California highway traffic
- Weblog information cascades
-
5Spatial prediction
Observations A µ V
Prediction at unobservedlocations V\A
pH value
Unobserved Process (one pH value per location s
2 V)
Horizontal position
- Gaussian processes
- Model many spatial phenomena well Cressie 91
- Allow to estimate uncertainty in prediction
- Want to select observations minimizing
uncertainty - How do we quantify informativeness / uncertainty?
6Mutual information Caselton Zidek 84
- Finite set V of possible locations
- Find A µ V maximizing mutual information A
argmax MI(A) - Often, observations A are expensive
- ? constraints on which sets A we can pick
7Constraints for observation selection
- maxA MI(A) subject to some constraints on A
- What kind of constraints do we consider?
- Want to place at most k sensors A k
- or more complex constraints
Sensors need to communicate (form a tree)
Multiple robots(collection of paths)
- All these problems NP hard. Can only hope for
approximation guarantees!
8The greedy algorithm
- Want to find A argmaxAk MI(A)
- Greedy algorithm
- Start with A
- For i 1 to k
- s argmaxs MI(A s)
- A A s
- Problem is NP hard! How well can this simple
heuristic do?
9Performance of greedy
Optimal
Greedy
Temperature datafrom sensor network
- Greedy empirically close to optimal. Why?
10Key observation Diminishing returns
Placement A S1, S2
Adding S will help a lot!
Adding S doesnt help much
New sensor S
Theorem UAI 2005, M. Narasimhan, J.
Bilmes Mutual information is submodular For A µ
B, MI(A S) MI(A) MI(B S)- MI(B)
11Cardinality constraints
- Theorem ICML 2005, with Carlos Guestrin, Ajit
Singh - Greedy MI algorithm provides constant factor
approximation placing k sensors, 8 ?gt0
Proof invokes fundamental result by Nemhauser et
al 78 on greedy algorithm for submodular
functions
12Myopic vs. Nonmyopic
- Approaches to observation selection
- Myopic Only plan ahead on the next observation
- Nonmyopic Look for best set of observations
- For finding best k observations, myopic greedy
algorithm gives near-optimal nonmyopic results! ? - What about more complex constraints?
- Communication constraints
- Path constraints
13Communication constraintsWireless sensor
placements should
- be very informative (high mutual information)
- Low uncertainty at unobserved locations
- have low communication cost
- Minimize the energy spent for communication
14Naive, myopic approach Greedy-connect
- Simple heuristic Greedily optimize information
- Then connect nodes to minimize communication cost
efficientcommunication! Not veryinformative ?
relay node
Most informative
relay node
Secondmost informative
Want to find optimal tradeoff between information
and communication cost
Greedy-Connect can select sensors far apart
15The pSPIEL Algorithm with Guestrin, Gupta,
Kleinberg IPSN 06
- pSPIEL Efficient nonmyopic algorithm
- (padded Sensor Placements at Informative and
cost-Effective Locations) - In expectation, both mutual information and
communication cost will be close to optimum
16Our approach pSPIEL
- Decompose sensing region into small,
well-separated clusters - Solve cardinality constrained problem per cluster
- Combine solutions using k-MST algorithm
1
2
C1
C2
C4
C3
17Guarantees for pSPIEL
Theorem pSPIEL finds a tree T with mutual
information MI(T) ?(1)
OPTMI,communication cost C(T) O(log V)
OPTcost IPSN06, with Carlos Guestrin, Anupam
Gupta, Jon Kleinberg
18Prototype implementation
- Implemented on Tmote Sky motes from MoteIV
- Collect measurement and link information and
send to base station
19Proof of concept study
- Learned model from short deployment of 46 sensors
at the Intelligent Workplace - Manually selected 20 sensorsUsed pSPIEL to
place 12 and 19 sensors - Compared prediction accuracy
Initial deployment and validation set
Time
20(No Transcript)
21(No Transcript)
22(No Transcript)
23Proof of concept study
better
better
Communication cost (ETX)
24Path constraints
Outline ofLake Fulmor
- Want to plan informative paths
- Find collection of paths P1,,Pk s.t.
- MI(P1 Pk) is maximized
- Length(Pi) B
25Naïve, myopic algorithm
Most informativeobservation
Waste (almost)all fuel!
Have to go backwithout furtherobservations
- Go to most informative reachable observations
- Again, the naïve myopic approach can fail badly!
- Looking at benefit cost-ratio doesnt help either
- Can get nonmyopic approximation algorithmwith
Amarjeet Singh, Carlos Guestrin, William Kaiser,
IJCAI 07
26Comparison with heuristic
Submodularpath planning
More informative
Known heuristic Chao et. al 96
- Approximation algorithm outperforms
state-of-the-art heuristic for orienteering
27Submodular observation selection
- Many other submodular objectives (other than MI)
- Variance reduction F(A) Var(Y) Var(Y
A) - (Geometric) coverage F(A) area covered
- Influence in social networks (viral marketing)
- Size of information cascades in blog networks
-
- Key underlying problemConstrained maximization
of submodular functions - Our algorithms work for any submodular function! ?
28Water Networks
- 12,527 junctions
- 3.6 million contaminationevents
- Place 20 sensors to
- Maximize detection likelihood
- Minimize detection time
- Minimize population affected
-
Theorem All these objectives are submodular! ?
29Bounds on optimal solution
Penalty reduction Higher is better
- Submodularity gives online bounds on the
performance of any algorithm
30Results of BWSN Ostfeld et al
- Multi-criterion optimization
- Ostfeld et al 07 count number of
non-dominated solutions
Author non-dom.(out of 30)
Krause et. al. 26
Berry et. al. 21
Dorini et. al. 20
Wu and Walski 19
Ostfeld and Salomons 14
Propato and Piller 12
Eliades and Polycarpou 11
Huang et. al. 7
Guan et. al. 4
Ghimire and Barkdoll 3
Trachtman 2
Gueli 2
Preis and Ostfeld 1
31Conclusions
- Observation selection is an important AI problem
- Key algorithmic problem Constrained maximization
of submodular functions - For budgeted placements, greedy is near-optimal!
- For more complex constraints (paths, etc.)
- Myopic (greedy) algorithms fail ?
- presented near-optimal nonmyopic algorithms ?
- Algorithms perform well on several real-world
observation selection problems