Title: How Optimized Environmental Sensing helps address Information Overload
1How Optimized Environmental Sensing helps
address Information Overload
2THANK YOU!!!!
- Coauthors
- Daphne Koller, Ron Parr, Joe Hellerstein, CMU
colleagues - IJCAI and AI researchers
- Funding NSF, Intel, ONR, ARO, DARPA, Sloan
Foundation, IBM - Kristina Woods, Family Friends
3How Optimized Environmental Sensing helps
address Information Overload
4We are having a devastating effect on our
environment
AI
how can we help tackle these challenges?
5Monitoring algal blooms
- Algal blooms threaten freshwater
- 4 million people without water
- 1300 factories shut down
- 14.5 billion to clean up
- Other occurrences in Australia, Japan, Canada,
- Brazil, Mexico, Great Britain, Portugal, Germany
- Growth processes still unclear Carmichael
Tai Lake China 10/07 MSNBC
6Monitoring rivers and lakes
Singh, Krause, G., Kaiser 07
- Monitor large spatial phenomena
NIMSKaiseret.al.(UCLA)
robotic sensor
can only make few observations
prediction
Where should we sense to get most accurate
predictions?
7Water distribution networks
- Water distribution in a city ? very complex
system - Pathogens in water can affect thousands (or
millions) of people - Currently Add chlorine to the source and hope
for the best
Simulator from EPA
8Monitoring water networksKrause, Leskovec, G.,
Faloutsos, VanBriesen 08
Sensors
Simulator from EPA
14K
- Place sensors to detect contaminations
- Battle of the Water Sensor Networks competition
Where should we place sensors to detect
contaminations quickly ?
9Think globally, Act locally
mustgetinformed!
must make betterpersonal choices!
10Think globally, Act locally
information overload!!! ? where do I start?what
should I read?
11Sensing problems
- Want to learn something about the world
- predict algal blooms, decrease carbon footprint,
- Can choose (partial) observations
- make measurements, read papers,
- but they are expensive / limited
- hardware cost, power consumption, personal time
-
- Want get most useful information at lowest cost!
Fundamental AI problemWhat information should I
use to learn ?
12Many apps for optimizing info
each app requires its own AI hero ? designing AI
hero is time consuming ? doesnt scale! ?
13The quest for the optimization narrow waist
reducingenergyconsumption
findingthe rightinformation
detectingcontaminations
predictingpostures
monitoringalgal blooms
Narrow Waist common framework addressing all
apps
14Related work
- Sensing problems considered in
- AI Value of Information (Howard 66,),
Experimental design (Lindley 56, Robbins 52),
Spatial statistics (Cressie 91, ), Machine
Learning active learning (MacKay 92, ),
Robotics (Feder et al. 99, ), Sensor Networks
(Zhao et al 04, ), Operations Research
(Nemhauser 78, ) - Existing algorithms typically
- Heuristics No guarantees! Can do arbitrarily
badly. - Find optimal solutions (Mixed integer
programming, POMDPs) Very difficult to scale to
bigger problems.
leading to many point solutionsrather than
narrow waist
15This work
Theoretical Approximation algorithms that
have theoretical guarantees and scale to
large problems Applied Empirical studies
with real deployments and large datasets
16Model-based sensing
- For each subset A Í V compute sensing quality
F(A)
Model predicts High impact
Low impactlocation
Contamination
Medium impactlocation
S3
S1
S2
S3
S4
S2
Sensor reducesimpact throughearly detection!
S1
S4
Set V of all network junctions
S1
Low sensing quality F(A)0.01
High sensing quality F(A) 0.9
17The quest for the optimization narrow waist
18Sensor placement
- Given set V of locations, sensing quality F
- Want AÍ V such that
-
-
- Typically NP-hard!
-
- How well can this simple heuristic do?
must discover moreuseful structure
S3
S1
S5
S2
general constrained optimization too broad an
intractable narrow waist
S4
Greedy algorithm Start with A Ø For i 1 to
k si argmaxs F(A ? s) A A ? si
S6
19Performance of greedy algorithm
optimal
greedy
population protected (higher is better)
Small subset of Water networks data
number of sensors placed
- greedy score empirically close to optimal
- why?
20Key property diminishing returns
placement A S1, S2
placement B S1, S2, S3, S4
S2
S2
S3
S
S1
S1
S4
Theorem Krause, Leskovec, G., Faloutsos,
VanBriesen 08 sensing quality F(A) in water
networks is submodular!
Adding S will help a lot!
Adding S doesnt help much
S
New sensor S
S
B . . . . .
large improvement
A
submodularity
S
small improvement
For AÍB, F(A ? S) F(A) F(B ? S) F(B)
21One reason submodularity is useful
- Theorem Nemhauser et al 78
- Greedy algorithm gives constant factor
approximation - F(Agreedy) (1-1/e) F(Aopt)
- Greedy algorithm gives near-optimal solution!
- Guarantees best possible unless P NP!
- Many more reasons, sit back and relax
22Building a sensing chair Mutlu, Krause,
Forlizzi, G., Hodgins 07
- People sit a lot
- Activity recognition inassistive technologies
- Seating pressure as user interface
equipped with 1 sensor per cm2!
sheet costs 6,000! ?
can we get similar accuracy with fewer, cheaper
sensors?
82 accuracy on 10 postures! Tan
et al
23How to place sensors on a chair?
- Predict posture Y Possible locations V
- Goal minimize uncertainty in prediction ?
maximize information gain (IG)
Possible locations V
Theorem information gain is submodular!
Krause, G. 05
Placed sensors, did a user study
random placement 53 uniform Placement 73
similar accuracy at lt2 of cost!
See store for details
24An efficient optimization narrow waist
some powerfulstructure!!! ?
25Battle of the Water Sensor Networks Competition
- Real metropolitan area network (12,527 nodes)
- Water flow simulator provided by EPA
- 3.6 million contamination events
- Place sensors that detect well on average
26BWSN competition resultsKrause, Leskovec, G.,
Faloutsos, VanBriesen 08
- 13 participants
- Performance measured in 30 different criteria
G Genetic algorithm
D Domain knowledge
H Other heuristic
E Exact method (MIP)
24 better performance than runner-up! ?
27Not just about theorem
3.6M contaminations
evaluation of F(A) very slow ?
30 hours/20 sensors
lower is better
6 weeks for all30 settings ?
CELF algorithm using lazy evaluations
same theoretical guarantees 1 hour/20
sensors Done after 2 days! ?
28Robustness against adversaries
effective protection against random
contaminations!
optimize robot path tomonitor algal blooms
sensor locations known ? adversary attacks
vulnerable spot
model incorrect or dynamics change ? must be
robust to changes
29Optimizing for the worst case
- Utility function Fi for each contamination
location i
worst-case score minF1(A),F2(A)
low minF1(B),F2(B) high
contamination at location 1
sensors A
F1(A) is high
F1(B) is high
sensors B
how can we solve this robust sensing problem?
contamination at location 2
F2(B) is high
F2(A) is low
30How does the greedy algorithm do?
V , ,
Theorem the problem maxA k mini Fi(A) does
not admit any approximation unless PNP
Greedy score ?
Can only buy k2
Greedy picks first
huge gap
Then, canchoose only or
hence we cant find any approximation
algorithm.
Optimal score1
Optimalsolution
or can we?
- ? Greedy arbitrarily bad. Something better?
31Alternative formulation
what if we are toldthe optimal value c
easier? yes! ? if we relax constraint A k
32Solving alternative formulation
Favg,c c ? every Fi c
Fi,c(A)
always Fi,c c
33Solving alternative formulation
34Solving alternative formulation
submodular!!! ? can use greedy alg.!
equivalent
35Back to our example
- Guess c1
- First pick
- Then pick
- ? Optimal solution!
-
how do we find c?
binary search!
36Theoretical guarantees
Krause, McMahan, G., Gupta 07
Theorem the problem maxA k mini Fi(A) does
not admit any approximation unless PNP ?
Theorem SATURATE finds a solution AS mini
Fi(AS) OPTk and AS ? k
? O(log V)
optimal valuefor k sensors
using ? k sensors
Theorem no algorithm better factor ?lt?, unless
NP ? DTIME(nlog log n)
37Example Minimax Kriging for lake monitoring
- Monitor pH values using robotic sensor
if optimize for average error
but may be importantto understand algal bloom
optimize WORST-CASE error throughout lake
.guarantees uniformly good predictions A ROBUST
SUBMODULAR SENSING PROBLEM ?
38Comparison with state of the art Minimax Kriging
- algorithm used in geostatistics simulated
annealing - Sacks Schiller 88, van Groeningen Stein
98, Wiens 05, - 7 parameters that need to be fine-tuned
better
Saturate is competitive 10x faster No
parameters to tune!
Precipitation data
Environmental monitoring
39Results on water networks
3000
no decreaseuntil allcontaminationsdetected!
2500
2000
maximum detection time (minutes)
1500
lower is better
1000
500
water networks
0
0
10
20
number of sensors
- 60 lower worst-case detection time!
40Reduction to submodular optimization
41An efficient optimization narrow waist
shameless plug www.submodularity.org tutorial
yesterday!! slides video available Matlab
toolbox
all these apps involve physical sensing now for
something completely different lets jump from
water
42 to the Web!
77 read blogs184M blogs Universal McCann,
08
- you have 10 minutes a day to read blogs / news
- which of the million blogs should you read?
43Information Cascades Leskovec,
Krause, G., Faloutsos, VanBriesen, Glance 07
learn aboutstory after us!
time
information cascade
a good blog learns about big cascades early on
44Water vs. Web
placing sensors inwater networks
selectinginformative blogs
vs.
- want to pick nodes to detect big cascades early
in both apps, utility functions submodular ?
45Performance on Blog selection
blog selection 45k blogs
- outperforms state-of-the-art heuristics
- 700x speedup using submodularity!
46Taking attention into account
- Naïve approach Just pick 10 best blogs
- Selects big, well known blogs (Instapundit, etc.)
- These contain many posts, take long to read!
cost/benefitanalysis
ignoring cost
x 104
cost-benefit optimization picks summarizer blogs!
47No particular blogs are good for me
184M blogs
k important Blogs
submodularoptimization
- many topics in blogosphere
- a set of blogs may not represent your interests
48Do I care about the most common stories?
- 1M blog posts published every day
- Some stories become disproportionately popular
- Hard to find information you care about
rather than selecting blogs ? if you have 10
minutes a day which stories should you read?
49Our goal coverage
- Turn down the noise in the blogosphere
- select a small set of posts that covers the most
important stories
January 17, 2009
50Our goal coverage
- Turn down the noise in the blogosphere
- select a small set of posts that covers the most
important stories
51Our goal personalization
- Tailor post selection to user tastes
posts selected without personalization
After personalization based on Zidanes feedback
52Personalize postingsEl-Arini, Veda, Shahaf, g.
09
Blogosphere
personalizedpostselection
personalizedcoverage fn.
personalization
sit back and enjoy ? its a submodular
optimization problem!
www.TurnDownTheNoise.com
53The power of the efficient narrow waist
54Finding exploiting structure in AI
narrow waist in AI ? discovering key
structure ? scale-up to real-world
motivated by real-world problems challenge
problems Stone 07
55Structural insights for challenges of next decade
what are the challenges for AI???
your structural insight here submodularity ?
56Building up AI
The Foundationscomputing sensing
57The basic foundations of AI are changing
exponentially increasing
processor speed GHz
release date
58The basic foundations of AI are changing
unique new challenges ? e.g., parallel
probabilistic inference Gonzalez, Low, G.
09 e.g., distributed algorithms for learning
over massive datasets on huge computer
clusters but, enable new apps ? new
intelligence, impact our daily lives!
new structure new pillars
59Opportunity for new applications of AI
are there more significant challenges for AI in
the coming decade?
60information overload!!!
its about the democratic process how can we
understand our impact on the world, and make
better decisions and its about science
2000 in US
its not (just) about the web
61The explosion of AI research
exploding number complexity of AI papers 2500
AI-like conferences 2.8M CS papers who can keep
up with all these results???
IJCAI 69
As long as the centuriesunfold, the number of
books will grow continually as convenient to
search for a bit of truth concealed in nature
as to find it hidden away in an immense
multitude of bound volumes -Dennis Diderot,
Encyclopédie (1755)
62Keyword search is not enough
AI can help detect and represent structure in
information, while providingnatural access to
sources
63The research landscape
can AI provide even more structure?
64An example of a structured view
65An example of a structured view
deeper understanding ? address information
overload challenge for AI build structured
view automatically!
is supported by
is disputed by
66Today, the narrow waist
optimizeinformation gathering
submodularityefficient optimizationnarrow waist
motivated by real-world problems
67A step towards huge AI challenges for next decade