Title: ICS280 Presentation by Suraj Nagasrinivasa
1ICS280 Presentationby Suraj Nagasrinivasa
- (1) Evaluating Probabilistic Queries over
Imprecise Data (SIGMOD 2003) - by R Cheng, D Kalashnikov, S Prabhakar
- (2) Model-Driven Data Acquisition in Sensor
Networks (VLDB 2004) - by A Deshpande, C Guestrin, J Hellerstein, W
Hong, S Madden - Acknowledgements Dmitri Kalashnikov and Michal
Kapalka
2In typical sensor applications...
- Sensors monitor external environment continuously
- Sensor readings are sent back to the application
- Decisions are often made based on these readings
3However, we face uncertainty
- Typically, DB/server collects sensor readings
- DB cannot store true sensor value at all points
in time - Scarce battery power
- Limited network bandwidth
- So, readings recorded at discrete time points
- Value of phenomenon continuously changing
- As a result, DB stored reading is mostly obsolete
4Scenario Answering Minimum Query with discrete
DB stored readings
Recorded Temperature
Current Temperature
x1
y0
- x0 lt y0 x is minimum
- y1 lt x1 y is minimum
- Wrong query result
x0
y1
x
y
5Scenario Answering Minimum Query with
error-bound readings I
Recorded Temperature
Bound for Current Temperature
y0
- x certainly gives the minimum temperature reading
x0
x
y
6Scenario Answering Minimum Query with
error-bound readings II
Recorded Temperature
Bound for Current Temperature
y0
- Both x and y have a chance of yielding the
minimum value - Which one has a higher probability?
x0
x
y
7Probabilistic Queries
- Based on variation characteristics of sensor
value over time - Bounds can be estimated for possible values
- Probability distribution of values defined within
bounds - Evaluate probability for query answers
- Probabilistic queries give a correct answer,
instead of a potentially incorrect answer
8Rest of the paper
- Notation Uncertainty Model
- Classification of Probabilistic Queries
- Evaluating Probabilistic Queries
- Quality of Probabilistic Queries
- Object Refreshment Policies
- Experimental Results
9Notation
- T A set of DB objects (e.g. sensors)
- a Dynamic attribute (e.g. pressure)
- Ti ith object of T
- Ti.a(t) Value of a in Ti at time t
10Uncertainty Model
fi(x,t) uncertainty pdf
Ti.a(t)
li(t)
ui(t)
Uncertainty Interval Ui(t)
- Can be extended in n dimensions
11Classification of Probabilistic Queries
- Type of Result
- Value-based returns single value
- E.g. Minimum query (l,u, pdf)
- Entity-based returns set of objects
- E.g. Range query ((Ti, pi), pigt0)
- Aggregation
- Non-Aggregate query result for an object is
independent of other objects - E.g. Range query
- Aggregate query result computed from set of
objects - E.g. Nearest Neighbor query
12Classification of Probabilistic Queries
Value-based answer Entity-based answer
Non-aggregate VSingleQ What is the temperature of sensor x? ERQ Which sensor has temperature between 10F and 30F?
Aggregate VAvgQ, VSumQ, VMinQ, VMaxQ What is the average temperature of the sensors? ENNQ, EMinQ, EMaxQ Which sensor gives the highest temperature?
- Query evaluation algorithms and quality metrics
are developed for each class
13ENNQ algorithmProjection, Pruning, Bounding
Evaluation
14ENNQ algorithm
15Quality of Probabilistic Result
- Introduce a notion of quality of answer
- Proposed metrics for different classes of queries
- regular range query
- "yes" or "no" with 100
- probabilistic query ERQ
- yes with pi 95 OK
- yes with pi 5 OK (95 it is not in l, u)
- yes with pi 50 NOT OK (not certain!)
16Quality for Entity-Aggregate Queries
- "Which sensor, among n, has the minimum reading?"
- Recall
- Result set R (Ti, pi)
- e.g. (T1, 30), (T2, 40), (T3, 30)
- B is interval, bounding all possible values
- e.g. minimum is somewhere in B 10,20
- Our metrics for aggregate queries Min, Max, NN
- objects cannot be treated independently as in ERQ
metric - uniform distribution (in result set) is the worst
case - metrics are based on entropy
17Quality for Entity-Aggregate Queries
- H(X) entropy of random variable X (X1 ,,Xn with
p(X1) ,, p(Xn)) - entropy is smallest (i.e., 0) iff ? i p(Xi) 1
- entropy is largest (i.e., log2(n)) iff all Xi's
are equally likely
18Improving Answer Quality
- Is important to pick right update policies that
will help improve answer quality - Global Choice
- Glb_RR (pick random)
- Local Choice
- Loc_RR (pick random)
- MaxUnc (heuristic chooses max. uncertainty
interval ) - MinExpEntropy (heuristic choose object with
minimum expected entropy)
19Experiments Simulation Set-up
- 1 server, 1000 sensors, limited network
bandwidth, Min queries tested - Queries arrival is a Poisson distribution
- Each query over a random set of 100 sensors
20Results
21Conclusions
- Probabilistic Querying for handling inherent
uncertainty in sensor DBs - Classification, Algorithms and Quality of Answer
metrics for various query types - Very general model of uncertainty which makes the
algorithms not directly implement-able in any
sensor network - Besides, in order to achieve any reasonable
energy-efficiency in sensor networks, application
and network requirements that dictate sensor
nodes to be awake have to be tightly coordinated.
Especially in the case of multi-hop routing
22Outline for Model Driven Data Acquisition for
Sensor Networks
- Introduction
- Motivation for Model-Based Queries
- Framework Concept
- Model Example Multivariate Gaussian
- Algorithm
- Resolving Model-Based Queries
- Incorporating Dynamicity
- Observation Plan / Cost model
- Experiments
- BBQ System
- Results
- Conclusions
23Motivation for Model-Based Queries
- Declarative Queries adopted as key programming
paradigm for large sensor nets - However, interpreting sensor nets as databases
results in two major problems - Misinterpretation of Data
- Physically observable world is a set of
continuous phenomenon in both time and space - Sensor readings are UNLIKELY to be random samples
- Inefficient approximate queries
- If sensor readings are not true values, need
for quantifying uncertainty to provide reliable
answers
24Motivation for Model-Based Queries
- Paper Contribution To incorporate statistical
models of real-world processes into sensor net
query processing architecture - Models help in
- Accounting for biases in spatial sampling
- Identifying sensors providing faulty data
- Extrapolating values for missing sensors
25Framework Concept
- Goal Given a query and model, to devise an
efficient data acquisition plan to provide best
possible answer - Major dependencies
- Correlations between sensors captured by the
statistical model - Correlation between attributes for given sensor
- Correlation between sensors for given attribute
- Specific connectivity of the wireless network
26Framework ConceptObservation Plan parameters
Correlations in Value Cost Differential
27Framework Concept
28Model Example Multivariate Gaussian
29Resolving Model-Based Queries (Range Queries)
30Resolving Model-Based Queries(Value Queries)
- To compute value of Xi with maximum error e and
confidence 1-delta - Compute mean of Xi (where o observations)
- As in range queries, find probability
31Range Queries for Gaussian
- Projection for Gaussian is simple just drop
unnecessary values from mean and variance matrix - The integral
- has to be computed.
32Incorporating Dynamicity
- Use historical measurements to improve confidence
of answers - Given pdf in time t
- Compute pdf at time t1
33Incorporating Dynamicity
- Assumption Markovian Model
- Dynamicity summarized by transition model
34Observation Plan / Cost Model
- What is the cost of making o observations?
- C(o) acquisition cost transmission cost
- Acquisition cost constant for each attribute
- Transmission cost
- Network graph
- Edge weights (link quality)
- Paths taken could be sub-optimal
35Observation Plan / Cost Model
- A set of attributes (theta) to observe are
determined by computing expected benefit - And finding
- This, being similar to the traveling salesmans
problem, is best dealt with heuristic algorithms
36BBQ System
- BBQ A Tiny-Model Query System
- Uses Multivariate Gaussians
- Has 24 transition models for different hour of
day
37Results
- Experiment 11 sensors on a tree, 83000
measurements, 2/3 used for training and 1/3 for
tests - Methodology
- BBQ builds a model based on training data
- One random query / hour taken possible
observations and model is updated - The answer is compared to the measured value
- Compare with two other methods
- TinyDB Each query broadcasted over sensor
networks using an overlay tree - Approximate-Caching Base station maintains a
view of the sensor readings
38Results
39Results
40Conclusion
- Approximate queries can be well optimized, but
model of physical phenomenon is needed - Defining an appropriate model is a challenge
- The framework works well for fairly steady
sensor data values - Statistical model is largely static with
refinements to the model based on incoming
queries and observations made as a result