ICS280 Presentation by Suraj Nagasrinivasa - PowerPoint PPT Presentation

About This Presentation

Title:

ICS280 Presentation by Suraj Nagasrinivasa

Description:

ICS280 Presentation by Suraj Nagasrinivasa (1) Evaluating Probabilistic Queries over Imprecise Data (SIGMOD 2003) by R Cheng, D Kalashnikov, S Prabhakar – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 41

Provided by: ALEX1167

Learn more at: https://ics.uci.edu

Category:

more less

Transcript and Presenter's Notes

Title: ICS280 Presentation by Suraj Nagasrinivasa

1
ICS280 Presentationby Suraj Nagasrinivasa

(1) Evaluating Probabilistic Queries over
Imprecise Data (SIGMOD 2003)
by R Cheng, D Kalashnikov, S Prabhakar
(2) Model-Driven Data Acquisition in Sensor
Networks (VLDB 2004)
by A Deshpande, C Guestrin, J Hellerstein, W
Hong, S Madden
Acknowledgements Dmitri Kalashnikov and Michal
Kapalka

2
In typical sensor applications...

Sensors monitor external environment continuously
Sensor readings are sent back to the application
Decisions are often made based on these readings

3
However, we face uncertainty

Typically, DB/server collects sensor readings
DB cannot store true sensor value at all points
in time
Scarce battery power
Limited network bandwidth
So, readings recorded at discrete time points
Value of phenomenon continuously changing
As a result, DB stored reading is mostly obsolete

4
Scenario Answering Minimum Query with discrete
DB stored readings
Recorded Temperature
Current Temperature
x1
y0

x0 lt y0 x is minimum
y1 lt x1 y is minimum
Wrong query result

x0
y1
x
y
5
Scenario Answering Minimum Query with
error-bound readings I
Recorded Temperature
Bound for Current Temperature
y0

x certainly gives the minimum temperature reading

x0
x
y
6
Scenario Answering Minimum Query with
error-bound readings II
Recorded Temperature
Bound for Current Temperature
y0

Both x and y have a chance of yielding the
minimum value
Which one has a higher probability?

x0
x
y
7
Probabilistic Queries

Based on variation characteristics of sensor
value over time
Bounds can be estimated for possible values
Probability distribution of values defined within
bounds
Evaluate probability for query answers
Probabilistic queries give a correct answer,
instead of a potentially incorrect answer

8
Rest of the paper

Notation Uncertainty Model
Classification of Probabilistic Queries
Evaluating Probabilistic Queries
Quality of Probabilistic Queries
Object Refreshment Policies
Experimental Results

9
Notation

T A set of DB objects (e.g. sensors)
a Dynamic attribute (e.g. pressure)
Ti ith object of T
Ti.a(t) Value of a in Ti at time t

10
Uncertainty Model
fi(x,t) uncertainty pdf
Ti.a(t)
li(t)
ui(t)
Uncertainty Interval Ui(t)

Can be extended in n dimensions

11
Classification of Probabilistic Queries

Type of Result
Value-based returns single value
E.g. Minimum query (l,u, pdf)
Entity-based returns set of objects
E.g. Range query ((Ti, pi), pigt0)
Aggregation
Non-Aggregate query result for an object is
independent of other objects
E.g. Range query
Aggregate query result computed from set of
objects
E.g. Nearest Neighbor query

12
Classification of Probabilistic Queries
Value-based answer Entity-based answer
Non-aggregate VSingleQ What is the temperature of sensor x? ERQ Which sensor has temperature between 10F and 30F?
Aggregate VAvgQ, VSumQ, VMinQ, VMaxQ What is the average temperature of the sensors? ENNQ, EMinQ, EMaxQ Which sensor gives the highest temperature?

Query evaluation algorithms and quality metrics
are developed for each class

13
ENNQ algorithmProjection, Pruning, Bounding
Evaluation
14
ENNQ algorithm
15
Quality of Probabilistic Result

Introduce a notion of quality of answer
Proposed metrics for different classes of queries

regular range query
"yes" or "no" with 100
probabilistic query ERQ
yes with pi 95 OK
yes with pi 5 OK (95 it is not in l, u)
yes with pi 50 NOT OK (not certain!)

16
Quality for Entity-Aggregate Queries

"Which sensor, among n, has the minimum reading?"
Recall
Result set R (Ti, pi)
e.g. (T1, 30), (T2, 40), (T3, 30)
B is interval, bounding all possible values
e.g. minimum is somewhere in B 10,20
Our metrics for aggregate queries Min, Max, NN
objects cannot be treated independently as in ERQ
metric
uniform distribution (in result set) is the worst
case
metrics are based on entropy

17
Quality for Entity-Aggregate Queries

H(X) entropy of random variable X (X1 ,,Xn with
p(X1) ,, p(Xn))
entropy is smallest (i.e., 0) iff ? i p(Xi) 1
entropy is largest (i.e., log2(n)) iff all Xi's
are equally likely

18
Improving Answer Quality

Is important to pick right update policies that
will help improve answer quality
Global Choice
Glb_RR (pick random)
Local Choice
Loc_RR (pick random)
MaxUnc (heuristic chooses max. uncertainty
interval )
MinExpEntropy (heuristic choose object with
minimum expected entropy)

19
Experiments Simulation Set-up

1 server, 1000 sensors, limited network
bandwidth, Min queries tested
Queries arrival is a Poisson distribution
Each query over a random set of 100 sensors

20
Results
21
Conclusions

Probabilistic Querying for handling inherent
uncertainty in sensor DBs
Classification, Algorithms and Quality of Answer
metrics for various query types
Very general model of uncertainty which makes the
algorithms not directly implement-able in any
sensor network
Besides, in order to achieve any reasonable
energy-efficiency in sensor networks, application
and network requirements that dictate sensor
nodes to be awake have to be tightly coordinated.
Especially in the case of multi-hop routing

22
Outline for Model Driven Data Acquisition for
Sensor Networks

Introduction
Motivation for Model-Based Queries
Framework Concept
Model Example Multivariate Gaussian
Algorithm
Resolving Model-Based Queries
Incorporating Dynamicity
Observation Plan / Cost model
Experiments
BBQ System
Results
Conclusions

23
Motivation for Model-Based Queries

Declarative Queries adopted as key programming
paradigm for large sensor nets
However, interpreting sensor nets as databases
results in two major problems
Misinterpretation of Data
Physically observable world is a set of
continuous phenomenon in both time and space
Sensor readings are UNLIKELY to be random samples
Inefficient approximate queries
If sensor readings are not true values, need
for quantifying uncertainty to provide reliable
answers

24
Motivation for Model-Based Queries

Paper Contribution To incorporate statistical
models of real-world processes into sensor net
query processing architecture
Models help in
Accounting for biases in spatial sampling
Identifying sensors providing faulty data
Extrapolating values for missing sensors

25
Framework Concept

Goal Given a query and model, to devise an
efficient data acquisition plan to provide best
possible answer
Major dependencies
Correlations between sensors captured by the
statistical model
Correlation between attributes for given sensor
Correlation between sensors for given attribute
Specific connectivity of the wireless network

26
Framework ConceptObservation Plan parameters
Correlations in Value Cost Differential
27
Framework Concept
28
Model Example Multivariate Gaussian
29
Resolving Model-Based Queries (Range Queries)
30
Resolving Model-Based Queries(Value Queries)

To compute value of Xi with maximum error e and
confidence 1-delta
Compute mean of Xi (where o observations)
As in range queries, find probability

31
Range Queries for Gaussian

Projection for Gaussian is simple just drop
unnecessary values from mean and variance matrix
The integral
has to be computed.

32
Incorporating Dynamicity

Use historical measurements to improve confidence
of answers
Given pdf in time t
Compute pdf at time t1

33
Incorporating Dynamicity

Assumption Markovian Model
Dynamicity summarized by transition model

34
Observation Plan / Cost Model

What is the cost of making o observations?
C(o) acquisition cost transmission cost
Acquisition cost constant for each attribute
Transmission cost
Network graph
Edge weights (link quality)
Paths taken could be sub-optimal

35
Observation Plan / Cost Model

A set of attributes (theta) to observe are
determined by computing expected benefit
And finding
This, being similar to the traveling salesmans
problem, is best dealt with heuristic algorithms

36
BBQ System

BBQ A Tiny-Model Query System
Uses Multivariate Gaussians
Has 24 transition models for different hour of
day

37
Results

Experiment 11 sensors on a tree, 83000
measurements, 2/3 used for training and 1/3 for
tests
Methodology
BBQ builds a model based on training data
One random query / hour taken possible
observations and model is updated
The answer is compared to the measured value
Compare with two other methods
TinyDB Each query broadcasted over sensor
networks using an overlay tree
Approximate-Caching Base station maintains a
view of the sensor readings

38
Results
39
Results
40
Conclusion

Approximate queries can be well optimized, but
model of physical phenomenon is needed
Defining an appropriate model is a challenge
The framework works well for fairly steady
sensor data values
Statistical model is largely static with
refinements to the model based on incoming
queries and observations made as a result

Write a Comment

User Comments (0)