Data Quality and Query Cost in Pervasive Sensing Systems

1 / 27
About This Presentation
Title:

Data Quality and Query Cost in Pervasive Sensing Systems

Description:

www.cs.columbia.edu – PowerPoint PPT presentation

Number of Views:4
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Data Quality and Query Cost in Pervasive Sensing Systems


1
Data Quality and Query Cost in Pervasive Sensing
Systems
  • David J. Yates

Bentley CollegeComputer Information Systems
Dept. Waltham, Massachusetts, USA dyates_at_bentley.e
du
2
Joint Work With
  • Erich Nahum
  • IBM T.J. Watson Research Center
  • 19 Skyline Drive
  • Hawthorne, New York, USA
  • James Kurose and Prashant Shenoy
  • Dept. of Computer Science
  • University of Massachusetts
  • Amherst, Massachusetts, USA

3
Talk Outline
  • Data quality and query cost for pervasive sensing
    systems
  • Motivation and introduction
  • Pervasive sensing applications
  • Resource-constrained sensor fields
  • Sensor networks and backbone networks
  • Data management techniques to conserve resources
  • Sensor network data server and cache
  • Query cost, data quality, delay, value deviation
  • Cost and quality performance
  • Summary and Conclusions

4
Research Contributions
  • Define and quantify data quality and query cost
    performance in pervasive sensing systems
  • Develop policies that approximate sensor field
    values using cached values for nearby locations
  • Prove analytic upper bound on sensor field query
    rate
  • Show cost and quality win-win for pervasive
    sensing applications for which response time is
    most important
  • Show cost vs. quality tradeoff for sensing
    applications for which accuracy is most important
  • Results are robust with respect to the manner in
    which the query workload changes

5
Pervasive Sensing Applications
  • Microsensors, on-board processing, wireless
    interfaces feasible at very small scale can
    monitor phenomena up close
  • Enables spatially and temporally dense monitoring
    and control
  • Pervasive sensing will reveal previously
    unobservable phenomena

Data center management
Manufacturing engineering
Environmental monitoring
Natural disaster response
Embedded, energy-constrained (wireless, small
form-factor), unattended systems
6
Sensors Embedded in Infrastructure
  • The day after a moderate earthquake jolts the
    city of San Francisco, building inspectors check
    on the structural integrity of an office building
    in the financial district. Sensors embedded in
    the walls of the building to monitor and record
    vibration data confirm that the structure is safe
    to enter. (Intel 2005)

7
From Sensor Networks to Applications
  • Sensor fields (blue), backbone (yellow),
    monitoring control applications (red)
  • Queries submitted from sensing applications
  • Replies received from sensor fields
  • Our focus Data management at data server

Routers Switches
Sensing Application
Data server / Gateway (and cache)
Sound

Light

Embedded, energy-constrained (wireless, small
form-factor), unattended systems
8
Data Server Node Without Cache
Sensor field
s
t1
s
l1
Sensor network query queue
s
Queries
Queries
s
s
s
s
t2
s
l2
Replies
s
Replies
s
s
Gateway reply queue
s
li query location i
ti timestamp associated with value sampled in
sensor field at location i
s sensor
9
Data Server Node Without Cache
Sensor field
End-to-end delay occurs between Querym and
Replym.Value deviation is between the value in
Replym and the value at li as Replym leaves the
gateway reply queue.
s
t1
s
l1
Sensor network query queue
s
Queries
Queries
s
s
Querym
s
s
t2
s
l2
Replies
s
Replym
Replies
s
s
Gateway reply queue
s
li query location i
ti timestamp associated with value sampled in
sensor field at location i
s sensor
10
Data Server Node With Cache
Sensor field
For a cache hit or a miss, end-to-end delay
occurs between Querym and Replym. Also, value
deviation is between the value in Replym and the
value at li as Replym leaves the gateway reply
queue.
s
l3
s
l1
Sensor network query queue
Gateway query queue
s
Queries
Queries
s
s
Querym
Miss or Prefetch
s
eli li,vi,ti
el1, el2
s
Cache
s
l2
Hit
s
Replym
Updates or replies
Updates
Replies
s
s
Cache update queue
Gateway reply queue
s
li query location eli cache entry for query
location
vi value in cache associated with location i
s sensor
ti timestamp of value associated with location i
Locations l1 and l2 are cached in entries el1 and
el2
11
Query Cost and Data Quality
Cost to query location li is normalized such that
Normalized quality using softmax normalization
12
Caching and Lookup Policies
  • All hits
  • All misses
  • Simple lookup
  • Piggyback queries
  • Greedy age-based lookup
  • Greedy distance-based lookup
  • Median-of-3 lookup

approximate lookups and queries
Policies incorporate an age parameter TT can be
0, finite, or infinite
13
Research Contributions
  • Defined and quantified data quality and query
    cost performance in pervasive sensing systems
  • Developed policies that approximate sensor field
    values using cached values for nearby locations
  • Prove analytic upper bound on sensor field query
    rate
  • Show cost and quality win-win for pervasive
    sensing applications for which response time is
    most important
  • Show cost vs. quality tradeoff for sensing
    applications for which accuracy is most important
  • Results are robust with respect to the manner in
    which the query workload changes

14
Lab Trace Data
Trace data from multi-sensor motes deployed at
Intel Berkeley lab (Deshpande 2004)
15
Lab Environment and Workload
  • 2.3 million readings taken over 35 days
  • Use readings with largest changes in value in our
    simulator (light measured in Lux)
  • Changes occur slowly relative to correlated
    changes (about 1 location every 1.4 seconds)
  • But, range of values is large
  • Applications determine values for A and T

16
Bounded Resource Consumption
  • N is set of locations in sensor field
  • Cache entry for each location used by multiple
    queries for periods of T seconds (requires
    blocking behind pending queries)
  • Sensor field query rate can be bounded by
    queries per second
  • Proof Induction on size of N
  • Sensor field transmissions dominate resource
    consumption

17
Data Quality Driven by Response Time
Picking a large value of A means delay is more
importantthan value deviationConsider
normalized quality when A 0.9
18
Cost and Quality Performance whenResponse Time
drives Quality
Trace-driven ChangesA 0.9, T 90 secQuery
rate 0.9 lpsChange rate 1.4 lps
Approximate greedy lookups outperform other
policiesThere is a win-win here!
19
Delay when Response Time drives Quality
Trace-driven Changes
20
Research Contributions
  • Defined and quantified data quality and query
    cost performance in pervasive sensing systems
  • Developed policies that approximate sensor field
    values using cached values for nearby locations
  • Proved analytic upper bound on sensor field query
    rate
  • Showed cost and quality win-win for pervasive
    sensing applications for which response time is
    most important
  • Show cost vs. quality tradeoff for sensing
    applications for which accuracy is most important
  • Results are robust with respect to the manner in
    which the query workload changes

21
Data Quality Driven by Accuracy
Choosing a small value of A means value deviation
is moreimportant to data quality than delayFor
example, consider normalized quality when A 0.1
22
Cost vs. Quality when Accuracy drives Quality
Trace-driven ChangesA 0.1, T 90 secQuery
rate 0.9 lpsChange rate 1.4 lps
There is a tradeoff between cost and quality here
23
Value Deviation when Accuracy drives Quality
Trace-driven Changes
Significant differences in accuracy between
policies
24
Cost and Quality Trends when Response Time
drives Quality
Trace-driven ChangesA 0.9, T 9 secQuery
rate 90, 9,and 0.9 lps Again, there is
awin-win here!
25
Cost vs. Quality Trends when Accuracy drives
Quality
Trace-driven ChangesA 0.1, T 9 secQuery
rate 90, 9,and 0.9 lps Same relative
performance
26
Talk Summary
  • Define and quantify data quality and query cost
    performance in pervasive sensing systems
  • Develop policies that approximate sensor field
    values using cached values for nearby locations
  • Prove analytic upper bound on sensor field query
    rate
  • Show cost and quality win-win for pervasive
    sensing applications for which response time is
    most important
  • Show cost vs. quality tradeoff for sensing
    applications for which accuracy is most important
  • Results are robust with respect to the manner in
    which the query workload changes

27
Thank You!
  • Further questions ???

David J. Yates
Bentley CollegeComputer Information Systems
Dept. Waltham, Massachusetts, USA dyates_at_bentley.e
du
28
Emergency Response Applications
  • Fire erupts in a warehouse in an industrial
    section of town. A sensing system installed in
    the building feeds detailed data to fire crews
    arriving on the scene, describing the location,
    characteristic and etiology of the fire, and
    predicting its future path. The result
    firefighters are able to work quickly and safely
    to bring the blaze under control. (Intel 2005)

29
Technology Market Trends
  • Three of the 7 companies named by Gartner as
    Cool Vendors in Emerging Trends and
    Technologies in 2005 produced hardware and/or
    software for sensor networks (Reynolds et al.
    2005)
  • IDC has identified supply chain management as the
    largest sensor network market in the short-term
    and predicts that the domestic market for RFID
    sensors will exceed 1 billion in 2007 (C. Boone
    2003)

30
Data Quality and Query Cost Research Issues
  • What form do data quality and query cost
    performance take?
  • Can we bound resource consumption?
  • Which policies provide best cost and quality when
    value deviation is more important than delay?
  • Which policies provide best performance when
    delay is more important than value deviation?
  • How does the manner in which the environment
    changes impact performance?

31
Softmax Normalization
  • Requires that we know only the mean and standard
    deviation for our system delays and value
    deviations
  • Normalization makes transformed values lie in the
    range 0,1
  • Used in neural networks and data mining for
    pattern recognition and data classification
    (Bridle 1990, Bishop 1995, Han and Kamber 2000)
  • Reaches softly towards maximum and minimum
    values, never quite getting there (Rodríguez
    2004)
  • Transformation is more or less linear in the
    middle range, and has a nonlinearity at both ends
    (Rodríguez 2004)

32
Query Workload Model
  • Query workload consists of polling component and
    random component
  • Parameterize to yield many workloads proposed by
    others
  • e.g., Madd03 (Berkeley), Lu02 (Virginia),
    Deme03 (Cornell), Jami03 (MIT),
    Inta03Zhao03 (USC), Desh03 (CMU), Olst03
    (Stanford)
  • These components are specified using two
    parameters
  • ? period of the polling component
  • ? average query arrival rate for a process that
    represents the random component
  • Example 9 queries with fixed interarrival times
    81 queries with exponentially distributed
    interarrival times 90 queries / second
  • All locations are equally likely to be queried

33
Models for Changes to Environment
  • Changes at each location are independent
  • Changes at each location correlated in space and
    time Models developed at USC (Jindal 2004)
  • Changes taken from real-world sensor readings at
    Intel Berkeley lab (Deshpande 2004)
  • Our focus - Models 2. and 3.

34
Delay when Accuracy drives Quality
Correlated Changes
Trace-driven Changes
Large all misses delay has important impact on
quality,but is discounted by choice of A 0.1
35
Results from Two Models
  • For correlated and trace-driven sensor network
    models
  • When delay is more important than value
    deviation, policies that approximate values using
    cached values for nearby locations provide best
    cost and best quality performance
  • When value deviation is more important than
    delay, there is a cost vs. quality tradeoff
  • Policies that always query (and cache) the
    specified location provide the best quality
    performance
  • Policies that approximate values using cached
    values for nearby locations provide best cost
    performance
  • What happens if we vary the query rate relative
    to the rate at which the environment changes?

36
Summary and Conclusions
  • Data Quality and Query Cost in Pervasive Sensing
    Systems
  • Define and quantify data quality and query cost
    performance in Pervasive Sensing Systems
  • Blocking behind pending queries bounds sensor
    field query rate
  • When delay is more important than value
    deviation, policies that approximate values using
    cached values for nearby locations provide best
    cost and quality performance
  • When value deviation is more important than
    delay, there is a cost vs. quality tradeoff
  • Policies that always query (and cache) the
    specified location provide the best quality
    performance
  • Policies that approximate values using cached
    values for nearby locations provide best cost
    performance
  • Results are robust with respect to the manner in
    which the environment changes

37
References I
  • (Christopher Bishop 1995) Neural Networks for
    Pattern Recognition. Oxford University Press,
    Oxford.
  • (C. Boone 2003) U.S. RFID for the Retail Supply
    Chain Spending Forecast and Analysis, 2003-2008,
    IDC, December 2003.
  • (John Bridle 1990) Probabilistic interpretation
    of feed-forward classification network outputs,
    with relationships to statistical pattern
    recognition, In Neurocomputing Algorithms,
    Architectures and Applications, Volume 6,
    Springer-Verlag, Berlin.
  • (A. Deshpande, C. Guestrin, S. Madden, J.M.
    Hellerstein, W. Hong 2004) Model-Driven Data
    Acquisition in Sensor Networks, In International
    Conference on Very Large Data Bases (VLDB),
    Toronto, August 2004.

38
References II
  • (J. Han and M. Kamber 2000) Data Mining Concepts
    and Techniques. Morgan Kaufman Publishers, San
    Francisco, California.
  • (Intel 2005) Intel Corp., Expanding Usage Models
    for Pervasive Sensing Systems, Technology_at_Intel
    Magazine, August 2005.
  • (A. Jindal and K. Psounis 2004) Modeling
    spatially-correlated sensor network data, In
    IEEE International Conference on Sensor and Ad
    hoc Communications and Networks (SECON), Santa
    Clara, California, October 2004.
  • (Reynolds et al. 2005) Martin Reynolds, Alan Mac
    Neela, Carol Rozwell, and Anne-Marie Roussel,
    Cool Vendors in Emerging Trends and
    Technologies, Gartner Research Report, March
    2005.

39
References III
  • (Caroline Rodríguez 2004) A computational
    environment for data preprocessing in supervised
    classification, M.Sc. Thesis, Department of
    Mathematics, University of Puerto Rico, Mayagüez,
    July 2004.
  • (J. Sikander 2004), Microsoft RFID Technology
    Overview, Microsoft Corp., November 2004.
Write a Comment
User Comments (0)