Cost Conscious Cleaning of Massive RFID Data Sets - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Cost Conscious Cleaning of Massive RFID Data Sets

Description:

Cost Conscious Cleaning of Massive RFID Data Sets. Hector Gonzalez, Jiawei Han, Xuehua Shen ... Where confidence is in the rage of 0.0 1.0 and indicates the level of ... – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 32

Provided by: hectorg

Category:

more less

Transcript and Presenter's Notes

Title: Cost Conscious Cleaning of Massive RFID Data Sets

1
Cost Conscious Cleaning of Massive RFID Data Sets

Hector Gonzalez, Jiawei Han, Xuehua Shen
University of Illinois at Urbana Champaign
Department of Computer Science
The Database and Information System Laboratory
ICDE 2007

2
Outline

Motivation
RFID Data
Error Sources
Cleaning Methods
Smoothing methods
Rule based Methods
DBN based methods
Cost conscious cleaning
Architecture
Cleaning Methods
Cleaning Sequence
Cleaning Plan Induction
Performance Study

3
Motivation

The reliability of current RFID systems is far
from optimal.
Under a wide variety of environments more than
50 of all tag readings are missed.
The volume of data generated by RFID systems is
huge
A large retailer with hundreds of readers can
generate thousands of tag readings every second.
An accurate and efficient cleaning process is
essential to the successful implementation of
RFID technology.

4
Example

A large retailer with RFID readers at warehouses,
distribution centers, and store backrooms.
A variety of factors impact correct tag
detections
Diverse reader/tag manufacturers, generation
Moving (conveyor belts, doors) and static tags
(shelfs).
Different levels of RF noise caused by metal or
water in the environment or in products.
No single method can efficiently clean such a
large volume of data, generated under such
diverse circumstances.

5
RFID Data

Readers conduct interrogation cycles at periodic
invervals
An RF signal is issued
Tags awake and transmit via RF their EPC
(electronic product code)
A singulation protocol is used to prevent tag
collisions
In order to improve accuracy, during a read cycle
multiple interrogation cycles are issued.
Readings of the form (EPC, Reader Time) are
generated at the end of each read cycle. Readings
have extra information such as
Total responses obtained during the read cycle
Antenna used for detection, tag type, and signal
strength

6
Error Sources

There are two types of errors.
False Negatives A tag that is present is not
detected.
False Positives A tag not present is detected.
Why do we observe errors
Collisions Multiple tags transmit
simultaneously, or Multiple readers transmit
simultaneously.
Environment Interference Metal or water near the
tag or reader cause RF interference.
Physical Configuration The tag moves too
quickly, or is located in a blind spot.
Logical Errors A door reader detects tags that
go nearby but not through.

7
Cleaning Methods

A cleaning method M is a classifier that assigns
a location to tag cases.
A tag case is a tuple of the form
(ltEPC, timegt,ltf1,f2,,fkgt)
Where each f_i is a feature (e.g. tag type,
signal strength)
The label assigned to each tag case is of the
form
(ltEPC,timegt location, confidence)
Where confidence is in the rage of 0.0 1.0 and
indicates the level of certainty about the
location.
Terminal classifiers do not provide a confidence
values

8
Fixed size smoothing window

Fixed window smoothing
The window is made up of the last k (fixed) read
cycles.
If there is any reading inside the window mark
the tag as present
Problems
Difficult to define the best window size for
different conditions
Benefit
Cheap method to apply, only requirement is to
remember last k readings

Truth

Readings

Smooth

9
Adaptive window size smoothing

Adaptive window size smoothing
Change the size of the window according to the
observed probability of tag detections
Use a binomial model
Let p, be the probability of detecting a present
tag
chose window size w, such that
(1-p) w lt threshold
Benefits We adapt the window size to the current
conditions
Problems
It can be expensive to store and maintain p for
every single tag in the system
All readings inside the window have the same
weight, better to put more weight on recent
readings

10
Rule Based Cleaning

Rules can be derived from the data or given by a
domain expert.
A door reader should only recognize tags with a
bell shaped signal strength.
The shelf reader should only recognize items that
stay there for more than 5 minutes
Rules can be derived from an RFID warehouse
Flowgraphs can be used to complete missing
readings and to decide location conflicts

11
DBN Based Cleaning

We can model tag detections using a dynamic but
hidden process
Tag readings correspond to noisy observations on
the true, but hidden, location of the tag
We dynamically update our belief on tag location
based on the sequence of observations
More recent observations weight higher on our
belief
DBNs allow us to differentiate between the
following two cases

No question, detect tag !!!
Should we detect this tag???
12
DBN Structure
Transition Model

We learn the transition and observation models
from data
DBNs can represent complex structures e.g.,
variable for RF noise, and tag speed interacting
with detection

Present t
Present t-1
hidden
Detect t
Detect t-1
Observed
Observation Model
13
Belief state updating
Belief Update Equation
Observation Model
Belief at time t1 given all evidence up to t1
Update to belief state given transition model
Belief state is updated dynamically, we do not
need to remember a window of observations, we
only need to keep the latest belief state.
14
Cost Conscious Cleaning

Given a collection of cleaning methods, a set of
labeled tag cases, and a cost model for each
cleaning method, design an efficient cleaning
plan that defines the conditions under which each
cleaning method or sequence of cleaning methods
should be applied

15
System Architecture
RFID Stream
Labeled Data
Cleaning Plan Induction
Apply Plan
Cleaning Methods
Clean Data
Online
Offline
16
Cost Model

In order to apply a cleaning method to a tag case
we need to incur a cost
Classification cost cost in terms of cpu and
storage of labeling each tag case.
Amortized per tuple training cost Cost to train
the cleaning method, e.g., in a DBN we need to
learn the transition and observation models.
Error cost
When we make an error in deciding the location of
a tag, a cost is incurred.
Error cost can be a scalar or a function of the
distance of the correct location to the predicted
one, or even the price of the item.

17
Cleaning Sequence

Ordered application of cleaning methods for a set
M to a set of tag cases D
SD,M Ms1 ? Ms2 ? ? Msk
Apply Ms1 to the entire data set D
Apply Ms2 to the cases that Ms1 failed to
classify
Apply Msk to the cases that every other method
failed to classify
The cost of applying a cleaning sequence C(SD,M)
is the cost of applying each method as described
plus the error cost on tag cases misclassified by
every method
Optimal cleaning sequence SD,M is the cleaning
sequence with minimal cost among all possible
cleaning sequences given D, and M

18
Cleaning Sequence Approximation
Step 1
M2
M1
M3
Step 2
Step 3
C(M1)1, C(M2)1.5, C(M3)0.5, Error 5 Accuracy
Adjusted Cleaning Cost
SD,M M1 ? M3 ? M2
C(SD,M) 1 0.520.5 0.431.5 0.195
19
Cleaning Plan

Input
D Set of labeled tag cases lt(EPC,time),(features
)gt
M Set of available cleaning methods
M1,M2,,Mk
C Cost model C(M1),C(M2),,C(Mk),C(Error)
Output
A decision tree that splits D according to
feature values.
For each leaf in the tree a cleaning sequence is
defined.
Application
For each test case,
use feature values to get to appropriate leaf.
apply cleaning sequence defined in the leaf.

20
Available Features

Tag features
Communications protocol, Manufacturer, price,
quality
Detection history
Reader features
Number of antennas
Protocol, price, vendor
Location features
type of area being covered (e.g. door, shelf,
conveyor belt)
Interference level (e.g. presence of metal or
water)
Item features
Type of item, contents, price

21
Cleaning Plan Induction

Use a traditional Top Down Induction of Decision
Trees (TDIDT) algorithm
Node splitting criteria
Split nodes based on expected cost reduction

Cleaning sequence cost before the split
Average cost for each cleaning sequence after the
split.
22
Example
Labeled tag cases
Cleaning Plan
reader
door
shelf
Method pat Accuracy 100
yes
metal
no
Method fix_1 Accuracy 75
Method dbn Accuracy 100

We use 3 cleaning methods fix_1, DBN, and pat
The label is 1 if the tag is indeed present, and
0 otherwise
Each method predicts 1 for present, 0 for absent
The cleaning plan selects when to apply each
method, e.g. we should use fix_1 to clean cases
from shelf readers when there is metal

23
Example of node splitting
C(fix_1)1.0,C(DBN)2.0, C(PAT)2.0,
C(Error)5.0 Cleaning sequence for D DBN ? pat
2.0 0.332.0 0.115.0 3.21
DNB ? fix 2.0 0.161.0 2.16
pat 2.0
Cost Reduction 3.21 (2.160.67 2.00.33)
1.11
24
Experimental Setup

Data generator simulates a complex RFID system
with multiple locations, readers, and tag types.
The simulation is controlled by several
parameters
Item flow characteristics, paths traversed,
predictable vs random movements
Item speed
Reader, tag, and item characteristics (e.g.
protocol, manufacturer, item contents).
Location characteristics and RF noise levels
The simulation is run for a number of read
cycles, at each cycle to probability of tag
detection is a function of reader, item, tag, and
location characteristics.

25
Cleaning methods

We compare the performance of several cleaning
methods
DBN a dbn based cleaner
var adaptive window smoothing
fix_k fixed (size k) window smoothing
Rule based methods
pat a pattern recognition method based on signal
strength shape
maj used to resolve multi reader detection
conflicts by majority voting
Cost Model
C(DBN)2.0, C(var)2.0, C(fix_1) 1.0,
C(fix_3)1.4, C(pat)2.5, C(maj)1.4, C(Error)10
All methods are terminal except for DBN which
uses P(tag present) as confidence

26
Complex Setup I

readers in low noise
readers detecting far away tags
readers with variable detection rates
reader at a conveyor belt
readers detecting tags with water and metal
In some areas there can be conflict of multiple
patterns detecting the same tag

27
Complex Setup II
28
Reader Setup
29
Noise Level
30
Conclusions

A cost conscious cleaning framework for RFID data
can increase accuracy at lower cost than any
single cleaning method.
The cleaning plan can be efficiently learnt from
data by applying the idea of cleaning cost
reduction to node splitting.
DBN based cleaning methods capture the intrinsic
dynamic behavior of tag detection, and deliver
high accuracy at lower costs than smoothing
window based techniques.

31
Thanks

Write a Comment

User Comments (0)