Cost Conscious Cleaning of Massive RFID Data Sets - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Cost Conscious Cleaning of Massive RFID Data Sets

Description:

Cost Conscious Cleaning of Massive RFID Data Sets. Hector Gonzalez, Jiawei Han, Xuehua Shen ... Where confidence is in the rage of 0.0 1.0 and indicates the level of ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 32
Provided by: hectorg
Category:

less

Transcript and Presenter's Notes

Title: Cost Conscious Cleaning of Massive RFID Data Sets


1
Cost Conscious Cleaning of Massive RFID Data Sets
  • Hector Gonzalez, Jiawei Han, Xuehua Shen
  • University of Illinois at Urbana Champaign
  • Department of Computer Science
  • The Database and Information System Laboratory
  • ICDE 2007

2
Outline
  • Motivation
  • RFID Data
  • Error Sources
  • Cleaning Methods
  • Smoothing methods
  • Rule based Methods
  • DBN based methods
  • Cost conscious cleaning
  • Architecture
  • Cleaning Methods
  • Cleaning Sequence
  • Cleaning Plan Induction
  • Performance Study

3
Motivation
  • The reliability of current RFID systems is far
    from optimal.
  • Under a wide variety of environments more than
    50 of all tag readings are missed.
  • The volume of data generated by RFID systems is
    huge
  • A large retailer with hundreds of readers can
    generate thousands of tag readings every second.
  • An accurate and efficient cleaning process is
    essential to the successful implementation of
    RFID technology.

4
Example
  • A large retailer with RFID readers at warehouses,
    distribution centers, and store backrooms.
  • A variety of factors impact correct tag
    detections
  • Diverse reader/tag manufacturers, generation
  • Moving (conveyor belts, doors) and static tags
    (shelfs).
  • Different levels of RF noise caused by metal or
    water in the environment or in products.
  • No single method can efficiently clean such a
    large volume of data, generated under such
    diverse circumstances.

5
RFID Data
  • Readers conduct interrogation cycles at periodic
    invervals
  • An RF signal is issued
  • Tags awake and transmit via RF their EPC
    (electronic product code)
  • A singulation protocol is used to prevent tag
    collisions
  • In order to improve accuracy, during a read cycle
    multiple interrogation cycles are issued.
  • Readings of the form (EPC, Reader Time) are
    generated at the end of each read cycle. Readings
    have extra information such as
  • Total responses obtained during the read cycle
  • Antenna used for detection, tag type, and signal
    strength

6
Error Sources
  • There are two types of errors.
  • False Negatives A tag that is present is not
    detected.
  • False Positives A tag not present is detected.
  • Why do we observe errors
  • Collisions Multiple tags transmit
    simultaneously, or Multiple readers transmit
    simultaneously.
  • Environment Interference Metal or water near the
    tag or reader cause RF interference.
  • Physical Configuration The tag moves too
    quickly, or is located in a blind spot.
  • Logical Errors A door reader detects tags that
    go nearby but not through.

7
Cleaning Methods
  • A cleaning method M is a classifier that assigns
    a location to tag cases.
  • A tag case is a tuple of the form
  • (ltEPC, timegt,ltf1,f2,,fkgt)
  • Where each f_i is a feature (e.g. tag type,
    signal strength)
  • The label assigned to each tag case is of the
    form
  • (ltEPC,timegt location, confidence)
  • Where confidence is in the rage of 0.0 1.0 and
    indicates the level of certainty about the
    location.
  • Terminal classifiers do not provide a confidence
    values

8
Fixed size smoothing window
  • Fixed window smoothing
  • The window is made up of the last k (fixed) read
    cycles.
  • If there is any reading inside the window mark
    the tag as present
  • Problems
  • Difficult to define the best window size for
    different conditions
  • Benefit
  • Cheap method to apply, only requirement is to
    remember last k readings

Truth

Readings

Smooth

9
Adaptive window size smoothing
  • Adaptive window size smoothing
  • Change the size of the window according to the
    observed probability of tag detections
  • Use a binomial model
  • Let p, be the probability of detecting a present
    tag
  • chose window size w, such that
  • (1-p) w lt threshold
  • Benefits We adapt the window size to the current
    conditions
  • Problems
  • It can be expensive to store and maintain p for
    every single tag in the system
  • All readings inside the window have the same
    weight, better to put more weight on recent
    readings

10
Rule Based Cleaning
  • Rules can be derived from the data or given by a
    domain expert.
  • A door reader should only recognize tags with a
    bell shaped signal strength.
  • The shelf reader should only recognize items that
    stay there for more than 5 minutes
  • Rules can be derived from an RFID warehouse
  • Flowgraphs can be used to complete missing
    readings and to decide location conflicts

11
DBN Based Cleaning
  • We can model tag detections using a dynamic but
    hidden process
  • Tag readings correspond to noisy observations on
    the true, but hidden, location of the tag
  • We dynamically update our belief on tag location
    based on the sequence of observations
  • More recent observations weight higher on our
    belief
  • DBNs allow us to differentiate between the
    following two cases



No question, detect tag !!!
Should we detect this tag???
12
DBN Structure
Transition Model
  • We learn the transition and observation models
    from data
  • DBNs can represent complex structures e.g.,
    variable for RF noise, and tag speed interacting
    with detection

Present t
Present t-1
hidden
Detect t
Detect t-1
Observed
Observation Model
13
Belief state updating
Belief Update Equation
Observation Model
Belief at time t1 given all evidence up to t1
Update to belief state given transition model
Belief state is updated dynamically, we do not
need to remember a window of observations, we
only need to keep the latest belief state.
14
Cost Conscious Cleaning
  • Given a collection of cleaning methods, a set of
    labeled tag cases, and a cost model for each
    cleaning method, design an efficient cleaning
    plan that defines the conditions under which each
    cleaning method or sequence of cleaning methods
    should be applied

15
System Architecture
RFID Stream
Labeled Data
Cleaning Plan Induction
Apply Plan
Cleaning Methods
Clean Data
Online
Offline
16
Cost Model
  • In order to apply a cleaning method to a tag case
    we need to incur a cost
  • Classification cost cost in terms of cpu and
    storage of labeling each tag case.
  • Amortized per tuple training cost Cost to train
    the cleaning method, e.g., in a DBN we need to
    learn the transition and observation models.
  • Error cost
  • When we make an error in deciding the location of
    a tag, a cost is incurred.
  • Error cost can be a scalar or a function of the
    distance of the correct location to the predicted
    one, or even the price of the item.

17
Cleaning Sequence
  • Ordered application of cleaning methods for a set
    M to a set of tag cases D
  • SD,M Ms1 ? Ms2 ? ? Msk
  • Apply Ms1 to the entire data set D
  • Apply Ms2 to the cases that Ms1 failed to
    classify
  • Apply Msk to the cases that every other method
    failed to classify
  • The cost of applying a cleaning sequence C(SD,M)
    is the cost of applying each method as described
    plus the error cost on tag cases misclassified by
    every method
  • Optimal cleaning sequence SD,M is the cleaning
    sequence with minimal cost among all possible
    cleaning sequences given D, and M

18
Cleaning Sequence Approximation
Step 1
M2
M1
M3
Step 2
Step 3
C(M1)1, C(M2)1.5, C(M3)0.5, Error 5 Accuracy
Adjusted Cleaning Cost
SD,M M1 ? M3 ? M2
C(SD,M) 1 0.520.5 0.431.5 0.195
19
Cleaning Plan
  • Input
  • D Set of labeled tag cases lt(EPC,time),(features
    )gt
  • M Set of available cleaning methods
    M1,M2,,Mk
  • C Cost model C(M1),C(M2),,C(Mk),C(Error)
  • Output
  • A decision tree that splits D according to
    feature values.
  • For each leaf in the tree a cleaning sequence is
    defined.
  • Application
  • For each test case,
  • use feature values to get to appropriate leaf.
  • apply cleaning sequence defined in the leaf.

20
Available Features
  • Tag features
  • Communications protocol, Manufacturer, price,
    quality
  • Detection history
  • Reader features
  • Number of antennas
  • Protocol, price, vendor
  • Location features
  • type of area being covered (e.g. door, shelf,
    conveyor belt)
  • Interference level (e.g. presence of metal or
    water)
  • Item features
  • Type of item, contents, price

21
Cleaning Plan Induction
  • Use a traditional Top Down Induction of Decision
    Trees (TDIDT) algorithm
  • Node splitting criteria
  • Split nodes based on expected cost reduction

Cleaning sequence cost before the split
Average cost for each cleaning sequence after the
split.
22
Example
Labeled tag cases
Cleaning Plan
reader
door
shelf
Method pat Accuracy 100
yes
metal
no
Method fix_1 Accuracy 75
Method dbn Accuracy 100
  • We use 3 cleaning methods fix_1, DBN, and pat
  • The label is 1 if the tag is indeed present, and
    0 otherwise
  • Each method predicts 1 for present, 0 for absent
  • The cleaning plan selects when to apply each
    method, e.g. we should use fix_1 to clean cases
    from shelf readers when there is metal

23
Example of node splitting
C(fix_1)1.0,C(DBN)2.0, C(PAT)2.0,
C(Error)5.0 Cleaning sequence for D DBN ? pat
2.0 0.332.0 0.115.0 3.21
DNB ? fix 2.0 0.161.0 2.16
pat 2.0
Cost Reduction 3.21 (2.160.67 2.00.33)
1.11
24
Experimental Setup
  • Data generator simulates a complex RFID system
    with multiple locations, readers, and tag types.
  • The simulation is controlled by several
    parameters
  • Item flow characteristics, paths traversed,
    predictable vs random movements
  • Item speed
  • Reader, tag, and item characteristics (e.g.
    protocol, manufacturer, item contents).
  • Location characteristics and RF noise levels
  • The simulation is run for a number of read
    cycles, at each cycle to probability of tag
    detection is a function of reader, item, tag, and
    location characteristics.

25
Cleaning methods
  • We compare the performance of several cleaning
    methods
  • DBN a dbn based cleaner
  • var adaptive window smoothing
  • fix_k fixed (size k) window smoothing
  • Rule based methods
  • pat a pattern recognition method based on signal
    strength shape
  • maj used to resolve multi reader detection
    conflicts by majority voting
  • Cost Model
  • C(DBN)2.0, C(var)2.0, C(fix_1) 1.0,
    C(fix_3)1.4, C(pat)2.5, C(maj)1.4, C(Error)10
  • All methods are terminal except for DBN which
    uses P(tag present) as confidence

26
Complex Setup I
  • readers in low noise
  • readers detecting far away tags
  • readers with variable detection rates
  • reader at a conveyor belt
  • readers detecting tags with water and metal
  • In some areas there can be conflict of multiple
    patterns detecting the same tag

27
Complex Setup II
28
Reader Setup
29
Noise Level
30
Conclusions
  • A cost conscious cleaning framework for RFID data
    can increase accuracy at lower cost than any
    single cleaning method.
  • The cleaning plan can be efficiently learnt from
    data by applying the idea of cleaning cost
    reduction to node splitting.
  • DBN based cleaning methods capture the intrinsic
    dynamic behavior of tag detection, and deliver
    high accuracy at lower costs than smoothing
    window based techniques.

31
Thanks
Write a Comment
User Comments (0)
About PowerShow.com