Matching - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Matching

Description:

Accounting payment to volunteers and billing of police authorites ... Commercial Of the Shelf Software (COTS) Software exists for most business needs: payroll ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 25
Provided by: chr1
Category:

less

Transcript and Presenter's Notes

Title: Matching


1
Matching
  • Lecture 10

2
Topics
  • ID parade Frames
  • Matching Examples
  • Fuzzy Matching
  • Scales of measurement

3
ID Parade Frames
  • Classifying volunteers as clean
  • Matching suspect to volunteers
  • Reservation of parade facility, officers,
    volunteers
  • Managing long-running process from decision to
    hold parade to payment of volunteers
  • Accounting payment to volunteers and billing of
    police authorites
  • Historical record and analysis

4
Merging multiple frames
  • Each frame produces its own model of the actors.
  • E.g. Models of volunteer
  • For matching with suspect
  • For classification
  • For payment
  • For reservation
  • For database, problem is called view
    integration

5
Miscellaneous Matching applications
  • Many systems have a matching task at their core
  • Shazam sound sample matching
  • De-duping mailing lists
  • CD DB - CD recognition
  • COTS selection
  • IS development selection
  • fingerprint matching
  • patient/donor matching for transplant surgery
  • blood typing and matching
  • patients to clinical trials
  • interns to placements in hospitals
  • DNA samples
  • search request to locate relevant documents
  • incoming news items to information subscribers
  • number plate recognition in Londons Congestion
    Congestion Charging System
  • speech and writing recognition
  • patterns to material to minimise wastage

6
Shazam - 2580
  • Shazam is a mobile phone application
  • It can recognise 1.7 million tracks from a 30 sec
    sample new tracks added at 5,000 a week
  • The track details are texted back within about
    30secs
  • It costs 50p 9p call charge (surcharge only if
    successful)
  • Your personal page shows the tracks you have
    tagged
  • www.shazam.com

7
De-duping
A catalogue from OReilly
C Wallace West England University Coldharbour
Lane Frenchay Bristol BS16 1QY
Ms C Wallace Univ. of the West of
England Frenchay Campus Coldharbour
Lane Bristol BS16 1QY
One person or two?
Mailing lists are reported with 25 40
duplicates.
8
CD DB
  • Database of 2.5 million CDs, track details and
    supporting matter run by gracenote
    (www.gracenote.com)
  • Used by media players to obtain track info
  • Player sends signature of CD sequence of track
    lengths in 1/4sec to match against the database
    (via HTTP)
  • Application searches DB for best match and
    returns track info to media player.
  • Matching algorithm described in US Patent
    6,061,680

9
Commercial Of the Shelf Software (COTS)
  • Software exists for most business needs
  • payroll
  • order processing
  • general ledger
  • human resources
  • e-commerce
  • e.g. SAP, SAGE ..
  • but analysts need to match business needs to COTS
    capability, and customise generic software for
    local business rules.

10
Method selection
  • DSDM Product Design Assistant
  • The Product design Assistant (PDA) provides the
    practitioner with an approach to determine which
    mechanisms and techniques are appropriate for
    their project
  • Table 1 Mechanism selection
  • Table 2 Technique selection

11
Police ID parade
  • Currently
  • Suspect matched to Volunteers visually by officer
  • Information System
  • Suspect and Volunteers modelled in database
  • System provides list of matching volunteers

12
Matching in general
  • Matching task typically involve
  • two sets of individuals e.g.
  • the suspect / sampled track / DNA sample - The
    Requirement
  • the volunteers / 1.7 million stored tracks / DNA
    on file The Resource
  • adequate representations of both
  • a fitness function which calculates how well
    matched a Resource is to the Requirement
  • Matching processes
  • Single or Batch?
  • Single One Req to many Resources
  • Batch Many Reqs to many Resources (e.g. cutting)
  • Automatic, Interactive, Assistive
  • Automatic Matching fully automated
  • Interactive User makes final selection, adjusts
    weights
  • Assistive Computer produces analyses which aid
    human selection

13
Single Allocation
  • Allocation to a single Requirement
  • long list the Resources - eliminate the
    obviously unsuitable
  • compute fitness between Requirement and each
    remaining Resource
  • rank the Resources in fitness order for a short
    list
  • ? user selection from short list on basis of
    additional information unknown to system
  • Interactive
  • User adjusts
  • description of Requirement (e.g the search term
    in Google)
  • fitness function (e.g. the weights in the ID
    parade)
  • and retries

14
Single attribute Matching
  • Fuzzy String Matching
  • Levenshtein distance
  • Soundex and Metaphone
  • Age difference
  • Scales of measurement

15
Fuzzy String Matching
  • How close are two strings words, DNA sequences?
  • Levenshtein distance
  • is the number of single character edits required
    to change one to the other
  • insert a letter
  • delete a letter
  • replace a letter
  • E.g.
  • Receipt reciept tecept - distance 2
  • Need a theory of why the strings are different
  • Better theory for typing would be to count
    transposition as 1 edit instead of 2
  • mutations in DNA matching

16
Soundex and Metaphone
  • Surnames in English have multiple spellings for
    similar sounds
  • Wallace and Wallis, Smith and Smythe
  • Errors caused by similar phonetics having
    different spelling
  • Useful where sound-text transliteration occurs in
    data capture
  • e.g. Smith and Smythe
  • Soundex (Odell and Rusell 1922) reduces every
    word to a letter and 3 digits S530 for both
  • Metaphone (Philips 1990) smarter about English
    phonetics SM0 for both
  • Not perfect
  • Kris (K620 and XRS)
  • Chris(C620 and KRS)

17
Fuzzy matching
  • How close are two ages?
  • Is the answer different for the identity parade
    and a dating agency?

Non fitness
0.0
Suspect
age
Volunteer
Ldeal Person
Date
18
Multi-attribute Matching
  • How to combine multiple attributes to create a
    single fitness measure?
  • Age and Height are different to Build,
    Eye-colour, Gender and Ethnic origin.
  • Distance in 2-D space

Sqrt(dx2 dy2)
dy
y
dx
x
19
Multi-attribute matching
  • Extract shows a simple Excel spreadsheet
    containing a suspect age, weight and gender, and
    the same attributes for 10 volunteers
  • Representation
  • Age is measured in years
  • Height in cm
  • Gender is M or F
  • Fitness function
  • Calculate difference between suspect and
    volunteer attributes
  • Normalise differences to 01
  • Multiple by weights to express importance of each
    attribute
  • Sum of squared differences as Fitness function
  • Best fit volunteer has minimum value for Fitness

20
Scales of Measurement
  • Nominal names or categories
  • E.g. Eye-colour, Ethnic origin, Telephone number,
    ISBN
  • Valid operations , not
  • Partly Ordered Scales e.g. grandparent, parent,
    uncle, child, cousin
  • Pairs are ordered but no overall ordering
  • Ordinal ranks
  • E.g. 1,2,3 in Derby, 1st ,2.1, 2.2, 3rd class,
    slight, medium heavy build
  • Valid operations lt, , gt
  • Invalid operations , - ( gap between 1 and
    2, is not the same as between 2 and 3)
  • Non-parametric statistics may apply
  • Interval - arbitrary zero value
  • E.g. Temperature in degrees F, date in Julian
    Calendar
  • Valid Op - (minus)
  • Invalid , (but differences are Ratio)
  • Ratio
  • E.g. Length, age
  • Valid Ops , , /, standard statistical
    operations
  • Multi-dimensional scales (index numbers)
  • E.g. Miles/gallon, IQ

21
Suspect/Volunteer attributes
  • Nominal names or codes
  • Ordinal ranks
  • Interval - no zero value
  • Ratio

22
Transforming and Scaling
  • To combine different attributes, we need to
    transform Nominal, Ordinal and Interval values to
    Ratio scales
  • This cannot be done objectively, so judgement
    involved
  • Scaling and weights need to be adjustable to
    fine-tune matching
  • gt Learning Frame (later)

23
Sensitivity Analysis
  • Arbitrary weights can be adjusted to see what
    effect their variation has on the final selection
  • ? How much would each weight have to change
    before the first choice is demoted?
  • Excel analysis

24
Tutorial Questions
  • Explain the users interaction with Shazam to tag
    a track using a sequence diagram.
  • Choose a matching problem with which you are
    familiar (or choose one from the list)
  • Identify the requirement and the resources and
    suggest appropriate representations of each
  • Identify a suitable fitness function for this
    problem
Write a Comment
User Comments (0)
About PowerShow.com