Disambiguation - PowerPoint PPT Presentation

About This Presentation
Title:

Disambiguation

Description:

Example: Michael Jordan, basketball star or ... Choose the most relevant people to Michael Jordan. ... Movie database from IMDB. 230,000 actors. 40,000 movies ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 17
Provided by: ting3
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Disambiguation


1
Disambiguation
  • March 7, 2003

2
Problem
  • Many people have the same name.
  • Example Michael Jordan, basketball star or
    professor?
  • Prior knowledge is not feasible.
  • Disambiguation based on context.
  • Example Scottie Pippen, Dennis Rodman, Phil
    Jackson
  • Example U.C. Berkeley, David Cohn

3
Graph
David Cohn
Scottie Pippen
Michael Jordan
Phil Jackson
U.C. Berkeley
Dennis Rodman
4
Graph
David Cohn
Michael Jordan
U.C. Berkeley
Scottie Pippen
Michael Jordan
Phil Jackson
Dennis Rodman
5
Algorithm
  • Choose the most relevant people to Michael
    Jordan.
  • Relevance measured by P( MJ p) for each person
    p.

6
Choosing Seed Values
  • We need a starting point.
  • People that correspond with the senses of MJ.
  • How well do the seeds separate people into
    camps?
  • Exhaustive search through all pairs of people.

7
Good Seeds
David Cohn
Scottie Pippen
Phil Jackson
U.C. Berkeley
Dennis Rodman
8
Bad seeds
David Cohn
Scottie Pippen
Phil Jackson
U.C. Berkeley
Dennis Rodman
9
Choosing Seeds I
  • Let Sj be the jth sense. Denote S1 as basketball
    star and S2 as professor (interchangeable because
    no prior knowledge).
  • In the exhaustive search, we arbitrarily pick
    some person to be seed0 and another to be seed1
    where seed0 corresponds to S0 and seed1 to S1.
  • Let P(MJ S1 MJ, seed1) 1 and P(MJ S0
    MJ, seed1) 0, vice versa.This probability
    could be wrong, but it is just an arbitrary
    assignment.

10
Choosing Seeds II
  • For each person, p, and sense, Sj
  • P( MJ Sj MJ, p) n(seedj, p) P(MJ seedj)
  • Person belong to camp Sj only if P(MJSj MJ, p)
    0.95.
  • Use harmonic mean to score how well seed0 and
    seed1 assign people to camps.

11
Iteration I
  • Now we have the best seeds, we are going to
    assign P( MJ Sj p) for each person, p.
  • Step 1 Begin with every person in the unknown
    except the seeds.
  • Step 2 For each person in the unknown and each
    sense, calculate P(MJ Sj p) P(MJ p) P(MJ
    SjMJ,p)

12
Iteration II
  • Step 3 For each sense, take the highest P(MJ
    Sj p) and take p out of unknown.
  • Step 4 Repeat step 2 and step 3 until everyone
    is out of the unknown.

13
Prediction
  • Given a link, simply add up all the probability
    of all the names for each sense.
  • So MJ in link is S1 or S2. We dont know
    anything about basketball stars or professors.

14
Dataset
  • Movie database from IMDB
  • 230,000 actors
  • 40,000 movies
  • Randomly pick actors who appeared in 15 movies or
    more (4000 actors).
  • Assign them to be the same person. Run the
    algorithm. See which sense does each movie belong
    to.
  • Repeat 100 times.
  • Average accuracy 75

15
Good Example
  • Blandick__Clara(38) vs Gibson__Henry(19)final
    score 0.98245638 out of 38 correctBlandick__Cl
    ara has seed Phelps__Lee18 out of 19
    correctGibson__Henry has seed Davies__John__IV_
  • Clara Blandick from 1910s to 1950s
  • Lee Phelps also from that era, appeared in 6
    movies with Clara
  • Henry Gibson from 1960s to 2000s
  • John Davies IV also from that era, appeared in 2
    movies with Henry

16
Bad Example
  • Marsh__Mae(25) vs Moorehead__Agnes(19)final
    score 0.50000016 out of 25 correctMarsh__Mae
    has seed Morin__Alberto__I_6 out of 19
    correctMoorehead__Agnes has seed Wolfe__Ian
  • Mae Marsh, Agnes Moorehead, Alberto Morin, and
    Ian Wolfe all appeared in movies from 1940s to
    1970s.
Write a Comment
User Comments (0)
About PowerShow.com