EPCA Integration - PowerPoint PPT Presentation

About This Presentation
Title:

EPCA Integration

Description:

people.cs.umass.edu – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 34
Provided by: Team153
Category:

less

Transcript and Presenter's Notes

Title: EPCA Integration


1
People in CALOs WorldContact Info, Expertise,
Groups RolesInformation Extraction,
Coreference, Group/Topic ModelsAndrew McCallum
Aron Culotta, Xuerui Wang, Charles Sutton, Wei
LiUMass Amherst
2
The Application
Workplace effectiveness Ability to leverage
network of acquaintances The power of your
little black book But filling Contacts DB by
hand is tedious, and incomplete.
Contacts DB
Email Inbox
Automatically
WWW
3
DEX Overview
CRF
WWW
Email
names
4
DEX Example
To Andrew McCallum mccallum_at_cs.umass.edu Subjec
t ...
First Name Andrew
Middle Name Kachites
Last Name McCallum
JobTitle Associate Professor
Company University of Massachusetts
Street Address 140 Governors Dr.
City Amherst
State MA
Zip 01003
Company Phone (413) 545-1323
Links Fernando Pereira, Sam Roweis,
Key Words Information extraction, social network,
Search for new people
5
Summary of Results
Example keywords extracted
Person Keywords
William Cohen Logic programming Text categorization Data integration Rule learning
Daphne Koller Bayesian networks Relational models Probabilistic models Hidden variables
Deborah McGuiness Semantic web Description logics Knowledge representation Ontologies
Tom Mitchell Machine learning Cognitive states Learning apprentice Artificial intelligence
Contact info and name extraction performance (25
fields)
Token Acc Field Prec Field Recall Field F1
CRF 94.50 85.73 76.33 80.76
  1. Expert Finding When solving some task, find
    friends-of-friends with relevant expertise.
    Avoid stove-piping in large orgs by
    automatically suggesting collaborators. Given a
    task, automatically suggest the right team for
    the job. (Hiring aid!)
  2. Social Network Analysis Understand the social
    structure of your organization. Suggest
    structural changes for improved efficiency.

6
Outline
  • Information Extraction
  • Learning in the wild
  • Transfer learning
  • Identity Uncertainty
  • Modeling Groups, Roles and Topics

7
Outline
  • Information Extraction
  • Learning in the wild
  • Transfer learning
  • Identity Uncertainty
  • Modeling Groups, Roles and Topics

8
0. Segmenting and labeling sequence
dataLinear-chain CRFs
Lafferty, McCallum, Pereira 2001
PER O O TIME O O
ORG O LOC
...
y
Named entity labels
...
x
CALO email words
Dave , The Friday meeting with Tembec in NY

Leveraging data from KnowItAll,Etzioni et al,
2004 UPenn help.
Enron email labeled by Michael Collins, et
al. 1200 entities
Field F1 DATE 0.8483 TIME 0.7939 LOCATION 0.64
76 PERSON 0.8439 ORGANIZATION 0.5987 ACRONYM 0.2
804 PHONE 0.7943 MONEY 0.7143 PERCENT 0.9091 OV
ERALL 0.7282
From monika.causholli_at_enron.com Dave, The
Friday meeting with Tembec in NY has been
postponed until next week. Attached is the
information you requested. Let me know if you
need anything else. Also did Doug give you the
data about consumer products? Cheers, Monica
Li, McCallum, unpublished, 2004
9
User feedback in the wildas labeling
Labeling for Classification
Seminar How to Organize your Life by Jane
Smith, Stevenson Smith Mezzanine Level,
Papadapoulos Sq 330 pm Thursday March 31 In
this seminar we will learn how to use CALO to...
Seminar announcement
Todo request
Other
Easy Often found in user interfaces e.g. CALO
IRIS, Apple Mail
10
Multiple-choice Annotation forLearning
Extractors in the wild
Culotta, McCallum 2005
Task Information Extraction.Fields NAME
COMPANY ADDRESS (and others)
Jane Smith , Stevenson Smith , Mezzanine Level,
Papadopoulos Sq.
11
Multiple-choice Annotation forLearning
Extractors in the wild
Culotta, McCallum 2005
Task Information extraction.Fields NAME
COMPANY ADDRESS (and others)
Jane Smith , Stevenson Smith , Mezzanine Level,
Papadopoulos Sq.
Interface presents top hypothesized segmentations
Jane Smith , Stevenson Smith Mezzanine Level ,
Papadopoulos Sq.
Jane Smith , Stevenson Smith Mezzanine Level ,
Papadopoulos Sq.
Jane Smith , Stevenson Smith Mezzanine Level ,
Papadopoulos Sq.
user corrects labels, not segmentations
12
Multiple-choice Annotation forLearning
Extractors in the wild
Culotta, McCallum 2005
Task Information extraction.Fields NAME
COMPANY ADDRESS (and others)
Jane Smith , Stevenson Smith , Mezzanine Level,
Papadopoulos Sq.
Interface presents top hypothesized segmentations
Jane Smith , Stevenson Smith Mezzanine Level ,
Papadopoulos Sq.
Jane Smith , Stevenson Smith Mezzanine Level ,
Papadopoulos Sq.
Jane Smith , Stevenson Smith Mezzanine Level ,
Papadopoulos Sq.
29 percent reduction in user actions needed to
train
13
Outline
  • Information Extraction
  • Learning in the wild
  • Transfer learning
  • Identity Uncertainty
  • Modeling Groups, Roles and Topics

14
Piecewise Training in Factorial CRFsfor Transfer
Learning
Sutton, McCallum, 2005
Emailed seminar annmt entities
Email English words
60k words training.
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell School of Computer
Science Carnegie Mellon University
330 pm 7500 Wean
Hall Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence during the
1980s and 1990s. As a result of its success and
growth, machine learning is evolving into a
collection of related disciplines inductive
concept acquisition, analytic learning in problem
solving (e.g. analogy, explanation-based
learning), learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
Too little labeled training data.
15
Piecewise Training in Factorial CRFsfor Transfer
Learning
Sutton, McCallum, 2005
Train on related task with more data.
Newswire named entities
Newswire English words
200k words training.
CRICKET - MILLNS SIGNS FOR BOLAND CAPE TOWN
1996-08-22 South African provincial side Boland
said on Thursday they had signed Leicestershire
fast bowler David Millns on a one year contract.
Millns, who toured Australia with England A in
1992, replaces former England all-rounder Phillip
DeFreitas as Boland's overseas professional.
16
Piecewise Training in Factorial CRFsfor Transfer
Learning
Sutton, McCallum, 2005
At test time, label email with newswire NEs...
Newswire named entities
Email English words
17
Piecewise Training in Factorial CRFsfor Transfer
Learning
Sutton, McCallum, 2005
then use these labels as features for final task
Emailed seminar annmt entities
Newswire named entities
Email English words
18
Piecewise Training in Factorial CRFsfor Transfer
Learning
Sutton, McCallum, 2005
Use joint inference at test time.
Seminar Announcement entities
Newswire named entities
English words
An alternative to hierarchical Bayes. Neednt
know anything about parameterization of subtask.
Accuracy No transfer lt Cascaded Transfer lt
Joint Inference Transfer
19
CRF Transfer Learning Results
Sutton, McCallum, 2005
Seminar Announcements Dataset Freitag
1998 CRF location speaker stime etime
overall No transfer 73.7 81.0 99.1 97.3
87.8 Cascaded transfer 74.2 84.3 99.2 96.0
88.4 Joint transfer 76.3 85.3 99.1 96.0 89.2
New best published accuracy on common dataset
20
Outline
  • Information Extraction
  • Learning in the wild
  • Transfer learning
  • Identity Uncertainty
  • Modeling Groups, Roles and Topics

21
Joint Co-reference Decisions,Discriminative Model
Culotta McCallum 2005
People
Stuart Russell
Y/N
Stuart Russell
Y/N
Y/N
S. Russel
22
Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Organizations
Stuart Russell
University of California at Berkeley
Y/N
Y/N
Stuart Russell
Y/N
Berkeley
Y/N
Y/N
Y/N
S. Russel
Berkeley
23
Joint Co-reference of Multiple Entity Types
Culotta McCallum 2005
People
Organizations
Stuart Russell
University of California at Berkeley
Y/N
Y/N
Stuart Russell
Y/N
Berkeley
Y/N
Y/N
Y/N
Reduces error by 22
S. Russel
Berkeley
24
Joint Co-reference Experimental Results
Culotta McCallum 2005
CiteSeer Dataset 1500 citations, 900 unique
papers, 350 unique venues Paper
Venue indep joint indep joint constraint 88.
9 91.0 79.4 94.1 reinforce 92.2 92.2 56.5 60.1
face 88.2 93.7 80.9 82.8 reason 97.4 97.0 75
.6 79.5 Micro Average 91.7 93.4 73.1 79.1 ?
error20 ?error22
25
Outline
  • Information Extraction
  • Learning in the wild
  • Transfer learning
  • Identity Uncertainty
  • Modeling Groups, Roles and Topics

26
Social network from my email
27
Clustering words into topics withLatent
Dirichlet Allocation
Blei, Ng, Jordan 2003
28
Example topicsinduced from a large collection of
text
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED
SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PER
SON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECT
IONS CERTAIN
WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK
TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL
DIVE DOLPHIN UNDERWATER
Tennenbaum et al
29
Example topicsinduced from a large collection of
text
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED
SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PER
SON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECT
IONS CERTAIN
WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK
TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL
DIVE DOLPHIN UNDERWATER
Tennenbaum et al
30
From LDA to Author-Recipient-Topic
(ART)
31
Inference and Estimation
  • Gibbs Sampling
  • Easy to implement
  • Reasonably fast

r
32
Enron Email Corpus
  • 250k email messages
  • 23k people

Date Wed, 11 Apr 2001 065600 -0700 (PDT) From
debra.perlingiere_at_enron.com To
steve.hooser_at_enron.com Subject
Enron/TransAltaContract dated Jan 1, 2001 Please
see below. Katalin Kiss of TransAlta has
requested an electronic copy of our final draft?
Are you OK with this? If so, the only version I
have is the original draft without
revisions. DP Debra Perlingiere Enron North
America Corp. Legal Department 1400 Smith Street,
EB 3885 Houston, Texas 77002 dperlin_at_enron.com
33
Topics, and prominent sender/receiversdiscovered
by ART
Titles chosen by me
34
Topics, and prominent sender/receiversdiscovered
by ART
Beck Chief Operations Officer
Dasovich Government Relations
Executive Shapiro Vice Presidence of
Regulatory Affairs Steffes Vice President of
Government Affairs
35
Comparing Role Discovery
Traditional SNA
Author-Topic
ART
connection strength (A,B)
distribution over recipients
distribution over authored topics
distribution over authored topics
36
Comparing Role Discovery Tracy Geaconne ? Dan
McCarty
Traditional SNA
Author-Topic
ART
Different roles
Different roles
Similar roles
Geaconne Secretary McCarty Vice President
37
Comparing Role Discovery Tracy Geaconne ? Rod
Hayslett
Traditional SNA
Author-Topic
ART
Very similar
Not very similar
Different roles
Geaconne Secretary Hayslett Vice President
CTO
38
Comparing Role Discovery Lynn Blair ? Kimberly
Watson
Traditional SNA
Author-Topic
ART
Very different
Very similar
Different roles
Blair Gas pipeline logistics Watson
Pipeline facilities planning
39
Comparing Group Discovery Enron TransWestern
Division
Traditional SNA
Author-Topic
ART
Not
Not
Block structured
40
McCallum Email Corpus 2004
  • January - October 2004
  • 23k email messages
  • 825 people

From kate_at_cs.umass.edu Subject NIPS and
.... Date June 14, 2004 22741 PM EDT To
mccallum_at_cs.umass.edu There is pertinent stuff
on the first yellow folder that is completed
either travel or other things, so please sign
that first folder anyway. Then, here is the
reminder of the things I'm still waiting
for NIPS registration receipt. CALO
registration receipt. Thanks, Kate
41
McCallum Email Blockstructure
42
Four most prominent topicsin discussions with
____?
43
(No Transcript)
44
Two most prominent topicsin discussions with
____?
45
Topic 37
46
Topic 40
47
(No Transcript)
48
Pairs with highestrank difference between ART
SNA
5 other professors 3 other ML researchers
49
Role-Author-Recipient-Topic Models
50
Year Three Plans People
  • Extraction, for Expert-finding and Group/Role
    Analysis
  • Make learning-in-the-wild practical for
    extraction.
  • Transfer from noisy/incomplete databases to
    improve IE.
  • Support questions about contact info,
    organizational affiliation, etc.
  • Identity Uncertainty
  • Central problem for going from text to knowledge
    base.
  • Many interacting entity types, relationships.
  • Group/Role/Topic Analysis
  • Explicit topic models of groups, roles,
    expertise, tasks,and its interation with
    extraction...
  • Support Qs about topical expertise, forwarding
    messages, team building.
  • Etc.
  • Continue to support and enhance MALLET toolkit,
    in collaboration with UPenn and others.
Write a Comment
User Comments (0)
About PowerShow.com