MASS COLLABORATION AND DATA MINING - PowerPoint PPT Presentation

About This Presentation

Title:

MASS COLLABORATION AND DATA MINING

Description:

Auto. Email. Manual. Email. Chat. Call. Center. 2nd Tier Support. 50% 40% 10 ... How do we incorporate the insights obtained by mining into the search phase? ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 42

Provided by: JoshuaS97

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: MASS COLLABORATION AND DATA MINING

1
MASS COLLABORATION AND DATA MINING

Raghu Ramakrishnan
Founder and CTO, QUIQ
Professor, University of Wisconsin-Madison
Keynote Talk, KDD 2001, San Francisco

2
DATA MINING
Extracting actionable intelligence from large
datasets

Is it a creative process requiring a unique
combination of tools for each application?
Or is there a set of operations that can be
composed using well-understood principles to
solve most target problems?
Or perhaps there is a framework for addressing
large classes of problems that allows us to
systematically leverage the results of mining.

3
MINING APPLICATION CONTEXT

Scalability is important.
But when is 2x speed-up or scale-up important?
When is 10x unimportant?
What is the appropriate measure, model?
Recall, precision
MT for search vs. MT for content conversion

Answers to these questions come from the context
of the application.
4
TALK OUTLINE

A New Approach to Customer Support
Mass Collaboration
Technical challenges
A framework and infrastructure for P2P knowledge
capture and delivery
Role of data mining
Confluence of DB, IR, and mining

5
TYPICAL CUSTOMER SUPPORT
Web Support KB
Customer
Support Center
6
TRADITIONAL KNOWLEDGE MANAGMENT
QUESTION
KNOWLEDGE BASE
ANSWER
EXPERTS
Knowledge created and structured by trained
experts using a rigorous process.
CONSUMERS
7
MASS COLLABORATION
QUESTION
KNOWLEDGE BASE
People using the web to share knowledge and help
each other find solutions
SELF SERVICE
Answer added to power self service
MASS COLLABORATION
ANSWER
-Experts -Partners -Customers -Employees
8
TIMELY ANSWERS
77 of answers are provided within 24h
6,845

No effort to answer each question
No added experts
No monetary incentives for enthusiasts

86 (4,328)
74 answered
77 (3,862)
65 (3,247)
40 (2,057)
Answers provided in 12h
Answers provided in 24h
Answers provided in 3h
Answers provided in 48h
Questions
9
MASS CONTRIBUTION
Users who on average provide only 2 answers
provide 50 of all answers
Answers
100 (6,718)
Contributed by mass of users
50 (3,329)
Top users
Contributing Users
7 (120)
93 (1,503)
10
POWER OF KNOWLEDGE CREATION
SUPPORT
SHIELD 1
SHIELD 2
Knowledge Creation
Self-Service )
- 85
Customer Mass Collaboration )
- 64
5
Support Incidents
Agent Cases
) Averages from QUIQ implementations
11
TYPICAL SERVICE CHAIN
40
50
10
Self Service Knowledge base
Auto Email
Manual Email
Call Center
2nd Tier Support
FAQ
Chat

QUIQ SERVICE CHAIN
80
15
5
QUIQ
QUIQ
2nd Tier Support
Self Service
Manual Email
Call Center
Chat
Mass Collaboration

12
CASE STUDIES COMPAQ
In newsgroups, conversations disappear and you
have to ask the same question over and over
again. The thing that makes the real difference
is the ability for customers to collaborate and
have information be persistent. Thats how we
found QUIQ. Its exactly the philosophy were
looking for.
Tech support people cant
keep up with generating content and are not
experts on how to effectively utilize the product
Mass Collaboration is the next step in Customer
Service. Steve Young, VP of Customer Care,
Compaq
13
ASP 2001 Top Ten Support Site
Austin-based National Instruments deployed a
Network to capture the specialized knowledge of
its clients and take the burden off its costly
support engineers, and is pleased with the
results. QUIQ increased customers participation,
flattened call volume and continues to do the
work of 50 support engineers.

David Daniels, Jupiter Media Metrix
14
MASS COLLABORATION
Internet-scale P2P knowledge sharing
Communities Knowledge Management Service
Workflows
Mass Collaboration
Many Experts
Support Newsgroups
Few Experts
Support Knowledge Base
Call Center
Solutions
Interactions
15
CORPORATE MEMORY Untapped Knowledge in Extended
Business Community
16
User-to-User Exchange
User-to-Enthusiast
Structured User Forum
User-to-Expert
Self-Organizing
Incentive to Participate
User Acquisition
Web Site
Areas of Interest
17
GOALS ISSUES

Interactions must be structured to encourage
creation of solutions
Resolve issue escalate if necessary
Capture knowledge from interactions
Encourage participation
Sociology
Privacy, security
Credibility, authority, history
Accountability, incentives

18
REQUIRED CAPABILITIES

Roles Credibility, administration
Moderators, experts, editors, enthusiasts
Groups Privacy, security, entitlements
Departments, gold customers
Workflow QoS, validation, escalation

19
TECHNICAL CHALLENGES
20
SEARCHING PEOPLE-BASES
ROUTING, NOTIFICATION
?
SEARCH
If its not there, find someone who knows - And
get it there (knowledge creation)!
21
QUIQ, the Best in Class Support Channel
SUPPORT
Email Support
Call Center
Automated Emails 1)
-20
100
80
Support Incidents
Agent Cases
Support Incidents
Agent Cases
Mass Collaboration
Web Self-Service
Knowledge Creation
Self-Service 2)
Self-Service
-42
-85
Customer Mass Collaboration
-64
68
5
Support Incidents
Agent Cases
Support Incidents
Agent Cases
1) Source QUIQ Client Information 2) Source
Association of Support Professionals
22
SEARCH AND INDEXING

User types in How can I configure the IP address
on my Presario?
Need to find most relevant content that is of
high quality and is approved for external
viewing, and that this user is entitled to see
based on her roles, groups, and service levels.
User decides to post question because no good
answer was found in the KB.
Search controls when experts and other users will
see this new question need to make this
real-time.
Concurrency, recovery issues!

23
SEARCH AND INDEXING

Data is organized into tabular channels
Questions, responses, users,
Each item has several fields, e.g., a question
Author id, author status, service level, item
popularity metrics, rating metrics, answer
status, approval status, visibility group, update
timestamp, notification timestamp, usage
signature, category, relevant products, relevant
problems, subject, body, responses

Which 5 items should be returned?
24
RUNTIME ARCHITECTURE
Web server
Web server
Hive Manager
Email
Real-time Indexing, Caching, Alerts
Cache
Alerts
Indexer
Files, Logs
DBMS
Warehouse
RAID STORAGE
25
LEARNING FROM ACTIVITY DATA TO KNOWLEDGE
Periodic offline activity
Miner
Indexer
Large R/W
Small reads
Files, Logs
DBMS
Warehouse
RAID STORAGE
26
SEARCH AND INDEXING
Which 5 items should be returned?

Question text, user attributes, system policies
IR-style ranked output
Search constraints
Show matches subject match twice as important
Show only approved answers to non-editors
Give preference to category Laptop
Give preference to recent solutions
Weight quality of solution

27
VECTOR SPACE MODEL

Documents, queries are vectors in term space
Vector distance from the query is used to rank
retrieved documents

...,
,
w
w
w
Q
1
,
12
11
1
t

...,
,
w
w
w
D
2
,
22
21
2
t
t
å

w
ed
unnormaliz

)
,
(
w
D
Q
sim
i
2
1
2
1
i

1
i
ith term in summation can be seen as the
relevance contribution of term i
28
TF-IDF DOCUMENT VECTOR
29
A HYBRID DB-IR SYSTEM

Searches are queries with three parts
Filter
DB-style yes/no criteria
Match
TF-IDF relevance based on a combination of fields
Quality
Relevance boost based on a policy

30
A HYBRID DB-IR SYSTEM

A query is built up from atomic constraints using
Boolean operators.
Atomic constraint
value op term, constraint-type
Terms are drawn from discrete domains and are of
two types hierarchy and scalar
Constraint-type is exact or approximate

31
A HYBRID DB-IR SYSTEM

Applying an atomic constraint to a set of items
returns a tagged result set
The result inherits the constraint-type
Each result item has a (TF-IDF) relevance score
0 for exact
Combining two tagged item sets using Boolean
operators yields a tagged set
The result type is exact if both inputs are
exact, and approximate otherwise
Result contains intersection of input item sets
if either input is exact union otherwise
Each result item is tagged with a combined
relevance

32
A HYBRID DB-IR SYSTEM

Semantics of Boolean expressions over constraints
is associative and commutative
Evaluating exact constraints and approximate
constraints separately (in DB and IR subsystems)
is a special case. Additionally
Uniform handling of relevance contributions of
categories, popularity metrics, recency, etc.
Absolute and relative relevance modifiers can be
introduced for greater flexibility.

33
CONCURRENCY, RECOVERY, PARALLELISM

Concurrency
Index is updated in real-time
Automatic partitioning, two-step locking protocol
result in very low overhead
Relies upon post-processing to address some
anomalies
Recovery
Partitioning is again the key
Leverages recovery guarantees of DBMS
Approach also supports efficient refresh of
global statistics
Parallelism
Hash based partitioning

34
NOTIFICATION

Extension of search Each user can define one or
more standing searches, and request instant or
periodic notification.
Boolean combinations of atomic constraints.
Major challenges
Scaling with number of standing searches.
Requires multiple timestamps, indexing searches.
Exactly-once delivery property.
Many subtleties center around notifiability of
updates!

35
ROLE OF DATA MINING
36
DATA MINING TASKS

There is a lot of insight to be gained by
analyzing the data.
What will help the user with her problem?
Who does a given user trust?
Characteristic metrics for high-quality content.
Identify helpful content in similar, past
queries.
Summarize content.
Who can answer this question?

37
LEVERAGING DATA MINING

How do we get at the data?
Relevant information is distributed across
several sources, not just the DBMS.
Aggregated in a warehouse.
How do we incorporate the insights obtained by
mining into the search phase?
Need to constantly update info about every piece
of content (Qs, As, users )

38
LEVERAGING DATA MINING

Three-step approach
Off-line analysis to gather new insight
Periodic refresh indexes
Use insight (from KB/index) to improve search
using the extended DB/IR query framework

Use mining to create useful metadata
39
SOME UNIQUE TWISTS

Identify the kinds of feedback that would be
helpful in refining a search.
I.e., Not just specific terms, but the types of
concepts that would be useful discriminators
(e.g., a good hierarchy of feedback concepts)
Metrics of quality
Link-analysis is a good example, but what are the
links here?
Self-tuning searches
The more the knobs, the more the choices
Next step self-personalizing searches?