Mining Sequential Patterns - PowerPoint PPT Presentation

About This Presentation

Title:

Mining Sequential Patterns

Description:

Title: Mining Sequential Patterns Author: Mika Klemettinen Description: Nokia Standard Presentation Template - A4 v. 4 2000/01/05 Eric Beasley Fixed RGB values for ... – PowerPoint PPT presentation

Number of Views:153

Avg rating:3.0/5.0

Slides: 18

Provided by: MikaKlem3

Category:

more less

Transcript and Presenter's Notes

Title: Mining Sequential Patterns

1
Course on Data Mining (581550-4) Seminar
Meetings
Ass. Rules
Clustering
P
P
Episodes
KDD Process
P
M
Text Mining
Home Exam
M
2
Course on Data Mining (581550-4) Seminar
Meetings
Today 16.11.2001

R. Feldman, M. Fresko, H. Hirsh, et.al.
"Knowledge Management A Text Mining Approach",
Proc of the 2nd Int'l Conf. on Practical Aspects
of Knowledge Management (PAKM98), 1998
B. Lent, R. Agrawal, R. Srikant "Discovering
Trends in Text Databases", Proc. of the 3rd Int'l
Conference on Knowledge Discovery in Databases
and Data Mining, 1997.

3
Course on Data Mining (581550-4) Seminar
Meetings
Good to Read as Background

Both papers refer to the Agrawal and Srikant
paper we had last week
Rakesh Agrawal and Ramakrishnan Srikant Mining
Sequential Patterns. Int'l Conference on Data
Engineering, 1995.

4
Knowledge Management A Text Mining Approach

R. Feldman, M. Fresko, H. Hirsh, et.al
Bar-Ilan University and Instict Software, ISRAEL
Rutgers University, USA LIA-EPFL, Switzerland
Published in PAKM'98 (Int'l Conf. on Practical
Aspects of Knowledge Management)
Data Mining course Autumn 2001/University of
Helsinki
Summary by Mika Klemettinen

5
KM A Text Mining Approach

Basic idea (see selected phases on the next
slides)
1. Get input data in SGML (or XML) format
Select only the contents of desired elements!
(title, abstract, etc.)
2. Do linguistic preprocessing
2.1 Term extraction (use linguistic software
for this)
2.2 Term generation (combine adjacent terms to
morpho- syntactic patterns like "noun-noun",
"adj.-noun", etc. by calculating association
coefficients)
2.3 Term filtering (select only the top M most
frequent ones)
3. Create taxonomies (there is a tool for this)
4. Generate associations (you may constrain the
creation)
5. Visualize/explore the results

6
2.1 Term Extraction
7
3 Taxonomy Construction
8
4 Association Rule Generation
9
4 Association Rule Generation
10
5.1 Visualization/Exploration
11
5.2 Visualization/Exploration
12
Discovering Trends in Text Databases

Brian Lent, Rakesh Agrawal and Ramakrishnan
Srikant
IBM Almaden Research Center, USA
Published in KDD'97
Data Mining course Autumn 2001/University of
Helsinki
Summary by Mika Klemettinen

13
Discovering Trends in Text Databases

Basic ideas
Identify frequent phrases using sequential
patterns mining (see the slides summaries from
the Agrawal et. al paper "Mining Sequential
Patterns" (MSP))
Generate histories of phrases
Find phrases that satisfy a specified trend
Definitions
Phrase phrase p is ? (w1)(w2) (wn ) ?, where w
is a word
1-phrase ? ?(IBM)? ?(data)(mining)? ?
2-phrase ? ?(IBM)? ?(data)(mining)? ? ?
?(Anderson) (Consulting) ? ?(decision)(support)?
?
Itemset, sequence, is contained, etc. as in MSP
paper

14
Discovering Trends in Text Databases

Gaps Minimum and maximum gaps between adjacent
words identify relations of words/phrases inside
sentences/paragraphs, between words/phrases in
different paragraphs, between words/phrases in
different sections, etc.
Sentence boundary 1000
Paragraph boundary 100.000
Section boundary 10.000.000
Phases
Partition data/documents based on their time
stamps, create phrases for each partition (Lent
al. have patent data documents)
Select the frequent phrases and save their
frequences
Define shape queries using SDL (Shape Definition
Language)