Data Mining

About This Presentation

Title:

Data Mining

Description:

Thanks to Raymond J. Mooney in the University of Texas at Austin, Isabelle Guyon ... Predict who is likely to attrite next month. ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 85

Provided by: raym120

Category:

more less

Transcript and Presenter's Notes

Title: Data Mining

1
Data Mining Machine LearningIntroduction

Intelligent Systems Lab.
Soongsil University

Thanks to Raymond J. Mooney in the University of
Texas at Austin, Isabelle Guyon
2
Artificial Intelligence (AI) Research Areas
Learning Algorithms Inference Mechanisms Knowledge
Representation Intelligent System Architecture
Research
Intelligent Agents Information Retrieval Electroni
c Commerce Data Mining Bioinformatics Natural
Language Proc. Expert Systems
Artificial Intelligence
Application
Rationalism (Logical) Empiricism
(Statistical) Connectionism (Neural) Evolutionary
(Genetic) Biological (Molecular)
Paradigm
3
Artificial Intelligence (AI) Paradigms
4
What is Machine Learning?
Trained machine

Learning
algorithm

TRAINING DATA
Answer
?
Query
5
Definition of learning

Definition A computer program is said to learn
from experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E

Task, T
Experience, E
Task
Learned Program
Learning Program
Program
Performance
Performance, P
6
What is Learning?

Herbert Simon Learning is any process by which
a system improves performance from experience.

7
Machine Learning

Supervised Learning
Estimate an unknown mapping from known input-
output pairs
Learn fw from training set D(x,y) s.t.
Classification y is discrete
Regression y is continuous
Unsupervised Learning
Only input values are provided
Learn fw from D(x) s.t.
Clustering

8
Why Machine Learning?

Recent progress in algorithms and theory
Growing flood of online data
Computational power is available
Knowledge engineering bottleneck. Develop
systems that are too difficult/expensive to
construct manually because they require specific
detailed skills or knowledge tuned to a specific
task
Budding industry

9
Niches using machine learning

Data mining from large databases.
Market basket analysis (e.g. diapers and beer)
Medical records ? medical knowledge
Software applications we cant program by hand
Autonomous driving
Speech recognition
Self customizing programs to individual users.
Spam mail filter
Personalized tutoring
Newsreader that learns user interests

10
Trends leading to Data Flood

More data is generated
Bank, telecom, other business transactions ...
Scientific data astronomy, biology, etc
Web, text, and e-commerce

11
Big Data Examples

Europe's Very Long Baseline Interferometry (VLBI)
has 16 telescopes, each of which produces 1
Gigabit/second of astronomical data over a 25-day
observation session
storage and analysis a big problem
ATT handles billions of calls per day
so much data, it cannot be all stored -- analysis
has to be done on the fly, on streaming data

12
Largest databases in 2007

Commercial databases
ATT 312 TB
World Data Centre for Climate 220 TB
YouTube 45TB of videos
Amazon 42 TB (250,000 full textbooks)
Central Intelligence Agency (CIA) ?

13
Data Growth
In 2 years, the size of the largest database
TRIPLED!
14
Machine Learning / Data Mining Application areas

Science
astronomy, bioinformatics, drug discovery,
Business
CRM (Customer Relationship management), fraud
detection, e-commerce, manufacturing,
sports/entertainment, telecom, targeted
marketing, health care,
Web
search engines, advertising, web and text mining,
Government
surveillance, crime detection, profiling tax
cheaters,

15
Data Mining for Customer Modeling

Customer Tasks
attrition prediction
targeted marketing
cross-sell, customer acquisition
credit-risk
fraud detection
Industries
banking, telecom, retail sales,

16
Customer Attrition Case Study

Situation Attrition rate at for mobile phone
customers is around 25-30 a year !
With this in mind, what is our task?
Assume we have customer information for the past
N months.

17
Customer Attrition Case Study

Task
Predict who is likely to attrite next month.
Estimate customer value and what is the
cost-effective offer to be made to this customer.

18
Customer Attrition Results

Verizon Wireless built a customer data warehouse
Identified potential attriters
Developed multiple, regional models
Targeted customers with high propensity to accept
the offer
Reduced attrition rate from over 2/month to
under 1.5/month (huge impact, with gt30 M
subscribers)
(Reported in 2003)

19
Assessing Credit Risk Case Study

Situation Person applies for a loan
Task Should a bank approve the loan?
Note People who have the best credit dont need
the loans, and people with worst credit are not
likely to repay. Banks best customers are in
the middle

20
Credit Risk - Results

Banks develop credit models using variety of
machine learning methods.
Mortgage and credit card proliferation are the
results of being able to successfully predict if
a person is likely to default on a loan
Widely deployed in many countries

21
Successful e-commerce Case Study

Task Recommend other books (products) this
person is likely to buy
Amazon does clustering based on books bought
customers who bought Advances in Knowledge
Discovery and Data Mining, also bought Data
Mining Practical Machine Learning Tools and
Techniques with Java Implementations
Recommendation program is quite successful

22
Security and Fraud Detection - Case Study

Credit Card Fraud Detection
Detection of Money laundering
FAIS (US Treasury)
Securities Fraud
NASDAQ KDD system
Phone fraud
ATT, Bell Atlantic, British Telecom/MCI
Bio-terrorism detection at Salt Lake Olympics 2002

23
Example ProblemHandwritten Digit Recognition
Handcrafted rules will result in large no.
of rules and Exceptions Better to have a
machine that learns from a large training set
Wide variability of same numeral
24
Chess Game
In 1997, Deep Blue(IBM) beat Garry Kasparov(?).
Let IBMs stock increase by 18 billion at
that year
25
Some Successful Applications ofMachine Learning

Learning to drive an
autonomous vehicle
Train computer-controlled vehicles
to steer correctly
Drive at 70 mph for 90 miles on public
highways
Associate steering commands with
image sequence
1200 computer-generated images as
training examples
Half-hour training

An additional information from previous image
indicating the darkness or lightness of the road
26
Some Successful Applications ofMachine Learning

Learning to recognize spoken words
Speech recognition/synthesis
Natural language understanding/generation
Machine translation

27
Example 1 visual object categorization

A classification problem predict category y
based on image x.
Little chance to hand-craft a solution, without
learning.
Applications robotics, HCI, web search (a real
image Google..)

28
Face Recognition - 1
Given multiple angles/ views of a person, learn
to identify them. Learn to distinguish male from
female faces.
29
Face Recognition - 2
Learn to recongnize emotions, gestures Li, Ye,
Kambhametta, 2003
30
Robot
Sony AIBO robot Available on June 1, 1999
Weight 1.6 KG Adaptive learning and
growth capabilities Simulate emotion such as
happiness and anger
31
Robot
Honda ASIMO (Advanced Step in Innovate
MObility) Born on 31 October, 2001
Height 120 CM, Weight 52 KG
http//blog.makezine.com/archive/2009/08/asimo_avo
ids_moving_obstacles.html?CMPOTC-0D6B48984890
32
Biomedical / Biometrics

Medicine
Screening
Diagnosis and prognosis
Drug discovery
Security
Face recognition
Signature / fingerprint
DNA fingerprinting

33
Computer / Internet

Computer interfaces
Troubleshooting wizards
Handwriting and speech
Brain waves
Internet
Spam filtering
Text categorization
Text translation
Recommendation

7
34
Classification

Assign object/event to one of a given finite set
of categories.
Medical diagnosis
Credit card applications or transactions
Fraud detection in e-commerce
Worm detection in network packets
Spam filtering in email
Recommended articles in a newspaper
Recommended books, movies, music, or jokes
Financial investments
DNA sequences
Spoken words
Handwritten letters
Astronomical images

35
Problem Solving / Planning / Control

Performing actions in an environment in order to
achieve a goal.
Solving calculus problems
Playing checkers, chess, or backgammon
Driving a car or a jeep
Flying a plane, helicopter, or rocket
Controlling an elevator
Controlling a character in a video game
Controlling a mobile robot

36
Applications
37
Disciplines Related with Machine Learning

Artificial intelligence
?? ?? ??, ????, ????, ????? ??
Bayesian methods
?? ????? ??, naïve Bayes classifier, unobserved
?? ? ??
Computational complexity theory
?? ??, ?? ???? ??, ??? ? ?? ??? ??? ??? ??
Control theory
?? ??? ??? ????? ????? ?? ?? ??? ??

38
Disciplines Related with Machine Learning (2)

Information theory
Entropy? Information Content? ??, Minimum
Description Length, Optimal Code ? Optimal
Training? ??
Philosophy
Occams Razor, ???? ??? ??
Psychology and neurobiology
Neural network models
Statistics
??? ??? ??? ???? ??? ???, ????, ??? ??

39
Definition of learning

Definition A computer program is said to learn
from experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E

40
Example checkers
Task T Playing checkers. Performance measure P
of games won. Training experience E Practice
games by playing against itself.
41
Example Recognizing handwritten letters
Task T Recognizing and classifying handwritten
words within images. Performance
measure P words correctly classified. Training
experience E A database of handwritten words
with given
classifications.
42
Example Robot driving
Task T Driving on public four-lane highway using
vision sensors. Performance measure P Average
distance traveled before an error (as judged by a
human overseer). Training experience E A
sequence of images and steering commands recorded
while observing a human driver.
43
Designing a learning system
Task T Playing checkers. Performance measure P
of games won. Training experience E Practice
games by playing against itself.
What does this mean? and what can we learn
from it?
44
Measuring Performance

Classification Accuracy
Solution correctness
Solution quality (length, efficiency)
Speed of performance

45
Designing a Learning System

1. Choose the training experience
2. Choose exactly what is to be learned, i.e. the
target function.
3. Choose how to represent the target function.
4. Choose a learning algorithm to infer the
target function from the experience.

Learner
Environment/ Experience
Knowledge
Performance Element
46
Designing a Learning System
1. Choosing the Training Experience

Key Attributes
Direct/indirect feedback ? ????? ?
Direct feedback checkers state and correct move
Indirect feedback move sequence and final
outcomes
Credit assignment problem
Degree of controlling the sequence of training
example
Learner? ?? ??? ?? ? teacher? ??? ?? ??
Distribution of examples
Train examples? ??? Test examples? ??? ??
???? ??? ???? ???? ?? ??? ? ???? ?
??? ??? ??? ?? (The Checkers World Champion? ??
??? ??? ??? ?? ?? ?)

47
Training vs. Test Distribution

Generally assume that the training and test
examples are independently drawn from the same
overall distribution of data.
IID Independently and identically distributed
If examples are not independent, requires
collective classification.
(e.g. communication network, financial
transaction network, social network? ?? ?? ????
?)
If test distribution is different, requires
transfer learning. that is, achieving cumulative
learning

48
?? Transfer learning

Transfer learning is what happens when someone
finds it much easier to learn to play chess
having already learned to play checkers
or to recognize tables having already learned
to recognize chairs
or to learn Spanish having already learned
Italian.
Achieving significant levels of transfer learning
across tasks -- that is, achieving cumulative
learning -- is perhaps the central problem facing
machine learning.

49
Training Experience

Direct experience Given sample input and output
pairs for a useful target function.
Checker boards labeled with the correct move,
e.g. extracted from record of expert play
Indirect experience Given feedback which is not
direct I/O pairs for a useful target function.
Potentially arbitrary sequences of game moves and
their final game results.
Credit/Blame Assignment Problem How to assign
credit blame to individual moves given only
indirect feedback?

50
Source of Training Data

Provided random examples outside of the learners
control. (??? ???? ??? ??)
Negative examples available or only positive?
Good training examples selected by a benevolent
teacher. (Teacher ? ??)
Near miss examples
Learner can query an oracle about class of an
unlabeled example in the environment. (??? ?? ??)
Learner can construct an arbitrary example and
query an oracle for its label. (??? ???? ??? ??)
Learner can design and run experiments directly
in the environment without any human guidance.
(??? ???? ???? ??? ???? ???)

51
Designing a Learning System

1. Choose the training experience
2. Choose exactly what is to be learned, i.e. the
target function.
3. Choose how to represent the target function.
4. Choose a learning algorithm to infer the
target function from the experience.

Learner
Environment/ Experience
Knowledge
Performance Element
52
Designing a Learning System
2. Choosing a Target Function

?? ??? ?????, ?? ???? ??? ??? ?? ?? ? ??? ?
?? ??????, ??? ????? ??? ????? ???? ??? ??,
??? ??? ???? ????.
Could learn a function
1. ChooseMove B ?M(??? ???)
Or
2. Evaluation function, V B ? R
? ??? ??? ?? ??? ????? ?? ??? ???.
V? ??? ???? ?? ??? ??? ?? ????? ???? ??
??? ?? ?? ??? ?? ? ?? ???? ???? ??.

53
Designing a Learning System
2. Choosing the Target Function

A function that chooses the best move M for any B
ChooseMove B ?M
Difficult to learn
It is useful to reduce the problem of improving
performance P at task T, to the problem of
learning some particular target function.
An evaluation function that assigns a numerical
score to any B
V B ? R

54
The start of the learning work

Instead of learning ChooseMove we establish a
value function
target function, V B ? R
that maps any legal board state in B to some real
value in R.
?? Position??? ? Position? ???? ?? Score ? ???
??? ???? ????? ??.
1. if b is a final board state that is won, then
V (b) 100.
2. if b is a final board state that is lost, then
V (b) -100.
3. if b is a final board state that is drawn,
then V (b) 0.
4. if b is not a final board state, then V (b)

55
The start of the learning work

Instead of learning ChooseMove we establish a
value function
target function, V B ? R
that maps any legal board state in B to some real
value in R.
.
?? Position??? ? Position? ???? ?? Score ? ???
??? ???? ????? ??.
1. if b is a final board state that is won, then
V (b) 100.
2. if b is a final board state that is lost, then
V (b) -100.
3. if b is a final board state that is drawn,
then V (b) 0.
4. if b is not a final board state, then V (b)
V (b),
??? b ??? ? ?? ??? ??? ?? (the best final
board state)
(? ???? ???? ?? ??? ??)
Unfortunately, this did not take us any further!

56
Approximating V(b)

Computing V(b) is intractable since it involves
searching the complete exponential game tree.
Therefore, this definition is said to be
non-operational.
An operational definition can be computed in
reasonable (polynomial) time.
Need to learn an operational approximation to the
ideal evaluation function.

57
Designing a Learning System

1. Choose the training experience
2. Choose exactly what is to be learned, i.e. the
target function.
3. Choose how to represent the target function.
4. Choose a learning algorithm to infer the
target function from the experience.

Learner
Environment/ Experience
Knowledge
Performance Element
58
3. Choosing a Representation for the Target
Function

Describing the function
Tables
Rules
Polynomial functions
Neural nets
Trade-off in choice
Expressive power
Size of training data
? ?? ??? ? ?? ??? ?? ??? ? ?? ??. ? ?? ?? ????
?? ??? ?? ? ??.
??? ??? ??? ?? ??? ? ?? ??? ???? ? ??? ???? ??.

59
Approximate representation
w1 - w6 weights
60
Linear Function for Representing V(b)

Use a linear approximation of the evaluation
function.

(b) w0 w1x1 w2x2 w3x3 w4x4 w5x5
w6x6
61
Designing a Learning System

1. Choose the training experience
2. Choose exactly what is to be learned, i.e. the
target function.
3. Choose how to represent the target function.
4. Choose a learning algorithm to infer the
target function from the experience.

Learner
Environment/ Experience
Knowledge
Performance Element
62
4. Choosing a Function Approximation Algorithm

A training example is represented as an ordered
pair ltb, Vtrain(b) gt
b board state
Vtrain(b) training value for b
Instance black has won the game
ltltx13, x20, x31, x40, x50,
x60gt, 100gt
(x2 0) indicates that white has
no remaining pieces.
Estimating training values for intermediate board
states
Vtrain(b) ? (Successor(b))
current approximation to V, (? the learned
function, hypothesis)
Successor(b) the next board state, ? b1 state
??? b ????? ?? training value? ? ????? ???
??(b1)? ??? ???? ??

63
DESIGNING A LEARNING SYSTEM
Estimating Training Values
64
???? Temporal Difference Learning

Estimate training values for intermediate
(non-terminal) board positions by the estimated
value of their successor in an actual game trace.
where successor(b) is the next board position
where it is the programs move in actual
play.
Values towards the end of the game are initially
more accurate and continued training slowly
backs up accurate values to earlier board
positions.

65
How to learn?
66
How to learn?
67
How to change the weights?
68
How to change the weights?
69
Obtaining Training Values

Direct supervision may be available for the
target function.
With indirect feedback, training values can be
estimated using temporal difference learning
(used in reinforcement learning where supervision
is delayed reward).

70
Learning Algorithm

Uses training values for the target function to
induce a hypothesized definition that fits these
examples and hopefully generalizes to unseen
examples.
In statistics, learning to approximate a
continuous function is called regression.
Attempts to minimize some measure of error (loss
function) such as mean squared error

71
The LMS(Least Mean Square) weight update rule

Due to mathematical reasoning, the following
update rule is very sensible.

72
LMS Discussion

Intuitively, LMS executes the following rules
??? ??? ??(the output for an example ) ? ?????,
??? ?? ???.
??? ??? ?? ? ?? ?? ???, ?? features? ??
???? weight?? ???. ??? ???? ??? ??? ???? ??.
??? ??? ?? ? ?? ?? ???, ?? features? ??
???? weight?? ???. ??? ???? ??? ??? ???? ??.
Under the proper weak assumptions, LMS can be
proven to eventually converge to a set of weights
that minimizes the mean squared error.

73
Lessons Learned about Learning

Learning? ?
??? target function ? ???(approximation) ??
?? direct or indirect experience? ????.
Function approximation ?? ?
a space of hypotheses?? training data?? ?? ???
??(hypotheses)? ???? Search? ?? ?
Different learning methods assume different
hypothesis spaces (representation languages)
and/or employ different search techniques.

74
Various Function Representations

Numerical functions
Linear regression
Neural networks
Support vector machines
Symbolic functions
Decision trees
Rules in propositional logic
Rules in first-order predicate logic
Instance-based functions
Nearest-neighbor
Case-based
Probabilistic Graphical Models
Naïve Bayes
Bayesian networks
Hidden-Markov Models (HMMs)
Probabilistic Context Free Grammars (PCFGs)
Markov networks

75
Various Search Algorithms

Gradient descent
Perceptron
Backpropagation
Dynamic Programming
HMM Learning
Probabilistic Context Free Grammars (PCFGs)
Learning
Divide and Conquer
Decision tree induction
Rule learning
Evolutionary Computation
Genetic Algorithms (GAs)
Genetic Programming (GP)
Neuro-evolution

76
Evaluation of Learning Systems

Experimental
Conduct controlled cross-validation experiments
to compare various methods on a variety of
benchmark datasets.
Gather data on their performance, e.g. test
accuracy, training-time, testing-time.
Analyze differences for statistical significance.
Theoretical
Analyze algorithms mathematically and prove
theorems about their
Computational complexity
Ability to fit training data
Sample complexity (number of training examples
needed to learn an accurate function)

77
Core parts of the machine learning
( )
(Initial game board)
(Game history)
Many machine learning systems can be usefully
characterized in terms of these four generic
modules.
78
Four Components of a Learning System(1)

Performance system
- Solve the given performance task
- Use the learned target function
- New problem ? trace of its solution
Critic
- Output a set of training examples of the
target function

79
Four Components of a Learning System (2)

Generalizer
Input training example
Output hypothesis (estimate of the target
function)
Generalizes from the specific training examples
Hypothesizes a general function
Experiment generator
Input - current hypothesis
Output - a new problem
Picks new practice problem maximizing the
learning rate

80
History of Machine Learning

1950s
Samuels checker player
Selfridges Pandemonium
1960s
Neural networks Perceptron
Pattern recognition
Learning in the limit theory
Minsky and Papert prove limitations of Perceptron
1970s
Symbolic concept induction
Winstons arch learner
Expert systems and the knowledge acquisition
bottleneck
Quinlans ID3
Michalskis AQ and soybean diagnosis
Scientific discovery with BACON
Mathematical discovery with AM

81
History of Machine Learning (cont.)

1980s
Advanced decision tree and rule learning
Explanation-based Learning (EBL)
Learning and planning and problem solving
Utility problem
Analogy
Cognitive architectures
Resurgence of neural networks (connectionism,
backpropagation)
Valiants PAC Learning Theory
Focus on experimental methodology
1990s
Data mining
Adaptive software agents and web applications
Text learning
Reinforcement learning (RL)
Inductive Logic Programming (ILP)
Ensembles Bagging, Boosting, and Stacking
Bayes Net learning

82
History of Machine Learning (cont.)

2000s
Support vector machines
Kernel methods
Graphical models
Statistical relational learning
Transfer learning
Sequence labeling
Collective classification and structured outputs
Computer Systems Applications
Compilers
Debugging
Graphics
Security (intrusion, virus, and worm detection)
E mail management
Personalized assistants that learn
Learning in robotics and vision

83
Remind

Learning as search in a space of possible
hypotheses
Learning methods are characterized by their
search strategies and by the underlying structure
of the search spaces.

84
Issues in Machine Learning