Title: Database Management System Recent Advances
1Database Management SystemRecent Advances
- By
- Prof. Dr.O.P. Vyas
- M.Tech.(CS), Ph.D. (
I.I.T. Kharagpur ) - DAAD Fellow (
Germany ) - AOTS Fellow ( Japan)
- Professor Head ( Computer Science)
- Pt. R.S. University Raipur (CG)
- Visiting Prof. Rostock University Germany
2Contents ADBMS
- Concepts of Association Rule Mining
- ARM Basics
- Problems with Apriori
- Apriori Vs. FP tree
- ARM Variants
- Classification Rule Mining
- Classification techniques
- Classifiers
- Various classifiers
- Classification Prediction
- Classification accuracy
- Mining Complex data types
- Complex data types
- Data mining Process Integration with existing
Technology
3Time Line of Data Mining Development
Time Area Contribution
Late 1700s Stat Bayes theorem of probability
Early 1900s Stat Regression analysis
Early 1920s Stat Maximum likelihood estimate
Early 1940s AI Neural Networks
Early 1950s Nearest neighbor
Early 1950s Single Link
Late 1950s AI Perceptron
Late 1950s Stat Resembling,Bias reduction , jackknife estimator
Early 1960s AI ML started
Early 1960s DB Batch reports
Mid 1960s Decision trees
Mid 1960s Stat Linear models for classification
IR Similarity measures
Time Area Contribution
Mid 1960s IR Clustering
Stat Exploratory data analysis(EDA)
Late 1960s DB Relational data model
Early 1970s IR SMART IR systems
Mid 1970s AI Genetic algorithms
Late 1970s Stat Estimation with incomplete data (EM algorithm)
Late 1970s Stat K- means clustering map
Early 1980s AI Kohonen self organizing map
Mid 1980s AI Decision tree algorithm
Early 1990s DB Association rule algorithms web and search engines
1990s DB Data warehousing
1990s DB Online analytic processing(OLAP)
4Data Mining Functionalities
5Association Rules
- Retail shops are often interested in associations
between different items that people buy. - Someone who buys bread is quite likely also to
buy milk - A person who bought the book Database System
Concepts is quite likely also to buy the book
Operating System Concepts. - Association information can be used in several
ways. - E.g. when a customer buys a particular book, an
online shop may suggest associated books. - Association rules
- bread ? milk
DB-Concepts, OS-Concepts ? Networks - Left hand side antecedent, right hand side
consequent - An association rule is a pattern that states when
Antecedent occurs, Consequent occurs with certain
probability.
6Association Rules (Cont.)
- Rules have an associated support, as well as an
associated confidence. - Support is a measure of what fraction of the
population satisfies both the antecedent and the
consequent of the rule. - E.g. suppose only 0.001 percent of all purchases
include milk and screwdrivers. The support for
the rule is milk ? screwdrivers is low. - We usually want rules with a reasonably high
support - Rules with low support are usually not very
useful - Confidence is a measure of how often the
consequent is true when the antecedent is true. - E.g. the rule bread ? milk has a confidence of
80 percent if 80 percent of the purchases that
include bread also include milk. - Usually want rules with reasonably large
confidence. - A rule with a low confidence is not meaningful.
- Note that the confidence of bread ? milk may be
very different from the confidence of milk ?
bread, although both have the same supports.
7A.R.M model data
- A.R.M. was initially applied to Market Basket
Analysis on transaction data of Supermarket
sales. - I i1, i2, , im a set of items.
- Transaction t
- t a set of items, and t ? I.
- Transaction Database T a set of transactions T
t1, t2, , tn.
8Transaction data supermarket data
- Market basket transactions
- t1 bread, cheese, milk
- t2 apple, eggs, salt, yogurt
-
- tn biscuit, eggs, milk
- Concepts
- An item an item/article in a basket
- I the set of all items sold in the store
- A transaction items purchased in a basket it
may have TID (transaction ID) - A transactional dataset A set of transactions
9Transaction data a set of documents
- A text document data set. Each document is
treated as a bag of keywords - doc1 Student, Teach, School
- doc2 Student, School
- doc3 Teach, School, City, Game
- doc4 Baseball, Basketball
- doc5 Basketball, Player, Spectator
- doc6 Baseball, Coach, Game, Team
- doc7 Basketball, Team, City, Game
10The model rules
- A transaction t contains X, a set of items
(itemset) in I, if X ? t. - An association rule is an implication of the
form - X ? Y, where X, Y ? I, and X ?Y ?
- An itemset is a set of items.
- E.g., X milk, bread, cereal is an itemset.
- A k-itemset is an itemset with k items.
- E.g., milk, bread, cereal is a 3-itemset
11Rule strength measures
- Support The rule holds with support sup in T
(the transaction data set) if sup of
transactions contain X ? Y. - sup Pr(X ? Y).
- Confidence The rule holds in T with confidence
conf if conf of tranactions that contain X also
contain Y. - conf Pr(Y X)
- An association rule is a pattern that states when
X occurs, Y occurs with certain probability.
12Mining Association RulesAn Example
Let us take the Min. support 50 Min. confidence
50
- For rule A ? C
- support support(A ?C) 50
- confidence support(A ?C)/support(A) 66.6
- A ? C (50, 66.6)
- C ? A (50, 100)
- The Apriori principle
- Any subset of a frequent itemset must be frequent
13The Apriori Algorithm
- Join Step Ck is generated by joining Lk-1with
itself - Prune Step Any (k-1)-itemset that is not
frequent cannot be a subset of a frequent
k-itemset - Pseudo-code
- Ck Candidate itemset of size k
- Lk frequent itemset of size k
- L1 frequent items
- for (k 1 Lk !? k) do begin
- Ck1 candidates generated from Lk
- for each transaction t in database do
- increment the count of all candidates in
Ck1 that are
contained in t - Lk1 candidates in Ck1 with min_support
- end
- return ?k Lk
14The Apriori Algorithm Example
Database D
L1
C1
Scan D
C2
C2
L2
Scan D
C3
L3
Scan D
15 Generating rules from frequent itemsets
- Frequent itemsets ? association rules
- One more step is needed to generate association
rules - For each frequent itemset X,
- For each proper nonempty subset A of X,
- Let B X - A
- A ? B is an association rule if
- Confidence(A ? B) minconf,
- support(A ? B) support(A?B) support(X)
- confidence(A ? B) support(A ? B) / support(A)
16Generating association rules..
- Once the frequent itemsets from transactions in a
database D have been found, it is straightforward
to generate strong association rules from them (
where strong association rules satisfy both
minimum support and minimum confidence). - To recap, in order to obtain A ? B, we need to
have support(A ? B) and support(A) - All the required information for confidence
computation has already been recorded in itemset
generation. No need to see the data T any more. - This step is not as time-consuming as frequent
itemsets generation.
17Goal and key features
- Goal Find all rules that satisfy the
user-specified minimum support (minsup) and
minimum confidence (minconf). - Key Features
- Completeness find all rules.
- No target item(s) on the right-hand-side
- Mining with data on hard disk (not in memory)
18Mining Association Rules in Large Databases
- Association rule mining Association rule can be
classified into categories based on different
criteria such as - 1. Based on types of Values handled in the rule,
associations can be classified into Boolean Vs.
quantitative. A Boolean association shows
relationships between discrete (categorical)
objects. A quantitative association is a
multidimensional association. Example of
quantitative association rule, where X is is a
variable representing a customer - Age (x,30.39) income (X,42k48k) ? buys (
X, high resolution TV) - Note that quantitative attributes, age and income
have been discretized - 2. Based on dimensions of data involved in the
rule. - Ex. Purchase (X, computer )? Purchase (X,
financial software) is a single dimensional
association rule, and if date/time of purchase
is added, it becomes multidimensional. - 3. Multilevel Association Rule Mining
- 4. Multi Dimensional A.R.M.
19Mining Multiple-Level Association Rules
- Items often form hierarchies
- Flexible support settings
- Items at the lower level are expected to have
lower support - Exploration of shared multi-level mining (Agrawal
Srikant_at_VLB95, Han Fu_at_VLDB95)
20Multi-level Association Redundancy Filtering
- Some rules may be redundant due to ancestor
relationships between items. - Example
- milk ? wheat bread support 8, confidence
70 - 2 milk ? wheat bread support 2, confidence
72 - We say the first rule is an ancestor of the
second rule. - A rule is redundant if its support is close to
the expected value, based on the rules
ancestor.
21Mining Multi-Dimensional Association
- Single-dimensional rules
- buys(X, milk) ? buys(X, bread)
- Multi-dimensional rules ? 2 dimensions or
predicates - Inter-dimension assoc. rules (no repeated
predicates) - age(X,19-25) ? occupation(X,student) ?
buys(X, coke) - hybrid-dimension assoc. rules (repeated
predicates) - age(X,19-25) ? buys(X, popcorn) ? buys(X,
coke) - Categorical Attributes finite number of possible
values, no ordering among valuesdata cube
approach - Quantitative Attributes numeric, implicit
ordering among valuesdiscretization, clustering,
and gradient approaches
22Mining Association Rules in Large Databases
- Mining single-dimensional Boolean association
rules from transactional databases - The Apriori Algorithm- an influential algorithm
for mining frequent itemsets for boolean
association rules, it uses prior knowledge of
frequent itemset properties. - Apriori employs an iterative approach known as a
level-wise search, where k-itemsets are used to
explore (k1) itemsets. - First the set of frequent 1-itemsets is found.
This set is denoted as L1. L1 is used to find
the set of frequent 2-itemset, L2. And so on
until no more frequent k-itemsets can be found. - The finding of each Lk requires one full scan of
the database.
23Many ARM algorithms
- There are a large number of them!!
- They use different strategies and data
structures. - Their resulting sets of rules are all the same.
- Given a transaction data set T, and a minimum
support and a minimum confident, the set of
association rules existing in T is uniquely
determined. - Any algorithm should find the same set of rules
although their computational efficiencies and
memory requirements may be different. We study
only one the Apriori Algorithm
24On Apriori Algorithm
- Seems to be very expensive
- Level-wise search
- K the size of the largest itemset
- It makes at most K passes over data
- In practice, K is bounded (10).
- The algorithm is very fast. Under some
conditions, all rules can be found in linear
time. - Scale up to large data sets
- Clearly the space of all association rules is
exponential, O(2m), where m is the number of
items in I. - The mining exploits sparseness of data, and high
minimum support and high minimum confidence
values. - Still, it always produces a huge number of rules,
thousands, tens of thousands, millions, ...
25UCI KDD Archive http//kdd.ics.uci.edu
- This is an online repository of large data sets
which encompasses a wide variety of data types,
analysis tasks, and application areas. - The primary role of this repository is to enable
researchers in knowledge discovery and data
mining to scale existing and future data analysis
algorithms to very large and complex data sets. - . The archive is intended to serve as a permanent
repository of publicly-accessible data sets for
research in KDD and data mining. It complements
the original UCI Machine Learning Archive , which
typically focuses on smaller classification-orient
ed data sets.
26ARM Implementations
- Many implementations of Apriori Algorithm are
available - http//www.cs.bme.hu/bodon/en/apriori/
- (APRIORI
implementation of Ferenc Bodon) - http//www.csc.liv.ac.uk/frans/KDD/Software/Aprio
ri-T_GUI/aprioriT_GUI.html - Apriori-T (Apriori Total) is an Association
Rule Mining (ARM) algorithm, developed by the
LUCS-KDD research team The code obtainable from
this page is a GUI version that inludes (for
comparison purpopses) implementations of Brin's
DIC algorithm (Brin et al. 1997) and Toivonon's
negative boarder ARM approach (Toivonen 1996) - http//www.csc.liv.ac.uk/frans/KDD/Software/FPgro
wth/fpGrowth.html - ( Implementation of FP growth method )
- DBMiner is data mining system which runs on top
of Microsoft SQL server 7.0 Plato system.
27A.R.M. ImplementationsExample
- In DBMiner, three kinds of associations could be
possibly mined - Inter-dimensional association. Associations among
or across two or more dimensions. - Customer-Country("Canada") gt Product-SubCategory(
"Coffee") i.e. Canadian customers are likely to
buy coffee. - 2. Intra-dimensional association. Associations
present within one dimension grouped by another
one or several dimensions. For example, if you
want to find out which products customers in
Canada are likely to purchase together - Within Customer-Country("Canada")
Product-ProductName("CarryBags") gt
Product-ProductName("Tents")i.e. Customers in
Canada, who buy carry-bags, are also likely to
buy tents. - 3. Hybrid association. Associations combining
elements of both inter- and intra-dimensional
association mining. For example, - Within Customer-Country("Canada")
Product("Carry Bags") gt Product("Tents"),
Time("Q3")i.e. Customers in Canada, who buy
carry-bags, also tend to buy tents and do so most
often in the 3rd quarter of the year (Jul, Aug,
Sep).
28Visualization of Association Rules Plane Graph
29Problems with the association mining
- Single minsup It assumes that all items in the
data are of the same nature and/or have similar
frequencies. - Not true In many applications, some items appear
very frequently in the data, while others rarely
appear. - E.g., in a supermarket, people buy food
processor and cooking pan much less frequently
than they buy bread and milk.
30Rare Item Problem
- If the frequencies of items vary a great deal, we
will encounter two problems - If minsup is set too high, those rules that
involve rare items will not be found. - To find rules that involve both frequent and rare
items, minsup has to be set very low. This may
cause combinatorial explosion because those
frequent items will be associated with one
another in all possible ways.
31Is Apriori Fast Enough? Performance Bottlenecks
- The core of the Apriori algorithm
- Use frequent (k 1)-itemsets to generate
candidate frequent k-itemsets - Use database scan and pattern matching to collect
counts for the candidate itemsets - The bottleneck of Apriori candidate generation
- Huge candidate sets
- 104 frequent 1-itemset will generate 107
candidate 2-itemsets - To discover a frequent pattern of size 100, e.g.,
a1, a2, , a100, one needs to generate 2100 ?
1030 candidates. - Multiple scans of database
- Needs (n 1 ) scans, n is the length of the
longest pattern
32Mining Frequent Patterns Without Candidate
Generation
- FP-Tree(Frequent Pattern Tree) Algorithm.
- To break the two bottlenecks of Apriori series
algorithms, some works of association rule mining
using tree struc-ture have been designed. FP-Tree
Han et al. 2000, frequent pattern mining, is
another milestone in the development of
association rule mining, which breaks thetwo
bottlenecks of the Apriori. - The frequent itemsets are generated with only two
passes over the database and without any
candidate generation process. FP-Tree was
introduced by Han et al in Han et al. 2000. - By avoiding the candidate generation process and
less passes over the database, FP-Tree is an
order of magnitude faster than the Apriori
algorithm. The frequent patterns generation
process includes two sub processes constructing
the FT-Tree, and generating frequent patterns
from the FP tree.
33FP Tree
- Compress a large database into a compact,
Frequent-Pattern tree (FP-tree) structure - highly condensed, but complete for frequent
pattern mining - avoid costly database scans
- Develop an efficient, FP-tree-based frequent
pattern mining method - A divide-and-conquer methodology decompose
mining tasks into smaller ones - Avoid candidate generation sub-database test
only! - Some Researchers have identified that when
dataset is vary sparse then FP Tree has shown
bottlenecks and Apriori has comparatively given
better performance !!
34Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent
items 100 f, a, c, d, g, i, m, p f, c, a, m,
p 200 a, b, c, f, l, m, o f, c, a, b,
m 300 b, f, h, j, o f, b 400 b, c, k,
s, p c, b, p 500 a, f, c, e, l, p, m,
n f, c, a, m, p
min_support 0.5
- Steps
- Scan DB once, find frequent 1-itemset (single
item pattern) - Order frequent items in frequency descending
order - Scan DB again, construct FP-tree
35Benefits of the FP-tree Structure
- Completeness
- never breaks a long pattern of any transaction
- preserves complete information for frequent
pattern mining - Compactness
- reduce irrelevant informationinfrequent items
are gone - frequency descending ordering more frequent
items are more likely to be shared - never be larger than the original database (if
not count node-links and counts) - Example For Connect-4 DB, compression ratio
could be over 100
36Mining Frequent Patterns Using FP-tree
- General idea (divide-and-conquer)
- Recursively grow frequent pattern path using the
FP-tree - Method
- For each item, construct its conditional
pattern-base, and then its conditional FP-tree - Repeat the process on each newly created
conditional FP-tree - Until the resulting FP-tree is empty, or it
contains only one path (single path will generate
all the combinations of its sub-paths, each of
which is a frequent pattern)
37Market Basket Analysis Purpose
- The Supermarket revolution when first sparked off
in the 1920s, one could not even dream of
retailing as it exists today. By the 1950s it had
won acclaim and acceptance almost globally. This
is one retailing sector that is spreading very
fast in India. But still majority of retailing
sector including this one is not properly
managed. - Retailing management has been in focus for
marketing strategists since long as organized
retailing is assuming significant attention.
M.B.A is one such effort. - In a supermarket retailing MBA has endeavored to
study and analyze the combination of various
items accumulated in a Shopping Basket and was
intended to establish Associationship between the
various items bought by the customer. - Market basket analysis is a generic term for
methodologies that study the composition of a
basket of products (i.e. a shopping basket)
purchased by a household during a single shopping
trip. - The idea is that market baskets reflect
interdependencies between products or purchases
made in different product categories, and that
these interdependencies can be useful to support
retail marketing decisions.
38MBA
- Our data mining approach to super market business
data will record all the supermarket transactions
in a tabular form and appropriate algorithm will
process the transaction data to provide
significant Associationships of various items. - From a marketing perspective, the research is
motivated by the fact that some recent trends in
retailing pose important challenges to retailers
in order to stay competitive. In fact, on the
level of the retailer, a number of trends can be
identified, including concentration,
internationalization, decreasing profit margins
and an increase in discounting. - Recently, a number of advances in data mining
(association rules) and statistics offer new
opportunities to analyze such data.
39Data Mining Functionalities
40Data Mining
Clustering
Association Mining
Classification
Classification mining analyzes a set of training
data (i.e. a set of objects whose class labels
are known) and constructs a model for each class
based on the features in the data. A set of
classification rules are generated by the
classification process, and these can be used to
classify future data, as well as develop a better
understanding of each class in the database.
Techniques
Associative Classification
Application domain
41Data Mining
Classification
Clustering
Association Mining
- Associative Classification
- (Combines the Association Classification)
-
- CBA, CMAR, CPAR MCLP
- Modifying the algorithms
Classification Techniques
Techniques
Application domain
42Supervised vs. Unsupervised Learning
- Learning training data are analyzed by a
classification algorithm. - Supervised learning (classification) Learning of
the model is supervised in that it is told to
which class each training sample belongs - Supervision The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations - New data is classified based on the training set
- Unsupervised learning (clustering)
- The class labels of training data is unknown
- Given a set of measurements, observations, etc.
with the aim of establishing the existence of
classes or clusters in the data
43Classification vs. Prediction
- Classification
- classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying
new data. - predicts categorical class labels.
- Prediction - Prediction can be viewed as the
construction and use of a model to assess the
class of an unlabeled sample, or to assess the
value or value ranges of an attribute that a
given sample is likely to have. - CLASSIFICATION REGRESSION are two prediction
methods. ( discrete Vs. Continuous) - models continuous-valued functions, i.e.,
predicts unknown or missing values. - Typical Applications-credit approval, target
marketing, medical diagnosis, treatment
effectiveness analysis.
44Data Classification A Two-Step Process
- Model construction describing a set of
predetermined classes - Learning
- Each tuple / sample is assumed to belong to a
predefined class, as determined by the class
label attribute. - The set of tuples used for model construction
training set (given data). - The model is represented as classification rules,
decision trees, or mathematical formulae. - Model usage for classifying future or unknown
objects - Classification
- Estimate accuracy of the model
- The known label of test sample is compared with
the classified result from the model - Accuracy rate is the percentage of test set
samples that are correctly classified by the
model - Test set is independent of training set,
otherwise over-fitting will occur
45Illustrating Classification Task
46Classification Process (1) Model Construction (
Learning)
Classification Algorithms
Classification rules IF rank professor OR
years gt 6 THEN tenured yes
47Classification Process (2) Use the Model in
Prediction ( classification)
(Jeff, Professor, 4)
Tenured?
48Examples of Classification Task
- Predicting tumor cells as benign or malignant
- Classifying credit card transactions as
legitimate or fraudulent - Classifying secondary structures of protein as
alpha-helix, beta-sheet, or random coil - Categorizing news stories as finance, weather,
entertainment, sports, etc
49Data Mining
Clustering
Association Mining
Classification
Classification mining analyzes a set of training
data (i.e. a set of objects whose class labels
are known) and constructs a model for each class
based on the features in the data. A set of
classification rules are generated by the
classification process, and these can be used to
classify future data, as well as develop a better
understanding of each class in the database.
Techniques
Associative Classification
Application domain
50Data Mining
Classification
Clustering
Association Mining
- Associative Classification
- (Combines the Association Classification)
-
- CBA, CMAR, CPAR MCLP
- Modifying the algorithms
Classification Techniques
Techniques
Application domain
51Supervised vs. Unsupervised Learning
- Learning training data are analyzed by a
classification algorithm. - Supervised learning (classification) Learning of
the model is supervised in that it is told to
which class each training sample belongs - Supervision The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations - New data is classified based on the training set
- Unsupervised learning (clustering)
- The class labels of training data is unknown
- Given a set of measurements, observations, etc.
with the aim of establishing the existence of
classes or clusters in the data
52Classification vs. Prediction
- Classification
- classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying
new data. - predicts categorical class labels.
- Prediction - Prediction can be viewed as the
construction and use of a model to assess the
class of an unlabeled sample, or to assess the
value or value ranges of an attribute that a
given sample is likely to have. - CLASSIFICATION REGRESSION are two prediction
methods. ( discrete Vs. Continuous) - models continuous-valued functions, i.e.,
predicts unknown or missing values. - Typical Applications-credit approval, target
marketing, medical diagnosis, treatment
effectiveness analysis.
53Data Classification A Two-Step Process
- Model construction describing a set of
predetermined classes - Learning
- Each tuple / sample is assumed to belong to a
predefined class, as determined by the class
label attribute. - The set of tuples used for model construction
training set (given data). - The model is represented as classification rules,
decision trees, or mathematical formulae. - Model usage for classifying future or unknown
objects - Classification
- Estimate accuracy of the model
- The known label of test sample is compared with
the classified result from the model - Accuracy rate is the percentage of test set
samples that are correctly classified by the
model - Test set is independent of training set,
otherwise over-fitting will occur
54Illustrating Classification Task
55Classification Process (1) Model Construction (
Learning)
Classification Algorithms
Classification rules IF rank professor OR
years gt 6 THEN tenured yes
56Classification Process (2) Use the Model in
Prediction ( classification)
(Jeff, Professor, 4)
Tenured?
57Examples of Classification Task
- Predicting tumor cells as benign or malignant
- Classifying credit card transactions as
legitimate or fraudulent - Classifying secondary structures of protein as
alpha-helix, beta-sheet, or random coil - Categorizing news stories as finance, weather,
entertainment, sports, etc
58Classification and Prediction
- What is classification? What is prediction?
- Issues regarding classification and prediction
- Classification by decision tree induction
- Bayesian Classification
- Classification by backpropagation
- Classification based on concepts from association
rule mining - Other Classification Methods
- Prediction
- Classification accuracy
- Summary
59Decision Tree Classifiers Survey paper
60Bayesian Classification
- Bayesian classifiers are statistical classifiers
which predict class membership probabilities such
as the probability that a given sample belongs to
a particular class. - Bayesian classification is based on the Bayes
theorem and it is observed that a simple Bayesian
Classifier known as the naïve Bayesian classifier
to be comparable in performance with decision
tree and Neural network classifiers. - Naïve Bayesian classifier assume that the effect
of an attribute value on a given class is
independent of the values of the other
attributes. (conditional independence). - Bayesian belief networks are graphical models,
which unlike naïve bayesian classifiers allow the
representation of dependencies among subsets of
attributes. Can be Used for classification.
61Bayesian Classification Why?
- Probabilistic learning Calculate explicit
probabilities for hypothesis, among the most
practical approaches to certain types of learning
problems - Incremental Each training example can
incrementally increase/decrease the probability
that a hypothesis is correct. Prior knowledge
can be combined with observed data. - Probabilistic prediction Predict multiple
hypotheses, weighted by their probabilities - Standard Even when Bayesian methods are
computationally intractable, they can provide a
standard of optimal decision making against which
other methods can be measured
62Bayesian Theorem
- Given training data D, posteriori probability of
a hypothesis h, P(hD) follows the Bayes theorem - MAP (maximum posteriori) hypothesis
- Practical difficulty require initial knowledge
of many probabilities, significant computational
cost
63Naïve Bayes Classifier (I)
- A simplified assumption attributes are
conditionally independent - Greatly reduces the computation cost, only count
the class distribution.
64Naive Bayesian Classifier (II)
- Given a training set, we can compute the
probabilities
65Naïve Bayesian Classification
- Naïve assumption attribute independence
- P(x1,,xkC) P(x1C)P(xkC)
- If i-th attribute is categoricalP(xiC) is
estimated as the relative freq of samples having
value xi as i-th attribute in class C - If i-th attribute is continuousP(xiC) is
estimated thru a Gaussian density function - Computationally easy in both cases
66The independence hypothesis
- makes computation possible
- yields optimal classifiers when satisfied
- but is seldom satisfied in practice, as
attributes (variables) are often correlated. - Attempts to overcome this limitation
- Bayesian networks, that combine Bayesian
reasoning with causal relationships between
attributes - Decision trees, that reason on one attribute at
the time, considering most important attributes
first
67Bayesian Belief Networks (I)
Family History
Smoker
(FH, S)
(FH, S)
(FH, S)
(FH, S)
LC
0.7
0.8
0.5
0.1
LungCancer
Emphysema
LC
0.3
0.2
0.5
0.9
The conditional probability table for the
variable LungCancer
PositiveXRay
Dyspnea
Bayesian Belief Networks
68Bayesian Belief Networks (II)
- Bayesian belief network allows a subset of the
variables conditionally independent - A graphical model of causal relationships
- Several cases of learning Bayesian belief
networks - Given both network structure and all the
variables easy - Given network structure but only some variables
- When the network structure is not known in advance
69Classification and Prediction
- What is classification? What is prediction?
- Issues regarding classification and prediction
- Classification by decision tree induction
- Bayesian Classification
- Classification by backpropagation
- Classification based on concepts from association
rule mining - Other Classification Methods
- Prediction
- Classification accuracy
- Summary
70Classification by Backpropopagation
- Backpropagation has been considered as effective
mechanism in field of classification. The
backpropagation algorithm was presented by
Rumelhart, Hinton, and Williams RHW86. One of
the most popularly used backpropagation technique
is a neural network learning algorithm. - In Freuds theory of psychodynamics, the human
brain (10 11) was described as a neural
network, and recent investigations have
corroborated this view. - This analogy therefore offers an interesting
model for the creation of more complex learning
machines, and has led the creation of ANN. - Neural network with their remarkable ability to
derive meaning from complicated or imprecise
data, can be used to extract patterns and detect
trends that are too complex to be noticed by
either humans or other computer techniques. - A trained neural network can be thought of as an
"expert" in the category of information it has
been given to analyze. This expert can then be
used to provide projections given new situations
of interest and answer "what if" questions.
71Classification by Backpropopagation
- Backpropagation has been considered as effective
mechanism in field of classification. The
backpropagation algorithm was presented by
Rumelhart, Hinton, and Williams RHW86. One of
the most popularly used backpropagation technique
is a neural network learning algorithm. - In Freuds theory of psychodynamics, the human
brain (10 11) was described as a neural
network, and recent investigations have
corroborated this view. - This analogy therefore offers an interesting
model for the creation of more complex learning
machines, and has led the creation of ANN. - Neural network with their remarkable ability to
derive meaning from complicated or imprecise
data, can be used to extract patterns and detect
trends that are too complex to be noticed by
either humans or other computer techniques. - A trained neural network can be thought of as an
"expert" in the category of information it has
been given to analyze. This expert can then be
used to provide projections given new situations
of interest and answer "what if" questions.
72 Neural Networks
- An Artificial Neural Network (ANN) is an
information processing paradigm that is inspired
by the way biological nervous systems, such as
the brain, process information. The key element
of this paradigm is the novel structure of the
information processing system. - It is composed of a large number of highly
interconnected processing elements (neurones)
working in unison to solve specific problems.
ANNs, like people, learn by example. - An ANN is configured for a specific application,
such as pattern recognition or data
classification, through a learning process.
Learning in biological systems involves
adjustments to the synaptic connections that
exist between the neurones. This is true of ANNs
as well.
73ANN Advantages
- Adaptive learning An ability to learn how to do
tasks based on the data given for training or
initial experience. - Self-Organisation An ANN can create its own
organization or representation of the information
it receives during learning time. - Real Time Operation ANN computations may be
carried out in parallel, and special hardware
devices are being designed and manufactured which
take advantage of this capability. - Fault Tolerance via Redundant Information Coding
Partial destruction of a network leads to the
corresponding degradation of performance.
However, some network capabilities may be
retained even with major network damage.
74ANN Vs Conventional Computing approach
- Neural networks take a different approach to
problem solving than that of conventional
computers. Conventional computers use an
algorithmic approach i.e. the computer follows a
set of instructions in order to solve a problem.
Unless the specific steps that the computer needs
to follow are known the computer cannot solve the
problem. That restricts the problem solving
capability of conventional computers to problems
that we already understand and know how to solve.
But computers would be so much more useful if
they could do things that we don't exactly know
how to do. - Neural networks process information in a similar
way the human brain does. The network is composed
of a large number of highly interconnected
processing elements ( neurones) working in
parallel to solve a specific problem. Neural
networks learn by example. They cannot be
programmed to perform a specific task. - The examples must be selected carefully otherwise
useful time is wasted or even worse, the network
might be functioning incorrectly. The
disadvantage is that because the network finds
out how to solve the problem by itself, its
operation can be unpredictable.
75ANN Vs. Conventional
- On the other hand, conventional computers use a
cognitive approach to problem solving the way
the problem is to solved must be known and stated
in small unambiguous instructions. These
instructions are then converted to a high level
language program and then into machine code that
the computer can understand. These machines are
totally predictable if anything goes wrong is
due to a software or hardware fault. - Neural networks and conventional algorithmic
computers are not in competition but complement
each other. There are tasks, more suited to an
algorithmic approach like arithmetic operations
and tasks that are more suited to neural
networks. Even more, a large number of tasks,
require systems that use a combination of the two
approaches (normally a conventional computer is
used to supervise the neural network) in order to
perform at maximum efficiency. - Neural networks do not perform miracles. But if
used sensibly they can produce some amazing
results.
76ANN An engineering approach
- A simple neuron
- An artificial neuron is a device with many
inputs and one output. The neuron has two modes
of operation the training mode and the using
mode. - In the training mode, the neuron can be
trained to fire (or not), for particular input
patterns. In the using mode, when a taught input
pattern is detected at the input, its associated
output becomes the current output. If the input
pattern does not belong in the taught list of
input patterns, the firing rule is used to
determine whether to fire or not.
77A Neuron
- The n-dimensional input vector x is mapped into
variable y by means of the scalar product and a
nonlinear function mapping
78Network layers
- The commonest type of artificial neural network
consists of three groups, or layers, of units a
layer of "input" units is connected to a layer of
"hidden" units, which is connected to a layer of
"output" units. - The activity of the input units represents the
raw information that is fed into the network. - The activity of each hidden unit is determined by
the activities of the input units and the weights
on the connections between the input and the
hidden units. - The behavior of the output units depends on the
activity of the hidden units and the weights
between the hidden and output units. - This simple type of network is interesting
because the hidden units are free to construct
their own representations of the input. The
weights between the input and hidden units
determine when each hidden unit is active, and so
by modifying these weights, a hidden unit can
choose what it represents.
79Architecture of Neural Networks
Feed-forward networks Feed-forward ANNs (figure
below) allow signals to travel one way only from
input to output. There is no feedback (loops)
i.e. the output of any layer does not affect that
same layer. Feed-forward ANNs tend to be straight
forward networks that associate inputs with
outputs. They are extensively used in pattern
recognition. This type of organization is also
referred to as bottom-up or top-down.
80ANN Architecture
Feedback networks Feedback networks (figure
below) can have signals traveling in both
directions by introducing loops in the network.
Feedback networks are very powerful and can get
extremely complicated. Feedback networks are
dynamic their 'state' is changing continuously
until they reach an equilibrium point. They
remain at the equilibrium point until the input
changes and a new equilibrium needs to be found.
Feedback architectures are also referred to as
interactive or recurrent, although the latter
term is often used to denote feedback connections
in single-layer organizations.
81ANN
- There are different architectures for Neural
networks, and they each utilize different wiring
and learning strategies. ( Backpropagation algo.
In 1980s) - Advantages
- prediction accuracy is generally high
- robust, works when training examples contain
errors - output may be discrete, real-valued, or a vector
of several discrete or real-valued attributes - fast evaluation of the learned target function
- Criticism
- long training time
- difficult to understand the learned function
(weights) - not easy to incorporate domain knowledge
82Network Training
- The ultimate objective of training
- obtain a set of weights that makes almost all the
tuples in the training data classified correctly - Steps
- Initialize weights with random values
- Feed the input tuples into the network one by one
- For each unit
- Compute the net input to the unit as a linear
combination of all the inputs to the unit - Compute the output value using the activation
function - Compute the error
- Update the weights and the bias
83Applications of ANN
- Classification A neural network can discover
the distinguishing features needed to perform a
classification task. It can take an object and
accordingly assign the specific class label to
it.ANN have been used in many classification
tasks including - Recognition of printed or handwritten characters.
- Classification of SONAR RADAR signals.
- Speech recognition A very significant area of
interest involves three modules namely front
end-which samples the speech signals and extracts
the data, the word processor which is used for
finding the probability of words in the
vocabulary that match the features of spoken
words and the sentence processor which determines
if the recognized word makes sense in the
sentence.
84Multi-Layer Perceptron
Output vector
Output nodes
Hidden nodes
wij
Input nodes
Input vector xi
85Classification and Prediction
- What is classification? What is prediction?
- Issues regarding classification and prediction
- Classification by decision tree induction
- Bayesian Classification
- Classification by backpropagation
- Classification based on concepts from association
rule mining - Other Classification Methods
- Prediction
- Classification accuracy
- Summary
86What Is Prediction?
- Prediction is similar to classification
- First, construct a model
- Second, use model to predict unknown value
- Major method for prediction is regression
- Linear and multiple regression
- Non-linear regression
- Prediction is different from classification
- Classification refers to predict categorical
class label - Prediction models continuous-valued functions
87Predictive Modeling in Databases
- Predictive modeling Predict data values or
construct generalized linear models based on
the database data. - One can only predict value ranges or category
distributions - Method outline
- Minimal generalization
- Attribute relevance analysis
- Generalized linear model construction
- Prediction
- Determine the major factors which influence the
prediction - Data relevance analysis uncertainty measurement,
entropy analysis, expert judgement, etc. - Multi-level prediction drill-down and roll-up
analysis. - www.sas.com www.spss.com
www.mathsoft.com
88Association-Based Classification
- Can any ideas from association rule mining be
applied to classification? - Several methods for association-based
classification - ARCS( Association rule Clustering System)
Quantitative association mining and clustering of
association rules (Lent et al97) (pg. 310,254) - It beats C4.5 in (mainly) scalability and also
accuracy - Associative classification (Liu et al98)
- It mines high support and high confidence rules
in the form of cond_set gt y, where y is a
class label - CAEP (Classification by aggregating emerging
patterns) (Dong et al99) - Emerging patterns (EPs) the itemsets whose
support increases significantly from one class to
another - Mine Eps based on minimum support and growth rate
89Assignment 1
- Suppose there are two classification rules, one
that says people with salaries between 10,000
and 20,000 have a credit rating of good, and
another that says that people with salaries
between 20,000 and 30,000 have a credit rating
of good. Under what conditions can the rules be
replaced without any loss of information, by a
single rule that says that people with salaries
between 10,000 and 30,000 have a credit rating
of good.
No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
90Assignment 1 (Solutions)
- Suppose there are two classification rules, one
that says people with salaries between 10,000
and 20,000 have a credit rating of good, and
another that says that people with salaries
between 20,000 and 30,000 have a credit rating
of good. Under what conditions can the rules be
replaced without any loss of information, by a
single rule that says that people with salaries
between 10,000 and 30,000 have a credit rating
of good. - Solution- Consider the following pair of rules
and their confidence levels -
- The new rule has to be assigned a
confidence-level which is between the
confidence-levels for rules 1 and 2. Replacing
the original rules by the new rule will result in
a loss of confidence-level information for
classifying persons, since we cannot distinguish
the confidence levels of people earning between
10000 and 20000 from those of people earning
between 20000 and 30000. Therefore we can combine
the two rules without loss of information only if
their confidences are the same.
No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
91Assignment 2
- 2. Suppose half of all the transactions in a
clothes shop purchase jeans, and one third of all
transactions in the shop purchase
T-shirts.Suppose also that half of the
transactions that purchase jeans also purchase
T-shirts. Write down all the non-trivial
association rules you can deduce from the above
information, giving support and confidence of
each rule.
92Assignment 2 (Solutions)
- 2. Suppose half of all the transactions in a
clothes shop purchase jeans, and one third of all
transactions in the shop purchase
T-shirts.Suppose also that half of the
transactions that purchase jeans also purchase
T-shirts. Write down all the non-trivial
association rules you can deduce from the above
information, giving support and confidence of
each rule. - Solution The rules are as follows, the last rule
can be deduced from the previous ones.
Rule Support Conf.
For all transactions T, true gt buys (T, Jeans) 50 50
For all transactions T, true gt buys ( T, t-shirts) 33 33
For all transactions T, buys ( T, Jeans) gt buys (T, t-shirts) 25 50
For all transactions T, buys( T, t-shirts) gt buys (T, jeans) 25 75
93Assignment 1
- Suppose there are two classification rules, one
that says people with salaries between 10,000
and 20,000 have a credit rating of good, and
another that says that people with salaries
between 20,000 and 30,000 have a credit rating
of good. Under what conditions can the rules be
replaced without any loss of information, by a
single rule that says that people with salaries
between 10,000 and 30,000 have a credit rating
of good.
No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
94Assignment 1 (Solutions)
- Suppose there are two classification rules, one
that says people with salaries between 10,000
and 20,000 have a credit rating of good, and
another that says that people with salaries
between 20,000 and 30,000 have a credit rating
of good. Under what conditions can the rules be
replaced without any loss of information, by a
single rule that says that people with salaries
between 10,000 and 30,000 have a credit rating
of good. - Solution- Consider the following pair of rules
and their confidence levels -
- The new rule has to be assigned a
confidence-level which is between the
confidence-levels for rules 1 and 2. Replacing
the original rules by the new rule will result in
a loss of confidence-level information for
classifying persons, since we cannot distinguish
the confidence levels of people earning between
10000 and 20000 from those of people earning
between 20000 and 30000. Therefore we can combine
the two rules without loss of information only if
their confidences are the same.
No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
95Assignment 2
- 2. Suppose half of all the transactions in a
clothes shop purchase jeans, and one third of all
transactions in the shop purchase
T-shirts.Suppose also that half of the
transactions that purchase jeans also purchase
T-shirts. Write down all the non-trivial
association rules you can deduce from the above
information, giving support and confidence of
each rule.
96Assignment 2 (Solutions)
- 2. Suppose half of all the transactions in a
clothes shop purchase jeans, and one third of all
transactions in the shop purchase
T-shirts.Suppose also that half of the
transactions that purchase jeans also purchase
T-shirts. Write down all the non-trivial
association rules you can deduce from the above
information, giving support and confidence of
each rule. - Solution The rules are as follows, the last rule
can be deduced from the previous ones.
Rule Support Conf.
For all transactions T, true gt buys (T, Jeans) 50 50
For all transactions T, true gt buys ( T, t-shirts) 33 33
For all transactions T, buys ( T, Jeans) gt buys (T, t-shirts) 25 50
For all transactions T, buys( T, t-shirts) gt buys (T, jeans) 25 75
97Classification and Prediction
- What is classification? What is prediction?
- Issues regarding classification and prediction
- Classification by decision tree induction
- Bayesian Classification
- Classification by backpropagation
- Classification based on concepts from association
rule mining - Other Classification Methods
- Prediction
- Classification accuracy
- Summary
98Other Classification Methods
- k-nearest neighbor classifier
- case-based reasoning
- Genetic algorithm
- Rough set approach
- Fuzzy set approaches
99Instance-Based Methods
- Instance-based learning Less commonly used
commercially - Store (all) training examples and delay the
processing (lazy evaluation) until a new
instance must be classified. - Typical approaches
- k-nearest neighbor approach
- Instances represented as points in a Euclidean
space. - Locally weighted regression
- Constructs local approximation
- Case-based reasoning
- Uses symbolic representations and knowledge-based
inference
100The k-Nearest Neighbor Algorithm
- All instances correspond to points in the n-D
space. - The nearest neighbor are defined in terms of
Euclidean distance. - The target function could be discrete- or real-
valued. - For discrete-valued, the k-NN returns the most
common value among the k training examples
nearest to xq. - Vonoroi diagram the decision surface induced by
1-NN for a typical set of training examples. -
(pg.314)
.
_
_
_
.
_
.
.