Title: Introduction to Data Mining
1Introduction to Data Mining
2Why Data Mining? Potential Applications
- Direct Marketing
- identify which prospects should be included in a
mailing list - Market segmentation
- identify common characteristics of customers who
buy same products - Market Basket Analysis
- Identify what products are likely to be bought
together - Insurance Claims Analysis
- discover patterns of fraudulent transactions
- compare current transactions against those
patterns
3What Is Data Mining?
- Combination of AI and statistical analysis to
discover information that is hidden in the data - associations (e.g. linking purchase of pizza with
beer) - sequences (e.g. tying events together marriage
and purchase of furniture) - classifications (e.g. recognizing patterns such
as the attributes of employees that are most
likely to quit) - forecasting (e.g. predicting buying habits of
customers based on past patterns) Expert systems
or small ML/statistical programs
4What can data mining do?
- Classification
- Classify credit applicants as low, medium, high
risk - Classify insurance claims as normal, suspicious
- Estimation
- Estimate the probability of a direct mailing
response - Estimate the lifetime value of a customer
- Prediction
- Predict which customers will leave within six
months - Predict the size of the balance that will be
transferred by a - credit card prospect
5What can data mining do? (contd)
- Association
- Find out items customers are likely to buy
together - Find out what books to recommend to Amazon.com
users - Clustering
- Difference from classification classes are
unknown!
6Market Analysis and Management
- Where are the data sources for analysis?
- Credit card transactions, loyalty cards, discount
coupons, customer complaint calls, plus (public)
lifestyle studies - Target marketing
- Find clusters of model customers who share the
same characteristics interest, income level,
spending habits, etc. - Determine customer purchasing patterns over time
- Conversion of single to a joint bank account
marriage, etc. - Cross-market analysis
- Associations/co-relations between product sales
- Prediction based on the association information
7Data Mining Confluence of Multiple Disciplines
Database Technology
Statistics
Data Mining
Machine Learning
Visualization
Information Science
Other Disciplines
8Data Mining On What Kind of Data?
- Relational databases
- Data warehouses
- Transactional databases
- Advanced DB and information repositories
- Object-oriented and object-relational databases
- Spatial databases
- Time-series data and temporal data
- Text databases and multimedia databases
- Heterogeneous and legacy databases
- WWW
9Data Mining Process
Learning
Collecting relevant data
Model building
Understanding of business Problem identification
Business strategy and evaluation
Action
10Requirements/challenges in Data Mining
- User interface
- Mining methodology
- Performance
- Data source
- Social and Security
11Requirements/challenges in Data Mining(2)
- User interface
- - Data Visualization
- Understandability and interpretation of results
- Information representation and rendering
- Screen real-estate
- - Interactivity
- Manipulation of mined knowledge
- focus and refine mining tasks
- Focus and refine mining results
12Requirements/challenges in Data Mining(3)
- Mining Methodology
- Mining different kinds of knowledge in databases
- Interactive mining of knowledge at multiple
levels of abstraction - Incorporation of background knowledge
- Query languages
- Expression and visualization of results
- Handling noise and incomplete data
- Pattern evaluation
13Requirements/challenges in Data Mining (4)
- Performance
- Efficiency and scalability of data mining
algorithms - Linear algorithms needed
- Parallel and distributed methods
- Incremental methods
- Divide and conquer?
14Requirements/challenges in Data Mining(5)
- Data Source
- Diversity of data types
- Handling complex types of data
- Mining information from heterogenous data bases
or information repositories - Can we expect a DM algorithm to do well on all
types of data ? - Data glut
- Are we collecting the right data for the right
answer? - Distinguish between important and unimportant data
15Requirements/challenges in Data Mining(6)
- Social and Security
- -Social Impact
- Private and sensitive data is gathered and mined
without individuals knowledge and/or consent - Appropriate use and distribution of discovered
knowledge - - Regulations
- Need for privacy and DM policies
16Data Mining Tools
17Summary
- The benefits of knowing ones business is
critical technologies are coming together to
support data mining. - Data mining is the process and result of
knowledge production, knowledge discovery and
knowledge management.