Title: Data Mining in SQL Server 2000 and Yukon
1Data Miningin SQL Server 2000and Yukon
- Richard Lees
- EasternMining_at_Hotmail.com
- RichardLees.com.au
2Agenda
- What isnt Data Mining
- Demo
- What is Data Mining
- Demo
- Create a data mine
- 4 ways to view data mine
- Whats Coming in Yukon
- Demo
- Questions
- Throughout
3Which Questions are Data Mining?
- Who are our biggest customers?
- What are customers buying with cigars?
- What are the customer retention levels of our
branches? - Which customers have bought olives, feta cheese
but no ciabatta bread? - Which regions have the highest male/female ratio
of single 20 somethings? - Which region has lowest customer retention levels
and list out lost customers?
4Demonstration
- Ad hoc query
- Drill through to details
- Business Intelligence tool
5History of OLAP and Data Mining
Future
2000
1993
1998
1999
19xx
Custom Data Mining available to Fortune 100
Codds Defined 12 rules for OLAP
- OLAP on the Web
- ThinSlicer
- Many others
- Data Mining V2
- SQL 2005
- BI Tools
- Microsoft
- SQL 2000
- OLAP v2
- Data Mining
- English Query
SAS and SPSS offer Data Mining tools To those
who can afford
6Sample Data I Will be Using
- Wellington Libraries Loan DB
- We wanted sample data for data mining
- They were just writing off a data warehouse
project - The experts have spent 12 months trying to
import data! - How could Microsoft help us?
- The data are in IBM databases!
-
7What is Data Mining?
Data mining is the use of powerful software
tools to discover significant traits or
relationships, from databases or data warehouses
and often used to predict future events
- It exploits
- statistical algorithms such as decision trees,
clustering, sequence clustering, association,
naïve bayes, neural network and time series
algorithms - Once the knowledge is extracted it
- Can be used to discover
- Can be used to predict values of other cases
8OLAP versus Data Mining
- OLAP
- Is about fast ad hoc querying
- Analysis by dimensions and measures
- Gives precise answers
- Data Mining
- May use rdbms or OLAP source
- Is about discovering and predicting
- Gives imprecise answers
- OLAP is not a prerequisite for data mining, but
it almost always comes first
(learning to ride a bike before a car)
9Clusters
Annual Income
Age
10Library Clusters
11Decision Trees
- Input data
- About cases
- Discovering relationships
- Predicting outcomes
12Data Mining
- Demo with real data
- Build a data mine
- View data mine
- Browse dependencies
- Browse decision trees
- Query using MDX
- Query using ThinMiner
- Batch update
- Uses of Data Mining
- Risk assessment
- Claim likelihood
- Customer profitability predictions
- Fraud detection
- Treatment efficacy
- Product suggestions
- Web shopping
- Call centre tool
13Successful Data Mining Projects
- Two additional Critical Success Factors
- Discover something interesting
- Profit from discovery
-
- For example
- ComputerFleet
- (Localhost)
14Whats Coming in Yukon
Decision Trees
Confusion Matrix
15Naïve Bayes
16Demonstration
- Yukon
- Development
- New algorithms
- Lift chart
- Profit curve
- Query tool
17Questions
References
Microsoft Research http//Research.Microsoft.com/r
esearch/pubs
Richard Lees EasternMining_at_Hotmail.com http//Rich
ardLees.com.au