Title: Data Mining I: KnowledgeSEEKER
1Data Mining I KnowledgeSEEKER
- Jennifer Davis
- Kelly Davis
- Saurabh Gupta
- Chris Mathews
- Shantea Stanford
2Overview of Presentation
- Introduction to Data Mining Methods and Products
- Tutorial How to Use KnowledgeSEEKER?
- Exercises How much did you learn?
3What is Data Mining?
- Filtering large amounts of data
- Searching for hidden patterns and/or trends
- Predicting future results
- Creating a competitive advantage and improving
decision making - Data mining is a form of artificial intelligence,
but is very different from other BI tools. - Discovery versus Verification
4What Sparked Data Mining?
- Motivated by business need, large amounts of
available data, and humans limited cognitive
processing abilities - Enabled by data warehousing, parallel processing,
and data mining algorithms - Source Dr. Hugh Watson
5Popular Data Mining Methods
- Neural networks learning from data patterns and
predicting new data - Genetic Algorithms optimizing techniques
- Decision trees rules for classifying data
- Regression Analysis - statistical
- K-nearest neighbor classifying and clustering
technique based on weighting of selected
variables - Data Visualization visually showing patterns
6Types of Data Mining
- Association identifies relationships
- Sequential pattern identifies sequencing
- Classifying identifies potential outcomes for
predetermined categories - Clustering identifies categories
- Prediction estimates future values or forecasts
7Data Mining Process
- Requires personnel with domain, data
warehousing, and data mining expertise - Requires data selection, data extraction, data
cleansing, and data transformation - Most data mining tools work with highly granular
flat files - Is an iterative and interactive process
- Source Dr. Hugh Watson
8How Data Mining Is Used?
- CRM Research, churn and promotional management.
- Process Mgmt Reduce operational delays.
- Analysis Develop forecasting models and fraud
prevention. - Predictive Capabilities Develop rules for
queries or expert systems and oil exploration. - Health Care Medical research and trends.
- Banking Identify bank locations.
- Sports Guide movement of players.
9Data Mining Products
- See product list, http//www.xore.com/prodtable.ht
ml - According to Jackie Sweeney, International Data
Corporation, Data mining has matured, producing
fortunes for the Big Three vendors - SPSS, IBM
and SAS Institute - and robust revenues for a
number of smaller vendors who market solutions
tailored to vertical markets.
10Data Mining Products
- Off-the-shelf applications and bundling are
becoming more common. - Wide range of pricing
- SAS Institutes Enterprise Miner 80k
- IBM Intelligent Miner 60k
- Angoss KnowledgeSEEKER 4,750 per license,
including upgrades and unlimited tech support for
1 year. Annual license renewal fees are 20 of
the list price. - Desktop products start at few hundred dollars
11Selection Process Questions to Ask?
- Are the data and variables currently available?
- Will mining involve numerical and nominal data?
- Can the tool build models, predict outcomes and
verify results? - Can it process the amount of data required?
- Can the tool handle incomplete data?
- Can the tool process noisy data?
- Can it provide the degree of granularity desired?
- How much technical knowledge is required?
12KnowledgeSEEKER by Angoss
- Angoss Software Corp Canadian public company
specializing in data mining solutions - Decision tree modeling
- Fully scalable and easy to use
- Specifications
- Operating Systems Unix, Windows 3.1, 95, 98 and
NT. - Databases Access, dBase II, III and IV, ODBC,
SAS, SPSS.
13Users of KnowledgeSEEKER
- IRS fraud detection
- University of Rochester Cancer research
- Hewlett Packard process and quality control
- Readers Digest market segmentation
- MGM Grand survey analysis
14Sources
- Angoss Whitepaper http//www.angoss.com/ProdServ
/ AnalyticalTools/kseeker/whitepaper.html - Data Mining for Golden Opportunities, Smart
Computing, January 2000 - Your Business Intelligence Arsenal, Telephony,
ChicagoApr 24, 2000, Douglas Hackney - Examples and testimonials http//www.data-mining
-software.com/data_mining_examples.htm - Data Management, Richard T. Watson, 2002
- http//www.xore.com/prodtable.html (Data Mining
Products) - Dr. Hugh Watsons slide
- Data Mining Gets Real, Enterprise Systems
Journal,April 1999, Jon William Toigo - http//www.anderson.ucla.edu/faculty/jason.frand/t
eacher/technologies/palace/datamining.htm
(examples of Data Mining uses)
15KnowledgeSEEKER Tutorial
16KnowledgeSEEKER Exercises
- According to KnowledgeSeeker, which is the most
important variable influencing hypertension for
those between the ages of 51-62 who are
regular or occasional smokers?Â
Answer - Cheese Last Week
17KnowledgeSEEKER Exercises
- What is the total number of 51-62 year olds who
have identified themselves as former/never
smokers and have an eating pattern that includes
a lot/moderate salt?
Answer 32
18KnowledgeSEEKER Exercises
- What percent of women between the ages of 32-50
who occasionally drink have high hypertension? -
Answer - 28.6
19KnowledgeSEEKER Exercises
- What is the percent of people in income group
4,5,7, and 8, age bracket 32-50, who have high
hypertension? - Â
20KnowledgeSEEKER Exercises
- In the sample data, how many people have never
smoked before?Â
21KnowledgeSEEKER Exercises
- What is the most important factor contributing to
hypertension according to KnowledgeSeeker for
those in the 51-62 age bracket?
Next by right clicking and selecting Go to
Split find the 4th most important factor from
the table. Â
Answer - Deep fried last week
22KnowledgeSEEKER Exercises
- What is the percentage of males who are regular
smokers among all male participants?Â
23KnowledgeSEEKER Exercises
- Create a graph of the distribution of smoking
males.
24KnowledgeSEEKER Exercises
- Complete the following steps
- Dependent variable Hypertension
- Â Â Â Â Click on Grow / Automatic
- Â Â Â Â Â
- Â Â Â What is the total number of males between the
ages of 63-72 who had fish last week?
Answer 24
25KnowledgeSEEKER Exercises
- What is the next split after age that has the
highest effect on hypertension according to
KnowledgeSeeker?Â
26KnowledgeSEEKER Exercises
- Among 32-50 year olds who report a drink pattern
of former/never, how many have high
hypertension?Â
27KnowledgeSEEKER Exercises
- According to KnowledgeSeeker, what is the most
important variable influencing hypertension for
women between the ages of 51-62? - How is this different from males age 51-62?
Women weight Men - drinking pattern