Data Mining I: KnowledgeSEEKER - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Data Mining I: KnowledgeSEEKER

Description:

Data Mining I: KnowledgeSEEKER Jennifer Davis Kelly Davis Saurabh Gupta Chris Mathews Shantea Stanford Overview of Presentation Introduction to Data Mining Methods ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 28
Provided by: terryUga8
Category:

less

Transcript and Presenter's Notes

Title: Data Mining I: KnowledgeSEEKER


1
Data Mining I KnowledgeSEEKER
  • Jennifer Davis
  • Kelly Davis
  • Saurabh Gupta
  • Chris Mathews
  • Shantea Stanford

2
Overview of Presentation
  • Introduction to Data Mining Methods and Products
  • Tutorial How to Use KnowledgeSEEKER?
  • Exercises How much did you learn?

3
What is Data Mining?
  • Filtering large amounts of data
  • Searching for hidden patterns and/or trends
  • Predicting future results
  • Creating a competitive advantage and improving
    decision making
  • Data mining is a form of artificial intelligence,
    but is very different from other BI tools.
  • Discovery versus Verification

4
What Sparked Data Mining?
  • Motivated by business need, large amounts of
    available data, and humans limited cognitive
    processing abilities
  • Enabled by data warehousing, parallel processing,
    and data mining algorithms
  • Source Dr. Hugh Watson

5
Popular Data Mining Methods
  • Neural networks learning from data patterns and
    predicting new data
  • Genetic Algorithms optimizing techniques
  • Decision trees rules for classifying data
  • Regression Analysis - statistical
  • K-nearest neighbor classifying and clustering
    technique based on weighting of selected
    variables
  • Data Visualization visually showing patterns

6
Types of Data Mining
  • Association identifies relationships
  • Sequential pattern identifies sequencing
  • Classifying identifies potential outcomes for
    predetermined categories
  • Clustering identifies categories
  • Prediction estimates future values or forecasts

7
Data Mining Process
  • Requires personnel with domain, data
    warehousing, and data mining expertise
  • Requires data selection, data extraction, data
    cleansing, and data transformation
  • Most data mining tools work with highly granular
    flat files
  • Is an iterative and interactive process
  • Source Dr. Hugh Watson

8
How Data Mining Is Used?
  • CRM Research, churn and promotional management.
  • Process Mgmt Reduce operational delays.
  • Analysis Develop forecasting models and fraud
    prevention.
  • Predictive Capabilities Develop rules for
    queries or expert systems and oil exploration.
  • Health Care Medical research and trends.
  • Banking Identify bank locations.
  • Sports Guide movement of players.

9
Data Mining Products
  • See product list, http//www.xore.com/prodtable.ht
    ml
  • According to Jackie Sweeney, International Data
    Corporation, Data mining has matured, producing
    fortunes for the Big Three vendors - SPSS, IBM
    and SAS Institute - and robust revenues for a
    number of smaller vendors who market solutions
    tailored to vertical markets.

10
Data Mining Products
  • Off-the-shelf applications and bundling are
    becoming more common.
  • Wide range of pricing
  • SAS Institutes Enterprise Miner 80k
  • IBM Intelligent Miner 60k
  • Angoss KnowledgeSEEKER 4,750 per license,
    including upgrades and unlimited tech support for
    1 year. Annual license renewal fees are 20 of
    the list price.
  • Desktop products start at few hundred dollars

11
Selection Process Questions to Ask?
  • Are the data and variables currently available?
  • Will mining involve numerical and nominal data?
  • Can the tool build models, predict outcomes and
    verify results?
  • Can it process the amount of data required?
  • Can the tool handle incomplete data?
  • Can the tool process noisy data?
  • Can it provide the degree of granularity desired?
  • How much technical knowledge is required?

12
KnowledgeSEEKER by Angoss
  • Angoss Software Corp Canadian public company
    specializing in data mining solutions
  • Decision tree modeling
  • Fully scalable and easy to use
  • Specifications
  • Operating Systems Unix, Windows 3.1, 95, 98 and
    NT.
  • Databases Access, dBase II, III and IV, ODBC,
    SAS, SPSS.

13
Users of KnowledgeSEEKER
  • IRS fraud detection
  • University of Rochester Cancer research
  • Hewlett Packard process and quality control
  • Readers Digest market segmentation
  • MGM Grand survey analysis

14
Sources
  • Angoss Whitepaper http//www.angoss.com/ProdServ
    / AnalyticalTools/kseeker/whitepaper.html
  • Data Mining for Golden Opportunities, Smart
    Computing, January 2000
  • Your Business Intelligence Arsenal, Telephony,
    ChicagoApr 24, 2000, Douglas Hackney
  • Examples and testimonials http//www.data-mining
    -software.com/data_mining_examples.htm
  • Data Management, Richard T. Watson, 2002
  • http//www.xore.com/prodtable.html (Data Mining
    Products)
  • Dr. Hugh Watsons slide
  • Data Mining Gets Real, Enterprise Systems
    Journal,April 1999, Jon William Toigo
  • http//www.anderson.ucla.edu/faculty/jason.frand/t
    eacher/technologies/palace/datamining.htm
    (examples of Data Mining uses)

15
KnowledgeSEEKER Tutorial
16
KnowledgeSEEKER Exercises
  • According to KnowledgeSeeker, which is the most
    important variable influencing hypertension for
    those between the ages of 51-62 who are
    regular or occasional smokers? 

Answer - Cheese Last Week
17
KnowledgeSEEKER Exercises
  • What is the total number of 51-62 year olds who
    have identified themselves as former/never
    smokers and have an eating pattern that includes
    a lot/moderate salt?

Answer 32
18
KnowledgeSEEKER Exercises
  • What percent of women between the ages of 32-50
    who occasionally drink have high hypertension? 

Answer - 28.6
19
KnowledgeSEEKER Exercises
  • What is the percent of people in income group
    4,5,7, and 8, age bracket 32-50, who have high
    hypertension?
  •  
  • Answer - 11.8

20
KnowledgeSEEKER Exercises
  • In the sample data, how many people have never
    smoked before? 
  • Answer - 94

21
KnowledgeSEEKER Exercises
  • What is the most important factor contributing to
    hypertension according to KnowledgeSeeker for
    those in the 51-62 age bracket?
  • Answer - Smoking

Next by right clicking and selecting Go to
Split find the 4th most important factor from
the table.  
Answer - Deep fried last week
22
KnowledgeSEEKER Exercises
  • What is the percentage of males who are regular
    smokers among all male participants? 
  • Answer - 30.8

23
KnowledgeSEEKER Exercises
  • Create a graph of the distribution of smoking
    males.

24
KnowledgeSEEKER Exercises
  • Complete the following steps
  • Dependent variable Hypertension
  •      Click on Grow / Automatic
  •      
  •     What is the total number of males between the
    ages of 63-72 who had fish last week?

Answer 24
25
KnowledgeSEEKER Exercises
  • What is the next split after age that has the
    highest effect on hypertension according to
    KnowledgeSeeker? 
  • Answer - Height

26
KnowledgeSEEKER Exercises
  • Among 32-50 year olds who report a drink pattern
    of former/never, how many have high
    hypertension? 
  • Answer - 0

27
KnowledgeSEEKER Exercises
  • According to KnowledgeSeeker, what is the most
    important variable influencing hypertension for
    women between the ages of 51-62?
  • How is this different from males age 51-62?

Women weight Men - drinking pattern
Write a Comment
User Comments (0)
About PowerShow.com