Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining

Description:

Data Mining Lecture 1 – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 30
Provided by: Erta2
Category:

less

Transcript and Presenter's Notes

Title: Data Mining


1
Data Mining
  • Lecture 1

2
Instructor Info
  • Name Ertan Karakurt
  • Contact ertankarakurt_at_akillisistemler.com.tr
  • 10 years experience on Data Mining and
    Intelligent Applications Development
  • General Purpose Data Mart Development for
    Financial Modeling
  • Behavioral Clustering of Retail Customers in
    Banking Sector
  • Propensity Modeling for Cross Selling
  • Attrition/Retention Modeling
  • Modeling Algorithms Library Development for
    Defense

...
3
Instructor Info
  • Ertan Karakurt
  • founder of Izmir based Akilli Sistemler
  • fuzzy/exact searching/matching engine for
    Databases
  • search space analyzing, learning
  • algorithm space analyzing, learning
  • parallelization architecture

4
Course Objective
  • stimulate university and industry cooperation
  • create an opportunity to work with real life
    applications and problems in Data Mining
  • case studies on data dictionaries
  • case studies on physically built data mining
    models
  • adjusting/utilizing the balance point between
    theory and application in Data Mining

5
Course Syllabus
  • Course topics
  • Introduction (Week1-Week2)
  • What is Data Mining?
  • Data Collection and Data Management Fundamentals
  • The Essentials of Learning
  • The Emerging Needs for Different Data Analysis
    Perspectives
  • Data Management and Data Collection Techniques
    for Data Mining Applications (Week3-Week4)
  • Data Warehouses Gathering Raw Data from
    Relational Databases and transforming into
    Information.
  • Information Extraction and Data Processing
    Techniques
  • Data Marts The need for building highly
    specialized data storages for data mining
    applications

6
Course Syllabus
  • Case Study 1 Working and experiencing on the
    properties of The Retail Banking Data Mart (Week
    4 Assignment1)
  • Data Analysis Techniques (Week 5)
  • Statistical Background
  • Trends/ Outliers/Normalizations
  • Principal Component Analysis
  • Discretization Techniques
  • Case Study 2 Working and experiencing on the
    properties of discretization infrastructure of
    The Retail Banking Data Mart (Week 5 Assignment
    2)
  • Lecture Talk In-class discussion

7
Course Syllabus
  • Clustering Techniques (Week 6)
  • K-Means Clustering
  • Condorcet Clustering
  • Other Clustering Techniques
  • Case Study 3 Working and experiencing on the
    properties of the clustering infrastructure for
    The Retail Banking (Week 6 Assignment3)
  • Lecture Talk In-class Discussion

8
Course Syllabus
  • Classification Techniques (Week 7- Week 8- Week
    9)
  • Inductive Learning
  • Decision Tree Learning
  • Association Rules
  • Regression
  • Probabilistic Reasoning
  • Bayesian Learning
  • Case Study 4 Working and experiencing on the
    properties of the classification infrastructure
    of Propensity Score Card System for The Retail
    Banking (Assignment 4) Week 9

9
Course Syllabus
  • Prediction Techniques (Week 10- Week 11)
  • Neural Networks
  • Radial Basis Networks
  • Reinforcement Learning
  • Case Study 5 Working and experiencing on the
    properties of the prediction infrastructure of
    Propensity Score Card System for The Retail
    Banking (Assignment 5) (Week 11)
  • Other Classification and Prediction Techniques
    (Week 12- Week 13)
  • Text Mining and Web Mining
  • Explanation Based Learning
  • Rule Based Learning
  • Genetic Algorithms
  • Recurrent Networks
  • Case Study 6 Working and experiencing on the
    properties of Genetic Algorithms infrastructure
    for Neural Network Topology Estimation
    (Assignment 6) (Week 13)

10
Course Syllabus
  • Assesment
  • One midterm examination (35)
  • One final examination (55)
  • In-class reviewed Case Studies Based Assignments
    (10)
  • There will be six assignments for each reviewed
    case studies. The assignments encouraged to be
    done by groups of two or three people

11
Course Syllabus
  • Text Book
  • Jiawei Han and Micheline Kamber, Data Mining
    Concepts and Techniques, 2nd ed., Morgan
    Kaufmann, 2006.
  • Supplementary Books
  • Hastie, R. Tibshirani, and J. Friedman, The
    Elements of Statistical Learning Data Mining,
    Inference, and Prediction, Springer-Verlag, 2001
  • P.-N.Tan, M. Steinbach, and V. Kumar,
    Introduction to Data Mining, Addison-Wesley,
    2006. ISBN 0-321-32136-7
  • Tom M. Mitchell, Machine Learning, McGraw-Hill,
    1997.
  • C. M. Bishop, Pattern Recognition and Machine
    Learning, Springer 2007
  • R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
    Classification, 2ed., Wiley-Inter-science, 2001.

12
Week1- What Is Data Mining?
  • "Drowning in Data yet Starving for Knowledge"
  • ???
  • "Computers have promised us a
    fountain of wisdom but delivered a flood of data"
    William J. Frawley, Gregory
    Piatetsky-Shapiro, and Christopher J. Matheus

13
Week1-What Is Data Mining?
  • Data flood
  • Information society produces vast amounts of data
  • Data are generated by
  • Bank, telecom, other business transactions ...
  • Scientific data astronomy, biology, etc
  • Web, text, image, and e-commerce

14
Week1-What Is Data Mining?
  • ATT handles billions of calls per day
  • As of 2003, according to Winter Corp. Survey,
  • ATT has a 26 TB decision-support database.
  • Web
  • 1998 26 million pages
  • 2003 Google searches 4 billion pages, many
    hundreds TB
  • 2005 Google searches 8 billion pages
  • 2008 1 trillion (1,000,000,000,000) pages.

15
Week1-What Is Data Mining?
  • UC Berkeley 2003 estimate
  • 5 exabytes (5 million terabytes) of new data was
    created in 2002.
  • Twice as much information was created in 2002 as
    in1999 (growth rate about 30 a year)
  • Other growth rate estimates are even higher
  • Very few data will ever be looked at by a human
  • Tools are needed to make sense and use of data

16
Week1-What Is Data Mining?
  • Data
  • raw
  • atomic
  • Information
  • processed
  • re-organized
  • grouped
  • Knowledge
  • patterns, models, findings behind Information
  • Wisdom
  • perfect orchestration of Knowledge

Data (Operation)
Information (Analytic)
Data
Knowledge
Wisdom
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in
information?
T. S. Eliot
17
Week1-What Is Data Mining?
  • Hypothesis
  • current data bases contain a lot of potentially
    important knowledge that can be used for
    wise-decisionining
  • Mission of DM
  • find it !!!

18
Week1-What Is Data Mining?
  • Data Mining (Alternative Name Knowledge
    Discovery in Databases KDD) definitions
  • mining knowledge from data
  • process of extracting interesting (non-trivial,
    implicit, previously unknown and potentially
    useful) knowledge or patterns from data in large
    databases.
  • discover knowledge that characterizes general
    properties of data
  • discover patterns on the previous and current
    data in order to make predictions on future data

19
Week1-What Is Not Data Mining?
  • "Torturing data until it confesses ... and if you
    torture it enough, it will confess to anything"
    Jeff Jonas, IBM
  • "An Unethical Econometric practice of massaging
    and manipulating the data to obtain the desired
    results" W.S. Brown Introducing Econometrics
  • "A buzz word for what used to be known as DBMS
    reports" An Anonymous Data Mining Skeptic

20
Week1-What Is Data Mining?
21
Week1-What Is Data Mining?
  • Data Mining -an interdisciplinary field
  • Databases
  • Statistics
  • High Performance Computing
  • Machine Learning
  • Visualization
  • Mathematics

22
Week1-What Is Data Mining?
  • Data Mining -an interdisciplinary field
  • Large Data sets in Data Mining
  • Efficiency of Algorithms is important
  • Scalability of Algorithms is important
  • Real World Data
  • Lots of Missing Values
  • Pre-existing data - not synthetic
  • Data not static - prone to updates
  • Domain Knowledge in the form of integrity
    constraints available.
  • Exploratory data analysis

23
Week1-Data Mining Application Examples
  • Credit Assessment
  • Stock Market Prediction
  • Fault Diagnosis in Production Systems
  • Medical Discovery
  • Fraud Detection
  • Hazard Forecasting
  • Buying Trends Analysis
  • Organizational Restructuring
  • Target Mailing
  • ---

24
Week1-Data Mining Application Examples
  • Credit Assessment
  • Stock Market Prediction
  • Fault Diagnosis in Production Systems
  • Medical Discovery
  • Fraud Detection
  • Hazard Forecasting
  • Buying Trends Analysis
  • Organizational Restructuring
  • Target Mailing
  • ---

25
Week1-Data Mining Application Examples
  • Can I develop a general characterization/profile
    of different investor types? (characterization)
  • What characteristics distinguish between Online
    and Broker investors? (classification)
  • Can I develop a model which will predict the
    average trades/month for a new investor?
    (regression)

26
Week1-Data Mining Application Examples
  • the natural question is to predict the Diagnosis
    from the symptoms (Medical Diagnosis Prediction)

27
Week1-Data Mining Application Examples
  • Assessing Credit Risk
  • Situation Person applies for a loan
  • Task Should a bank approve the loan?
  • Need to predict the credit risk of the person
    people with bad credit are not likely to repay.

28
Week1-Data Mining Application Examples
  • A person buys a book (product) at amazon.com.
  • Task Recommend other books (products) this
  • person is likely to buy
  • Amazon does clustering based on books bought
  • customers who bought Advances in Knowledge
  • Discovery and Data Mining, also bought Data
  • Mining Practical Machine Learning Tools and
  • Techniques with Java Implementations
  • Recommendation program is quite successful

29
Week 1-End
  • read
  • Course Text Book Chapter 1
Write a Comment
User Comments (0)
About PowerShow.com