Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining

Description:

Title: CSE591 Data Mining Last modified by: Lijun Created Date: 9/30/1996 6:28:10 PM Document presentation format: On-screen Show Other titles: Times New Roman Tahoma ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 20
Provided by: cecsWrig7
Learn more at: http://cecs.wright.edu
Category:

less

Transcript and Presenter's Notes

Title: Data Mining


1
CS499/699-10 Data Mining
  • Fall 2003
  • Professor Guozhu Dong
  • Computer Science Engineering
  • WSU

2
Introduction
  • Introduction to this Course
  • Introduction to Data Mining

3
Introduction to the Course
  • First, about you - why take this course?
  • Your background and strength
  • AI, DBMS, Statistics, Biology, Business,
  • Your interests and requests
  • What is this course about?
  • Problem solving
  • Handling data
  • transform data to workable data
  • Mining data
  • turn data to knowledge
  • validation and presentation of knowledge

4
This course
  • What can you expect from this course?
  • Knowledge and experience about DM
  • Problem solving skills
  • How is this course conducted?
  • Home works, projects, exams, classes
  • Course Format
  • Individual Projects 30
  • Exams and/or quizzes 60
  • Homeworks 10

5
Course Web Site
  • cs.wright.edu/gdong/mining03/WSUCS499DataMining.h
    tm
  • My office and office hours
  • RC 430
  • 430-530, T Th
  • My email gdong_at_cs.wright.edu
  • Slides and relevant information will be made
    available at the course web site

6
Any questions and suggestions?
  • Your feedback is most welcome!
  • I need it to adapt the course to your needs.
  • Please feel free to provide yours anytime.
  • Share your questions and concerns with the class
    very likely others may have the same.
  • No pain no gain no magic for data mining.
  • The more you put in, the more you get
  • Your grades are proportional to your efforts.

7
Introduction to Data Mining
  • Definitions
  • Motivations of DM
  • Interdisciplinary Links of DM

8
What is DM?
  • Or more precisely KDD (knowledge discovery from
    databases)?
  • Many definitions
  • An iterative process, not plug-and-play
  • raw data ? transformed data ? preprocessed data ?
    data mining ? post-processing ? knowledge
  • One definition is
  • A non-trivial process of identifying valid,
    novel, useful and ultimately understandable
    patterns in data

9
Need for Data Mining
  • Data accumulate and double every 9 months
  • There is a big gap from stored data to knowledge
    and the transition wont occur automatically.
  • Manual data analysis is not new but a bottleneck
  • Fast developing Computer Science and Engineering
    generates new demands
  • Seeking knowledge from massive data
  • Any personal experience?

10
When is DM useful
  • Data rich world
  • Large data (dimensionality and size)
  • Image data (size)
  • Gene chip data (dimensionality)
  • Little knowledge about data (exploratory data
    analysis)
  • What if we have some knowledge?

11
DM perspectives
  • KDD goals Prediction, description,
    explanation, optimization, and exploration
  • Knowledge forms patterns vs. models
  • Understandability and representation of knowledge
  • Some applications
  • Business intelligence (CRM)
  • Security (Info, Comp Systems, Networks, Data,
    Privacy)
  • Scientific discovery (bioinformatics, medicine)

12
Challenges
  • Increasing data dimensionality and data size
  • Various data forms
  • New data types
  • Streaming data, multimedia data
  • Efficient search and access to data/knowledge
  • Intelligent update and integration

13
Interdisciplinary Links of DM
  • Statistics
  • Databases
  • AI
  • Machine Learning
  • Visualization
  • High Performance Computing
  • supercomputers, distributed/parallel/cluster
    computing

14
Statistics
  • Discovery of structures or patterns in data sets
  • hypothesis testing, parameter estimation
  • Optimal strategies for collecting data
  • efficient search of large databases
  • Static data
  • constantly evolving data
  • Models play a central role
  • algorithms are of a major concern
  • patterns are sought

15
Relational Databases
  • A relational database can contain several tables
  • Tables and schemas
  • The goal in data organization is to maintain data
    and quickly locate the requested data
  • Queries and index structures
  • Query execution and optimization
  • Query optimization is to find the best possible
    evaluation method for a given query
  • Providing fast, reliable access to data for data
    mining

16
AI
  • Intelligent agents
  • Perception-Action-Goal-Environment
  • Search
  • Uniform cost and informed search algorithms
  • Knowledge representation
  • FOL, production rules, frames with semantic
    networks
  • Knowledge acquisition
  • Knowledge maintenance and application

17
Machine Learning
  • Focusing on complex representations,
    data-intensive problems, and search-based methods
  • Flexibility with prior knowledge and collected
    data
  • Generalization from data and empirical validation
  • statistical soundness and computational
    efficiency
  • constrained by finite computing data resources
  • Challenges from KDD
  • scaling up, cost info, auto data preprocessing,
    more knowledge types

18
Visualization
  • Producing a visual display with insights into the
    structure of the data with interactive means
  • zoom in/out, rotating, displaying detailed info
  • Various types of visualization methods
  • show summary properties and explore relationships
    between variables
  • investigate large DBs and convey lots of
    information
  • analyze data with geographic/spatial location
  • A pre- and post-processing tool for KDD

19
Bibliography
  • J. Han and M. Kamber. Data Mining Concepts and
    Techniques. 2001. Morgan Kaufmann.
  • D. Hand, H. Mannila, P. Smyth. Principals of Data
    Mining. 2001. MIT.
  • W. Klosgen J.M. Zytkow, edited, 2001, Handbook
    of Data Mining and Knowledge Discovery.
Write a Comment
User Comments (0)
About PowerShow.com