Data Mining Engineering - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining Engineering

Description:

Title: Data Mining Engineering Author: Peter Brezany Last modified by: Peter Brezany Created Date: 8/2/1995 10:08:02 AM Document presentation format – PowerPoint PPT presentation

Number of Views:210
Avg rating:3.0/5.0
Slides: 27
Provided by: PeterB233
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Engineering


1
Data Analysis for Decision and Management
Processes
Univ.-Prof. Dr. Peter Brezany Institute of
Scientific Computing Faculty for Information
Science University of Vienna E-mail
brezany_at_par.univie.ac.at WWW http//www.par.univi
e.ac.at/brezany http//artemis.wszib.edu.pl/brez
any/
2
Institute of Scientific Computing Research
Profile
The primary objectives of the Institute are
- to conduct research in high-performance
advanced data analysis, knowledge
management, programming languages,
compilers, programming environments and software
tools for high performance computing
systems, - to actively contribute to a
transfer of technology to industry
- to disseminate knowledge in the fields of
parallel and distributed computing and
software technology
3
Institute for Software Science Main Research
Projects and Cooperations
Participation in 14 EU projects (coordination of
1 project) The European Centre of Excellence for
Parallel Computing, a department of the
Institute, founded by the EU Coordination of the
CEI-PACT project (Austria, Slovakia, Czech
Republic, Poland, Italy, Hungary,
Slovenia) Special Research Program AURORA of the
Austrian Science Fund (1997-2007) Many
international cooperations (NASA, CalTech, CERN,
...)
4
New Research Field GRID COMPUTING
The Grid a new distributed com- puting
infrastructure for science and engineering. The
Grid consists of physical resources (computers,
disks, net- works, databases, sensors,
laboratory equipments) and middleware software
that ensures the access and the coordinated use
of such resources.
5
Media That Radically Influenced Society
1850s Telegraph
1840s Penny Post
1500s Printing Press
1930s Radio
1950s TV
1920s Telephone
20xx Grid
1990s
Web
6
Outline
  • Business Intelligence, knowledge management
  • Relation data, information, knowledge
  • Knowledge discovery process System
    Architectures
  • Data warehousing and data webhousing
  • Data preparation
  • selection, preprocessing (cleaning,
    transformation), integration
  • Data mining techniques
  • association rules, sequences, classification,
    prediction, neural networks, clustering,
    meta-learning
  • Advanced topics
  • Multi-agent and mobile agent systems
  • Web mining
  • intelligent search engines
  • semantic web
  • information and knowledge management on computing
    grids
  • security issues

7
Basic Literature
Mark and Mary Whitehorn Business Intelligence
The IBM Solution. Springer-Verlag, 2000. R.
Kimball The Data Warehouse Toolkit. John Willey,
1996. J. Han, M. Kamber Data Mining. Concepts
and Techniques Morgam Kaufmann Publishers,
2000. M. Ester, J. Sander Knowledge Discovery
in Databases. Springer-Verlag, 2000 (in
German). I.H. Witten, E. Frank Data Mining.
(Practical Machine Learning Tools and Techniques
with Java Implementations). Morgam Kaufmann
Publishers, 2000.
8
Time Schedule
  • Monday, Feb 27 17.15 20.30 (4
    hours)
  • Tuesday, Feb 28 10.00 -- 13.15 (4
    hours)
  • Wednesday, Mar 01 15.30 18.45 (3
    hours)
  • Thursday, Mar 02 16.00 --18.15 (3
    hours)
  • Location s.1 AK4

9
Business Intelligence
Definition Business Intelligence is an
umbrella term, broadly covering the processes
involved in extracting valuable business
information and knowledge from the mass of data
that exists within a typical enterprise, and
knowledge management (knowledge storage in an
appropriate form and knowledge distribution). Wha
t is meant by information and knowledge? This is
best un- derstood by imagining a chain linking
data ? information ? knowledge.
10
Data ? Information ? Knowledge
  • Data are the facts about events or processes.
  • Information is the organization of, associations
    between, and constraints upon data that allow it
    to be used by a user or a machine.
  • Knowledge is the interpretation of information
    and its use in a problem solving context.
    Knowledge can lead to new insights, which in turn
    lead to new innovations and ultimately to wealth
    creation and improvements in the quality of life.
  • Wisdom arises when one understands the
    foundational principles responsible for the
    patterns representing knowledge (She/he can
    answer questions like Why ... ? and knows how he
    can find or derive new knowledge.

11
Data
Example When a customer visits a gass station
and buys petrol, it is possible to describe this
transaction with the following data data/time,
volume, price. However, this data do not say,
why this customer has chosen this station and not
any other, and it is not possible to find out
from this data whether he will come again, or
whether this station is good or bad. Data alone
posses almost no meaning nor purpose. They
are the base material for getting information.
12
Information
  • A piece of information can be described as a
    message.
  • As all messages, information has one sender and
    one receiver.
  • Information shall form the opinion or attitude of
    the receiver to a problem and influence his
    behavior.
  • We can also think of information as data which
    something changes/forms/influences.
  • The word inform originally meant give some
    form one thing or person.

13
Information (2)
  • Data become information when the receiver adds
    some meaning to data. Such a data upgrading can
    be done in different ways, for example
  • Contextualizing We know for what purpose the
    data was collected.
  • Calculation The data could be mathematically
    analyzed und statistically
  • enriched.
  • Correction Errors are removed from the data
    material.
  • Comprising The data is transformed into a more
    compact form main
  • components of the data material have to be
    identified.

14
Information Management
Information management all management tasks,
which deal with information and communication in
one enterprise.
15
Knowledge
  • Knowledge is the production factor of the future,
    which will replace energy and materials.
  • Knowledge is produced by means of head activity
    and processes, which modell the head activity.
  • Transformation process Information ? Knowledge
  • Comparison How shall I estimate information
    about the current situation in comparison to
    other known situations?
  • Consequence How will information influence
    decisions and activities.
  • Connex Which relations exist between one
    concrete information element and another one?
  • Conversation How do think other people about one
    certain piece of information?

16
Knowledge (2)
  • People gain knowledge through experience they
    see, hear, touch, and taste the world around
    them.
  • We can associate something we see with something
    we hear, thereby gaining new knowledge about the
    world.
  • Suppose we know that the sun is hot, balls are
    round, and the sky is blue. These facts are
    knowledge about the world. How do we store this
    knowledge in our brain? How could we store this
    knowledge in a computer?
  • This problem, called knowledge representation, is
    one of the first, most fundamental issues that
    researchers in artificial intelligence had to
    face.

17
Knowledge Pyramide
Decision
Action Knowledge Information
Data Characters
Pragmatics (Associated with Context and
Experience)
Semantics (Meaning)
Syntax
Knowledge has 3 Dimensions Syntax, Semantics,
and Pragmatics.
18
Example
  • Characters t i s n i o r o l a n i l w
  • Data The above characters give with the
    right syntax (here the
    sequence of letters) a statement It will rain
    soon.
  • Information The above statement
    means Water drops fall from the sky.
  • Knowledge Information Water drops fall
    from the sky. is connected with experience and
    expectations like One can become wet it can
    rain into the flat.
  • Action Based on this knowledge,
    activities are developed I will take an
    umbrella, I will close the window, etc.

19
Knowledge Management
Knowledge management all management tasks of
the enterprise, which deal with obtaining,
utillization, and further development of
knowledge.
20
Knowledge Representation
  • Procedural representation
  • Perhaps the most common technique for
    representing knowledge in computers is to use
    procedural knowledge.Procedural code not only
    encodes facts (constants and variables) but also
    defines a sequence of operations for using and
    manipulating those facts. Thus, program code is a
    perfect natural way of encoding procedural
    knowledge. This hardcoded logic is typically
    not considered to be part of artificial
    intelligence per se.
  • Declarative representation
  • A user simply states facts, rules, and
    relationships. However, declarative knowledge
    must be processed by some procedural code. Most
    of the knowledge representation techniques
    studied in artificial intelligence are
    declarative. Some of them are shown on the
    following slides.

21
Knowledge Representation - Rules
  • General form of a predicate logic rule
  • if antecedents(s) then consequents(s)
  • (Instead antecedent, other names, e.g.,
    precondition, are used.
  • Instead consequent, other names, e.g.,
    conclusion, action,
  • hypothesis, are used.)
  • Rules can have following forms
  • if P then Q
  • if P1 and P2 and ... and Pn then Q1 and Q2 and
    ... and Qm
  • if P1 and P2 or ... or Pn then Q
  • Rules, which produce new facts, are called
    production rules.

22
Rules (2)Architecture of a Production System
Inference mechanisms
Fact base
Knowledge base
Recognize
Select
Rules
Facts
Act
23
Semantic Nets
Semantic nets are used to define the meaning of a
concept by its relationships to other
concepts. A graph data structure is used, with
nodes holding concepts and links with natural
language labels showing the relationships. A
portion of a semantic net representation of the
vehicle domain is shown in the next
slide. Remark The standard relationships such
as isa, has-part, and instance should
be familiar to readers with object-oriented
design experience.
24
A Semantic Net Example
has-part
Vehicle
Wheels
is-a
has-part
Motor
has-part
Automobile
Doors
4
num-wheels
is-a
size
num-doors
Sports Car
2
Small
instance
Corvette
25
Business Intelligence Tools
  • Data warehouses
  • OLAP (On-Line Analytical Processing) tools
  • Data mining tools
  • Text mining tools
  • Data joiners
  • Business Intelligence portals, etc.

26
Business Intelligence Tools (cont.)
  • Data warehouse - a repository of multiple
    heterogeneous data sources, organized under a
    unified schema at a single site in order to
    facilitate management decision making.
  • OLAP analysis techniques with functionalities
    such as summari- zation, consolidation, and
    aggregation, as well as the ability to view
    information from different angles.
  • Data mining extracting or mining knowledge
    from large data sets.
  • Text mining mining large textual (document)
    databases. Related term web mining.
  • Data joiner - working with data from disparate,
    heterogeneous data sources
  • Business Intelligence portal a Web site
    designed to be the first point of entry for
    visitors to information about a company. With
    help of the portals personalising functions, the
    user can choose informa-tion sources that he
    needs for performing a specific task. The portal
    allows problemless access to valuable information
    and data analyses so, the basis for competent
    decisions is optimized.
Write a Comment
User Comments (0)
About PowerShow.com