Title: Data Mining Engineering
1Data Analysis for Decision and Management
Processes
Univ.-Prof. Dr. Peter Brezany Institute of
Scientific Computing Faculty for Information
Science University of Vienna E-mail
brezany_at_par.univie.ac.at WWW http//www.par.univi
e.ac.at/brezany http//artemis.wszib.edu.pl/brez
any/
2Institute of Scientific Computing Research
Profile
The primary objectives of the Institute are
- to conduct research in high-performance
advanced data analysis, knowledge
management, programming languages,
compilers, programming environments and software
tools for high performance computing
systems, - to actively contribute to a
transfer of technology to industry
- to disseminate knowledge in the fields of
parallel and distributed computing and
software technology
3Institute for Software Science Main Research
Projects and Cooperations
Participation in 14 EU projects (coordination of
1 project) The European Centre of Excellence for
Parallel Computing, a department of the
Institute, founded by the EU Coordination of the
CEI-PACT project (Austria, Slovakia, Czech
Republic, Poland, Italy, Hungary,
Slovenia) Special Research Program AURORA of the
Austrian Science Fund (1997-2007) Many
international cooperations (NASA, CalTech, CERN,
...)
4New Research Field GRID COMPUTING
The Grid a new distributed com- puting
infrastructure for science and engineering. The
Grid consists of physical resources (computers,
disks, net- works, databases, sensors,
laboratory equipments) and middleware software
that ensures the access and the coordinated use
of such resources.
5Media That Radically Influenced Society
1850s Telegraph
1840s Penny Post
1500s Printing Press
1930s Radio
1950s TV
1920s Telephone
20xx Grid
1990s
Web
6Outline
- Business Intelligence, knowledge management
- Relation data, information, knowledge
- Knowledge discovery process System
Architectures - Data warehousing and data webhousing
- Data preparation
- selection, preprocessing (cleaning,
transformation), integration - Data mining techniques
- association rules, sequences, classification,
prediction, neural networks, clustering,
meta-learning - Advanced topics
- Multi-agent and mobile agent systems
- Web mining
- intelligent search engines
- semantic web
- information and knowledge management on computing
grids - security issues
7Basic Literature
Mark and Mary Whitehorn Business Intelligence
The IBM Solution. Springer-Verlag, 2000. R.
Kimball The Data Warehouse Toolkit. John Willey,
1996. J. Han, M. Kamber Data Mining. Concepts
and Techniques Morgam Kaufmann Publishers,
2000. M. Ester, J. Sander Knowledge Discovery
in Databases. Springer-Verlag, 2000 (in
German). I.H. Witten, E. Frank Data Mining.
(Practical Machine Learning Tools and Techniques
with Java Implementations). Morgam Kaufmann
Publishers, 2000.
8Time Schedule
- Monday, Feb 27 17.15 20.30 (4
hours) - Tuesday, Feb 28 10.00 -- 13.15 (4
hours) - Wednesday, Mar 01 15.30 18.45 (3
hours) - Thursday, Mar 02 16.00 --18.15 (3
hours) - Location s.1 AK4
9Business Intelligence
Definition Business Intelligence is an
umbrella term, broadly covering the processes
involved in extracting valuable business
information and knowledge from the mass of data
that exists within a typical enterprise, and
knowledge management (knowledge storage in an
appropriate form and knowledge distribution). Wha
t is meant by information and knowledge? This is
best un- derstood by imagining a chain linking
data ? information ? knowledge.
10Data ? Information ? Knowledge
- Data are the facts about events or processes.
- Information is the organization of, associations
between, and constraints upon data that allow it
to be used by a user or a machine. - Knowledge is the interpretation of information
and its use in a problem solving context.
Knowledge can lead to new insights, which in turn
lead to new innovations and ultimately to wealth
creation and improvements in the quality of life. - Wisdom arises when one understands the
foundational principles responsible for the
patterns representing knowledge (She/he can
answer questions like Why ... ? and knows how he
can find or derive new knowledge.
11Data
Example When a customer visits a gass station
and buys petrol, it is possible to describe this
transaction with the following data data/time,
volume, price. However, this data do not say,
why this customer has chosen this station and not
any other, and it is not possible to find out
from this data whether he will come again, or
whether this station is good or bad. Data alone
posses almost no meaning nor purpose. They
are the base material for getting information.
12Information
- A piece of information can be described as a
message. - As all messages, information has one sender and
one receiver. - Information shall form the opinion or attitude of
the receiver to a problem and influence his
behavior. - We can also think of information as data which
something changes/forms/influences. - The word inform originally meant give some
form one thing or person.
13Information (2)
- Data become information when the receiver adds
some meaning to data. Such a data upgrading can
be done in different ways, for example - Contextualizing We know for what purpose the
data was collected. - Calculation The data could be mathematically
analyzed und statistically - enriched.
- Correction Errors are removed from the data
material. - Comprising The data is transformed into a more
compact form main - components of the data material have to be
identified.
14Information Management
Information management all management tasks,
which deal with information and communication in
one enterprise.
15Knowledge
- Knowledge is the production factor of the future,
which will replace energy and materials. - Knowledge is produced by means of head activity
and processes, which modell the head activity. - Transformation process Information ? Knowledge
- Comparison How shall I estimate information
about the current situation in comparison to
other known situations? - Consequence How will information influence
decisions and activities. - Connex Which relations exist between one
concrete information element and another one? - Conversation How do think other people about one
certain piece of information?
16Knowledge (2)
- People gain knowledge through experience they
see, hear, touch, and taste the world around
them. - We can associate something we see with something
we hear, thereby gaining new knowledge about the
world. - Suppose we know that the sun is hot, balls are
round, and the sky is blue. These facts are
knowledge about the world. How do we store this
knowledge in our brain? How could we store this
knowledge in a computer? - This problem, called knowledge representation, is
one of the first, most fundamental issues that
researchers in artificial intelligence had to
face.
17Knowledge Pyramide
Decision
Action Knowledge Information
Data Characters
Pragmatics (Associated with Context and
Experience)
Semantics (Meaning)
Syntax
Knowledge has 3 Dimensions Syntax, Semantics,
and Pragmatics.
18Example
- Characters t i s n i o r o l a n i l w
- Data The above characters give with the
right syntax (here the
sequence of letters) a statement It will rain
soon. - Information The above statement
means Water drops fall from the sky. - Knowledge Information Water drops fall
from the sky. is connected with experience and
expectations like One can become wet it can
rain into the flat. - Action Based on this knowledge,
activities are developed I will take an
umbrella, I will close the window, etc.
19Knowledge Management
Knowledge management all management tasks of
the enterprise, which deal with obtaining,
utillization, and further development of
knowledge.
20Knowledge Representation
- Procedural representation
- Perhaps the most common technique for
representing knowledge in computers is to use
procedural knowledge.Procedural code not only
encodes facts (constants and variables) but also
defines a sequence of operations for using and
manipulating those facts. Thus, program code is a
perfect natural way of encoding procedural
knowledge. This hardcoded logic is typically
not considered to be part of artificial
intelligence per se. - Declarative representation
- A user simply states facts, rules, and
relationships. However, declarative knowledge
must be processed by some procedural code. Most
of the knowledge representation techniques
studied in artificial intelligence are
declarative. Some of them are shown on the
following slides.
21Knowledge Representation - Rules
- General form of a predicate logic rule
- if antecedents(s) then consequents(s)
- (Instead antecedent, other names, e.g.,
precondition, are used. - Instead consequent, other names, e.g.,
conclusion, action, - hypothesis, are used.)
- Rules can have following forms
- if P then Q
- if P1 and P2 and ... and Pn then Q1 and Q2 and
... and Qm - if P1 and P2 or ... or Pn then Q
- Rules, which produce new facts, are called
production rules.
22Rules (2)Architecture of a Production System
Inference mechanisms
Fact base
Knowledge base
Recognize
Select
Rules
Facts
Act
23Semantic Nets
Semantic nets are used to define the meaning of a
concept by its relationships to other
concepts. A graph data structure is used, with
nodes holding concepts and links with natural
language labels showing the relationships. A
portion of a semantic net representation of the
vehicle domain is shown in the next
slide. Remark The standard relationships such
as isa, has-part, and instance should
be familiar to readers with object-oriented
design experience.
24A Semantic Net Example
has-part
Vehicle
Wheels
is-a
has-part
Motor
has-part
Automobile
Doors
4
num-wheels
is-a
size
num-doors
Sports Car
2
Small
instance
Corvette
25Business Intelligence Tools
- Data warehouses
- OLAP (On-Line Analytical Processing) tools
- Data mining tools
- Text mining tools
- Data joiners
- Business Intelligence portals, etc.
26Business Intelligence Tools (cont.)
- Data warehouse - a repository of multiple
heterogeneous data sources, organized under a
unified schema at a single site in order to
facilitate management decision making. - OLAP analysis techniques with functionalities
such as summari- zation, consolidation, and
aggregation, as well as the ability to view
information from different angles. - Data mining extracting or mining knowledge
from large data sets. - Text mining mining large textual (document)
databases. Related term web mining. - Data joiner - working with data from disparate,
heterogeneous data sources - Business Intelligence portal a Web site
designed to be the first point of entry for
visitors to information about a company. With
help of the portals personalising functions, the
user can choose informa-tion sources that he
needs for performing a specific task. The portal
allows problemless access to valuable information
and data analyses so, the basis for competent
decisions is optimized.