Voislav Galic, vgalic_at_bitsyu.net
Dušan Zecevic, zdusan_at_softhome.net
Ðorde Ðurdevic, madcat_at_tesla.rcub.bg.ac.yu
Veljko Milutinovic, vm_at_etf.bg.ac.yu
- Introduction to Virtual Presence - Data Mining
for Virtual Presence - A New Software
Paradigm - Selected Case Studies
- - Definitions
- VP applications
- Psychological aspects
- Definitions
- What can Data Mining do?
- Growing popularity of Data Mining
- - Algorithms
- A new software paradigm
- Standardization
- FIPA specifications
- Agent management
- Agent Communication Language
- GoodNews (CMU)
- Categorization of financial news articles
- iMatch (MIT)
- help students find resources they need
- advanced, agent-based system architecture
- Tourist city in the future (ETF)
- represents a qualitative step forward in the
domain of maximization of customer satisfaction - technologies
- Data Mining
- Software Agents (mobile)
Carnegie Mellon University, Pittsburgh,
USA Massachusetts Institute of Technology,
USA Faculty of Electrical Energinering,
University of Belgrade, Serbia and Montenegro
- This tutorial will attempt to familiarize you
with - The concept of VP (Virtual Presence) as a
new technological challenge - The new paradigms and technologies that will
bring the VP to everyday life - - Data Mining - Software Agents
- Virtual presence will arguably be one of the
most important aspects of personal communication
in the twenty-first century
Virtual presence is a term with various shades
of meanings in different industries, but its
essence remains constant it is a new tool that
enables some form of telecommunication in which
the individual may substitute their physical
presence with an alternate, typically,
electronic presence
10How to Accomplish it?
- The presence is accomplished through the
Internet, video, or other communications,
perhaps even psychically one day - Technological advance will sophisticate virtual
presence, altering the very meaning of the word
presence - The ability to conduct everyday tasks by being
virtually or electronically present
11VP Applications
- in government
- Sunshine laws
- Voting
- in business
- Online board meetings
- Shareholder voting online
- in education
- interactive lectures and courses
- in medicine
- Telemedicine (Diagnostics, Remote surgery)
- Risks (Privacy)
- in everyday life
- Telecommuting/Telework
- Software agents as our virtual shadows
12Psychological Aspects
- Cyberspace and Mind
- Presence in Virtual Space
- Knowledge discovery is a non-trivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data
14Many Definitions
- Data mining is also called data or knowledge
discovery - It is a process of inferring knowledge from
large oceans of data - Search for valuable information in large volumes
of data - Analyzing data from different perspectives and
summarizing it into useful information
15What Can Data Mining Do?
- DM allows you to extract knowledge from
historical data and predict outcomes of future
situations - Optimize business decisions and improve
customers satisfaction with your services - Analyze data from many different angles,
categorize it, and summarize the relationships
identified - Reveal knowledge hidden in data and turn this
knowledge into a crucial competitive advantage - Predict cross-sell opportunities and make
recommendations - etc.
16The Power of Data Mining
- Having a database is one thing, making sense of
it is quite another - It does not rely on narrow human queries to
produce results, but instead uses AI related
technology and algorithms - Data mining produces usually more general (more
powerful) results than those obtained by
traditional techniques - Using more than one type of algorithm to search
for patterns in data
17Reasons for the Growing Popularity of Data Mining
- Growing Data Volume
- Low Cost of Machine Learning
- Limitations of Human Analysis
18Tasks Solved by Data Mining
- Predicting
- Classification
- Detection of relations
- Explicit modeling
- Clustering
- Market basket analysis
- Deviation detection
- Data mining includes three major components,
with corresponding algorithms - Clustering (Classification)
- Association Rules
- Sequential Analysis
19Classification Algorithms
- Statistical algorithms
- Neural networks algorithms
- Genetic algorithms
- Nearest neighbor method
- Rule induction
- Data visualization
- Decision tree building algorithms
- Parallel algorithms
20Association Rule Algorithms
- Association rule implies certain association
relationship among the set of objects in a
database - These objects occur together, or one implies
the other - Formally X ? Y, where X and Y are sets of items
(itemsets) - Key terms
- Confidence
- Support
- The goal to find all association rules that
satisfy user-specified minimum support and
minimum confidence constraints - Apriori algorithm and its variations
- Distributed / Parallel algorithms
21Sequential Analysis
- Sequential Patterns
- The problem finding all sequential patterns
with user-specified minimum support - Elements of a sequential pattern need not to be
- consecutive
- simple items
- Algorithms for finding sequential patterns
- count-all algorithms
- count-some algorithms
- Various applications (market, banking, sports)
- Drawbacks of existing algorithms
- Data size
- Data noise
- Query complexity
- The infrastructure has to be significantly
enhanced to support larger applications - Solutions
- Adding extensive indexing capabilities
- Using new HW architectures to achieve
improvements in query time
- All software agents are programs, but not all
programs are agents
24Many Definitions
- Computational systems that inhabit some dynamic
environment, sense and act autonomously and
realize a set of goals or tasks for which they
are designed - Hardware or (more usually) software-based
computer system that enjoys the following
- Reactive (sensing and acting) - Autonomous -
Goal-oriented (pro-active purposeful) -
Temporally continuous - Communicative (socially
- Learning (adaptive) - Mobile - Flexible -
25What Problems do Agents Solve ?
- Client/server network bandwidth problem
- In the design of a client/server architecture
- The problems created by intermittent or
unreliable network connections - Attempts to get computers to do real thinking for
26The New Software Paradigm
- Unless special care has been taken in the design
of the code, two software programs cannot
interoperate - The promise of agent technology is to move the
burden of interoperability from software
programmers to programs themselves - This can happen if two conditions are met
- A common language (Agent Communication Language
ACL) - An appropriate architecture
- They draw on and integrate many diverse
disciplines of computer science and other areas
27FIPA Specifications
- The Foundation for Intelligent Physical Agents
(FIPA), established in 1996 in Geneva - FIPA specifications
- Agent Management
- Agent Communication Language
- Agent/Software Integration
- Agent Management Support for Mobility
- Human-Agent Interaction
- Agent Security Management
- Agent Naming
- FIPA Architecture
- Agent Message Transport
- etc.
28Agent Management
- Provides the normative framework within which
FIPA agents exist and operate - Establishes the logical reference model for the
creation, registration, location, communication,
migration and retirement of agents
- The entities contained in the reference model are
logical capability sets and do not imply any
physical configuration - - Additionally, the implementation details of
individual APs and agents are the design choices
of the individual agent system developers
29Components of the Model
- computational process - fundamental actor on an
AP - as a physical software process has a life
cycle that has to be managed by the AP
- - yellow pages to other agents
- supported function are
- register
- deregister
- modify
- search
- - white pages services to other agents
- - maintains a directory of AIDs which contain
transport addresses - supported function are
- register
- deregister
- modify
- search
- get-description
- operations for underlying AP
- Message Transport Service
- communication method between agents
- physical infrastructure in which agents can be
- all non-agent, executable collections of
instructions accessible through an agent
30Agent Life Cycle
- FIPA agents exist physically on an AP and utilize
the facilities offered by the AP for realising
their functionalities - In this context, an agent, as a physical software
process, has a physical life cycle that has to
be managed by the AP
The state transitions of agents can be described
- create - invoke - destroy - quit - suspend
- resume - wait - wake up - move - execute
31Agent Communication Language
- The specification consists of a set of message
types and the description of their meanings
- Implementing a subset of the pre-defined message
types and protocols - Sending and receiving the not-understood message
- Correct implementation of communicative acts
- defined in the specification
- Freedom to use communicative acts with other
names, - not defined in the specification
- Obligation of correctly generating messages in
the transport form - Language must be able to express propositions,
objects and actions - The use of Agent Management Content Language and
- Pre-defined message parameters
sender receiver content reply-with in-reply-t
o language ontology reply-by protocol
confirm disconfirm inform not-understood query-if
query-ref refuse etc.
32Communication Examples
- Agent i asks agent j for its available
services (query-ref sender i
receiver j content (iota ?x
(available-services j ?x)) )
- Agent j refuses to i reserve a ticket for i,
since i there are insufficient funds in i's
account (refuse sender j receiver
i content ( (action j
(reserve-ticket LHR, MUC, 27-sept-97))
(insufficient-funds ac12345) )
language sl)
- Agent i, believing that agent j thinks that a
shark is a - mammal, attempts to change j's belief
- (disconfirm sender i receiver j
content (mammal shark) - )
- Agent i asks agent j if j is registered with
domain server d1 (query-if sender i
receiver j content (registered
(server d1) (agent j)) reply-with
r09) ... (inform sender j receiver
i content (not (registered (server d1)
(agent j))) in-reply-to r09)
- Auction bid (inform sender agent_X
receiver auction_server_Y content
(price (bid good02) 150) in-reply-to
round-4 reply-with bid04 language sl
ontology auction)
- Agent j replies that it can reserve trains,
planes and automobiles (inform sender j
receiver i content ( (iota ?x
(available-services j ?x))
((reserve-ticket train)
(reserve-ticket plane) (reserve
automobile)) ) )
- Agent i did not understand an query-if message
because it did not recognize the
ontology (not-understood sender i
receiver j content ((query-if sender j
receiver i ) (unknown (ontology
www))) language sl )
- Agent i confirms to agent j that it is,
- in fact, true that it is snowing today
- (confirm sender i receiver j
content "weather( today, snowing )"
language Prolog - )
- A system that automatically categorizesnews
reports that reflect positively or negativelyon
a companys financial outlook
- Correlation between news reports on a companys
financial outlook and its attractiveness as an
investment - Text categorization very difficult domainfor
the use of machine learning - Very large number of input features
- High level of noise (metaphors, irony,)
- Large percent of irrelevant features
- A new text classification algorithm Domain
Experts - Two types of data
- (Human-)labeled
- Unlabeled
- The algorithm classifies financial news into the
predefined five categories - FCP (Frequently Co-located Phrase) the building
elementfor the categorization algorithm
- The algorithm categorizes each given news article
into the predefined categories - GOOD strong and explicit evidences of the
companys financial status - shares of ABC company rose 2 percent
- GOOD, UNCERTAIN predictions and forecasts of
future profitability - ABC company predicts fourth-quarter earnings
will be high - NEUTRAL nothing is mentioned about the
financial well-being of the company - ABC announced plans to focus on products based
on recycledmaterials - BAD, UNCERTAIN predictions of future loses
- ABC announced today that fourth-quarter results
couldfall short of expectations - BAD explicitly bad evidences
- shares of ABC fell 0.57 to 44.65 in early NY
36Co-located Phrase
- The proposed algorithm labels the unlabeled
news articlesthrough voting process among
experts that are FCPs - Definition a co-located phrase is a sequence of
nearby, but not necessarily consecutive words - shares of ABC rose 8.5 (shares, rose) GOOD
- ABC presented its new product (present,
product) NEUTRAL
class selected FCP
share gains rose, profit revenue rose
/? except forecasts earnings
/- alliance company, deal present product
-/? short expectation
- share down lost, profit sales decrease
- Problems with construction of the training (i.e.
labeled)data set inter-indexer inconsistency - Problems with small sets of labeled (training)
data - Very expensive labeled data, while unlabeled
data are cheaply available - The accuracy is around 75 (total of 2000 news
articles) - Comparison of a few different methods (picture)
Naive-Bayes v Domain Experts
- The vision of each MIT student
- having a personal software agent,
- which helps to manage its owner's academic life
- The aim - bring together MIT students and staff
who may usefully collaborate with each other - completing final projects
- studying for exams
- tutoring one another
- Facilitate students and faculty matching for
- Research
- Teaching
- Internship
40Ceteris Paribus Preference
- Ceteris paribus relations express a preference
over sets of possible outcomes - All possible outcomes are considered to be
describable by some (large) set of binary
features (true or false) - The specified features are instantiated to either
true or false - Other features are ignored
41CPP Agent Configuration
- Specify a domain for preference
- Agent methods of communication and notification
- Different security settings of different servers
- Preference statements themselves
- How to get users to easily adjust C.P. rules
(graphical interface) - Pose hypothetical preference questions to user to
help complete the preferences of an ambivalent
user - People will only put down their true profile, if
they know that the system is secure
- Benefit MIT students by matching them to
appropriate resources - Static interest matching
- Group together similar users for specific context
- This enables viewing a human user as a resource
for dynamic resource discovery (locate experts,
enthusiasts,...) - Dinamic interest matching
- Location and/or temporal specific resource
matching As students and their agents move from
one physical location to another, iMatch
services for matching the closest resources can
be offered - Help students manage their lives
43The near future
- The focus of the research is on e-tourism after
the year 2005, but the applications of the
proposed infrastructure are multifold
- The assumptions
- after the year 2005, each tourist in Europe will
be equiped with a cell phone of the power same or
better than the Pentium IV - whenever a tourism-based service or product is
purchased, a mobile agent is assigned to that
cell phone PC, to monitor the behaviour of the
customer - all tourist cell phone PCs create an AD-HOC
networkaround the points of touristic
attractions, and link to a data mine that
collects all information of interest
45How to accomplish it?
- The information of interest is not collected by
asking the customer to fill out the forms, but by
monitoring the behaviour of the customer - The collected information, sorted in the data
mine, is made available to other tourists, as an
on-line owner-independent source of information
about the given services and/or products
46What can it do
- If a tourist would like to know, at that very
moment, what restaurant has good food/atmosphere
and happy customers, he/she can access the data
mine (via the Internet) and can obtain the
information that is linked to that very moment,
and is not created by the owner of the business,
but by the customers - Accessing the given restaurants website has two
drawbacks - the information is not fresh - periodically
updated - the information is made by the owner of the
restaurant, and therefore not completely objective
- Consequently, the proposed approach works much
better, and represents a qualitative step
forward in the domain of maximization of
customer satisfaction - This may mean that the privacy of the customers
is jeopardized,however, if the monitored
behaviour is non-personalized, and if the
customer obtains a discount based on the fact
that mobile agents are welcome, the privacy stops
to be an issue, and people will sign up
