Title: A MultiAgent System for Personalized Press Reviews
1A MultiAgent System for Personalized Press Reviews
- A. Addis, G. Cherchi,A. Manconi, and E. Vargiu
- Intelligent Agents and Soft-Computing Group
-
- Dept. of Electrical and Electronic Engineering
- University of Cagliari
- (Italy)
2Outline
- Introduction
- The Proposed MAS
- The Abstract Architecture
- The Concrete Architecture
- Experimental Results
- Conclusions and Future Work
3Introduction
4Motivations
- Internet offers a growing amount of data spread
on heterogeneous sources (web services,
distributed databases, web pages, etc.) - Very difficult for the users to select contents
according to their personal interests
5Motivations
I'm interested on concert musics in Sardinia
6Motivations
- Support the user through an automated system,
able to - Retrieve and extract information from
heterogeneous sources - Select the contents really deemed relevant for
the user, according to her/his personal interests.
7The Proposed Approach
- A multiagent system able to
- take into account users needs and
preferences(Personalization) - adapt to changes occurring in the
environment(Adaptation) - interact with other agents and the
user(Cooperation)
8The Proposed Approach
- Why a multiagent system?
- a centralized classification system may be
quickly overwhelmed by a large and dynamic
document stream (like daily-updated news) - internet is intrinsicly large and distributed,
thus it gives the opportunity to take advantage
of distributed computing paradigms and resources
9The Proposed MAS
- The Abstract Architecture
10The AbstractArchitecture
- Information Extraction
- Text Categorization
- Users Feedback
11Information Extraction
- The information extraction module extracts data
from web sources through specialized wrappers - Two kinds of wrappers have been implemented
- HTML/XHTML wrapper
- RSS wrapper
12Information Extractionthe HTML wrapper
- The process of extracting data from HTML
typically consists of two steps - learning page structure to detect the tags
containing object - performing structured data extraction applying
the mapping function to populate the
corresponding data repository
13Information Extractionthe RSS wrapper
- Extracts information from online newpapers
articles in RSS format - Being RSS a well-structured format, it is very
simple to process RSS pages
14Text Categorization
- The Text Categorization module progressively
filters information flowing from external sources
to the end user - high level text categorization using a
high-level taxonomy - personalized text categorization considering the
user's needs and preferences
15High LevelText Categorization
- A fragment of the adopted taxonomy()
-
- economia, affari e finanza
- agricoltura
- industria ed energia
- computer e information technology
- affari e servizi finanziari
- macroeconomia
- borsa e mercati finanziari
- trasporti
-
-
- politica
- difesa
- elezioni e referendum
- governo
- parlamento
- partiti e movimenti
- costituzione
- politica interna
() a subset of the one proposed by the
International Press Telecommunications Council
16High LevelText Categorization
- Classifiers devised to perform text
categorization have been implemented by using kNN
and w-kNN algorithms - These techniques do not require specific training
and are very robust wrt the impact of noisy data
17High LevelText Categorization
- A suitable encoding has been adopted
- all non informative words are removed using a
stop word list - a standard stemming algorithm removes the most
common morphological and inflexional suffixes - for each class of the taxonomy, features
selection, based on the information gain
statistics, has been enforced
18PersonalizedText Categorization
- A set of arguments of interest for the user can
be obtained by composing generic topics with
suitable logical operators (i.e., and, or, not) - For instance, a user might be interested in being
kept informed about all articles that involve
both defense and government
19PersonalizedText Categorization
- Two ways of combining classifiers
- Horizontal composition (combination)
- Vertical composition (pipeline)
20PersonalizedText Categorization
- The combination is evaluated using P-norms
21PersonalizedText Categorization
- The composition exploits a pipeline of
classifiers
C1
C2
Cn
22PersonalizedText Categorization
- Particular care has been taken in limiting the
phenomenon of false negatives - The impact of false positives is reduced at the
composition level exponentially depending on the
number of the involved agents
23Users Feedback
- The user's feedback module is aimed at dealing
with any feedback provided by the user - Two solutions have been implemented
- training an ANN
- using a kNN classifier
24The Proposed MAS
- The Concrete Architecture
25The PACMAS Architecture
- A multiagent architecture designed to support the
development of applications aimed at - Retrieving heterogeneous data spread among
different sources - Filtering and organizing them to personal
interests explicitly stated by each user - Providing adaptation techniques to improve and
refine user profile
26The PACMAS Architecture
Information Sources
Mid-span Levels
27Implementation
28Information Level
- Information agents are aimed at performing
information extraction - a set of agents wraps italian online newspapers
containing articles in RSS and HTML format - an agent wraps the adopted generic taxonomy
29Filter Level
- Filter agents are aimed at performing text
categorization - removing all non-informative words
- removing the most common morphological and
inflexional suffixes - weighting terms using the TF-IDF
- selecting the relevant features
- generating for each document a feature vector
30Task Level
- Task agents are aimed at performing text
categorization. - Agents perform
- High level text categorization
- embody a kNN classifier
- measure the classification accuracy
- Personalized text categorization
- take into account users preferences automatically
composing topics
31Interface Level
- Interface agents are aimed at interacting with
the user and performing users' feedback
32Experimental Results
33Experimental Results
- Task Agents have been trained by a set of
newspaper articles previously classified by
experts of the domain - For each item of the taxonomy
- a set of 200 documents has been selected
- KNN algorithm (with k 7) has been adopted
- 150 features have been used
- random datasets have been generated
34Experimental Results
- For each item of the taxonomy, several tests have
been conducted - a set of 500 documents has been selected
- KNN algorithm (with k 7) has been adopted
- random datasets have been generated
- The average accuracy of the system is 80.05
35Experimental Results
36Conclusions andFuture Work
37Conclusions
- We presented a system aimed at
- retrieving articles from italian online
newspapers - classifying them using suitable machine learning
techniques - The system has been built upon PACMAS, a support
for implementing Personalized, Adaptive, and
Cooperative MultiAgent Systems
38Future Work
- A new release of the system will be improved by
adopting different classifier algorithms - We are investigating how to enhance the
feedback-related functionalities according to an
evolutionary computation framework
39Thats all folks!