A MultiAgent System for Personalized Press Reviews

About This Presentation

Title:

A MultiAgent System for Personalized Press Reviews

Description:

Distributed Agent-based Retrieval Tools The Future of Search Engine's Technology. ... HTML/XHTML wrapper. RSS wrapper. Information Extraction: the HTML wrapper ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 40

Provided by: crs4

Category:

more less

Transcript and Presenter's Notes

Title: A MultiAgent System for Personalized Press Reviews

1
A MultiAgent System for Personalized Press Reviews

A. Addis, G. Cherchi,A. Manconi, and E. Vargiu
Intelligent Agents and Soft-Computing Group
Dept. of Electrical and Electronic Engineering
University of Cagliari
(Italy)

2
Outline

Introduction
The Proposed MAS
The Abstract Architecture
The Concrete Architecture
Experimental Results
Conclusions and Future Work

3
Introduction
4
Motivations

Internet offers a growing amount of data spread
on heterogeneous sources (web services,
distributed databases, web pages, etc.)
Very difficult for the users to select contents
according to their personal interests

5
Motivations
I'm interested on concert musics in Sardinia
6
Motivations

Support the user through an automated system,
able to
Retrieve and extract information from
heterogeneous sources
Select the contents really deemed relevant for
the user, according to her/his personal interests.

7
The Proposed Approach

A multiagent system able to
take into account users needs and
preferences(Personalization)
adapt to changes occurring in the
environment(Adaptation)
interact with other agents and the
user(Cooperation)

8
The Proposed Approach

Why a multiagent system?
a centralized classification system may be
quickly overwhelmed by a large and dynamic
document stream (like daily-updated news)
internet is intrinsicly large and distributed,
thus it gives the opportunity to take advantage
of distributed computing paradigms and resources

9
The Proposed MAS

The Abstract Architecture

10
The AbstractArchitecture

Information Extraction
Text Categorization
Users Feedback

11
Information Extraction

The information extraction module extracts data
from web sources through specialized wrappers
Two kinds of wrappers have been implemented
HTML/XHTML wrapper
RSS wrapper

12
Information Extractionthe HTML wrapper

The process of extracting data from HTML
typically consists of two steps
learning page structure to detect the tags
containing object
performing structured data extraction applying
the mapping function to populate the
corresponding data repository

13
Information Extractionthe RSS wrapper

Extracts information from online newpapers
articles in RSS format
Being RSS a well-structured format, it is very
simple to process RSS pages

14
Text Categorization

The Text Categorization module progressively
filters information flowing from external sources
to the end user
high level text categorization using a
high-level taxonomy
personalized text categorization considering the
user's needs and preferences

15
High LevelText Categorization

A fragment of the adopted taxonomy()

economia, affari e finanza
agricoltura
industria ed energia
computer e information technology
affari e servizi finanziari
macroeconomia
borsa e mercati finanziari
trasporti
politica
difesa
elezioni e referendum
governo
parlamento
partiti e movimenti
costituzione
politica interna

() a subset of the one proposed by the
International Press Telecommunications Council
16
High LevelText Categorization

Classifiers devised to perform text
categorization have been implemented by using kNN
and w-kNN algorithms
These techniques do not require specific training
and are very robust wrt the impact of noisy data

17
High LevelText Categorization

A suitable encoding has been adopted
all non informative words are removed using a
stop word list
a standard stemming algorithm removes the most
common morphological and inflexional suffixes
for each class of the taxonomy, features
selection, based on the information gain
statistics, has been enforced

18
PersonalizedText Categorization

A set of arguments of interest for the user can
be obtained by composing generic topics with
suitable logical operators (i.e., and, or, not)
For instance, a user might be interested in being
kept informed about all articles that involve
both defense and government

19
PersonalizedText Categorization

Two ways of combining classifiers
Horizontal composition (combination)
Vertical composition (pipeline)

20
PersonalizedText Categorization

The combination is evaluated using P-norms

21
PersonalizedText Categorization

The composition exploits a pipeline of
classifiers

C1
C2
Cn
22
PersonalizedText Categorization

Particular care has been taken in limiting the
phenomenon of false negatives
The impact of false positives is reduced at the
composition level exponentially depending on the
number of the involved agents

23
Users Feedback

The user's feedback module is aimed at dealing
with any feedback provided by the user
Two solutions have been implemented
training an ANN
using a kNN classifier

24
The Proposed MAS

The Concrete Architecture

25
The PACMAS Architecture

A multiagent architecture designed to support the
development of applications aimed at
Retrieving heterogeneous data spread among
different sources
Filtering and organizing them to personal
interests explicitly stated by each user
Providing adaptation techniques to improve and
refine user profile

26
The PACMAS Architecture
Information Sources
Mid-span Levels
27
Implementation
28
Information Level

Information agents are aimed at performing
information extraction
a set of agents wraps italian online newspapers
containing articles in RSS and HTML format
an agent wraps the adopted generic taxonomy

29
Filter Level

Filter agents are aimed at performing text
categorization
removing all non-informative words
removing the most common morphological and
inflexional suffixes
weighting terms using the TF-IDF
selecting the relevant features
generating for each document a feature vector

30
Task Level

Task agents are aimed at performing text
categorization.
Agents perform
High level text categorization
embody a kNN classifier
measure the classification accuracy
Personalized text categorization
take into account users preferences automatically
composing topics

31
Interface Level

Interface agents are aimed at interacting with
the user and performing users' feedback

32
Experimental Results
33
Experimental Results

Task Agents have been trained by a set of
newspaper articles previously classified by
experts of the domain
For each item of the taxonomy
a set of 200 documents has been selected
KNN algorithm (with k 7) has been adopted
150 features have been used
random datasets have been generated

34
Experimental Results

For each item of the taxonomy, several tests have
been conducted
a set of 500 documents has been selected
KNN algorithm (with k 7) has been adopted
random datasets have been generated
The average accuracy of the system is 80.05

35
Experimental Results
36
Conclusions andFuture Work
37
Conclusions

We presented a system aimed at
retrieving articles from italian online
newspapers
classifying them using suitable machine learning
techniques
The system has been built upon PACMAS, a support
for implementing Personalized, Adaptive, and
Cooperative MultiAgent Systems

38
Future Work

A new release of the system will be improved by
adopting different classifier algorithms
We are investigating how to enhance the
feedback-related functionalities according to an
evolutionary computation framework

39
Thats all folks!

Write a Comment

User Comments (0)

About PowerShow.com

A MultiAgent System for Personalized Press Reviews - PowerPoint PPT Presentation

A MultiAgent System for Personalized Press Reviews

Distributed Agent-based Retrieval Tools The Future of Search Engine's Technology. ... HTML/XHTML wrapper. RSS wrapper. Information Extraction: the HTML wrapper ... – PowerPoint PPT presentation