A MultiAgent System for Personalized Press Reviews - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

A MultiAgent System for Personalized Press Reviews

Description:

Distributed Agent-based Retrieval Tools The Future of Search Engine's Technology. ... HTML/XHTML wrapper. RSS wrapper. Information Extraction: the HTML wrapper ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 40
Provided by: crs4
Category:

less

Transcript and Presenter's Notes

Title: A MultiAgent System for Personalized Press Reviews


1
A MultiAgent System for Personalized Press Reviews
  • A. Addis, G. Cherchi,A. Manconi, and E. Vargiu
  • Intelligent Agents and Soft-Computing Group
  • Dept. of Electrical and Electronic Engineering
  • University of Cagliari
  • (Italy)

2
Outline
  • Introduction
  • The Proposed MAS
  • The Abstract Architecture
  • The Concrete Architecture
  • Experimental Results
  • Conclusions and Future Work

3
Introduction
4
Motivations
  • Internet offers a growing amount of data spread
    on heterogeneous sources (web services,
    distributed databases, web pages, etc.)
  • Very difficult for the users to select contents
    according to their personal interests

5
Motivations
I'm interested on concert musics in Sardinia
6
Motivations
  • Support the user through an automated system,
    able to
  • Retrieve and extract information from
    heterogeneous sources
  • Select the contents really deemed relevant for
    the user, according to her/his personal interests.

7
The Proposed Approach
  • A multiagent system able to
  • take into account users needs and
    preferences(Personalization)
  • adapt to changes occurring in the
    environment(Adaptation)
  • interact with other agents and the
    user(Cooperation)

8
The Proposed Approach
  • Why a multiagent system?
  • a centralized classification system may be
    quickly overwhelmed by a large and dynamic
    document stream (like daily-updated news)
  • internet is intrinsicly large and distributed,
    thus it gives the opportunity to take advantage
    of distributed computing paradigms and resources

9
The Proposed MAS
  • The Abstract Architecture

10
The AbstractArchitecture
  • Information Extraction
  • Text Categorization
  • Users Feedback

11
Information Extraction
  • The information extraction module extracts data
    from web sources through specialized wrappers
  • Two kinds of wrappers have been implemented
  • HTML/XHTML wrapper
  • RSS wrapper

12
Information Extractionthe HTML wrapper
  • The process of extracting data from HTML
    typically consists of two steps
  • learning page structure to detect the tags
    containing object
  • performing structured data extraction applying
    the mapping function to populate the
    corresponding data repository

13
Information Extractionthe RSS wrapper
  • Extracts information from online newpapers
    articles in RSS format
  • Being RSS a well-structured format, it is very
    simple to process RSS pages

14
Text Categorization
  • The Text Categorization module progressively
    filters information flowing from external sources
    to the end user
  • high level text categorization using a
    high-level taxonomy
  • personalized text categorization considering the
    user's needs and preferences

15
High LevelText Categorization
  • A fragment of the adopted taxonomy()
  • economia, affari e finanza
  • agricoltura
  • industria ed energia
  • computer e information technology
  • affari e servizi finanziari
  • macroeconomia
  • borsa e mercati finanziari
  • trasporti
  • politica
  • difesa
  • elezioni e referendum
  • governo
  • parlamento
  • partiti e movimenti
  • costituzione
  • politica interna

() a subset of the one proposed by the
International Press Telecommunications Council
16
High LevelText Categorization
  • Classifiers devised to perform text
    categorization have been implemented by using kNN
    and w-kNN algorithms
  • These techniques do not require specific training
    and are very robust wrt the impact of noisy data

17
High LevelText Categorization
  • A suitable encoding has been adopted
  • all non informative words are removed using a
    stop word list
  • a standard stemming algorithm removes the most
    common morphological and inflexional suffixes
  • for each class of the taxonomy, features
    selection, based on the information gain
    statistics, has been enforced

18
PersonalizedText Categorization
  • A set of arguments of interest for the user can
    be obtained by composing generic topics with
    suitable logical operators (i.e., and, or, not)
  • For instance, a user might be interested in being
    kept informed about all articles that involve
    both defense and government

19
PersonalizedText Categorization
  • Two ways of combining classifiers
  • Horizontal composition (combination)
  • Vertical composition (pipeline)

20
PersonalizedText Categorization
  • The combination is evaluated using P-norms

21
PersonalizedText Categorization
  • The composition exploits a pipeline of
    classifiers

C1
C2
Cn
22
PersonalizedText Categorization
  • Particular care has been taken in limiting the
    phenomenon of false negatives
  • The impact of false positives is reduced at the
    composition level exponentially depending on the
    number of the involved agents

23
Users Feedback
  • The user's feedback module is aimed at dealing
    with any feedback provided by the user
  • Two solutions have been implemented
  • training an ANN
  • using a kNN classifier

24
The Proposed MAS
  • The Concrete Architecture

25
The PACMAS Architecture
  • A multiagent architecture designed to support the
    development of applications aimed at
  • Retrieving heterogeneous data spread among
    different sources
  • Filtering and organizing them to personal
    interests explicitly stated by each user
  • Providing adaptation techniques to improve and
    refine user profile

26
The PACMAS Architecture
Information Sources
Mid-span Levels
27
Implementation
28
Information Level
  • Information agents are aimed at performing
    information extraction
  • a set of agents wraps italian online newspapers
    containing articles in RSS and HTML format
  • an agent wraps the adopted generic taxonomy

29
Filter Level
  • Filter agents are aimed at performing text
    categorization
  • removing all non-informative words
  • removing the most common morphological and
    inflexional suffixes
  • weighting terms using the TF-IDF
  • selecting the relevant features
  • generating for each document a feature vector

30
Task Level
  • Task agents are aimed at performing text
    categorization.
  • Agents perform
  • High level text categorization
  • embody a kNN classifier
  • measure the classification accuracy
  • Personalized text categorization
  • take into account users preferences automatically
    composing topics

31
Interface Level
  • Interface agents are aimed at interacting with
    the user and performing users' feedback

32
Experimental Results
33
Experimental Results
  • Task Agents have been trained by a set of
    newspaper articles previously classified by
    experts of the domain
  • For each item of the taxonomy
  • a set of 200 documents has been selected
  • KNN algorithm (with k 7) has been adopted
  • 150 features have been used
  • random datasets have been generated

34
Experimental Results
  • For each item of the taxonomy, several tests have
    been conducted
  • a set of 500 documents has been selected
  • KNN algorithm (with k 7) has been adopted
  • random datasets have been generated
  • The average accuracy of the system is 80.05

35
Experimental Results
36
Conclusions andFuture Work
37
Conclusions
  • We presented a system aimed at
  • retrieving articles from italian online
    newspapers
  • classifying them using suitable machine learning
    techniques
  • The system has been built upon PACMAS, a support
    for implementing Personalized, Adaptive, and
    Cooperative MultiAgent Systems

38
Future Work
  • A new release of the system will be improved by
    adopting different classifier algorithms
  • We are investigating how to enhance the
    feedback-related functionalities according to an
    evolutionary computation framework

39
Thats all folks!
Write a Comment
User Comments (0)
About PowerShow.com