Semantic Analysis of Text - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Semantic Analysis of Text

Description:

Documents are valuable source of information & knowledge in eGov and ... Magpie. Manual Annotation Tools. CRAM. ESF - IT Mangement. Bratislava, 7-8 August 2006 ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 23
Provided by: ZB
Category:

less

Transcript and Presenter's Notes

Title: Semantic Analysis of Text


1
Semantic Analysis of Text
  • Michal Laclavík
  • Ústav Informatiky SAV

2
Outline
  • Motivation
  • Graph and Object representations
  • Ontologies Organizational Memories
  • Context Content
  • Approaches and Technologies
  • Semantic Annotation
  • Vision of Semantic Web
  • Vision of Semantic Organization

3
Motivation
  • Documents are valuable source of information
    knowledge in eGov and also in eBusiness,
    eCommerce
  • Intranet, Internet systems HTML documents
  • Electronic communication organizational task
    and processes text
  • Content Context

4
Structured Information
  • Text not structured not understandable for
    computers
  • Goal to provide structured information
  • Based on structured information it is possible to
    provide knowledge
  • Types of structured knowledge
  • XML based versus others (Object database, RDBM,
    logic based, rule-based)
  • RDF
  • OWL ontologies, OWL-DL

5
Organizational Memories
  • Metadata about Resources
  • Documents
  • People
  • Customers
  • .
  • Based on Ontologies Semantic Web Technologies
  • Tied with Knowledge Management Solution

6
Context
  • We need to detect context of information
    knowledge in order to provide it in user specific
    context
  • When, by who, why was document created
  • In what activity, what process, what category
  • Not only text of Documents is relevant !
  • Should be tied with user context analysis
  • Workflow, email, file system - computerized
    activities
  • Context detection - from electronic communication
    channels analysis

7
Content
  • We need to detect content of document
  • Category
  • What document talks about
  • Semantic of the document
  • Analysis of Text
  • Finding objects from problem domain in text

8
Existing Solutions
  • Mainly stamping HTML documents for reader
  • Annotea, Ruby annotation,
  • Often creating own structured XML data or new
    ontology
  • Specialized Browsers
  • Magpie
  • Manual Annotation Tools
  • CRAM

9
Manual Annotation Browsing
10
Semi-Automatic Annotation
  • Natural Language Processing Training sets
    Learning
  • Neural Networks, Genetic Algorithms, Hidden
    Markow Models, ..
  • Document Structure Based
  • useful for HTML
  • Wrappers
  • Learning of Structure patterns
  • Pattern Recognition
  • C-PANKOW
  • Ontea

11
C-PANKOW
  • Pattern matching
  • Google API request
  • Creating ontology individual of most typical
    class (type)

Is a pattern for Niger by Google API
12
C-PANKOW
  • QTag
  • Detection of Nouns
  • plural proper noun (NPS)
  • plural common noun (NNS)
  • common noun (NN)
  • proper noun (NP)
  • adjective (JJ)
  • interjection UH

13
Ontology based Text Annotation OnTeADeveloped
at IISAS
14
Motivation Objectives
  • Detecting Meta data from Text
  • Preparing improved structured data for later
    computer processing
  • Structured data are based on application ontology
    model


gt
15
Domain Ontology Job Offers
16
Ontea Architecture
  • Regular expressions applied to find individuals
    corresponding with text
  • Text Class in Ontology gt New individual is
    created output individual
  • Output individual can have properties of certain
    types
  • Individuals found in text are properly assigned
    as properties of result individual
  • Inference is used

17
Simple Pattern Ontology
  • Can be used for any Application
  • Can be extended

18
Results - Jobs
  • Creation of Jobs Metadata
  • Same structure for each document
  • Can be used for processing, presentation

19
Success rate
2
1
3
  • 500 documents
  • Found more elements

20
Conclusion on Annotation
  • text gt representation of the text in the domain
    ontology model
  • Used in web job offer documents application and
    Flood pred. application
  • We are applying it in email analysis app

21
Vision of Semantic Web
  • The Semantic Web is a mesh of information linked
    up in such a way as to be easily processable by
    machines, on a global scale. You can think of it
    as being an efficient way of representing data on
    the World Wide Web, or as a globally linked
    database.(Source http//infomesh.net/2001/swint
    ro/ - The Semantic Web An Introduction)

22
Vision of Semantic Organization
  • To have all information and data available for
    computer processing via Semantic Web technology
    (XML, RDF, OWL)
  • Ontology translation not so important on one
    domain
  • Document and Text analysis results using Semantic
    annotation are part of this
  • Conversion or mapping of RDBMS to XML/RDF/OWL is
    another problem
Write a Comment
User Comments (0)
About PowerShow.com