CSC 9010: AeroText, Ontologies, AeroDAML - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

CSC 9010: AeroText, Ontologies, AeroDAML

Description:

Information Extraction requires modeling extensive domain knowledge. Other applications of text mining, such as document categorization, can also use ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 11
Provided by: BEN764
Category:

less

Transcript and Presenter's Notes

Title: CSC 9010: AeroText, Ontologies, AeroDAML


1
CSC 9010 AeroText, Ontologies, AeroDAML
  • Dr. Paula Matuszek
  • Paula_A_Matuszek_at_glaxosmithkline.com
  • (610) 270-6851

2
AeroText
  • Information Extraction tool marketed by Lockheed
    Martin
  • Capabilities similar to GATE
  • Much better developed IDE
  • Less open to extensions of the system itself.
  • Equally steep learning curve for effective use!
  • Lockheed AeroText General Overview
  • Lockheed AeroText White Paper

3
AeroText Demo
4
Ontologies
  • Information Extraction requires modeling
    extensive domain knowledge
  • Other applications of text mining, such as
    document categorization, can also use domain
    information
  • In modeling such knowledge we often create an
    ontology An explicit formal specification of
    how to represent the objects, concepts, and other
    entities that are assumed to exist in some area
    of interest and the relationships that hold among
    them.

5
A Simple Ontology Birthdates
  • Objects, concepts, entities
  • Months, days, years
  • dates
  • first names
  • last names
  • persons
  • birthdates
  • Relationships between them
  • a date has exactly one month, day, year
  • a birthdate is a date
  • a person has at least 1 first name and exactly 1
    last name
  • a person has a birthdate
  • a birthdate has a person

6
Who and Why?
  • Many groups are developing ontologies
  • standardize terms and vocabulary
  • facilitate the semantic web
  • improve information integration
  • interested in the domain itself
  • Some ontologies under development
  • Cyc
  • GO (Gene ontology)
  • UMLS (Unified Medical Language System)
  • CIA World Factbook

7
DAML
  • DARPA Agent Markup Language
  • A language for describing ontologies
  • Example an ontology for dates
  • Extensive information available at www.daml.org.

8
UBOT
  • UML Based Ontology Toolkit
  • Part of a DARPA project to automatically mark up
    web pages to make them
  • The purpose of DAML is to annotate information on
    the web to make it machine-readable so that
    software agents can interpret it and reason with
    it the semantic web
  • http//ubot.lockheedmartin.com/ubot/intro/index.ht
    ml

9
AeroDAML
  • AeroDAML is a web service that takes a web page
    as an input and generates DAML markup.
  • Uses AeroText as the underlying extraction tool.
  • Works with various ontologies.
  • Paper describing system

10
Lab try out AeroDAML
  • AeroDAML page
  • Choose a news page (www.phillynews.com, Google
    News, ...) and tag it with the Cyc and CIA
    ontologies.
  • How well did each ontology do at picking up
    content? Did they miss things they should have
    found? Was anything tagged incorrectly?
  • Repeat for one of your domain-specific documents,
    or a web page in a specific area. Try a different
    ontology if you think one of the others might be
    more interesting.
  • How was the annotation different?
  • Are we enabling the semantic web?
Write a Comment
User Comments (0)
About PowerShow.com