CMPS 561 Projects - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

CMPS 561 Projects

Description:

an IE application analyses texts and presents only the specific information from ... Types of IE: Examples. The shiny red rocket was fired on Tuesday. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 25
Provided by: zongh
Category:

less

Transcript and Presenter's Notes

Title: CMPS 561 Projects


1
CMPS 561 Projects
  • Zonghuan Wu, Shixian Chu, Jinfeng Chen

2
Agenda
  • Project Topics
  • Introduction to IE
  • Introduction to TreeMap
  • Contacts

3
Topics
  • Use GATE to Extract and/or use Treemap to
    visualize Web Data
  • Sports data
  • Entertainment data
  • HTML document
  • Recipes
  • Financial data
  • Coupons
  • Or, any interesting field (approval needed)

4
Challenges
  • Define Your Topics
  • Find data sources
  • Online Databases IMDB, http//www.databasesports.
    com/
  • Data Dumps (wikipedia, open directory project)
  • Forums (coupons, recipes)
  • Feeds (e.g. http//finance.yahoo.com/rssindex)
  • Crawler (e.g. http//crawler.archive.org/)
  • Information Extraction
  • GATE
  • HTML Parser (htmlparser.sourceforge.net)
  • Information Visualization
  • Treemap (http//www.cs.umd.edu/hcil/treemap/)

5
Treemap based Search Result Visualization
6
Ben Shneiderman and HCIL
7
Treemap Prior Arts
8
Treemap Prior Arts
9
Treemap - Prior Arts
People can analyze, compare and identify
information 25-50 faster using treemaps versus
tables people prefer treemaps to tables in
subjective ratings including ease-of-use,
clarity, and usefulness and people can learn to
use treemaps quickly without any training.
10
Treemap - Prior Arts
11
Treemap - Prior Arts
12
Treemap - Prior Arts
  • Circular Treemap

13
Treemap Prior Arts
14
Treemap Prior Arts
15
Treemap Prior Arts
16
Treemap Prior Arts
17
IE
  • Information Extraction (IE) is a technology based
    on analyzing natural language in order to extract
    snippets of information.
  • The process takes texts (and sometimes speech) as
    input and produces fixed-format, unambiguous data
    as output.
  • This data may be used directly for display to
    users, or may be stored in a database or
    spreadsheet for later analysis, or may be used
    for indexing purposes in Information Retrieval
    (IR) applications such as Internet search engines
    like Google.

18
IE vs. IR
  • Function
  • an IR system finds relevant texts and presents
    them to the user
  • an IE application analyses texts and presents
    only the specific information from them that the
    user is interested in.
  • Pros and Cons
  • More difficult, knowledge-intensive to build,
    tied to particular domains and scenarios
  • More computationally intensive
  • Potentially more efficient because of reducing
    the amount of time for people reading

19
Types of IE
  • Named Entity recognition (NE)
  • Finds and classifies names, places, etc.
  • Coreference resolution (CO)
  • Identifies identity relations between entities.
  • Template Element construction (TE)
  • Adds descriptive information to NE results (using
    CO).
  • Template Relation construction (TR)
  • Finds relations between TE entities.
  • Scenario Template production (ST)
  • Fits TE and TR results into specified event
    scenarios.

20
Types of IE Examples
  • The shiny red rocket was fired on Tuesday. It is
    the brainchild of Dr. Big Head. Dr. Head is a
    staff scientist at We Build Rockets Inc.
  • NE, finding entities rocket, Tuesday, Dr. Head
    and We Build Rockets Inc.
  • CO, which entities and references (such as
    pronouns) refer to the same thing it refers to
    the rocket.
  • TE, what attributes entities have the rocket is
    shiny red and that it is Heads brainchild.
  • TR, what relationships between entities there
    are Dr. Head works for We Build Rockets Inc.
  • ST, about events that the entities participate in
    there was a rocket launching event in which the
    various entities were involved.

21
(No Transcript)
22
(No Transcript)
23
Contact
  • Shixian Chu shixianchu_at_gmail.com
  • or sxc1825_at_cacs.louisiana.edu
  • Jinfeng Chen
  • jxc5466_at_cacs.louisiana.edu
  • Zonghuan Wu
  • 482-1667,
  • zwu_at_cacs.louisiana.edu
  • ACTR 206

24
Manipulate Wikipedia with JWPL
  • http//www.ukp.tu-darmstadt.de/software/jwpl/
  • Ali Elshaimaa E eea7236_at_cacs.louisiana.edu
  • Zonghuan Wu zwu_at_cacs.louisiana.edu
Write a Comment
User Comments (0)
About PowerShow.com