- PowerPoint PPT Presentation

About This Presentation
Title:

Description:

GATE, a General Architecture for Text Engineering. Hamish Cunningham, Kalina Bontcheva ... The SW: machine processable, repurposable data to compliment hypertext ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 21
Provided by: ham48
Category:
Tags: compliment

less

Transcript and Presenter's Notes

Title:


1
  •                                                
                                                      
                             
  • GATE, a General Architecture for Text Engineering
  • Hamish Cunningham, Kalina BontchevaDepartment of
    Computer Science,
  • University of Sheffield
  • Wednesday October 30th 2002
  • Next generation web
  • GATE, language technology infrastructure

1(20)
2
  •                                                
                                                      
                             
  • A Ubiquitous Permeable Web
  • The next generation of the web must be
  • ubiquitous semantics for every device, every
    organisation, every individual
  • permeable allow contextual data to penetrate
    and persist
  • companionable able to engage with us via
    multiple natural modalities.
  • Roles for Language Technology
  • discovery of semantics (ubiquity)
  • mediating between context and personal semantic
    memories (permeability)
  • conversing with people and the semantic web
    (companionableness).

2(20)
3
  •                                                
                                                    
                               
  • Critical Mass for the Semantic Web
  • The SW machine processable, repurposable data
    to compliment hypertext
  • But semantics 0.0000000... of the Web
  • How to achieve critical mass? Huge scale
    automatic annotation. Requirements
  • Huge scale freely available to all EU
    citizens distributed (over a Grid)
    re-purposeable (delivered as Web Services)
  • Portability and robustness via simple and
    therefore shallow HLT methods ve and ve
    learning analogs of IPSEs for
    computer-literate users

3 (20)
4
  •                                                
                                                    
                               
  • Motivation for Software Infrastructure for
    Language Engineering
  • Need for scalable, reusable, and portable HLT
    solutions
  • Support for large data, in multiple media,
    languages, formats, and locations
  • Lowering the cost of creation of new language
    processing components
  • Promoting quantitative evaluation metrics via
    tools and a level playing field

4 (20)
5
                                               
                                                  
                          Motivation (II)
software lifecycle in collaborative
research Project Proposal We love each other.
We can work so well together. We can hold
workshops on Santorini together. We will solve
all the problems of AI that our predecessors were
too stupid to. Analysis and Design Stop work
entirely, for a period of reflection and
recuperation following the stress of attending
the kick-off meeting in Luxembourg. Implementatio
n Each developer partner tries to convince the
others that program X that they just happen to
have lying around on a dusty disk-drive meets the
project objectives exactly and should form the
centrepiece of the demonstrator. Integration and
Testing The lead partner gets desperate and
decides to hard-code the results for a small set
of examples into the demonstrator, and have a
fail-safe crash facility for unknown input
("well, you know, it's still a prototype..."). Ev
aluation Everyone says how nice it is, how it
solves all sorts of terribly hard problems, and
how if we had another grant we could go on to
transform information processing the World over
(or at least the European business travel
industry).
2(20)
6
  •                                                
                                                    
                               
  • GATE, a General Architecture for Text Engineering
  • An architectureA macro-level organisational
    picture for LE software systems.
  • A frameworkFor programmers, GATE is an
    object-oriented class library that implements the
    architecture.
  • A development environmentFor language
    engineers, computational linguists et al, GATE is
    a graphical development environment bundled with
    a set of tools for doing e.g. Information
    Extraction.
  • Some free components... ...and wrappers for
    other people's components
  • Tools for evaluation visualise/edit
    persistence IR IE dialogue ontologies etc.
  • Free software (LGPL). Download at
    http//gate.ac.uk/download/

6 (20)
7
  •                                                
                                                    
                               
  • Architectural principles
  • Non-prescriptive, theory neutral (strength and
    weakness)
  • Re-use, interoperation, not reimplementation
    (e.g. diverse XML support, integration of tools
    like Protégé, Jena and Weka)
  • (Almost) everything is a component, and
    component sets are user-extendable
  • Component-based development
  • An OO way of chunking software Java Beans
  • GATE components CREOLE modified Java Beans
    (Collection of REusable Objects for Language
    Engineering)
  • The minimal component 10 lines of Java, 10
    lines of XML, 1 URL.

7 (20)
8
  •                                                
                                                    
                               
  • GATE Language Resources
  • GATE LRs are documents, ontologies, corpora,
    lexicons,
  • Documents / corpora
  • GATE documents loaded from local files or the
    web...
  • Diverse document formats text, html, XML,
    email, RTF, SGML.
  • Processing Resourcres
  • Algorithmic components knows as PRs beans with
    execute methods.
  • All PRs can handle Unicode data by default.
  • Clear distinction between code and data (simple
    repurposing).
  • 20-30 freebies with GATE
  • e.g. Named entity recognition WordNet Protégé
    Ontology OntoGazetteer DAMLOIL export
    Information Retrieval based on Lucene

8 (20)
9

ANNIE

Named entity
Core- ference
Document content Document metadata Document
format data Linguistic data
POS tagger

Named entity

Event extraction

A Language AnalysisExample
Custom application 1
Relational Database
File storage
Oracle/ PostgresQL
10
                                               
                                                
                           
                                                  
                                 
10(11)
11
  •                                                
                                                    
                               
  • Building IE Components in GATE (1)
  • The ANNIE system a reusable and easily
    extendable set of components

11 (20)
12
  •  Building IE Components in GATE (2)
  • JAPE a Java Annotation Patterns Engine
  • Light, robust regular-expression-based
    processing
  • Cascaded finite state transduction
  • Low-overhead development of new components
  • Rule Company1
  • Priority 25
  • (
  • ( Token.orthography upperInitial )
  • Lookup.kind companyDesignator
  • )companyMatch
  • --gt
  • companyMatch.NamedEntity kind
    company, rule Company1

12 (20)
13
  •  Performance Evaluation
  • At document level annotation diff
  • At corpus level corpus benchmark tool
    tracking systems performance over time

13 (20)
14
The Semantic Web and GATE
  • GATE is being used for development of
    (semi-)automatic methods for
  • linking web pages to Ontologies using
    Information Extraction
  • learning and evolving Ontologies via IE and
    lexical semantic network traversal.

14 (20)
15
Populating Ontologies with IE
16
Protégé and Ontology Management
17
  •                                                
                                                    
                               
  • Information Retrieval Support
  • Based on the Lucene IR engine

17 (20)
18
                                               
                                                
                            Processing
Multilingual Data All the visualisation and
editing tools for ML LRs use enhanced Java
facilities
18 (20)
19
  •                                           Applic
    ations
  • GATE has been used for a variety of applications,
    including
  • MUMIS automatic creation of semantic indexes
    for multimedia programme material
  • MUSE a multi-genre IE system
  • Metadata for Medline (at Merck)
  • ACE participation in the Automatic Content
    Extraction programme
  • HSE summarisation of health and safety
    information from company reports
  • OldBaileyIE NE recognition on 17th century Old
    Bailey Court reports.
  • AKT language technology in knowledge management
  • AMITIES call centre automation
  • Various Medical Informatics and database
    technology projects
  • IE in Romanian, Bulgarian, Greek, Bengali,
    Spanish, Swedish, German, Italian, and French
    (Arabic, Chinese and Russian this autumn)

19 (20)
20
  •                                                
                                                    
                               
  • Conclusion
  • GATE an infrastructure that lowers the overhead
    of creating embedding robust NLP components
  • Further information http//gate.ac.uk/
  • Online demos, tutorials and documentation
  • Software downloads
  • Talks and papers

20 (20)
Write a Comment
User Comments (0)
About PowerShow.com