A1257278591FWkbc - PowerPoint PPT Presentation

About This Presentation
Title:

A1257278591FWkbc

Description:

10.30: introductions, summary of background / skills. 10.40: mission, conventions, internal pages, ... GATE (the Volkswagen Beetle. of Language Processing) is: ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 10
Provided by: ham48
Category:

less

Transcript and Presenter's Notes

Title: A1257278591FWkbc


1
GATE technical workshop introduction http//gate
.ac.uk/ http//nlp.shef.ac.uk/ Hamish
Cunningham Sheffield, March 17/18, 2004
2
Agenda
  • Thursday (G30)
  • 10.30 API, CREOLE lifecycle, java for jape 1
    (vt)
  • 12.00 break
  • 12.15 tests, writing, running API etc. 2 (hc,
    vt)
  • 1.30 lunch
  • 2.30 corpora, evaluation tools (dm, kb)
  • 3.00 machine learning (vt)
  • 4.00 break
  • 4.15 ontologies (kb)
  • 5.15 wrapup
  • 5.30 close
  • Wednesday (G22)
  • 10.15 arrival, setup
  • 10.30 introductions, summary of background /
    skills
  • 10.40 mission, conventions, internal pages, GATE
    intro (hc)
  • 11.30 tools cvs, jbuilder, tkdiff, building
    GATE (vt)
  • 12.00 break
  • 12.15 intro to the GUI (dm)
  • 1.30 lunch
  • 2.30 annie, jape (dm)
  • 4.00 break
  • 4.15 summary of projects (hc)
  • 5.30 close

3
Blah
  • mission
  • conventions
  • mailing lists
  • roles and responsibilities

4
GATE (the Volkswagen Beetle of Language
Processing) is
  • Eight years old (!), with 000s of users at 00s of
    sites
  • An architecture A macro-level organisational
    picture for LE software systems.
  • A framework For programmers, GATE is an
    object-oriented class library that implements the
    architecture.
  • A development environment For language engineers,
    computational linguists et al, a graphical
    development environment.
  • Some free components... ...and wrappers for other
    people's components
  • Tools for evaluation visualise/edit
    persistence IR IE dialogue ontologies etc.
  • Free software (LGPL). Download at
    http//gate.ac.uk/download/

5
A bit of a nuisance (our users)
  • Thousands of users at hundreds of
  • sites. A representative sample
  • the American National Corpus project
  • the Perseus Digital Library project, Tufts
    University, US
  • Longman Pearson publishing, UK
  • Merck KgAa, Germany
  • Canon Europe, UK
  • Knight Ridder, US
  • BBN (leading HLT research lab), US
  • SMEs Melandra, SG-MediaStyle, ...
  • Imperial College, London, the University of
    Manchester, UMIST, the University of Karlsruhe,
    Vassar College, the University of Southern
    California and a large number of other UK, US and
    EU Universities
  • UK and EU projects inc. MyGrid, CLEF, dotkom,
    AMITIES, CubReporter, Poesia...
  • GATE team projects. Past
  • Conceptual indexing MUMIS automatic semantic
    indices for sports video
  • MUSE, cross-genre entitiy finder
  • HSL, Health-and-safety IE
  • Old Bailey collaboration with HRI on 17th
    century court reports
  • Multiflora plant taxonomy text analysis for
    biodiversity research e-science
  • EMILLE S. Asian language corpus
  • ACE / TIDES Arabic, Chinese NE
  • JHU summer w/s on semtagging
  • Present
  • Advanced Knowledge Technologies 12m UK five
    site collaborative project
  • ETCSL Sumerian digital library
  • MiAKT medical informatics / AKT
  • SEKT Semantic Knowledge Tech
  • PrestoSpace AV Preservation
  • KnowledgeWeb h-TechSight

6
  •                                                
                                                    
                               
  • Architectural principles
  • Non-prescriptive, theory neutral (strength and
    weakness)
  • Re-use, interoperation, not reimplementation
    (e.g. diverse XML support, integration of
    Protégé, Jena, Weka...)
  • (Almost) everything is a component, and component
    sets are user-extendable
  • (Almost) all operations are available both from
    API and GUI

7
All the worlds a Java Bean....
  • CREOLE a Collection of REusable Objects for
    Language Engineering
  • GATE components modified Java Beans with XML
    configuration
  • The minimal component 10 lines of Java, 10
    lines of XML, 1 URL
  • Why bother?
  • Allows the system to load arbitrary language
    processing components

8
GATE APIs
Onto-logy
ProtégéOnto-logy
Word- net
Gaz-etteers
...
Language Resource Layer (LRs)
  • NOTES (2)
  • eg Protégé LR VR both wrapped in Res. (bean)
    API
  • ontology repositories and inference should be the
    same KAON Sesame Orenge ?
  • NOTES
  • everything is a replaceable bean
  • all communication via fixed APIs
  • low coupling, high modularity, high
    extensibility

9
Happy Birthday Valy!
Write a Comment
User Comments (0)
About PowerShow.com