Title: ESSLLI 2006 Summer school
1 - ESSLLI 2006 Summer school
- Malaga, Spain
- 31 July 11 August
2General Comments
- PLUS
- Courses on time
- Proceedings of all courses
- Workshops
- Student sessions
- Internet connection
- MINUS
- Not well organized
- Site not updated on time
- Lunch tickets
3Courses
- Counting Words An Introduction to Lexical
Statistics - Formal Ontology for Communicating Agents
(Workshop) - Word Sense Disambiguation
- Introduction to Corpus Resources, Annotation
Access - An Empirical View on Semantic Roles Within and
Across Languages - Approximate Reasoning for the Semantic Web
4Counting WordsMarco Baroni and Stefan Evert
- Contents
- Introduction
- Distributions
- Zipfs Law
- The ZipfR package
- Practical Consequences and Conclusion
5Introduction
- The frequency of words plays an important role in
corpus linguistics. - The study of word frequency distributions is
called Lexical Statistics. - It seems that word frequency distributions are
more of interest to theoretical physicists than
to theoretical linguists. - This course introduces some of the empirical
phenomena pertaining to word frequency
distributions and the classic models that have
been proposed to capture them.
6DistributionsBasic Terminology
- Types distinct words
- Tokens instances of all distinct words
- Corpus size (N) number of tokens in the corpus
- Vocabulary size (V) number of types
- Frequency list list that reports the number of
tokens of each type in the corpus - Rank/Frequency profile replace the types with
the frequency ranks - Frequency Spectrum a list reporting how many
types in a frequency list have a certain frequency
7DistributionsExample
- Sample a b b c a a b a d
- N9, V4
- Freq. list rank/freq. prof. Freq.
spect.
type f
a 4
b 3
c 1
d 1
rank f
1 4
2 3
3 1
4 1
f V(f)
1 2
3 1
4 1
8DistributionsTypical frequency patterns
- Top ranks are occupied by function words (the,
of, and..) - Frequency decreases quite rapidly
- The lowest frequency elements are content words
9Zipfs Law
- The frequency is a non-linear decreasing function
of rank. - Zipfs model f(w)C/r(w)a
- The model predicts a very rapid decrease in
frequency among the most frequent words, which
becomes slower as the rank grows. - Mathematical property
- logf(w)logC-alogr(w) (Linear function)
10Zipfs LawApplications and explanations
- Zipfian distributions are encountered in various
phenomena - City populations
- Incomes in economics
- Frequency of citations of scientific papers
- Visits to web sites
- Least effort principle
11ZipfR Package
- Statistical package for modeling lexical
distributions. - url http//www.purl.org/stefan.evert/zipfR
- Dependencies the R package
- url http//www.r-project.org
- Binaries available for Win and MacOS.
- Source available for Linux.
- Open source, GNU Licensed project.
12Practical Consequences and Conclusion
- The Zipfian nature of word frequency distribution
causes data sparseness problems. - Although V is growing with corpus size, we cannot
use it as a measure of lexical richness when
comparing corpora. - Interested readers should proceed to Baayen(2001)
for a thorough introduction to word frequency
distributions with an emphasis to statistical
modeling.
13References
- Abney, Steven (1996), Statistical methods and
linguistics. In Klavans, J. Resnik, P. (eds)
The balancing act Combining symbolic and
statistical approaches to language. Cambridge MA
MIT Press, 1-23. - Baayen, Harald (2001), Word frequency
distributions. Dordrecht Kluwer - Baldi, Pierre/Frasconi, Paolo/Smyth, Padhraic
(2003), Modeling the internet and the web.
Chichester Wiley - Biber, Douglas/Conrad, Susan/Reppen, Randi
(1998), Corpus linguistics. Cambridge Cambridge
University Press - Creutz, Mathias (2003), Unsupervised segmentation
o words using prior distributions of morph length
and frequency. In Proceedings of ACL 03, 280-287 - Delgaard, Peter (2002), Introductory statistics
with R. New York Springer - Evert, Stefan (2004), The statistics of word
co-occurrences Word pairs and collocations.PhD
thesis, University of Stuttgard/IMS
14References
- Evert, Stefan/Baroni, Marco (2006), Testing the
extrapolation quality of word frequency models.
In Proceedings of Corpus Linguistics 2005,
available from http//www.corpus.bham.ac.uk./PCLC - Li, Wentian (2002), Zipfs Law everywhere. In
Glottometrics 5, 14-21 - Manning, Christopher/Schutze, Hinrich (1999),
Foundations of statistical natural language
processing. Cambridge MA MIT Press - McEnery, Tony and Andrew Wilson (2001), Corpus
Linguistics, 2nd edition. Edinburgh Edinburgh
University Press - Oakes, Michael (1998), Statistics for corpus
linguistics. Edinburgh Edinburgh University
Press - Sampson Geoffrey (2002), Review of Harald Baayen
Word frequency distributions. In Computational
Linguistics 28, 565-569 - Zipf, George Kingsley (1949), Human behavior and
the principle of least effort. Cambridge MA
Addison-Wesley - Zipf, George Kingsley (1965), The psycho-biology
of language. Cambridge MA MIT Press
15Formal Ontology for Communicating Agents
(FOCA)Workshop
- Contents
- Introduction
- Communicative acts
- The missing ontological link
- Semantic Coordination
- A Communication Acts Ontology for Software Agents
Interoperability - OWL DL as a FIPA ACL content Language
16Introduction
- Purpose of the workshop
- To gather contributions that
- Take seriously into account the ontological
aspects of communication and interaction - Use formal ontologies for achieving a better
semantic coordination between interacting and
communicating agents
17IntroductionCommunicative acts
- According to Austin, 3 kinds of acts can be
performed simultaneously through a single
utterance - Locutionary act producing noises that conform to
a system - Illocutionary act what is performed in saying
something - Perlocutionary act what is performed by saying
something - An important issue is the distinction between the
last two acts.
18IntroductionThe missing ontological link
- Ontological ingredients
- Events, states, actions, speech acts, relations,
plans, propositions, arguments, facts,
commitments,.. - Top-level ontologies focus on the sub-domain of
concrete entities, like time, space,.. - There is a need for the integration of the large
amount of the philosophical work on other domains
like that of abstract entities.
19IntroductionSemantic Coordination
- An important aspect of interaction and
communication involves the management of
ontologies. - Scenaria identified w.r.t. semantic coordination
- With a shared pre-existing ontology
- With different ontologies but linked to a
pre-existing common upper level ontology - With different ontologies but mapped directly
onto each other - When agents are involved
- Keep static ontologies but manage a shared
dynamic one - Create new static ontologies through a
negotiation phase - Modify their ontology during the interaction
while maintaining some kind of negotiation
meaning
20A Communication Acts Ontology for Software Agents
Interoperability
- Different classes of communication acts to each
ACL. - The use of an agreed ontology can open a
possibility of real agents interoperation based
on a wide agreement on some classes of
communication acts that will serve as a bridge
among different ACL islands - Main design criterion follow the speech act
theory and also embed an approach for expressing
the semantics of the communication acts - Use the OWL DL language
21A Communication Acts Ontology for Software Agents
Interoperability
- Upper layer
- CommunicationAct ? ?hasSender.Actor ?
1.hasSender ? ?hasReceiver.Actor ?
?hasContent.Content - Assertive ? CommunicationAct ? ?hasContent.Proposi
tion ? ?hasCommit.AssertiveCommitment - Directive ? CommunicationAct ? ?hasContent.Action
?hasCommit.DirectiveCommitment - Commisive ? CommunicationAct ? ?hasContent.Action
?hasCondition.Proposition ? ?hasCommit.CommissiveC
ommitment - Expressive ? CommunicationAct ?
?hasContent.Proposition ? ?hasState.PsyState
?hasCommit.ExpressiveCommitment - Declarative ? CommunicationAct ?
?hasContent.Proposition
22A Communication Acts Ontology for Software Agents
Interoperability
- The Standards Layer extends the Upper Layer with
terms representing classes of communication acts
of general purpose ACLs, like FIPA-ACL. - The Applications Layer is the most specific.
Defines communication acts classes for a specific
application. - Concluding Classes in the upper layer are
considered the framework agreement for general
communication. Classes in the standard layer
reflect classes of communication acts that
different standard ACLs define. Classes in the
application layer concern the particular
communication acts used by each agent system
committing to the ontology.
23References
- J. L. Austin. How to Do Things With Words. Oxford
University Press. Oxford, 1962 - J. R. Searle. Speech Acts An Essay in the
Philosophy of Language. Cambridge University
Press. New York, 1969 - M. P. Singh. Agent Communication Languages
Rethinking the Principles. IEEE Computer, vol.31,
num.12, pp.40-47, 1998 - M. Wooldridge. Semantic Issues in the
Verification of Agent Communication Languages.
Journal of Autonomous Agents and Multi-Agent
Systems, vol.3, num.1, pp.9-31, 2000 - Y. Labrou, T. Finin, Y. Pen. Agent Communication
Languages the Current Landscape. IEEE
Intelligent Systems, vol.14, num.2, pp.45-52,
1999 - M. P. Singh. A Social Semantics for Agent
Communication Languages. Issues in Agent
Communication, pp.31-45. Spinger-Verlag, 2000 - FIPA Communicative Act Library Specification.
Foundation For Intelligent Physical Agents, 2005.
http//www.fipa.org/specs/fipa00037/SC00037J.html
24References
- N. Asher and A. Lascarides. Logics of
Conversation. Cambridge University Press, 2003 - S. Levinson. Pragmatics. Cambridge University
Press, 1983 - J.R. Searle and D. Vanderveken. Foundations of
illocutionary logic. Cambridge University Press,
1975 - J.R. Searle. The Construction of Social Reality.
Free Press, New York, 1995 - R. Stalnaker. Assertion. Syntax and Semantics,
9315-332, 1978 - J. Ginzburg. Dynamics and the Semantics of
Dialogue. CSLI Stanford, 1996 - H. H. Clark. Using Language. Cambridge University
Press, 1996 - S. Carberry. Plan Recognition in Natural Language
Dialogue. MIT Press, 1990
25OWL DL as a FIPA ACL Content Language
- FIPA-SL content language is in general
undecidable. - Use OWL DL in order to enable semantic validation
in the content of the ACL message and to separate
speech act semantics from content semantics. - Their ontology defines some of the FIPA
specifications (message structure, ontology
service, content language, communicative act lib)
26OWL DL as a FIPA ACL Content Language
- Advantages
- Application ontologies are domain independent.
They can be applied to a MAS in different
domains. - Various application ontologies in OWL DL are
available. This shows a great potential for
reusing already formulated ontologies. - W3C suggests the use of OWL within agents.
27References
- Eric Miller et al. Web Ontology Language (OWL),
2004 - RACER Systems GmbH. The features of racerpro
version 1.9, 2005 - Foundation for Intelligent Physical Agents. FIPA
ACL Message Structure Specification, 2002 - Foundation for Intelligent Physical Agents. FIPA
Ontology Service Specification, 2001 - Foundation for Intelligent Physical Agents. FIPA
SL Content Language Specification, 2002 - Foundation for Intelligent Physical Agents. FIPA
Communicative Act Library Specification, 2002 - Web Ontology Working Group. OWL Web Ontology
Language Use Cases and Requirements, 2004 - Giovani Caire. JADE Introduction AAMAS 2005, 2005
28Introduction to Corpus Resources, Annotation
AccessSabine Schulte im Walde and Heike
Zinsmeister
- Contents
- Basic definitions
- Corpora
- Annotation
- Tokenization Morpho-Syntactic Annotation
29Introduction to Corpus Resources, Annotation
Access
- Basic Definitions
- Linguistics Characterization and explanation of
linguistic observations - Corpus Any collection of more than one text
- Annotation The practice of adding
interpretative, linguistic information to an
electronic corpus of spoken and/or written
language
30Corpora
- Corpora give only a partial description of a
language - They are incomplete
- (e.g. Brown corpus does not include vocabulary
related to WWW and e-mail) - They are biased
- They include ungrammatical sentences
- (e.g. typos, copy-and-paste errors, conversion
errors) - We have to sample a corpus according to some
design criteria such that it is balanced and
representative for a specific purpose
31Annotation
- Levels
- POS tags
- Lemmata
- Senses
- Semantic roles
- Named Entities
- Topic
- Co reference
- Principles
- The raw corpus should be recoverable
- Annotation should be extricable from the corpus
- Easy access to documentation
- Annotation scheme
- How, where, by whom the annotation was applied
32Tokenization and Morpho-Syntactic Annotation
- Tokenization divides the raw input character
sequence of a text into sentences and the
sentences into tokens - Problems
- Language dependent task
- Sentence boundaries
- Numbers
- Abbreviations
- Capitalization
- Hyphenation
- Multiword expressions
- Clitics
- ? So.. We need to apply disambiguation methods
33Tokenization and Morpho-Syntactic Annotation
- Part-Of-Speech Tagging (POS tagging) The task of
labeling each word in a sequence of words with
its appropriate part-of-speech. - Performs a limited syntactic disambiguation
- Context helps to disambiguate tags
- Tagset A set of part-of-speech tags
- Classical 8 classes noun, verbs, article,
participle, pronoun, preposition, adverb,
conjunction
34Tokenization and Morpho-Syntactic Annotation
- Morphology morphology is concerned with the
inner structure of words and the formation of
words from smaller units. - Root the morphem of the word
- Stemming A process that strips off affixes and
leaves the stem. - Lemmatization A process that gives the lemma of
a word. Includes disambiguation at the level of
lexemes, depending on the part-of-speech. - Co reference is the reference in one expression
to the same referent in another expression - Anaphora is co reference of one expression with
its antecedent
35References
- Tony McEnery (2003). Corpus Linguistics. In The
Oxford Handbook of Computational Linguistics,
pp.448-463. Oxford University Press - Tony McEnery and Andrew Wilson (2001). Corpus
Linguistics. 2nd edition. Edinburgh University
Press, chapter 1 - Sue Atkins, Jeremy Clear and Nicholas Ostler
(1992). Corpus Design Criteria. In Literary and
Linguistic Computing, 7(1)1-16 - Nancy Ide (2004). Preparation and Analysis of
Linguistic Corpora. In Schreibman, S., Siemens,
R., Unsworth, J., eds. A Companion to Digital
Humanities. Blackwell - Geoffrey Leech (1997). Introducing Corpus
Annotation. In Richard Garside, Geoffrey Leech
and Tony McEnery, eds. Corpus Annotation.
Longmanm pp.1-18 - Geoffrey Leech (2005). Adding Linguistic
Annotation. In Developing Linguistic Corpora A
Guide to good Practice, ed. M. Wynne. Oxford
Oxbow Books, pp. 17-29. Available online from
http//ahds.ac.uk./linguistic-corpora/ - Gregory Grefenstette and Pasi Tapanainen (1994)
What is a word, what is a sentence? Problems of
tokenization. In Proceedings of the 3rd
Conference on Computational Lexicography and Text
Research.
36References
- Andrei Mikheev (2003) "Text segmentation". In
Ruslan Mitkov, editor, "The Oxford Handbook of
Computational Linguistics", pp. 376-394. Oxford
University Press. - Helmut Schmid (2007?) "Tokenizing". In Anke
Lüdeling and Merja Kytö, editors, "Corpus
Linguistics. - An International Handbook. Mouton de Gruyter,
Berlin. - Christopher D. Manning and Hinrich Schütze
(1999) Foundations of Statistical Natural
Language Processing, chapter 10. MIT Press. - Atro Voutilainen (2003) Part-of-speech
tagging". In Ruslan Mitkov, editor, "The Oxford
Handbook of Computational Linguistics", pp.
219-232. Oxford University Press. - John Carroll, Guido Minnen, and Ted Briscoe
(1999) Corpus annotation for parser
evaluation. In Proceedings of LINC. Bergen. - Ruslan Mitkov, Richard Evans, Constantin Orasan,
Catalina Barbu, Lisa Jones, and Violeta Sotirova
(2000) Coreference and anaphora developing
annotating tools, annotated resources and
annotation strategies. In Proceedings of the
Discourse, Anaphora and Reference Resolution
Conference, pp. 49-58. - Eva Hajicová, Jarmila Panevová, and Petr Sgall
(2000) "Coreference in annotating a large
corpus". In Proceedings of the 2nd International
Conference on Language Resources and Evaluation,
pp. 497-500.
37Approximate Reasoning for the Semantic WebFrank
van Harmelen, Pascal Hitzler and Holger Wache
- Contents
- Semantic Web the Vision
- Ontologies
- XML
- W3C Stack
- Beyond RDF OWL
- Why Approximate Reasoning
- Reduction of use-cases to reasoning methods
38Semantic Web the Vision
- Semantic Web Web of Data
- Set of open, stable W3C standards
- Intelligent things we cant do today
- Search engines concepts, not keywords
- Personalization
- Web Services need semantic characterizations to
find them, to combine them - Requirement Machine Accessible Meaning
39Ontologies
- Ontologies ARE shared models of the world
constructed to facilitate communication - Ontologies ARE NOT definitive descriptions of
what exists in the world (this is philosophy) - Whats inside an ontology?
- Classes
- Instances
- Values
- Inheritance
- Restrictions
- Relations
- Properties
- We need a machine representation
40XML
- What was XML again?
- ltcountry nameGreecegt
- ltcapital nameAthensgt
- ltareacodegt210lt/areacodegt
- lt/capitalgt
- lt/countrygt
- Why not use XML ??
- No agreement on
- Structure
- Is country a
- Object?
- Class?
- Attribute?
- Relation?
- What does nesting mean?
- Vocabulary
- Is country the same as nation ?
country
name
capital
Greece
name
areacode
Athens
210
41W3C Stack
- XML
- Surface syntax, no semantics
- XML Schema
- Describes structure of XML documents
- RDF
- Datamodel for relations between things
- RDF Schema
- RDF Vocabulary Definition Language
- OWL
- A more expressive Vocabulary Definition Language
42Beyond RDF OWL
- OWL extends RDF Schema to a full-fledged ontology
representation language. - Domain / range
- Cardinality
- Quantifiers
- Enumeration
- Equality
- Boolean Algebra
- Union, complement
- OWL is simply a Description Logic SHOIN(D) with
an RDF/XML syntax. - 3 Flavors OWL Lite, OWL DL, OWL Full
43Why Approximate Reasoning
- Current inference is exact
- yes or now
- This was OK, because until now ontologies were
clean - Hand-crafted, well-designed, carefully populated,
well maintained, - BUT, ontologies will be sloppy
- Made by machines
- (e.g. almost subClassOf)
- Mapping ontologies is almost always messy
- (e.g. almost equal)
44Reduction of use-cases to reasoning methods
- Realization (member of)
- Subsumption (subclass-relation)
- Mapping (similar to)
- Retrieval (has member)
- Classification (locate in hierarchy)
- GOAL
- Find approximation methods for the reasoning
methods - Many reasoning methods can be reduced to
satisfiability - GOAL find approximation methods for
satisfiability
45References
- Cadoli and Schaerf, 1995 Marco Cadoli and Marco
Schaerf. Approximate inference in default
reasoning and circumscription. Fundamenta
Informaticae, 23123143, 1995. - Cadoli et al., 1994 Marco Cadoli, Francesco M.
Donini, and Marco Schaerf. Is intractability of
non-monotonic reasoning a real drawback? In
National Conference on Artificial Intelligence,
pages 946951, 1994. - Dalal, 1996a M. Dalal. Semantics of an anytime
family of reasoners. In W. Wahlster, editor,
Proceedings of ECAI-96, pages 360364, Budapest,
Hungary, August 1996. John Wiley Sons LTD. - Motik, 2006 B. Motik. Reasoning in Description
Logics using Resolution and Deductive Databases.
PhD thesis, Universität Karlsruhe (2006) - Schaerf and Cadoli, 1995 Marco Schaerf and
Marco Cadoli. Tractable reasoning via
approximation. Artificial Intelligence,
74249310, 1995. - Zilberstein, 1993 S. Zilberstein. Operational
rationality through compilation of anytime
algorithms. PhD thesis, Computer science
division, university of California at Berkley,
1993. - Zilberstein, 1996 S. Zilberstein. Using anytime
algorithms in intelligent systems. Artificial
Intelligence Magazine, fall7383, 1996.
46Word Sense DisambiguationRada Mihalcea
- Outline
- Some Definitions
- Basic Approaches Intro
- Basic Approaches In more Detail
- Some Examples
47Word Sense Disambiguation
- Word Sense Disambiguation is the problem of
selecting a sense for a word from a set of
predefined possibilities (Sense Inventory). - Sense Inventory usually comes from a dictionary
- Word Sense Discrimination is the problem of
dividing the usages of a word into different
meanings, without regard to existing predefined
possibilities.
48Word Sense Disambiguation
- Knowledge-Based Disambiguation
- - Machine Readable Dictionaries (e.g. WordNet)
- - Raw Corpora (not manually annotated)
- Supervised Disambiguation
- - Manually Annotated Corpora
- - Input of the learning system is
- 1. a training set of the feature-encoded inputs
- 2. their appropriate sense label
- Unsupervised Disambiguation
- - Unlabelled corpora
- - Input of the learning system is
- 1. a training set of feature-encoded inputs
- 2. NOT their appropriate sense label
49Word Sense Disambiguation
- Knowledge-Based Disambiguation
- Examples
- - Algorithms based on Machine Readable
- Dictionaries (e.g. Lesk alg)
- - Semantic Similarity Metrics
- - relies on semantic networks, like ontologies
- e.g. Sim(a,b) -log(Path(a,b))/2D)
- - may utilize on information content metric
- e.g. Sim(a,b) IC(LCS(a,b)), IC(a)-log(P(a))
- - Heuristic-based Methods
- e.g. identify the most often used meaning and
use it - by default.
50Word Sense Disambiguation
- Knowledge-Based Disambiguation
- Examples
- disambiguate plant in plant with flower
- 1. plant, works, industrial plant
- 2. plant, flora, plant life
- Sim(plant1, flower)1.0
- Sim(plant2, flower)1.5 winner sense 2
51Word Sense Disambiguation
- Supervised Disambiguation
- -Class of methods that induce a classifier from
manually sense-tagged text using machine learning
techniques (SVM, Na?ve Bayes, Neural Networks..) - - Resources
- 1. Sense tagged text
- 2. Dictionary (source of sense inventory)
- 3. Syntactic Analysis (POS tagger, Chunker)
- Example of features of a training algorithm for
the target word bank bank/SHORE and
bank/FINANCE
52Word Sense Disambiguation
- Unsupervised Disambiguation
- - Identifies patterns and divides data into
clusters, - where its member of a cluster has more in common
- with the members of its own class, than any other
- - Words with similar meanings tend to occur in
similar - contexts. So clustering is based on the context
- - Only raw text is available, no external
resources nor - annotations
- - Usual Approaches Agglomerative algorithm, LSA
53Word Sense Disambiguation
- Unsupervised Disambiguation
- Examples
- - Agglomerative Clustering(McQuitty's Similarity
Analysis) - First Order Representation of the target word
bank, in four sentences
Similarity Matrix and resulting clustering
54An Empirical View on Semantic Roles Within and
Across LanguagesKatrin Erk and Sebastian Pado
- Outline
- - The problem
- - Predicate-argument structure
- - A solution
- Proposition Bank (PropBank)
- (http//www.cs.rochester.edu/gildea/PropBank/Sort
)
55An Empirical View on Semantic Roles Within and
Across Languages
- The problem
- - Despite of the breakthroughs in NLP based on
statistical - methods and linguistic representations, accurate
information - extraction was out of reach
- - A critical element was missing accurate
predicateargument - structure
- - The most important factor for improved quality
in language - translation is accurate predicate-argument
structure - - Complete grammatical parse and vocabulary
coverage are - not enough.
- - Knowledge of the proper constituents of verb
arguments is - not enough. Their proper position is very
important
56An Empirical View on Semantic Roles Within and
Across Languages
- Predicate-argument structure
- - Example
- Sentence John broke the window
- Associated predicate-argument break(John,
window) - - The recognition of the structure is not a
trivial problem - - In natural language there are several lexical
items referring - to the same type of event and several syntactic
realizations - of the same predicate-argument relations
- - Example
- A will meet/visit/consult/debate (with) B
- A and B met/visited/consulted/debated
- There was a meeting/visit/consultation/debate
- between A and B
- A had a meeting/visit/consultation/debate with
B
57An Empirical View on Semantic Roles Within and
Across Languages
- A solution
- - Create a body of publicly available training
data that explicitly annotates predicate-argument
positions with labels. - - Highest priority was given to
predicate-argument structure for verbs - - The result was the Proposition Bank (PropBank)
58An Empirical View on Semantic Roles Within and
Across Languages
- Proposition Bank (PropBank)
- - 4000 predicates (verbs only)
- - Process
- 1. For a given predicate a survey is made of the
its usages - 2. The usages are divided into senses if they
take different - number of arguments (syntactic grounds, not
semantic) - 3. The expected arguments of each sense are
numbered - sequentially from Arg0 to Arg5
- - Example
- draw sense pull
- ... the campaign is drawing fire from the
anti-smoking - advocates...
- Arg0 the campaign
- Re1 drawing
- Arg1 fire
- Arg2-from anti-smoking advocates
59An Empirical View on Semantic Roles Within and
Across Languages
- Proposition Bank (PropBank)
- - Frame Files (developed by a linguist)
- 1. Contain sense distinctions of predicates
(previous - slide)
- 2. Contain role sets. A role set of a verb
lists the roles - which seem to occur more frequently.
- - Example of role set for verb buy
- BUY
- Arg0 buyer
- Arg1 thing bought
- Arg2 seller, bought-from
- Arg3 price paid
- Arg4 putrefactive, bought-for
60ESSLLI 2006 Summer School
- 18th European Summer School in Logic, Language
and Information - Thanks!