Malcolm Clark - PowerPoint PPT Presentation

About This Presentation
Title:

Malcolm Clark

Description:

Genre Analysis of Structured E-mails for Corpus Profiling ... Information Retrieval (IR), Genre and Perception ... How can genre categorization be performed by ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 31
Provided by: malcol75
Category:
Tags: clark | genre | malcolm

less

Transcript and Presenter's Notes

Title: Malcolm Clark


1
Genre Analysis of Structured E-mails for Corpus
Profiling Workshop on Corpus Profiling for NLP/IR
Malcolm Clark Supervisors Professor Patrik
O'Brian Holt Dr Ian Ruthven
1/25
2
Presentation Outline
  • Introduction
  • The Problems
  • Information Retrieval (IR), Genre and Perception
  • Experiment Research Questions, Setup, How do
    People use Textual Features?
  • Conclusions
  • Contributions and Implications
  • Future Work

Malcolm Clark
2/25
3
Introduction
  • Focuses IR and cognitive psychology.
  • Corpuses contain exemplar documents called
    genres useful for profiling corpora
  • E-mail exchanges have socially constructed
    communicative behaviours which exist to improve
    the efficiency of a community of practice and for
    profiling corpora.
  • Investigate these types of genres and how people
    use emails in terms of genre and perception for
    filtering.

Malcolm Clark
3/25
Malcolm Clark
4
The Problems
  • Identifying genres for profiling corpus
  • Filter correct types of documents to user by
    genre
  • E-mail filtering
  • Understanding user tasks
  • Rapidly understand a text without the necessity
    for parsing the whole document?

4/25
Malcolm Clark
5
The Project Examines
  • The value of structure.
  • How form or layout is perceived in structured
    texts?
  • Constructivist (recognition) and ecological
    approaches (action afforded ) or are they both
    used?
  • If and how the objects of a community of practice
    (COP) can be comprehended and exploited?
  • How readers react to genre features in document
    collections.

Malcolm Clark
5/25
6
Information Retrieval
Division of IR into computer science lab
experiments vs user-orientated social
studies Järvelin(2006)
6/25
Malcolm Clark
7
Genre Background
Readily observable features
Communicative purpose
TYPICAL GENRE
Form
Purpose
Discourse Structure
Comms Medium
Arguments
Structural Features
Language or Symbol System
Themes
Topics
Topics
Topics
Formality, specialised vocab
Orlikowski and Yates 1994
Malcolm Clark
7/25
Malcolm Clark
8
Corpus - Genre Example from E-mail-call for papers
Header Title etc
Abstract
Titles Topics (list)
Dates and submission
8/25
Malcolm Clark
9
Genre What are Communities of Practice (COP)?
  • What ?
  • Social institutions/sites.
  • When?
  • Human agents draw on genre rules to
  • engage in organizational communication.
  • How?
  • Produced, reproduced, or modified.
  • But how are they perceived and used?

9/25
Malcolm Clark
10
Human Perceptual Systems
  • Two prominent fields in perception research

Perceive
Final goal?
Recognition
Action
10/25
Malcolm Clark
11
Experiment Pilot - Research Questions
  • How human beings use genres features and what do
    they perceive?
  • How can genre categorization be performed by
    using current skimming methods?
  • How do genres evolve in communities of practice
    (i.e. e-mail etc)?
  • How are the document genres and structural
    attributes used?

11/25
Malcolm Clark
12
Experiment Pilot - How do People Use Texts?
  • By eye tracking i.e. the position and movement of
  • the eye
  • Collect and analyse the empirical data produced
    by experiments in e-mail community of practice.
  • Locating the strategies and features for
    profiling corpora - e.g. centred blocks of text,
    invariant cues. Taking into account features,
    strategies etc.
  • How do humans view genre?

12/25
Malcolm Clark
13
Experiment Pilot
13/25
Malcolm Clark
14
Pilot - Setup
  • Method - 4 x 16 image blocks (4 genres in each
    two blocks).
  • Measurements
  • Amount of genres idd correctly - purpose
  • Structure vs Non-structure form - form
  • Identification of genre response time - form
  • Strategies and distinguishing features - purpose
    and form
  • Variables
  • Purpose/type of genre
  • Form in 4 representations..

14/25
Malcolm Clark
15
CFP - Content AND Structure
15/25
Malcolm Clark
16
CFP Structure and No Content
16/25
Malcolm Clark
17
CFP Content No Structure
17/25
Malcolm Clark
18
CFP No Content AND No Structure
18/25
Malcolm Clark
19
Setup
  • Task and procedure
  • Shown 64 images
  • Vocally Id each image.
  • Eyetracker records features and strategies used.
  • Data recorded
  • X/Y location saccades and fixations.
  • Features and strategies
  • Desktop video recording Wink
  • Timed and vocal responses

19/25
Malcolm Clark
20
Results after 5 Participants
  • Amount of genres idd correctly-purpose
  • 11.5 per block out of 16.
  • Un-structured vs structure 41.6/72.9
  • Orig (87.5),Orig no content (77), content no
    struc (68), non 27
  • Structure vs Non-form - av. response time (sec)

  • 2.22 vs 2.72
  • HOW WAS IT DONE?????
  • Clues to strategies
  • skimmed shape - left (sem) / centred (cfp)
  • aligned and blocks of text/numerics
  • No structure/no struc or content wide spirals of
    scanning behaviour poss looking keywords?

20/25
Malcolm Clark
21
Results Distinguishing features
Genre Features
CFP Dates, centered blocks
Cinema Block numerical content
ITS Inconclusive (participants ignore them?)
Lib List book (s) info at bottom
Nl Paragraph/summary of item then URL
Ord Left alignment/currency
Sem Inconclusive
Spam Keywords LOTTO/address and uppercase emboldened text
21/25
Malcolm Clark
22
Conclusions
  • Genre largely overlooked but momentum is
    building.
  • Our approach is useful for filtering e-mails/id
    features for characterising datasets
  • Purpose and form very useful for using texts.
  • Clues to perception processes found but need to
    add familiarity to the mix.
  • Train machine to emulate human behaviour and
    understand textual input without reading whole
    text?

22/25
Malcolm Clark
23
Contributions and Implications
  • Development of a language/perception
    theory/framework of
  • How people use different types of texts.
  • Modelling user tasks and behaviour in relation to
    genre and perception.
  • Extend laboratory IR/user-orientated IR approach
  • From algorithms and machines.
  • To a user-oriented and contextual level.

23/25
Malcolm Clark
24
Future Work
  • Focus on narrowing down my work domains.
  • Investigate domains
  • Academic documents collections CSIRO Enterprise
  • Legal documents - Enron
  • Weblogs TREC Blog
  • Web domains - Wikipedia
  • Consider multi-genres e.g. course books, large
    documents e.g social work report

24/25
Malcolm Clark
25

Malcolm Clark
25/25
26
Motivation
  • Useful features for profiling corpora.
  • Adds another type of filtering to large data
    collections to take advantage of genre i.e. news,
    biographical etc.
  • Genre benefits organisations financially and
    administratively i.e. rapid retrieval of
    information.
  • Embrace genre and perception to understand and
    examine these structures!

26/25
Malcolm Clark
27
Evaluation System
  • Model the findings based on FERRET and McFRUMPs
    Predictor and Substantiator.
  • Our system Genre Retrieval and Understanding
    Memory Program or GRUMP.
  • Similar features to Clark and Watt (2007)?

27/25
Malcolm Clark
28
Skimming Categorisation
  • Skimming
  • Used to identify the main points in a text
    much
  • quicker than normal reading without having
    to
  • understand every word.
  • Normally used when a reader has a large
  • amount of text to read within a limited
    time.
  • Categorisation
  • Automatically labelled or classified.
  • No need for manual organisation, labelling
    or
  • sorting.

28/25
Malcolm Clark
29
Evaluation System How it Works
Queries
Texts
Query Parser
McFRUMP Parser
Abstracts
Case Frame Matcher
Case frame patterns
Relevant Texts Figure taken from Mauldin 1991
McFRUMP parser contains the Predictor/Substantiato
r, Scripts etc
29/27
Malcolm Clark
30
Evaluation System Script Example
  • Using Schanks (1981, ch 3) Conceptual Dependency
    theory of Scripts, Plans and Goals and DeJongs
    (1982) FRUMP make different genre scripts
  • John Doe was arrested last Saturday morning after
    holding up the New Haven Savings Bank
  • ARREST SCRIPT
  • Police arrive at suspect location
  • Suspect Apprehended
  • Taken to police station
  • Charged
  • Incarcerated or bailed
  • Using this type of script format to understand
    stories, genre rules/features can be specified in
    scripts to understand texts.

Modify script with genre rules
30/25
Write a Comment
User Comments (0)
About PowerShow.com