Title: The Use of Provenance in Information Retrieval
1The Use of Provenance in Information Retrieval
- Simone Stumpf
- Erin Fitzhenry
- Tom Dietterich
2Defining Provenance
- To us, provenance concerns
- The origin of content within documents
- The relationships between documents
3Why focus on Provenance for Information Retrieval?
- People remember the relationships between
documents! - Episodic vs. Semantic Memory
- Studies
- Blanc-Brude Scapin (2007)
- Gonçalves Jorge (2004)
- No need to formulate keyword queries
- Other common document attributes are often
inaccurately remembered (Blanc-Brude Scapin
2007) - Title (20 false recall)
- Size (53.8 false recall)
- Time (47.6 false recall)
4Example Use Case Where did I save that again?
I got an email from Tom
I saved the attachment
And I pasted some information from the attachment
into a PowerPoint document
Where did that presentation go??
5Requirements for Tracking and Visualizing
Provenance
- Instrument all important document provenance
events - Provenance events are NOT automatically captured
by Windows - Develop a UI enabling users to locate documents
via the provenance relationships they remember - Integrate the UI into the Windows Desktop
6Capturing Provenance Events with TaskTracer
- TaskTracer is a Personal Information Management
system - User defines a hierarchy of Projects or
Activities - As the user works, TaskTracer automatically tags
(according to task/project) - Files
- Folders
- Email Messages
- Email Contacts
- Web pages
7Instrumenting TT to Capture Provenance Events
- TaskTracer already instruments many desktop
events - Open, Save, SaveAs, Close
- EmailArrived, Email Open, Email Close
- Open URL, Close URL, Follow Hyperlink
- Idea Extend existing instrumentation to cover
key provenance events - CopyPaste, SaveAs, FileCopy/Rename
- AttachmentAdd, AttachmentOpen, AttachmentSave,
EmailForward, EmailReply - FileDownload, FileUpload
- Coming soon
8Instrumenting TaskTracer to capture Provenance
Events (cont.)
- Database of document-to-document provenance
relationships
9Developing a User InterfaceTaskTrail
- A tool for visualizing provenance
-
Double-click to open
Users Query
Mouse over details
Click to Expand
10Integrating TaskTrail into the Windows UI
- Launch a query by right clicking on an item
within - Windows Explorer, Outlook, TaskExplorer
11(No Transcript)
12(No Transcript)
13(No Transcript)
14Research Questions
- Does TaskTrail help users find documents more
quickly than other methods? - How should the provenance graph be laid out?
- What kind of provenance events do users
accurately recall? - How large are the provenance graphs?
- What patterns exist (if any) in terms of the
succession of provenance events?
15User Studies Formative
- Observational Study (planned)
- What provenance-related actions do users perform?
Which of those do they remember? - Observe 12 participants in their workplaces
- Record provenance-related actions performed
- Interview participants after 1 week to see what
they remember - Free Recall
- Cued Recall
- How do users layout their documents according to
what they remember?
16User Studies Summative
- TaskTrail Study at Intel (in progress)
- 4 participants (so far) are using TaskTracer for
at least 1 month each - Then they will use TaskTrail to locate their own
documents - Measures of success
- Do users locate more documents using TaskTrail?
- Do users locate documents more quickly using
TaskTrail? - Do users prefer using TaskTrail?
17Provenance-related User Studies are Hard!
- Must be done in the wild
- Involves
- Long time-scales, which increase chances that
- Participants will drop out
- Situation on site will change
- Potentially sensitive information
- Emails to/from users not participating in the
study - Documents regarding trade secrets
- Installation of some event-tracking software
- Software installation/maintenance can introduce
compatibility, scheduling and other problems
18Summary TaskTrail
- TaskTrail
- Instruments desktop provenance relationships
- Allows user to query by right-clicking objects
- User can browse visualization of provenance
relationships to find desired documents - Exploits human episodic memory to help users find
documents