Discovery and Regeneration of Hidden Emails - PowerPoint PPT Presentation

About This Presentation
Title:

Discovery and Regeneration of Hidden Emails

Description:

A weakly connected precedence graph G can be represented by a single bulletized ... developed algorithms to transform precedence graph into bulletized emails. 9 ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 26
Provided by: xdz
Category:

less

Transcript and Presenter's Notes

Title: Discovery and Regeneration of Hidden Emails


1
Discovery and Regeneration of Hidden Emails
  • Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou,
    Ed Zwart
  • Dept. of Computer Science
  • Univ. of British Columbia

2
Whatre hidden emails? - a concrete example
Subject Re Midterm Details gt a) I need to meet
with a gt faculty recruit gt b) Don, can you go
directly gt to SOWK 124 .. Sure. gt d) I will
bring the exams with gt me I can help you carry
it. gt f) I will bring classlists with gt me.
Is there a seating plan? Don
Subject Re Midterm Details gt a) I need to meet
with a gt faculty recruit I will go there as
well. gt c) Warren and Qiang, can gt you go
directly to LSK 201? No problem. gt f) I will
bring classlists with gt me. Do they need
to sign on the list? - Warren
Subject Re Midterm Details gt a) I need to meet
with a gt faculty recruit gt b) Don, can you
go gt directly to SOWK 124 ... Don, Ill
go with you too. gt e) Students whose last gt
names begin with gt f) I will bring classlists
gt with me. Do we have a seating plan as
last term? Cheers, Kevin
  • Deleted messages
  • Subject Midterm Details
  • a) I need to meet with a faculty recruit at
    lunch tomorrow.
  • b) Don, can you go directly to SOWK 124 ...
  • c) Warren and Qiang, can you go directly to LSK
    201.
  • d) I will bring the exams with me to Sage,
  • e) Students whose last name begin with M-Q will
  • f) I will bring classlists with me.
  • Thanks.
  • -Ed

3
Whats hidden emails? cont.
  • A hidden email is an email quoted by at least one
    email in the folder but does not exist itself in
    the same folder.
  • Deleted emails (intentionally or accidentally)
  • Forwarded messages
  • Previous discussions before users join

4
Applications
  • Email summarization
  • Threading hierarchy
  • Summarization based on threading hierarchy
  • Existence of hidden emails affects the
    summarization task.
  • Forensics and privacy
  • Reconstruct deleted messages for investigation.
  • Protected information may be leaked via
    quotations.

5
Problem statement
  • Problem statement
  • Given a folder of emails, regenerate all hidden
    emails in this folder.
  • Two sub-problems
  • How to discover hidden emails?
  • How to reconstruct discovered hidden emails to
    the user?

6
Methodology - skeleton
  • Step 1. Discovery of hidden emails
  • 1.1 Identify hidden fragments
  • 1.2 Find overlapping of hidden fragments
  • Step 2. Regeneration of hidden emails
  • 2.1 Build the precedence graph
  • 2.2 Generate bulletized emails

7
Step 1. Discovery of hidden emails
  • Challenges
  • Quotations can be edited as free text.
  • Insertion, deletion
  • Copied messages, forwarded messages
  • Reshuffling
  • Several emails may quote the same original hidden
    email.

8
Methodology - skeleton
  • Step 1. Discovery of hidden emails
  • 1.1 Identify hidden fragments
  • 1.2 Find overlapping of hidden fragments
  • Step 2. Regeneration of hidden emails
  • 2.1 Generate the precedence graph
  • 2.2 Create bulletized emails

9
Identify hidden fragments (cont.)
  • Separate quoted new fragments
  • Compare each quoted fragment (F) with all other
    new fragments in the folder.
  • If there is no sufficiently long overlapping, F
    is considered as hidden fragments.
  • Otherwise, there exists a sufficiently long
    overlapping, the overlapped part is not hidden.

gt a I will go there as well. gt c No problem. gt
f Do they need to sign on the list? - Warren
gt a gt c gt f
Quoted fragment (F)
10
Methodology - skeleton
  • Step 1. Discovery of hidden emails
  • 1.1 Identify hidden fragments
  • 1.2 Find overlapping of hidden fragments
  • Step 2. Regeneration of hidden emails
  • 2.1 Generate the precedence graph
  • 2.2 Create bulletized emails

11
Overlapping of hidden fragments
Subject Re Midterm Details gt a) gt b)
Sure. gt d) I can help you carry it. gt
f) Is there a seating plan? Don
Subject Re Midterm Details gt a) I will go
there as well. gt c) No problem. gt f) Do
they need to sign on the list? - Warren
a c f
ab d f
a c f
a b d f
12
Methodology - skeleton
  • Step 1. Discovery of hidden emails
  • 1.1 Identify hidden fragments
  • 1.2 Find overlapping of hidden fragments
  • Step 2. Regeneration of hidden emails
  • 2.1 Build the precedence graph
  • 2.2 Generate bulletized emails

13
Precedence graph example
gt a gt b gt d gt f
a
b
c
gt a gt b ... gt e gt f
d
e
gt a gt c ... gt f
f
The precedence graph
Three emails in the current folder
14
Precedence graph complications
  • The ideal case
  • A chain of nodes ? A total ordered hidden
    fragments.
  • Complicate cases
  • Incompatible nodes
  • e.g., b c, d e
  • Partial order is necessary.

15
Methodology - skeleton
  • Step 1. Discovery of hidden emails
  • 1.1 Identify hidden fragments
  • 1.2 Find overlapping of hidden fragments
  • Step 2. Regeneration of hidden emails
  • 2.1 Build the precedence graph
  • 2.2 Generate bulletized emails

16
Precedence graph ? hidden emails
  • Objectives
  • Node coverage
  • Edge soundness
  • All edges are represented.
  • No spurious edges are implied.
  • Minimization
  • Minimize the number of regenerated hidden emails.

17
Generate hidden emails challenges
  • Challenges
  • People read documents sequentially ? graphical
    representation isnt acceptable.
  • Incompatible nodes no arbitrary ordering ?
  • partial ordering representation.

18
Bulletized email model
  • Text devices
  • bullets ? incompatible nodes.
  • offsets ? nested relations among bulletized
    fragments.

19
Bulletized email model example
  • One bulletized hidden email suffice.
  • a
  • c
  • b
  • gt d
  • gt e
  • f

a
b
c
d
e
f
20
Example complicate case
e.g., ? Spurious edge (B, u)
A
  • A
  • gt x
  • y
  • u
  • A
  • gt x
  • gt B
  • y
  • u

B
x
y
B
u
21
A necessary and sufficient condition
  • Theorem 1
  • A weakly connected precedence graph G can be
    represented by a single bulletized email with
    every edge captured and no inclusion of spurious
    edges, iff G is a strict and complete
    parent-child graph.

22
Heuristics for Incompleteness and non-strictness
  • Edge deletion
  • Remaining graph satisfies Theorem 1
  • Deleted edges can be represented separately.

23
Contributions
  • A first step to reconstruct hidden emails
  • We proposed the bulletized email model to
    regenerate hidden emails.
  • We found the necessary and sufficient condition
    for a precedence graph to be represented exactly
    by one single bulletized email.
  • We developed algorithms to transform precedence
    graph into bulletized emails.

24
Future work
  • Nested hidden emails
  • Deal with cycles
  • NLP analysis to decide additional ordering among
    hidden fragments.
  • Document forensics

25
Experiments for scalability
  • Default setting (synthetic data)
  • A folder of M(1000) emails,
  • 1M hidden emails
  • 30M emails quote hidden emails
  • Each email quote about 10 hidden fragments of
    that hidden email.

26
Experiments
27
Experiments
28
Thank you!
29
Experiments runtime
30
Identify hidden fragments
  • Identify quoted fragments
  • Compare F with F
  • If there exists a sufficiently long overlapping,
    the overlapped part is not hidden.
  • Otherwise, F is taken as a hidden fragment.

31
Some concepts
  • Parent-child subgraph
  • parent(C) P, child(P) C
  • Completeness
  • Each parent-child subgraph is a complete
    bipartite graph (biclique) ?
  • PC

PC1
A
B
C
D
E
PC2
PC3
F
32
Some concepts strictness
  • Strictness
  • If two nodes x, y have a common child u, all the
    parent-child subgraphs preceding either x or y
    but not both, also precedes u.
  • B1 x , B2 y u all fragments in B1
    and B2 have to precede u.

33
Algorithm graph2email
  • Input G, and a set of nodes S
  • Output a bulletized email BE
  • Process
  • Traverse Ss descendants T, s.t., they can
    construct a bulletized email independently
  • DFS BFS

34
Example of graph2email
A
A
  • A
  • C
  • B
  • gt D
  • gt E
  • F

B
  • B
  • D
  • E

C
  • gt C
  • gt B
  • D
  • E
  • F

B
C
  • gt C
  • gt B
  • D
  • E

D
E
F
35
Example of star-cut
s
cut (s,t)
two missing edges
B
A
B
A
B
A
C
D
E
C
D
E
C
D
E
t
36
gt A gt B gt X gt Y gt Z
A B
X Y
Write a Comment
User Comments (0)
About PowerShow.com