Title: Discovery and Regeneration of Hidden Emails
1Discovery and Regeneration of Hidden Emails
- Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou,
Ed Zwart - Dept. of Computer Science
- Univ. of British Columbia
2Whatre hidden emails? - a concrete example
Subject Re Midterm Details gt a) I need to meet
with a gt faculty recruit gt b) Don, can you go
directly gt to SOWK 124 .. Sure. gt d) I will
bring the exams with gt me I can help you carry
it. gt f) I will bring classlists with gt me.
Is there a seating plan? Don
Subject Re Midterm Details gt a) I need to meet
with a gt faculty recruit I will go there as
well. gt c) Warren and Qiang, can gt you go
directly to LSK 201? No problem. gt f) I will
bring classlists with gt me. Do they need
to sign on the list? - Warren
Subject Re Midterm Details gt a) I need to meet
with a gt faculty recruit gt b) Don, can you
go gt directly to SOWK 124 ... Don, Ill
go with you too. gt e) Students whose last gt
names begin with gt f) I will bring classlists
gt with me. Do we have a seating plan as
last term? Cheers, Kevin
- Deleted messages
- Subject Midterm Details
- a) I need to meet with a faculty recruit at
lunch tomorrow. - b) Don, can you go directly to SOWK 124 ...
- c) Warren and Qiang, can you go directly to LSK
201. - d) I will bring the exams with me to Sage,
- e) Students whose last name begin with M-Q will
- f) I will bring classlists with me.
- Thanks.
- -Ed
3Whats hidden emails? cont.
- A hidden email is an email quoted by at least one
email in the folder but does not exist itself in
the same folder. - Deleted emails (intentionally or accidentally)
- Forwarded messages
- Previous discussions before users join
4Applications
- Email summarization
- Threading hierarchy
- Summarization based on threading hierarchy
- Existence of hidden emails affects the
summarization task. - Forensics and privacy
- Reconstruct deleted messages for investigation.
- Protected information may be leaked via
quotations.
5Problem statement
- Problem statement
- Given a folder of emails, regenerate all hidden
emails in this folder. - Two sub-problems
- How to discover hidden emails?
- How to reconstruct discovered hidden emails to
the user?
6Methodology - skeleton
- Step 1. Discovery of hidden emails
- 1.1 Identify hidden fragments
- 1.2 Find overlapping of hidden fragments
- Step 2. Regeneration of hidden emails
- 2.1 Build the precedence graph
- 2.2 Generate bulletized emails
7Step 1. Discovery of hidden emails
- Challenges
- Quotations can be edited as free text.
- Insertion, deletion
- Copied messages, forwarded messages
- Reshuffling
- Several emails may quote the same original hidden
email.
8Methodology - skeleton
- Step 1. Discovery of hidden emails
- 1.1 Identify hidden fragments
- 1.2 Find overlapping of hidden fragments
- Step 2. Regeneration of hidden emails
- 2.1 Generate the precedence graph
- 2.2 Create bulletized emails
9Identify hidden fragments (cont.)
- Separate quoted new fragments
- Compare each quoted fragment (F) with all other
new fragments in the folder. - If there is no sufficiently long overlapping, F
is considered as hidden fragments. - Otherwise, there exists a sufficiently long
overlapping, the overlapped part is not hidden.
gt a I will go there as well. gt c No problem. gt
f Do they need to sign on the list? - Warren
gt a gt c gt f
Quoted fragment (F)
10Methodology - skeleton
- Step 1. Discovery of hidden emails
- 1.1 Identify hidden fragments
- 1.2 Find overlapping of hidden fragments
- Step 2. Regeneration of hidden emails
- 2.1 Generate the precedence graph
- 2.2 Create bulletized emails
11Overlapping of hidden fragments
Subject Re Midterm Details gt a) gt b)
Sure. gt d) I can help you carry it. gt
f) Is there a seating plan? Don
Subject Re Midterm Details gt a) I will go
there as well. gt c) No problem. gt f) Do
they need to sign on the list? - Warren
a c f
ab d f
a c f
a b d f
12Methodology - skeleton
- Step 1. Discovery of hidden emails
- 1.1 Identify hidden fragments
- 1.2 Find overlapping of hidden fragments
- Step 2. Regeneration of hidden emails
- 2.1 Build the precedence graph
- 2.2 Generate bulletized emails
13Precedence graph example
gt a gt b gt d gt f
a
b
c
gt a gt b ... gt e gt f
d
e
gt a gt c ... gt f
f
The precedence graph
Three emails in the current folder
14Precedence graph complications
- The ideal case
- A chain of nodes ? A total ordered hidden
fragments. - Complicate cases
- Incompatible nodes
- e.g., b c, d e
- Partial order is necessary.
15Methodology - skeleton
- Step 1. Discovery of hidden emails
- 1.1 Identify hidden fragments
- 1.2 Find overlapping of hidden fragments
- Step 2. Regeneration of hidden emails
- 2.1 Build the precedence graph
- 2.2 Generate bulletized emails
16Precedence graph ? hidden emails
- Objectives
- Node coverage
- Edge soundness
- All edges are represented.
- No spurious edges are implied.
- Minimization
- Minimize the number of regenerated hidden emails.
17Generate hidden emails challenges
- Challenges
- People read documents sequentially ? graphical
representation isnt acceptable. - Incompatible nodes no arbitrary ordering ?
- partial ordering representation.
18Bulletized email model
- Text devices
- bullets ? incompatible nodes.
- offsets ? nested relations among bulletized
fragments.
19Bulletized email model example
- One bulletized hidden email suffice.
a
b
c
d
e
f
20Example complicate case
e.g., ? Spurious edge (B, u)
A
B
x
y
B
u
21A necessary and sufficient condition
- Theorem 1
- A weakly connected precedence graph G can be
represented by a single bulletized email with
every edge captured and no inclusion of spurious
edges, iff G is a strict and complete
parent-child graph.
22Heuristics for Incompleteness and non-strictness
- Edge deletion
- Remaining graph satisfies Theorem 1
- Deleted edges can be represented separately.
23Contributions
- A first step to reconstruct hidden emails
- We proposed the bulletized email model to
regenerate hidden emails. - We found the necessary and sufficient condition
for a precedence graph to be represented exactly
by one single bulletized email. - We developed algorithms to transform precedence
graph into bulletized emails.
24Future work
- Nested hidden emails
- Deal with cycles
- NLP analysis to decide additional ordering among
hidden fragments. - Document forensics
25Experiments for scalability
- Default setting (synthetic data)
- A folder of M(1000) emails,
- 1M hidden emails
- 30M emails quote hidden emails
- Each email quote about 10 hidden fragments of
that hidden email.
26Experiments
27Experiments
28Thank you!
29Experiments runtime
30Identify hidden fragments
- Identify quoted fragments
- Compare F with F
- If there exists a sufficiently long overlapping,
the overlapped part is not hidden. - Otherwise, F is taken as a hidden fragment.
31Some concepts
- Parent-child subgraph
- parent(C) P, child(P) C
- Completeness
- Each parent-child subgraph is a complete
bipartite graph (biclique) ? - PC
PC1
A
B
C
D
E
PC2
PC3
F
32Some concepts strictness
- Strictness
- If two nodes x, y have a common child u, all the
parent-child subgraphs preceding either x or y
but not both, also precedes u. - B1 x , B2 y u all fragments in B1
and B2 have to precede u.
33Algorithm graph2email
- Input G, and a set of nodes S
- Output a bulletized email BE
- Process
- Traverse Ss descendants T, s.t., they can
construct a bulletized email independently - DFS BFS
34Example of graph2email
A
A
B
C
B
C
D
E
F
35Example of star-cut
s
cut (s,t)
two missing edges
B
A
B
A
B
A
C
D
E
C
D
E
C
D
E
t
36gt A gt B gt X gt Y gt Z
A B
X Y