Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns - PowerPoint PPT Presentation

About This Presentation
Title:

Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns

Description:

Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 30
Provided by: Nick2197
Category:

less

Transcript and Presenter's Notes

Title: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns


1
Finding Tribes Identifying Close-Knit
Individuals fromEmployment Patterns
  • Lisa Friedland and David Jensen
  • Presented by Nick Mattei

2
Introduction
  • Tribes groups with similar traits in a large
    graph
  • Distinguish those that work together and move
    together intentionally

3
Relationship Knowledge Discovery
  • Exploit connections among individuals to identify
    patterns and make predictions
  • Discover underlying dependencies
  • Links must be inferred

4
Graph Mining
  • Discover Hidden Group Structures
  • Animal Herds, Webpages, Employees
  • Time Series Analysis
  • Co-integration (Economics)
  • Security and Intrusion Detection
  • Dynamic Networks

5
Motivation
  • National Association of Securities Dealers
  • Fraud
  • Collusion
  • 4.8 Million Records
  • 2.5 Million Reps at 560,000 Firms
  • 100 Years of Data

6
Complications
  • Jobs not necessarily in order (or singletons)
  • 20 of employees hold more than one job at a time
  • 10 begin multiple jobs (up to 16) on one day
  • Leave gaps between employment
  • Mergers and acquisitions

7
Model
8
Finding Anomalously Related Entities
  • Input
  • Bipartite Graph G (R ? A, E)
  • Entities R r1, r2, , rn (People)
  • Attributes A a1, a2, , am (Orgs.)
  • Entities should connect several attributes
  • Model co-occurrence rates of pairs of attributes

9
Algorithm
10
Simple Model Measures
  • JOBS (Number of shared Jobs in the sequence)
  • YEARS (Number of Years of overlap)

11
Example Sequences
12
Probabilistic Model
  • X P(BrA -gt BrB -gt BrC -gt BrD)
  • pa tAB tBC tCD
  • Estimate
  • P(start branch i)
  • (reps ever at i) / (reps in database)
  • Tij P(reps from i to j ever at i)
  • (reps leave i to go to j) / (ever at i)

13
Probabilistic Model
  • Null Hypothesis of Independent Movement
  • Movement Not Random
  • Split and Merge
  • Markov Chains

14
Probabilistic Model (Different Paths)
  • Tij becomes Vij
  • Vij P(move to branch j at any point after
    branch I currently at i)
  • ( reps who go to branch j at any point after
    working at i) / ( reps ever at i)
  • Now each vij gt tij and probabilities no longer
    sum to 1.

15
Probabilistic Model (Different Paths)
  • Vij becomes Wij
  • Wij P (move to branch j at any point
    simultaneous to or after branch i currently at
    i)
  • ( reps who start at j at any point
    simultaneous or after starting at i) / ( of reps
    ever at i)
  • Now less precise in respect to direct transitions
    but more general

16
PROB - TIMEBINS
  • Bins of 1 year or more
  • 10 people worked at each branch in a bin period
  • PiX reps ever at i during time X / reps in
    DB
  • yiXjY reps ever at I during time X and at j
    during time Y, where Y gt X / reps ever at i
    during time X

17
PROB-NOTIME
  • Ignores order of job moves
  • Use original pi
  • Zij raw number of reps who are at both branches
    I and j during career
  • Transition Pr from i to j
  • (zij / reps ever at i)
  • ! (zij / reps ever at j)
  • transition Pr from j to i

18
Tribe Size
19
Pairs
20
Commonality of Job Sequence
21
Disclosure Scores
22
Homogenaity and Mobility
23
(No Transcript)
24
(No Transcript)
25
Discussion
  • JOBS, PROB, PROB-TIME, PROB-NOTIME create tribes
    with higher than average disclosure scores
  • PROB creates more cross zip code results
  • PROB-TIME has higher phi-squared than all others
  • PROB favors large firms

26
Discussion
  • JOBS and YEARS compute larger connected
    components
  • JOBS and PROB find same number of tribes but pick
    different groups as tribes

27
Conclusions
  • With no explicit knowledge we can discover
  • Job transitions
  • Geography
  • Career track

28
Conclusions
  • Needed
  • Ongoing process
  • Multiple affiliations
  • Arbitrary times
  • Time is a paradox in domain

29
Thanks!
  • Time for
  • Questions
  • Comments
  • Smart Remarks
Write a Comment
User Comments (0)
About PowerShow.com