First International Conference on - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

First International Conference on

Description:

Structural Link Analysis from User Profiles and Friends Networks: ... LiveJournal Topology [1]: Tools and Security Model. LJMindMap.com 2004 mcfnord ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 24
Provided by: kdd5
Category:

less

Transcript and Presenter's Notes

Title: First International Conference on


1
Structural Link Analysis from User Profiles and
Friends NetworksA Feature Construction Approach
  • William H. Hsu, Joseph Lancaster, Martin S. R.
    Paradesi, Tim Weninger
  • Monday, 26 March 2007
  • Laboratory for Knowledge Discovery in Databases
  • Kansas State University
  • http//www.kddresearch.org/KSU/CIS/ICWSM-20070326.
    ppt

2
Link Analysis in Social NetworksThe K-State
Corpus
3
Outline
  • Background, Related Work and Rationale
  • Technical Objective Link Mining in Social
    Networks
  • Methodology Graph Feature Extraction
  • Experimental Results K-State LJMiner Corpus
  • Continuing Work Statistical Relational Models

4
Problem StatementLink Mining in Social Networks
  • Problem Definition
  • Given records of users of weblog or social
    network service
  • Discover
  • Features of entities users, communities
  • Relationships friendship, membership,
    moderatorship
  • Explanations and predictions for relationships
  • Goals
  • Boost precision and recall of link existence
    prediction
  • Find relevant features
  • Significance Recommendations (Friendship,
    Membership)

5
Related WorkLink Mining
  • Getoor and Diehl (2005) - Graphical model
    representations of link structure
  • Ketkar et al. (2005) - Data mining techniques vs
    graph-based representation
  • Sarkar Moore (2005) - Change in link structure
    across discrete time steps
  • Popescul Ungar (2003) - ER model to predict
    links
  • Hill (2003), Bhattacharya Getoor (2004)
    Statistical Relational Learning to resolve
    identity uncertainty
  • Resig et al. (2004) - Predicting IM online times
    using friends graph degree
  • McCallum et al. (2005) - Inferring roles and
    topic categories based on link analysis

6
Rationale
  • Limitations of Current State of the Art
  • Do not take graph features into account
  • Limited ability to select, extract features
  • Novel Contribution Link Mining System
  • Extracts, computes features of network model
  • Towards dependent types for relational link
    mining
  • Rationale
  • Desired functionality infer new links from old
  • Evaluation precision, recall for link existence

7
Outline
  • Background, Related Work and Rationale
  • Technical Objective Link Mining in Social
    Networks
  • Methodology Graph Feature Extraction
  • Experimental Results K-State LJMiner Corpus
  • Continuing Work Statistical Relational Models

8
K-State Test BedLJMiner Corpus
9
LiveJournal Topology 1Tools and Security Model
10
LiveJournal Topology 2Definitions
11
Outline
  • Background, Related Work and Rationale
  • Technical Objective Link Mining in Social
    Networks
  • Methodology Graph Feature Extraction
  • Experimental Results K-State LJMiner Corpus
  • Continuing Work Statistical Relational Models

12
Graph Features 1Node, Pair, Link-Dependent
Node-Dependent Features specific to one node
(vertex) within candidate pair
Indegree (v) Target popularity
Indegree (u) Source popularity
Outdegree (u) Source fertility
Outdegree (v) Target fertility
Pair-Dependent Features specific to one
candidate pair of nodes (vertices)
Link-Dependent Features specific to one link
(edge) in directed graph
13
Graph Features 2Node and Pair Features in
LJMiner
14
LJCrawler
  • System Design
  • Data acquisition client, injector, parser
  • Ancillary issues
  • Multi-threading
  • Distribution
  • Storage
  • Analytical postprocessing LJClipper, LJStats
  • Distinguishing features of LJCrawler
  • Results
  • 200 users/second maximum, 5 users/second allowed
  • Approximately 2 million pages crawled

15
Outline
  • Background, Related Work and Rationale
  • Technical Objective Link Mining in Social
    Networks
  • Methodology Graph Feature Extraction
  • Experimental Results K-State LJMiner Corpus
  • Continuing Work Statistical Relational Models

16
Network StatisticsGraph Distance
17
Interpretation of Results
  • 941-node graph (Hsu et al., 2006) LJCrawler v1
    output
  • 1000-4000 node graphs LJCrawler v2 output

18
Outline
  • Background, Related Work and Rationale
  • Technical Objective Link Mining in Social
    Networks
  • Methodology Graph Feature Extraction
  • Experimental Results K-State LJMiner Corpus
  • Continuing Work Statistical Relational Models

19
Results
  • Establishing an Interdisciplinary Research
    Initiative
  • K-State / KU / UNL collaboration
  • Resources Linguistic Data Consortium
  • NIST evaluations
  • Involving End Users of Machine Translation
  • Document users
  • Machine learning, data mining, info extraction
    researchers
  • Novel Applications
  • Social networks and collaborative recommendation
  • Gisting and beyond

20
Continuing Work
  • Information Extraction and Intelligent IR
  • Learning models for IE ontologies
  • Latent semantic analysis
  • Machine Learning
  • Natural language learning
  • Time series learning and understanding
  • Relational and first-order models
  • Automated Reasoning
  • Probabilistic
  • Case-based and analogical
  • Data Mining and Warehousing
  • Grid Computing

21
References
  • Knight, K. Whats New in Statistical Machine
    Translation. Invited Talk, International Joint
    Conference on Artificial Intelligence
    (IJCAI-2005), Edinburgh, UK, August, 2005.
  • Knight, K. Graehl, J. (2005). An Overview of
    Probabilistic Tree Transducers for Natural
    Language Processing. In Proceedings of CICLing
    2005, p. 1-24.
  • Chiang, D. A hierarchical phrase-based model for
    statistical machine translation. In Proceedings
    of the Conference of the Association for
    Computational Linguistics (ACL 2005), p. 263270.
  • Koehn, P., Och, F. J., Marcu, D. (2003).
    Statistical Phrase-Based Translation. In
    Proceedings of HLT-NAACL 2003, the Human Language
    Technology Conference of the North American
    Chapter of the Association for Computational
    Linguistics, May 27 - June 1, 2003, Edmonton,
    CANADA.

22
Acknowledgements
  • K-State Lab for Knowledge Discovery in Databases
  • Vikas Bahirwani
  • Tejaswi Pydimarri
  • Andrew King
  • Social Networks, Graph Theory, Graph Algorithms
  • Kirsten Hildrum (IBM T. J. Watson Labs)
  • Todd Easton (K-State, Industrial and
    Manufacturing Systems Engineering)
  • Machine Learning
  • Dan Roth, Cinda Heeren, Jiawei Han (University of
    Illinois at Urbana-Champaign)
  • AnHai Doan (University of Wisconsin Madison)

23
Questions and Discussion
Write a Comment
User Comments (0)
About PowerShow.com