PG Day Presentation - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

PG Day Presentation

Description:

Towards A Community of Machine Learners Through Learning Online Communities of Practice PG Day Presentation Zhili Wu Supervisor: Dr. Chunhung Li – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 34
Provided by: XYZ121
Category:

less

Transcript and Presenter's Notes

Title: PG Day Presentation


1
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
PG Day Presentation
Zhili Wu Supervisor Dr. Chunhung
Li Cosupervisor Prof. Jiming Liu 1/Oct/2004
10/Jan/2005
2
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Outline
  • 1. Introduction
  • Motivation Objective
  • Background Related Topics
  • Tentative Proposals
  • 2. Experimental Study
  • BBS Data Study
  • MATLAB Programming Contest Platform Study
  • 3. Future Work
  • 4. Q A

3
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Motivation
  • 1. Social (Interaction) Networks grow from
    cyberspace
  • remarkably fast
  • with large scale participation and data
  • e.g. Email, Blog, Instant Messaging (IM), and
    the WWW
  • 2. Machine Learning algorithms
  • current trend highly optimized and tuned
  • perform well in many classification, clustering
    scenarios
  • e.g. kernel machines, ICA, LDA

Human
NN, GA, Agent
web mining any more????
AI
4
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Motivation
  • 1. Social (Interaction) Networks grow from
    cyberspace
  • more underlying dynamics?
  • how to improve so as to query,communicate
    conveniently?
  • 2. Machine Learning algorithms
  • can they show collective power rather than
    (over-)fitness?
  • can they help (1), and benefit from their effort
    to (1)?

Human
NN, GA, Agent
web mining any more????
AI
5
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
Objective Witness network (Internet) technology
enabled social activities are powerful, massive,
learnable, going virtual How can artificial
learners like machine learners progress by
getting inspiration from the social setting of
online human learning?
6
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
More Specific Domain 1. Understand social
interactions, mainly emphasize on online social
interactions, through statistical and machine
learning on data collected from online
Communities (of practice). 2. Add more social
factors learned from studying community data into
machine learners, hope they can form a community
of machine learners.
7
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • More background
  • 1. Machine Learners
  • (instances of a or a set of ) machine learning
    (ML) algorithms
  • 2. Typical ML algorithms doing follows
  • Classification predict the categories
  • Clustering find groups/clusters
  • Regression predict continuous outputs
  • Ranking giving an order, recommending the best
    match
  • Feature selection the best relevant descriptors
  • Other human learning they can computerize?

8
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • More background
  • 1. Community of Practice (COP) E Wenger 1991
  • Social learning that occurs when people who have
    a common interest in some subject or problem,
    collaborate to share ideas, find solutions, and
    build innovations.
  • e.g. an apprenticeship where an employee learns
    from job
  • 2. COP today is more general, virtual but
    controversial
  • COP going online kimble 2001
  • (Old) COP disappears Patricia 2004
  • Online Community v.s. online COP
  • E.g. orkut? Blog? Wikipedia? Forum?

9
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • More background How is COP studied?
  • A. Theories of COP derived from WWW studies
  • "Power-Law Distribution of the World Wide Web".
    (Huberman, L. A. Adamic), Science, 287 2115
    (2000).
  • "Strong Regularities in World Wide Web Surfing",
    (Huberman, P. Pirolli, J. Pitkow and R. M.
    Lukose), Science, (1998).
  • "Evolutionary Dynamics of the World Wide Web",
    (Huberman, L. A. Adamic), Nature 401,131 (1999).
  • Modeling the Internet's large-scale topology
    Barabási PNAS 99, 13382-13386 (2002).
  • B. Specific COP issues
  • Finding Communities in Linear Time a Physics
    Approach (with Fang Wu), Eur.Phys. Journal B38,
    331-338 (2004).
  • Email as Spectroscopy Automated Discovery of
    Community Structure within Organizations (with
    J. Tyler and D. Wilkinson), in Communities and
    Technologies (2003).
  • How To Search a Social Network Lada A. Adamic and
    Eytan Adar
  • Identifying communities of practice through
    ontology network analysis IEEE INTELL SYST 2003

10
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • More Related Study
  • a. Ensemble Learning
  • A collection of learners whose predictions are
    combined by weighted averaging or voting
  • e.g. bagging, boosting, biting
  • b. Distributed Learning
  • Mainly on partitioning data
  • c. Biologically Inspired Learning
  • Neural Network, etc. al --- not social inspired
    learning

11
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Tentative Proposals
  • Find a specific online COP
  • Study the general properties of online COP
  • 3. Improve machine learning performance on
    online COP
  • Operate on multiple views of data
  • Enrich inter-learner communication
  • Role distribution of learners
  • 4. Verify the collective power of learner
    combination
  • 5.

12
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Ongoing Studies
  • A. Chinese Bulletin Board Systems Study
  • As the startup scope
  • Study the general properties
  • 3. Improve machine learning performance on them
  • Operate on multiple views of data
  • Enrich inter-learner communication
  • Role distribution of learners
  • B. A case study of MATLAB Programming Contest
    Platform
  • 1. To verify the collective power of learner
    combination
  • 2. Observe the dynamics

13
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Chinese Bulletin Board Systems Study
  • Focus on university BBS at this stage
  • Relatively topic focused
  • Large data throughput, compatible platforms
    among many BBS
  • Accessible due to personal experiences
  • Less likely been studied by others, but much
    room to improve

14
????BBS An example thread/conversation Job
board http//bbs.zsu.edu.cn/bbstdoc?boardJob
      ??/??       ??       ????
      ????       ???       ?????
??? redrain (??Aegean), ?? Job ? ?
??????2005???? ??? ???? Yat-sen Channel (Fri
Dec 10 182641 2004), ???? http//www.dayoo.com
/corp/job/-- ??????? ??????? ????Fantasy?,redrai
n??????? ? ??.???? Yat-sen Channel
bbs.zsu.edu.cn.FROM bbs.nju.edu.cn
title
the user ip / domain
      ??/??       ??       ????
      ????       ???       ?????
??? ssky (\/\oo/\/), ?? Job ? ?Re
??????2005???? ??? ???? Yat-sen Channel (Fri
Dec 10 183051 2004), ?? ???????,?????,ft ?
? redrain (??Aegean) ?????? ?
http//www.dayoo.com/corp/job/ -- ???????,??????
?? ???????,???????? ???????,????????
???????,????????? ??.???? Yat-sen Channel
bbs.zsu.edu.cn.FROM ssky_at_zsu ? ??.????
Yat-sen Channel bbs.zsu.edu.cn.FROM
192.168.48.35
First replier
Title with Re added
content
Previous post citation
signature
source
15
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Data Collection Processing
  • Each thread (conversation) has many posts
  • Content Title
  • Author Ref (the author being
    replied/referenced)
  • Each conversation as a document is word-segmented
    -gt a document-of-word (frequency) matrix
  • All user post-reply relation accumulated in all
    conversations
  • a) 611 conversations, 7/Dec/2004, 729 authors,
    JOB board
  • M 611 x 7892 R 729 x 729
  • b) 656 conversations, 9/Oct16/Nov, GRADUATE
    board
  • M 656 x 5644 R 536 x 536


16
Graduate School Examination Board High
frequency words
Stop words high frequency but unimportant
words
17
Like stop words, there should be stop authors
18
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
A modified ranking approach by constructing a
matrix based on both the author-author in-degree
relation matrix and the matrices related with
document content
Record the presence of authors in each
conversation, aim to connect part 1 with 4
1. author-author post-reply matrix
2. author-document association matrix
P
3. document-authorassociation matrix
4. document-documentcosine similarity matrix
Maybe time information can be used to describe
the conversation-conversationdependence/referenc
e.
19
Top 10 authors among 729
In-degree out-degree
20
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
Typical way on the document of word matrix Job
Graduate document of word matrix, 1267 x 10693
Linear kernel, 10 CV, 91.0813 Normalized
Lin-ker 10 CV, 91.1602 Poly 2, 10 CV,
52.0126 RBF 10 CV, 55.8011 After removing
1552 stop words Linear kernel 10CV
90.6867 Normalized Lin-ker 10CV 91.6338 Solely
based on document of author frequency Linear
kernel 10CV 78.5320
Document of term frequencymatrix
document-authorassociation matrix
21
Pad the document author association matrix
Linear kernel, 10 CV, 91.0813 92.7388
Normalized Lin-ker 10 CV, 91.1602 Poly 2, 10
CV, 52.0126 RBF 10 CV, 55.8011 After
removing 1552 stop words Lin kernel 10CV
90.6867 Normalized Lin-ker 10CV 91.6338
94.8698
document-authorassociation matrix
Document of term frequencymatrix
22
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Observations are
  • A. Chinese Bulletin Board Systems Study
  • As the startup scope data are retrievable,
    manageable
  • Study the general properties zipf law, but
    not enough, better ranking scheme possible
  • 3. Improve machine learning performance on them
  • Operate on multiple views of data- combining
    words with author information improves
    classification
  • Enrich inter-learner communication
  • Role distribution of learners

23
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
Matlab Programming contest Case Study
Background A biannual event, an online
platform Entries are given real time evaluations
Participants are allowed to build their own
solvers modify (tweak) others
submissions Furniture contest 1270 passing
entries A week duration
These Three Pictures from http//www.mathworks.co
m/contest/furniture/analysis.html
24
Performance gradually improved, submissions form
about six clans
25
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Matlab Programming contest Case Study
  • Six inferred clans have evolutional meaning
  • To which extent, an entry claims its reference to
    a previous entry?
  • We build a connection matrix based on all
    reference information
  • Assume directional reference implies
    bidirectional relation valued with 1, otherwise
    Inf.
  • Calculate the shortest distances among each pair
    of submissions
  • Approximate a similarity matrix based on kernel
    operation on the distance matrix

26
(No Transcript)
27
The number of function defined and used in an
entry

28

Among 150 functionsappeared, some quickly faded
out, some came later and some survived
29

With frequencies taken into account, some
functions are persistently popular, some can be
seen to decay more quickly
30
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
Observations B. A case study of MATLAB
Programming Contest Platform 1. To verify the
collective power of learner combination a.
Apparent collective efforts of participants b.
Entries are composed of multiple
sub-functions 2. Observe the dynamics a.
Evolution patterns of entries six big shifts
b. Some functions play active roles, some
quickly decay
31
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Summary
  • 1. Introduction
  • Motivation Objective
  • Background Related Topics
  • Tentative Proposals
  • 2. Experimental Study
  • BBS Data Study
  • MATLAB Programming Contest Platform Study

32
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
  • Future work
  • To formulate and specify the objective better
  • Is it too vague, too large ?
  • Is it feasible, useful?
  • Build models/frameworks/methodologies
  • Apply to workable scenarios

33
Towards A Community of Machine Learners Through
Learning Online Communities of Practice
Q A And Your Comments Suggestions
Write a Comment
User Comments (0)
About PowerShow.com