Title: Analysing Social Networks Via the Internet
1Analysing Social Networks Via the Internet
- Bernie Hogan
- PhD Candidate, Department of Sociology
- Research Coordinator, NetLab
2As we may think
- Wholly new forms of encyclopedias will appear,
ready made with a mesh of associative trails
running through them - Vannevar Bush, 1945
360 years later
- We have no shortage of associative trails. But it
is not confined to information - When computer networks link people as well as
machines, they become social networks (Wellman,
et al. 1996)
4Why do networks matter?
- Googles succeeded through a social network
algorithm. - MySpace and Facebook are the largest explicit
social networks ever created. - We can show how the rich get richer Preferential
attachment (Barabasi and Albert 1998), - And how everyone is only six degrees, apart
(Milgram 1967 Watts 2001).
5The Oracle of Kevin Bacon The Original Online
Network
The Importance of Being Earnest
Where the Truth Lies
84 Charing Cross Road
A Few Good Men
Mission Impossible II
6What are networks?
- Relationships between actors
- Friendships
- Partnerships
- Hyperlinks
- Information about actors
- People
- Businesses
- Webpages
Plus
7Nodes
- Generally constrained to well defined types.
- People to people (not to orgs and teams).
- More than one type are included in affiliation
networks - Linking people as one set to events as another
set.
8Links can be
- Directed links arcs (from me to you)
- Undirected links edges (me and you)
- Valued (I sent 3 messages to you)
- Signed (I like him I dislike her)
- Multiplex (l link to her blog, know her email and
on her MySpace page)
9Some Network Types
Users of a web forum
Subset of political blogs
Friend pages on MySpace
10Where to find networks online?
Social networking
Email
Social news
Web links
Blogs
Message boards
Instant messengers
Games
11Networks as data
To
A B C D
A 1 0 0
B 1 0 1
C 0 1 1
D 0 0 0
A
B
From
D
C
12Networks as data II
13Capturing this data online
- Scraping pages
- Using scripting languages (python, perl)
- Using scraping software
- APIs (Application Program Interface)
- Again using scripting languages
- Out-of-the-box software
- Online applications
- More on this tomorrow!!
14Analysing Data
- Software Applications
- UCInet powerful, social-science oriented, quirky
interface - Pajek powerful, strange interface, comprehensive
- Others (Egotistics, NetMiner, Visualyzer,
NetWorkBench) - Software Environments
- JUNG (Java Universal Network Graphing Package)
- R (SNA package)
- iGraph (Python)
15Common metrics I Centrality
- Who is the most connected?
- Simple question, complex answer
Degree Number of links
3
Betweenness Shortest paths
PageRank Links to high degree
16Common metrics II Sub-groups
- Interested in group structure
- Again, many applicable measures
- Components
- Number of disconnected sets
- Strong must be an arc in to all nodes
- Community detection
- See Mark Newmans work (such as the Girvan-Newman
algorithm) - Special Ks K-shell, K-core, K-plex
17World Wide Web K-shells
- http//xavier.informatics.indiana.edu/lanet-vi
18Community Detection Political Blogs
- Adamic Glance. 2004. The Political Blogosphere
and the 2004 U.S. Election Divided They Blog.
19Visualizing Data
- Applications
- GUESS great for tweaking based on attribute
data. Technical, but powerful. - NetDraw straightforward, integrates with UCInet
- Pajek fast, draws large networks, pretty
- More coming out every week (See the work of
Martin Wattenberg, Danyel Fisher and Fernanda
Viegas) - Environments / Packages
- JUNG, Prefuse, Piccolo, R (gplot)
20Visualization Best Practices
- General
- Do NOT show a graph for graphs sake.
- Huge networks often give cluttered pictures
- De-clutter by trimming to symmetric ties.
- Drawing Nodes
- Size can often represent log(continuous
variable). - Tint - can represent categorical or continuous
variable. - Do not show ego in an egonet.
- Only use labels on small graphs (n lt 50).
- Layout
- Spring-embedder layouts work nicely.
- Post-layout touch ups are possible using bin
packing (in GUESS).
Most Important Be Graph Literate. Otherwise
youll be impressed with the first thing you
draw, regardless of its quality
21Visualization Demo Email Subgroups in JUNG
22Example - Digg.com
Popular Stories
Stories from Friends
Todays Top Stories
23Digg Using networksto Predict the News
- Data gathered in early March
- All Digg Users with 7 or more top stories (subset
of top 1000 Diggers) as of Feb 27 - Mapped symmetric ties
- Node size is log( stories-6), brightness is
degree. - Calculated number of ties (for links to top
diggers links to other diggers) - In to node Fans
- Symmetric Friends
- Out from nodes Watched
24.
25Regression Output - Predicting Popular Stories
Effect of fans in high places
Very strong models
26Online networks in Context
Media Multiplexity There is a positive
relationship between the number of ways in which
people connect and tie strength
(Haythornthwaite 1999)
27Networks in a pinch
- The number of ties is often the most significant.
- Just ask.
- Specify boundary conditions (e.g. people you have
emailed in the past month) - Categories are help them remember and give you
extra data points. (e.g. friends / workmates /
relatives) - With a roster, you can get people to select from
a list.
28Summary
- Network analysis Because sociology wasnt nerdy
enough already. - Involves a disparate suite of programs for
capture, analysis and visualization. - Compelling visual imagery - maps of
relationships. - Strong explanatory power in online spaces.
- A host of meaningful metrics to choose from
- Sometimes, the number of ties is enough.
29Many Thanks
- Bernie Hogan
- bernie.hogan_at_utoronto.ca
- PhD Candidate, Department of Sociology
- Research Coordinator, NetLab
- Graduate Fellow, Knowledge Media Design Institute
- University of Toronto
- P.S. Ask me about my scripts and tools