Title: Large-Scale Network Analysis with the Boost Graph Libraries
1Large-Scale Network Analysis with the Boost Graph
Libraries
- Douglas Gregor
- Open Systems Lab
- Indiana University
- dgregor_at_osl.iu.edu
2What are the BGLs?
- A collection of libraries for computation on
graphs/networks. - Graph data structures
- Graph algorithms
- Graph input/output
- Common design
- Flexibility/customizability throughout
- Obsessed with performance
- Common interfaces throughout the collection
- All open source, freely available online
Intro
3The BGL Family
- The Original (sequential) BGL
- BGL-Python
- The Parallel BGL
- Parallel BGL-Python
Intro
4The Original BGL
- The largest and most mature BGL
- 7 years of research and development
- Many users, contributors outside of the OSL
- Steadily evolving
- Written in C
- Generic
- Highly customizable
- Efficient (both storage and execution)
Intro
BGL
5BGL Graph Data Structures
- Graphs
- adjacency_list highly configurable with
user-specified containers for vertices and edges - adjacency_matrix
- compressed_sparse_row
- Adaptors
- subgraphs, filtered graphs, reverse graphs
- LEDA and Stanford GraphBase
- Or, use your own
Intro
BGL
6Original BGL Algorithms
- Searches (breadth-first, depth-first, A)
- Single-source shortest paths (Dijkstra,
Bellman-Ford, DAG) - All-pairs shortest paths (Johnson,
Floyd-Warshall) - Minimum spanning tree (Kruskal, Prim)
- Components (connected, strongly connected,
biconnected) - Maximum cardinality matching
- Max-flow (Edmonds-Karp, push-relabel)
- Sparse matrix ordering (Cuthill-McKee, King,
Sloan, minimum degree) - Layout (Kamada-Kawai, Fruchterman-Reingold,
Gursoy-Atun) - Betweenness centrality
- PageRank
- Isomorphism
- Vertex coloring
- Transitive closure
- Dominator tree
Intro
BGL
7Task Biconnected Components
Input Graph
Output Graph
Articulation points B G A
Intro
BGL
8Define a Graph Type
- Determine vertex/edge propertiesstruct Vertex
string name struct Edge int bicomponent
- Determine the graph typetypedef
adjacency_listlt /EdgeListS/ vecS,
/VertexListS/ vecS,
/DirectedS/ undirectedS,
/VertexProperty/ Vertex,
/EdgeProperty/ Edgegt Graph
Intro
BGL
9Read in a GraphViz DOT File
- Build an empty graphGraph g
- Map vertex propertiesdynamic_properties
dyndyn.property(node_id,
get(Vertexname, g)) - Read in the GraphViz graphifstream
in(biconnected_components.dot)read_graphviz(in
, g, dyn)
Intro
BGL
10Run Biconnected Components
- Keep track of the articulation pointsvectorltGrap
hvertex_descriptorgt art_points - Compute biconnected componentsbiconnected_compon
ents (g, get(Edgebicomponent, g),
back_inserter(art_points))
Intro
BGL
11Output results
- Attach bicomponent number to the label property
of edgesdyn.property(label,
get(Edgebicomponent, g)) - Write results to another GraphViz fileofstream
out(bc_out.dot)write_graphviz(out, g, dyn) - Show articulation pointscout ltlt Articulation
points for (int i 0i lt art_points.size()
i) cout ltlt gart_pointsi.name ltlt
Intro
BGL
12Task Biconnected Components
Input Graph
Output Graph
Articulation points B G A
Intro
BGL
13Original BGL Summary
- The original BGL is large, stable, efficient
- Lots of algorithms, graph types
- Peer-reviewed code with many users, nightly
regression testing, etc. - Performance comparable to FORTRAN.
- Who should use the BGL?
- Programmers comfortable with C
- Users with graph sizes from tens of vertices to
millions of vertices
Intro
BGL
14BGL-Python
- Python is ideal for rapid prototyping
- Its a scripting language (no compiler)
- Dynamically typed means less typing for you
- Easy to use you already know Python
- BGL-Python provides access to the BGL from within
Python - Similar interfaces to C BGL
- Easier to learn than C
- Great for scripting, GUI applications
- help(bgl.dijkstra_shortest_paths)
Intro
BGL
Python
15Example Biconnected Components
import boost.graph as bgl Pull in the BGL
bindings g bgl.Graph.read_graphviz("biconnected_
components.dot") Compute biconnected
components and articulation points bicomponent
g.edge_property_map(int) art_points
bgl.biconnected_components(g, bicomponent)
Save results with bicomponent numbers as edge
labels g.edge_propertieslabel
bicomponentg.write_graphviz("biconnected_componen
ts_out.dot") print "Articulation points
", node_id g.vertex_propertiesnode_id for v
in art_points print node_idv, , print ""
Intro
BGL
Python
16Wrapping the BGL in Python
- BGL-Python is not a
- port
- reimplementation
- BGL-Python wraps the C BGL
- Python calls translate to C calls
- C can call back into Python
- Most of the speed of C
- Most of the flexibility of Python
17Performance Shortest Paths
Intro
BGL
Python
18BGL-Python Summary
- BGL-Python is all about tradeoffs
- More gradual learning curve
- Faster time-to-solution
- Lower performance
- Our typical approach
- Prototype in Python to get your ideas down
- Port to C when performance matters
Intro
BGL
Python
19(No Transcript)
20The Parallel BGL
- A version of the C BGL for computational
clusters - Distributed memory for huge graphs
- Parallel processing for improved performance
- An active research project
- Closely related to the original BGL
- Parallelizing BGL programs should be easy
Intro
BGL
Parallel
Python
21Parallel BGL Distributed Graphs
distributed across 3 processors.
Intro
BGL
Parallel
Python
22Parallel Graph Algorithms
- Breadth-first search
- Eager Dijkstras single-source shortest paths
- Crauser et al. single-source shortest paths
- Depth-first search
- Minimum spanning tree (Boruvka, Dehne Götz)
- Connected components
- Strongly connected components
- Biconnected components
- PageRank
- Graph coloring
- Fruchterman-Reingold layout
- Max-flow (Dinics)
Intro
BGL
Parallel
Python
23Performance Sparse graphs
24Scalability (547k vertices/node)
Up to 70M Vertices 1B Edges Small-World Graph
25Performance vs. CGMgraph
96k vertices 10M edges Erdos-Renyi
17x
30x
Intro
BGL
Parallel
Python
26Parallel BGL Summary
- The Parallel BGL is built for huge graphs
- Millions to hundreds of millions of nodes
- Distributed-memory parallel processing on
clusters - Future work will permit larger graphs
- Parallel programming has a learning curve
- Parallel graph algorithms much harder to write
- Distributed graph manipulation can be tricky
- Parallel BGL is an active research library
Intro
BGL
Parallel
Python
27Distributed Graph Layout
Intro
BGL
Parallel
Python
28Parallel BGL in Python
- Preliminary support for the Parallel BGL in
Python - Just import boost.graph.distributed
- Similar interface to sequential BGL-Python
- Several options for usage with MPI
- Straight MPI mpirun -np 2 python script.py
- pyMPI allows interactive use of the interpreter
- Initially used to prototype our distributed
Fruchterman-Reingold implementation.
Intro
BGL
Parallel
Python
29Porting for Performance
Intro
BGL
Parallel
Python
Porting
30Which BGL is Right for You?
- Is any BGL right for you?
- Depends on how large your networks are
- Up to 1/2 million vertices, any BGL will do
- C BGL can push to a couple million vertices
- For tens of millions or larger, Parallel BGL only
- Other considerations
- You can prototype in Python, port to C
- Algorithm authors might prefer the original BGL
- Parallelism is very hard to manage
Intro
BGL
Parallel
Python
Porting
31Conclusion
- The Boost Graph Library family is a collection of
full-featured graph libraries - All are flexible, customizable, efficient
- Easy to port from Python to C
- Can port from sequential to parallel
- Always growing, improving
- Is one of the BGLs right for you?
- A typical build or buy decision
Intro
BGL
Parallel
Python
Porting
Conclusion
32For More Information
- (Original) Boost Graph Libraryhttp//www.boost.or
g/libs/graph/doc - Parallel Boost Graph Libraryhttp//www.osl.iu.edu
/research/pbgl - Python Bindings for (Parallel) BGLhttp//www.osl.
iu.edu/dgregor/bgl-python - Contact us!
- Douglas Gregor ltdgregor_at_osl.iu.edugt
- Andrew Lumsdaine ltlums_at_osl.iu.edugt
Intro
BGL
Parallel
Python
Porting
Conclusion
33Other BGL Variants
- QuickGraph (C)http//www.codeproject.com/cs/misc
ctrl/quickgraph.asp - Ruby Graph Libraryhttp//rubyforge.org/projects/r
gl/ - Rooster Graph (Scheme)http//savannah.nongnu.org/
projects/rgraph/ - RBGL (an R interface to the C
BGL)http//www.bioconductor.org/packages/bioc/1.8
/html/RBGL.html - Disclaimer These are all separate projects. We
do not maintain them.
Intro
BGL
Parallel
Python
Porting
34Comparative Performance
Intro
BGL