A graph, a collection of nodes connected by edges, is just data. Whether it's a social network (where nodes are people, and edges are friend relationships), or a decision tree (where nodes are branch criteria or values, and edges decisions), the nature of the graph is easily represented in a data object. It might be represented as a matrix (where rows and columns are nodes, and elements mark whether an edge between them is present) or as a data frame (where each row is an edge, with columns representing the pair of connected nodes).
The trick comes in how you represent a graph visually; there are many different options each with strengths and weaknesses when it comes to interpretation. A graph with many nodes and edges may become an unintelligible hairball without careful arrangement, and including directionality or other attributes of edges or nodes can reveal insights about the data that wouldn't be apparent otherwise. There are many R packages for creating and displaying graphs (igraph is a popular one, and this CRAN task view lists many others) but that's a problem in its own right: an important part of the data exploration process is trying and comparing different visualization options, and the myriad packages and interfaces makes that process difficult for graph data.
Now, there's the new ggraph package, recently published to CRAN by author Thomas Lin Pederson, which promises to make exploring graph data easier. Unlike other graphing packages, ggraph uses the grammar of graphics paradigm of the ggplot2 package, unifying the data structures and attributes associated with graphics. It also includes a wide range of visual representations of graphs — layouts — and makes it easy to switch between them. The basic "mesh" visualization of nodes and edges provides 11 different options for arranging the nodes:
Other types of visualizations are supported, too: hive plots, dendrograms, treemaps, and circle plots, to name just a few. Note that only static graphs are available, though: unlike igraph and some other packages, you can't rearrange the location of the nodes or otherwise manipulate the graphics with a mouse.
For the R programmer, most of the work is done by the ggraph function. It's analagous to the ggplot function, except that you don't provide data for the locations of the nodes; their position is selected by an algorithm. (Similarly, layout choices are automatically made for visualization types other than the mesh.) There are also various themes suited to graphs you can use to style your chart: goodbye gridlines and axes; hello labels, annotations and edge arrows.
The ggraph package is available on CRAN now, and works with R version 2.10 and later. For more on the ggraph package, see the announcement blog post linked below.
Data Imaginist: Announcing ggraph: A grammar of graphics for relational data
Comments
You can follow this conversation by subscribing to the comment feed for this post.