In many cases, the positioning of nodes in graph visualization is not valuable. Therefore we should always consider whether this is our case so that we can:
- fix the problem (using external data or graphical variables, or by creating a hierarchy or applying storytelling or exploratory mechanisms),
- or use an alternative method to display network data.
What are network data
Network data are a set of nodes that represent some entities (analytical units) and their connections. Such data come either in the form of tables of nodes and edges or as a matrix. This form of data has already shown its analytical potential in various areas ranging from human societies through transportation to words in texts.
What is a graph visualization
Graph visualization is the most common way to visually represent network data. The nodes are turned into circles (or point symbols) and their connections presented as lines (straight or curved) connecting them. Then, the position of each node is calculated on the basis of selected layout algorithms and forces. This should result in a clean visual representation allowing us to both identify singular nodes and spot and rationalize the distributions and trends in the dataset.
So where is the problem with graph visualization?
The algorithms mostly position the nodes with an edge connection closer to one another than nodes without a common edge. In itself, the idea makes a lot of sense: it attempts to simulate their natural allocation, and so the user then reads the message as "these two nodes are somehow interconnected". This approach can be covered by the proximity rule from gestalt theory, but, as a geographer, I see the potential to link it to the 1st geographical law defined by Tobler (1969) "Everything is related to everything else. But near things are more related than distant things". So, in an ideal situation, nodes are automatically placed into positions that allow the user to simply group them visually into clusters on the basis of their proximity and thereby deliver new knowledge about network structure, which would not be possible without such representation. Unfortunately, in most cases, this is not possible - many network datasets are so complex that their graph visualization looks like a "hairball", where the nodes are no longer able to be positioned according to the proximity rule.
When NOT to use graph visualizations
There are several occasions when graph visualization should probably not be used - i.e., nodes are in positions we cannot take advantage of (for more information, take a look at e.g., Lanum (2016)):
- There are better data forms for describing the problem.
In many cases, creating nodes and edges from everything does not make sense.
- No structural characteristics are shown.
No additional knowledge can be gained from nodes positioned into a graph layout – no interesting pattern can be identified.
- There are too many connections.
If everything is connected to everything else, the graph visualization will probably end up in the form of a hairball.
- There are many nodes without any connections.
This would result in a large subset of notes floating “randomly” in the space.
Fixing the problem within a graph visualization...
The first option that should help to solve the problem of wrong or misleading node placement is to customize the graph visualization or adjust the source dataset. These are the approaches you could consider:
- Place nodes with external data.
If possible and beneficial, geocode your nodes and bind them to geographic coordinates, then visualize them with a base map. Similarly, you can use any other attribute that may assist you in placing the nodes in a position that would make more sense – e.g., place nodes on a scatter plot. Another option is to set initial coordinates of nodes with a pre-grouping method.
- Use hierarchy.
A good way to display complex data is to create higher-level nodes that put first-order nodes into groups, and then apply graph visualization to display the relations between these groups. This approach does not necessarily mean losing the internal structure of the groups; it can be combined with a different visualization method displaying intra-group values and relations.
- Use graphical variables to “carry the knowledge".
This is, of course, a must if you want to both make the output visually pleasing and add additional information to the chart. Use size for a centrality value, color for whatever grouping idea, hide the labels of nodes that do not reach a certain threshold…
- Draw your own graph instead of relying on predefined algorithms.
You know what you want to show, but the algorithms don’t place the nodes well? You can draw your own, as you are the one who knows perfectly the nature of the dataset.
- Apply storytelling and exploratory techniques.
If it is still a mess, just try adding some labels, tooltips, interactions, animations, new views, and filters to guide the user, or let him play around and find the important knowledge himself.
… or go for alternatives
In many cases, graph visualization is neither the only nor right way to display your data. These are some interesting alternatives you may consider:
- Any other statistical tools and methods, e.g. histogram or bar chart
This is probably the simplest option. If you do not want to annoy users with a massive graph they will probably get lost in, just calculate some statistics, sort or group your nodes, and find a simple way to convey the most important message.
- Chord diagram / Radial network visualization / Circle plot.
There are many names and extensions for this type of chart. The idea is to place nodes in a circle and let their edges create a visual pattern in the middle.