Visualizing paths as flows in a Sankey diagram

I made an interactive visualization (Swedish) of the flow of proposals (bills) and decisions in the Swedish parliament. The visualization combines features of a straightforward Tree (path visualization) with those of a Sankey diagram (flow visualization).

Here's a screenshot, click on it to go to the full version:

I'm quite satisfied with how it turned out, and the remainder of this post is a brief description of what I did and what I'd do differently the next time.

The main idea is to show both the Tree and the Sankey in the same figure, at the same time. I will explain this precisely below, but first I will describe what type of data we are dealing with.

The data: proposal paths

In the Swedish Parliament a proposal is submitted by a member, a party or the government. After submission the proposal is considered by a committee which prepares a simple yes/no question for the parliament to vote on. The vote renders a final decision to reject or accept the proposal.

The data I have are all paths: Proposal source -> Committee -> Decision. There is one path for each proposal, and there are around 50,000 proposals during a 4-year period. Many proposals take the same path, for example in 2010-2014 the Social democrats submitted 498 proposals which were considered by the Foreign affairs committee and then rejected. The government submitted 125 proposals which went through the Traffic committee and were accepted, and so on.

The data I visualize are the number of proposals on each path.

This type of data are weighted paths on a graph, and therefore represents sequences of events in general. For instance, navigation paths on a website is an important example of such data. But also things like character and word sequences in linguistics, or call stack traces in computing could be fun to look at.

In my case with the proposals, the graph is very simple: three "layers" without internal connections, and no loops. However, the main idea presented here should work for other types of graphs, too.

The (false) dichotomy: Visualize paths or flows

For clarity in the following examples I use mock data with fewer nodes and paths than the original. The numbers are made up.

If we visualize all the paths we get a Tree (with multiple roots). Like this:

The "government" node is pre-selected, click on nodes to see their connections.

The Tree shows all the data, and therefore renders many nodes. The number of nodes in the last layer is the number of sources $$ \times $$ number of committees $$ \times $$ number of decisions. In the real data this amounts to roughly 400 nodes. The Tree may be a good alternative for smaller datasets, or with additional filtering.

On the other hand, if we show the nodes only once and aggregate the paths between them into flows we get a Sankey diagram. Like this:

The "government" node is pre-selected, click on nodes to see their connections.

The Sankey is less cluttered and correctly shows the magnitude of the flow between any two nodes. But aggregating paths into flows destroys information about longer paths: you can see the magnitude of the flow between the Government node and the Traffic committee, but you can not see how many of those were Accepted or Rejected. Compare to the Tree above, where this path information is present, but illegible as the number of nodes explodes.

Resolution: Visualize both paths and flows

My attempt at this problem is to draw the full Tree, but make it look like the Sankey. Think of it as taking the Tree figure above, and carefully arranging the positions of all the paths so that it looks like the Sankey figure. Like this:

The "government" node is pre-selected, click on nodes to see their connections.

The path structure is revealed when the user selects one of the nodes. Selecting a node highlights all paths passing through that node. Therefore we clearly see the flow from the Government node to the Traffic committee, and then also how that flow splits into one part to Accepted and one part to Rejected. By arranging the Tree to look like a Sankey we lift some of the initial cognitive load for the user, while allowing them to browse a richer dataset once they grasp what they are looking at. In my original figure I use this "path Sankey" as an overview. When user selects a node, all its paths are highlighted and a detail view is triggered to show a list of the relevant proposals.

There is a drawback which I have not found any solution to: arranging the paths to look like the Sankey disrupts the visual continuity of the paths (seen when highlighted). The path, say, Government -> Traffic committee -> Rejected is visually discontinuous at the committee node. This drawback is most obvious with small and moderate datasets, like in these examples. The visual disruption is diminished when there are more nodes and smaller flows, like in the original figure where I think it works all right.

This method works for paths with three steps. If additional layers are shown, the visual "memory" only reaches two steps back. Nevertherless, the two steps are an improvement over the original Sankey's single step memory.

What should be improved

My implementation of this figure in D3.js works well, but it has a number of flaws which I think should be addressed:

1. Partition both nodes and flows

My implementation partitions the Sankey flows into the parts from the Tree representation, but the nodes are still monolithic.

To fully carry through the thought of "rearranging the Tree into a Sankey", the Sankey nodes should be made by stacking the corresponding small nodes in the Tree. This would allow for more precise highlighting during user interactions.

This may enhance the visual discontinuity discussed above, nevertheless I believe it's worth trying.

2. Better rendering

My implementation literally renders the elements of the Tree, rearranged into a Sankey. This unfortunately results in

Lots of <path> elements, making rendering slower,
Glitches (seams) between the sub-flows making up the whole (depending on SVG renderer, zoom level etc), and
The z-order of the flows makes the appearance of highlighted flows in relation to other flows inconsistent.

I propose to draw the composite Sankey flows as a single <path> and append/remove the <path> elements representing the highlighted path. This reduces the number of SVG elements, removes the seams and makes z-order consistent.

3. More general graphs

My implementation is tailored after the particular data I had in mind. Thus the layout is based on layers of node groups of nodes. This makes automatic layout simple and robust, but the code is not flexible enough for many other types of data.

Therefore, the overall positions of nodes and flows should be determined first, like for a Sankey. This layout step may be automated, by hand, or both. The point is that it is a separate concern to partition these given elements into the sub-flows and sub-nodes necessary for highlighting the underlying path data. This is an algorithmic step which should not require additional input by any designer.

Finally

Code (MIT Licence) at Github.

There's a minimal example in a bl.ocks here and a more advanced example with randomized data in another bl.ocks here.

For more code, see the source of this page and of the original interactive visualization.

If you found this post interesting, you will also like a short piece by Elijah Meeks on Sankey diagrams for path visualization at Netflix.

← Back to jonaseinarsson.se | me@jonaseinarsson.se