Visualizing Connections Using Chord Diagrams in Python

It is often helpful to visualize the connections between categorical data points. This could help identify a significant amount of overlap between two types or types that are typically associated with each other.

There are a few different ways we can visualize this, but the chord diagram is one I have started using for data sets with limited options for the categorical data point.

What is a Chord Diagram?

A chord diagram shows all the possible options for a categorical value and the number of connections between each option. The chord diagram is a great way to analyze and view the connections.

For example, if you have a dataset of posts with different tags or movies that are in multiple categories, using a chord diagram would be a helpful way to identify data points with either a high number of connections or very few connections compared to the average.

Creating a Chord Diagram in Python

We can use the Python library, Holoviews, to help us create our diagram. Holoviews is a library that extends an underlying visualization library, such as MatplotLib. This library offers a variety of graphs, and you can switch which library it is extending (such as MatplotLib or Bokeh) so you can use the library that best works for your needs.

Holoviews is designed to work great with Pandas, which we’ll use below. However, there are also methods for using a few different data types.

First, we need to install Holoviews. I’d suggest installing the recommended setup using:

pip install holoviews[recommended]

Once installed, we can create our Python script or Notebook and get started.

To start, we import the modules we need and let Holoviews know which library we are extending. I normally use matplotlib as the backend for this.

import pandas as pd
import holoviews as hv
from holoviews import opts, dim

hv.extension('matplotlib')
hv.output(fig='svg', size=500)

From there, let’s create a very basic example data set first.

# Create a Pandas DataFrame from an example list of dicts.
connections_df = pd.DataFrame.from_records([
    {'source': 0, 'target': 1, 'value':5},
    {'source': 0, 'target': 2, 'value':15},
    {'source': 0, 'target': 3, 'value':8},
    {'source': 0, 'target': 4, 'value':2},
    {'source': 1, 'target': 4, 'value':45},
    {'source': 1, 'target': 3, 'value':12},
    {'source': 1, 'target': 2, 'value':1},
    {'source': 2, 'target': 3, 'value':19},
    {'source': 2, 'target': 4, 'value':13},
    {'source': 3, 'target': 4, 'value':27},
])

The Chord method accepts a dataframe with source, target, and value columns where source and target are numerical representations of the “to” and “from” categorical options and the value is how many connections it has.

We can pass this dataframe to the Chord method to get our first diagram:

hv.Chord(connections_df)

A circular graph with lines connecting some points through the middle.

We can add labels to the diagram by passing a “nodes” data set. The nodes will have two columns, index for the numerical representation of a value and name for the label.

You can name these columns anything as long as you update the second parameter in the .Dataset() function and the labels parameter in the opts function to match your column names.

nodes_ds = hv.Dataset(pd.DataFrame.from_records([
    {'index': 0, 'name': "Stuff"},
    {'index': 1, 'name': "Things"},
    {'index': 2, 'name': "Whatnots"},
    {'index': 3, 'name': "Odds & Ends"},
    {'index': 4, 'name': "Cups"},
]), 'index')
hv.Chord((connections_df, nodes_ds)).opts(opts.Chord(labels='name'))

The same circular graph as above but with each of the main points being labelled with names such as stuff and things.

The opts method allows us to pass a variety of options and settings to create our diagram, such as the labels column.

We’re able to start seeing the connections, but it’s challenging to evaluate with all the lines being the same color. We can use the opts method to pass some settings for adjusting edge and node colors.

hv.Chord((connections_df, nodes_ds)).opts(
    opts.Chord(
        cmap='Category20',
        edge_color=dim('source').astype(str),
        labels='name',
        node_color=dim('index').astype(str)
    )
)

The same graph as above but instead of the lines being black, each data point has its own color and lines match its color.

The diagram is looking much better now. We can start to see which nodes have the most connections between them.

Lastly, if we are just outputting in a Jupyter Notebook, this works, but we probably need to save this image to be used somewhere. Holoviews has a .save() method for this:

# Create our chord diagram in the same way but saving it to a variable.
chord_example_3 = hv.Chord((connections_df, nodes_ds)).opts(
    opts.Chord(
        cmap='Category20', 
        edge_color=dim('source').astype(str), 
        labels='name', 
        node_color=dim('index').astype(str)))

# Use the .save() method to save the diagram to a file.
hv.save(chord_example_3, 'chord-example-3.svg')

Creating a Chord Diagram with Real Data

Now that we have created a basic diagram, let’s look at how this would work for an actual data set.

Your categorical data could be in a variety of formats. For this example, we are looking at a data set that has 2 “types” per entity, and we’ll visualize connections between these different types.

I found a dataset on Kaggle of all the Pokémon and their types. Pokémon is a video game with hundreds of animal-like creatures with different “types.” Pokémon can have 1 or 2 types, and there are 18 potential types.

To make this simple, I’ll only look at Pokémon that have 2 types and use pandas value_counts() method to quickly extract out the main connection counts.

import pandas as pd
import holoviews as hv
from holoviews import opts, dim

hv.extension('matplotlib')
hv.output(fig='svg', size=500)

# Load our DataFrame.
pokemon_df = pd.read_csv('pokemon.csv')

# Only use base forms to make this analysis more straightforward for this example.
pokemon_regular_forms_only_df = pokemon_df[pokemon_df['Alternate Form Name'].isnull()]

# To make this analysis simple, only look at Pokémon that have two types.
two_types_df = pokemon_regular_forms_only_df[~pokemon_regular_forms_only_df['Secondary Type'].isnull()]

# Create a dict of the type combinations and a frequency count.
type_connections = two_types_df.apply(lambda x: f'{x["Primary Type"][1:-1]},{x["Secondary Type"][1:-1]}', axis=1).value_counts().to_dict()

"""
type_connections is in the format of:
{
    'Normal,Flying': 26,
    'Ghost,Dark': 12
}
"""

Now, we have a dict of type combinations and counts. If this were a larger and more complex dataset, we’d have to approach this differently. However, for this, I’ll loop over the type combinations and convert them into a dictionary of source types with their accompanying target types.

"""
Cycle over our type combinations, split each, and add it to our connections 
dict to end up with a format like:

connections = {
    'normal': {
        'targets': {
            'flying': 26,
            'water': 12
        }
    }
}
"""

from collections import defaultdict

connections = defaultdict(lambda: {'targets': defaultdict(int)})
for type_combo, value in type_connections.items():
    pk_types = type_combo.split(',')
    for pk_type in pk_types:
        for target in pk_types:
            if target != pk_type:
                connections[pk_type]['targets'][target] += value

Now, we need to convert this to our chords and nodes format. Plus, there might be some source->target inverse (such as normal/flying vs flying/normal) that we want to convert all to the same for our individual chord record.

# Create a unique nodes list first
nodes = list(set(list(connections.keys()) + [target for d in connections.values() for target in d['targets'].keys()]))
nodes_df = pd.DataFrame({'node': nodes}, index=range(len(nodes)))

# Create the chords dataframe
chord_data = []
node_to_id = {node: idx for idx, node in enumerate(nodes)}
seen_pairs = set()
for source, target_data in connections.items():
    source_id = node_to_id[source]
    for target, count in target_data['targets'].items():
        target_id = node_to_id[target]
        """
        The original connections could have duplicate counts,
        one where the 2nd type is the source and one where the 2nd type is the target.
        So, create a frozen set of the pair to check for duplicates as these are order-independent.
        """
        pair = frozenset([source_id, target_id])
        if pair not in seen_pairs:
            seen_pairs.add(pair)
            chord_data.append([source_id, target_id, count])

chords_df = pd.DataFrame(chord_data, columns=['source', 'target', 'value'])

Now, we can pass our nodes dataframe to the Dataset method and then create our diagram.

# We use .reset_index() here to create the `index` column used in the HoloViews dataset.
nodes_ds = hv.Dataset(nodes_df.reset_index(), 'index')

hv.Chord((chords_df, nodes_ds)).opts(
    opts.Chord(
        cmap='Category20',
        edge_color=dim('source').astype(str),
        labels='node', # Make sure this matches the column name from nodes_df
        node_color=dim('index').astype(str)
    )
)

A circular graph with Pokemon types around the outside, such as fire and water, with lines connecting the types to each other.

Now, you may have noticed in the examples with labels that the labels along the left side were upside down. HoloViews rotates the labels as it rotates around the diagram by default, which causes many to be upside-down. There are several GitHub issues and Stack Overflow questions about this, but it has not been changed as of now. Luckily, there are hooks that we can add a function to that can correct this.

First, let’s create our function that will determine the rotation of the text.

def rotate_label(plot, element):
    labels = plot.handles["labels"]
    for annotation in labels:
        angle = annotation.get_rotation()
        if 90 < angle < 270:
            annotation.set_rotation(180 + angle)
            annotation.set_horizontalalignment("right")

Now, we can create our diagram as before but, this time, pass our new function to the hooks parameter.

hv.Chord((chords_df, nodes_ds)).opts(
    opts.Chord(
        cmap='Category20',
        edge_color=dim('source').astype(str),
        labels='node',
        node_color=dim('index').astype(str),
        hooks=[rotate_label]
    )
)

A circular graph with Pokemon types around the outside, such as fire and water, with lines connecting the types to each other. This time, the labels along the left are the correct side up.

We now have our finished Chord diagram! We can quickly spot that there are a lot of Pokémon with normal and flying. There are also quite a bit of connections between bug and poison, between grass and poison, and between flying and bug.

Next Steps

Once you work with the Chord diagrams, there are a few more things you can do, such as:

Use Bokeh as the main library instead to have an interactive Chord diagram
Using the select method on the Chord object, you can filter what data in the chords dataframe gets visualized

If you create any fun Chord diagrams, let me know!

Currently accepting new data projects

What is a Chord Diagram?

Creating a Chord Diagram in Python

Creating a Chord Diagram with Real Data

Next Steps