4. Creating Sunburst Charts#

Sunburst charts are a hierarchical data visualization technique that displays relationships and proportions within nested categories using concentric circles. Each layer of the chart represents a level in the hierarchy, starting from the center and expanding outward. For instance, in a company structure, the inner circle could represent the organization as a whole, the next layer the departments, and subsequent layers the teams or individual roles within those departments. The size of each segment typically corresponds to a numerical value, such as revenue or population, making it easy to compare proportions within and across different categories.

Sunburst charts are particularly useful for visualizing hierarchical data with many layers and showcasing the composition of categories within a whole. They are effective in situations where relationships and proportions at multiple levels need to be understood simultaneously, such as organizational structures, file system breakdowns, or market share by product lines and subcategories. Their circular layout makes efficient use of space and can visually engage an audience, making them a great choice for storytelling and presentations.

However, sunburst charts should be avoided when the hierarchy is too deep or the data is too detailed, as this can lead to visual clutter and make the chart difficult to interpret. Additionally, they are not ideal for datasets requiring precise comparisons between categories, as the curved segments can distort perception compared to linear representations. In such cases, alternative visualizations like tree maps or bar charts may be more effective. The key is to use sunburst charts when the hierarchical structure and overall proportions are the primary focus, rather than specific quantitative analysis.

Getting ready#

For this recipe we will create two data sets. The fist one df1 contains data representing personal goals falling in different categories. While the second one df2 is an extract from the Gapminder data set

import pandas as pd
df1 = pd.DataFrame(dict(Activity=["My Goals", "Health", "Career", "Personal", "Finance", 
               "Exercise", "Diet", "Sleep", 
               "New Skills", "Networking", 
               "Family", "Friends", 
               "Investing", "Saving"
               ],
    Category=["", "My Goals", "My Goals", "My Goals", "My Goals", "Health", "Health", "Health",  "Career",  "Career", "Personal", "Personal", "Finance", "Finance"],
    Value=[0, 25, 25, 25, 25, 10, 10, 10, 10, 10, 10, 10, 10, 10]))
df1.head()
Activity Category Value
0 My Goals 0
1 Health My Goals 25
2 Career My Goals 25
3 Personal My Goals 25
4 Finance My Goals 25
import plotly.express as px
df2 = px.data.gapminder().query("year == 2007")
df2.head()
country continent year lifeExp pop gdpPercap iso_alpha iso_num
11 Afghanistan Asia 2007 43.828 31889923 974.580338 AFG 4
23 Albania Europe 2007 76.423 3600523 5937.029526 ALB 8
35 Algeria Africa 2007 72.301 33333216 6223.367465 DZA 12
47 Angola Africa 2007 42.731 12420476 4797.231267 AGO 24
59 Argentina Americas 2007 75.320 40301927 12779.379640 ARG 32

How to do it#

  1. Import the plotly.express module as px

import plotly.express as px
  1. Create a minimal sunburst chart by calling the function sunburst from Plotly express and passing the data set df1 as well as the following key arguments:

  • names

  • parents

In addition, the following extra arguments are specified

  • height and width

  • title

fig = px.sunburst(df1, names='Activity', parents='Category', 
                  height=600, width=900,
                  title='Personal Goals 2025')
fig.show()
  1. Customise the colors used in the scatter by using the input color_discrete_sequence

fig = px.sunburst(df1, names='Activity', parents='Category', 
                  color_discrete_sequence=px.colors.sequential.deep,
                  height=600, width=900,
                  title='Personal Goals 2025')
fig.show()
  1. Customise the data to appear in the hover tooltip by specifying the arguments hover_name and hover_data

fig = px.sunburst(df1, names='Activity', parents='Category', 
                  color_discrete_sequence=px.colors.sequential.deep,
                  hover_name='Category',
                  hover_data={'Activity':True, 'Category': False},
                  height=600, width=900,
                  title='Personal Goals 2025')
fig.show()

For our second example, we are going to use the second data set df2 which contains a subset of the Gapminder data. The idea is to visualise both Population size and Life Expectancy in the world with particular interest on the comparison between regions.

  1. Create an sunburst chart using the function sunburst and passing the arguments

    • path: this represents the list of columns names or columns of a rectangular dataframe defining the hierarchy of sectors, from root to leaves (groing from the center outwards). In this case, we pass the list ['continent', 'country'] since we want to be able to visualise the data by regions

    • values: this is the name of the column (alternatively, you can pass an array object) that is used to specify the size of the wedges which form the Sunburst chart. In our case we pass pop since we want to visualise the population size

fig = px.sunburst(df2, 
                  path=['continent', 'country'], 
                  values='pop',
                  height=600, width=900,
                  title='Life Expectancy accross Regions in 2007'
                  )
fig.show()
  1. Use the argument color to add an extra dimension to the data through the color of the wedges. In this case, we are going to associate the color of each wedge with the Life expectancy of the corresponding country or region. So, we set color='lifeExp'. We also use the argument color_continuous_scale to set the color palette to be used

fig = px.sunburst(df2, 
                  path=['continent', 'country'], 
                  values='pop',
                  color='lifeExp', 
                  color_continuous_scale='RdBu',
                  height=600, width=900,
                  title='Life Expectancy accross Regions in 2007'
                  )
fig.show()
  1. Set the midpoint of the color bar to the average Life Expectancy weighted by the population size of each country

import numpy as np
avg_life_exp_pop = np.average(df2['lifeExp'], weights=df2['pop'])
fig = px.sunburst(df2, path=['continent', 'country'], values='pop',
                  color='lifeExp', 
                  color_continuous_scale='RdBu',
                  color_continuous_midpoint=avg_life_exp_pop,
                  hover_name='country',
                  hover_data=['iso_alpha'],
                  height=600, width=900,
                  title='Life Expectancy accross Regions in 2007'
                  )
fig.show()