4. Making Bubble Charts#

Bubble plots are a variation of scatter plots where a third dimension of data is represented by the size of the markers –so the points become bubbles. In a standard scatter plot, we use the x-axis and y-axis to represent two variables, but bubble plots add another layer of information by using the size of the bubbles to represent an additional variable.

🚀 When to use them:

Bubble plots are particularly useful when you want to display and compare three different variables in a single chart. They are effective for visualizing relationships between variables and identifying patterns, trends, or clusters within your data. For example, bubble plots are commonly used in:

  • Social sciences: To analyze demographic data, such as income, education level, and population size.

  • Market analysis: To show the relationship between product price, sales volume, and market share.

  • Finance: To display financial metrics like profit, revenue, and market cap of different companies.

⚠️ Be aware:

However, bubble plots should be avoided when overlapping data points cause clutter and make interpretation difficult. Also if the variation in bubble size is minimal, the additional dimension may not add much value.

Getting ready#

Get the gapminder data set from plotly.express and filter the rows corresponding to the year 2007. We will use this as our data for this recipe

import plotly.express as px

df = px.data.gapminder()
df = df[df.year==2007]

How to do it#

  1. Call the scatter function from plotly.express but in addition to x and y include the input size to specify the size of the bubbles according to a variable.

In this case we pass pop which means that the size of each dot will reflect the size of the population of that particular country. This allows us to investigate if there is any relationship between the size of the population and the main variables in the scatter GDP per capita and Life expectancy.

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 size='pop',
                 title='GDP per Capita vs Life Expectancy 2007')
fig.show()
  1. Add the input color to specify the color of the bubbles according to a column in the data. In this case, we pass the column continent. So, we have bubbles of 5 different colors

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 size='pop', color='continent',
                 title='GDP per Capita vs Life Expectancy 2007')
fig.show()
  1. Customise the maximum size of the mark by passing an integer into the optional input size_max. The default vauel is 20.

In this case, we want to make the markers bigger so we choose size_max=50.

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 size='pop', size_max=50, 
                 color='continent', 
                 title='GDP per Capita vs Life Expectancy 2007')
fig.show()

One big weakness in this chart is that we cannot see the name of the country in the hover. But we will fix it in the next step.

  1. Add variables to appear in the hover tooltip by passing the optional input hover_data. In this case, we add the column country

fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
                 size='pop', size_max=50, 
                 color='continent', 
                 hover_data=['country'], 
                 title='GDP per Capita vs Life Expectancy 2007')

fig.show()
  1. Customize the hover appearence further by invoking the method update_traces on the figure and passing hovertemplate. We specify the order and format of the variables to be displayed.

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 size='pop', size_max=60,
                 color='continent', 
                 hover_data=['pop', 'country'], 
                 title='GDP per Capita vs Life Expectancy 2007')
fig.update_traces(
    hovertemplate="<br>".join([
        "Country: %{customdata[1]}",
        "GDP per capita: %{x}",
        "Life Expectancy: %{y}",
        "Population: %{customdata[0]}",
    ])
)
fig.show()
  1. Finally, customize the axis labels by by invoking the method update_layout on the figure and passing both xaxis_title and yaxis_title.

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 color='continent', 
                 size='pop', size_max=60,
                 hover_data=['pop', 'country'], 
                 title='GDP per Capita vs Life Expectancy 2007')

fig.update_traces(
    hovertemplate="<br>".join([
        "Country: %{customdata[1]}",
        "GDP per capita: %{x}",
        "Life Expectancy: %{y}",
        "Population: %{customdata[0]}",
    ])
)

fig.update_layout(xaxis_title="USD", 
                  yaxis_title="Years")
fig.show()

There is more..#

Make the same plot with go.Scatter

 df['color'] = df['continent'].map({'Asia':'blue', 'Europe':'red', 'America':'purple', 'Africa':'green', 'Oceania':'orange'}) 
import plotly.graph_objects as go
import numpy as np
desired = 60
sizeref = 2. * max(df['pop']) / (desired ** 2)
sizeref
732601.72
fig = go.Figure()

fig.add_trace(go.Scatter(x=df['gdpPercap'], y =df['lifeExp'], 
                         customdata = df[['country', 'pop']],
                                 mode='markers',
                                    marker=dict( 
                                    color=df['color'],
                                    size=df['pop'],  
                                    sizemode='area',
                                    sizeref=sizeref,
                                    sizemin=4),
                                    ))


fig.update_traces(
    hovertemplate="<br>".join([
        "Country: %{customdata[0]}",
        "GDP per capita: %{x}",
        "Life Expectancy: %{y}",
        "Population: %{customdata[1]}"])
    )

fig.update_traces(
    hovertemplate="<br>".join([
        "Country: %{customdata[0]}",
        "GDP per capita: %{x}",
        "Life Expectancy: %{y}",
        "Population: %{customdata[1]}",
    ])
)

fig.show()
fig = go.Figure()

for continent in df['continent'].unique():
    
    subdf = df[df['continent'] == continent]
    custom = subdf[['country', 'pop']]

    fig.add_trace(go.Scatter(x=subdf['gdpPercap'], y =subdf['lifeExp'], 
                                    customdata=  subdf[['country', 'pop']],
                                    mode='markers', 
                                    name=continent, 
                                    marker=dict(color=subdf['color'],
                                                size=subdf['pop'],  
                                                sizemode='area',
                                                sizeref=sizeref,
                                                sizemin=4),
                                    ))

fig.update_layout(title='GDP per Capita vs Life Expectancy 2007',
                  xaxis_title="USD", 
                  yaxis_title="Years"
                  )

fig.update_traces(
    hovertemplate="<br>".join([
        "Country: %{customdata[0]}",
        "GDP per capita: %{x}",
        "Life Expectancy: %{y}",
        "Population: %{customdata[1]}",
    ])
)



fig.show()