4. Making Bubble Charts#
Bubble plots are a variation of scatter plots where a third dimension of data is represented by the size of the markers –so the points become bubbles. In a standard scatter plot, we use the x-axis and y-axis to represent two variables, but bubble plots add another layer of information by using the size of the bubbles to represent an additional variable.
🚀 When to use them:
Bubble plots are particularly useful when you want to display and compare three different variables in a single chart. They are effective for visualizing relationships between variables and identifying patterns, trends, or clusters within your data. For example, bubble plots are commonly used in:
Social sciences: To analyze demographic data, such as income, education level, and population size.
Market analysis: To show the relationship between product price, sales volume, and market share.
Finance: To display financial metrics like profit, revenue, and market cap of different companies.
⚠️ Be aware:
However, bubble plots should be avoided when overlapping data points cause clutter and make interpretation difficult. Also if the variation in bubble size is minimal, the additional dimension may not add much value.
Getting ready#
Get the gapminder
data set from plotly.express
and filter the rows corresponding to the year 2007. We will use this as our data for this recipe
import plotly.express as px
df = px.data.gapminder()
df = df[df.year==2007]
How to do it#
Call the
scatter
function fromplotly.express
but in addition tox
andy
include the inputsize
to specify the size of the bubbles according to a variable.
In this case we pass pop
which means that the size of each dot will reflect the size of the population of that particular country. This allows us to investigate if there is any relationship between the size of the population and the main variables in the scatter GDP per capita and Life expectancy.
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
size='pop',
title='GDP per Capita vs Life Expectancy 2007')
fig.show()
Add the input
color
to specify the color of the bubbles according to a column in the data. In this case, we pass the columncontinent
. So, we have bubbles of 5 different colors
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
size='pop', color='continent',
title='GDP per Capita vs Life Expectancy 2007')
fig.show()
Customise the maximum size of the mark by passing an integer into the optional input
size_max
. The default vauel is20
.
In this case, we want to make the markers bigger so we choose size_max=50
.
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
size='pop', size_max=50,
color='continent',
title='GDP per Capita vs Life Expectancy 2007')
fig.show()
One big weakness in this chart is that we cannot see the name of the country in the hover. But we will fix it in the next step.
Add variables to appear in the hover tooltip by passing the optional input
hover_data
. In this case, we add the columncountry
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
size='pop', size_max=50,
color='continent',
hover_data=['country'],
title='GDP per Capita vs Life Expectancy 2007')
fig.show()
Customize the hover appearence further by invoking the method
update_traces
on the figure and passinghovertemplate
. We specify the order and format of the variables to be displayed.
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
size='pop', size_max=60,
color='continent',
hover_data=['pop', 'country'],
title='GDP per Capita vs Life Expectancy 2007')
fig.update_traces(
hovertemplate="<br>".join([
"Country: %{customdata[1]}",
"GDP per capita: %{x}",
"Life Expectancy: %{y}",
"Population: %{customdata[0]}",
])
)
fig.show()
Finally, customize the axis labels by by invoking the method
update_layout
on the figure and passing bothxaxis_title
andyaxis_title
.
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
color='continent',
size='pop', size_max=60,
hover_data=['pop', 'country'],
title='GDP per Capita vs Life Expectancy 2007')
fig.update_traces(
hovertemplate="<br>".join([
"Country: %{customdata[1]}",
"GDP per capita: %{x}",
"Life Expectancy: %{y}",
"Population: %{customdata[0]}",
])
)
fig.update_layout(xaxis_title="USD",
yaxis_title="Years")
fig.show()
There is more..#
Make the same plot with go.Scatter
df['color'] = df['continent'].map({'Asia':'blue', 'Europe':'red', 'America':'purple', 'Africa':'green', 'Oceania':'orange'})
import plotly.graph_objects as go
import numpy as np
desired = 60
sizeref = 2. * max(df['pop']) / (desired ** 2)
sizeref
732601.72
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['gdpPercap'], y =df['lifeExp'],
customdata = df[['country', 'pop']],
mode='markers',
marker=dict(
color=df['color'],
size=df['pop'],
sizemode='area',
sizeref=sizeref,
sizemin=4),
))
fig.update_traces(
hovertemplate="<br>".join([
"Country: %{customdata[0]}",
"GDP per capita: %{x}",
"Life Expectancy: %{y}",
"Population: %{customdata[1]}"])
)
fig.update_traces(
hovertemplate="<br>".join([
"Country: %{customdata[0]}",
"GDP per capita: %{x}",
"Life Expectancy: %{y}",
"Population: %{customdata[1]}",
])
)
fig.show()
fig = go.Figure()
for continent in df['continent'].unique():
subdf = df[df['continent'] == continent]
custom = subdf[['country', 'pop']]
fig.add_trace(go.Scatter(x=subdf['gdpPercap'], y =subdf['lifeExp'],
customdata= subdf[['country', 'pop']],
mode='markers',
name=continent,
marker=dict(color=subdf['color'],
size=subdf['pop'],
sizemode='area',
sizeref=sizeref,
sizemin=4),
))
fig.update_layout(title='GDP per Capita vs Life Expectancy 2007',
xaxis_title="USD",
yaxis_title="Years"
)
fig.update_traces(
hovertemplate="<br>".join([
"Country: %{customdata[0]}",
"GDP per capita: %{x}",
"Life Expectancy: %{y}",
"Population: %{customdata[1]}",
])
)
fig.show()