1. Making animated scatter charts#
Animated scatter plots are a dynamic and engaging data visualization technique that allows us to track changes in multi-dimensional data over time (or across a continuous variable). Unlike static scatter/bubble plots, which capture data at a single state, animated scatter plots show movement and transitions in the data, revealing trends, patterns, and outliers that might otherwise go unnoticed. Each frame of the animation represents a snapshot of the data at a particular time or state, and the sequential movement provides a narrative of how the data evolves.
This technique is particularly useful in scenarios where time-series data or changes in one or more variables are of interest. For instance, animated scatter plots can effectively demonstrate economic shifts, such as changes in GDP and life expectancy across countries over decades. By animating the data, these plots make it easier to identify correlations, causations, and cycles while engaging the viewer in a way that static graphs may not. Additionally, they are a powerful tool for presentations and storytelling, as they can visually guide the audience through the insights and context behind the data.
Getting ready#
For this recipe we will load the Gapminder
data set from the plotly.express
module.
import plotly.express as px
df = px.data.gapminder()
df = df[df.country!='Kuwait']
Inspect the data by calling the method head
on the data frame
df.head()
country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
---|---|---|---|---|---|---|---|---|
0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | AFG | 4 |
1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | AFG | 4 |
2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | AFG | 4 |
3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | AFG | 4 |
4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | AFG | 4 |
How to do it#
Make a simple animated scatter using the function
px.scatter
in the same way as an static scatter plot (passing the data frame as well as the names of the two columns that will be ploted asx
andy
respectively) but adding
animation_frame
: this column or array like is used to assign marks to animation frames. In our case, we pass the stringyear
since we want to make the animation to run over timeanimation_group
: this column or array like is used to provide object-constancy across animation frames. That is, rows with matchinganimation_group
s will be treated as if they describe the same object in each frame. In our case, we pass the stringcountry
since each dot represents a country which we want to animate over time
Notice that we are also passing the following arguments to set the aesthetics of the plot
color_discrete_sequence
height
andwidth
template
Then, use the method show
to display the Figure
object
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
animation_frame="year",
animation_group="country",
color="continent",
color_discrete_sequence=px.colors.qualitative.Bold,
height=500, width=800,
template='plotly_white',
title='Gap Minder Data: GDP per Capita vs Life Expectancy'
)
fig.show()
By inspecting the resulting animation we quickly can make the following observations
the name of the country is not visible when hoovering over each point
the range of boht axes is fixed and some points fall outside of it as the animation progresses over time
Let’s improve our animation by fixing these two issues.
Add the input argument
hover_name
and pass the stringcountry
. This will result on the name of the country being appearing in bold in the hover tooltip.
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
animation_frame="year",
animation_group="country",
color="continent",
color_discrete_sequence=px.colors.qualitative.Bold,
template='plotly_white',
height=600, width=800,
title='Gap Minder Data: GDP per Capita vs Life Expectancy',
hover_name="country",
)
fig.show()
In order to find appropriate ranges for the axes, use the method
describe
on the data framedf
and look at the minimum and maximum values for each relevant column. Then, use the argumentsrange_x
andrange_y
to set the axes’s range for the animation.
df.describe()
year | lifeExp | pop | gdpPercap | iso_num | |
---|---|---|---|---|---|
count | 1692.000000 | 1692.000000 | 1.692000e+03 | 1692.000000 | 1692.000000 |
mean | 1979.500000 | 59.407433 | 2.980259e+07 | 6803.145639 | 425.964539 |
std | 17.265365 | 12.923315 | 1.065068e+08 | 8139.536000 | 249.183165 |
min | 1952.000000 | 23.599000 | 6.001100e+04 | 241.165876 | 4.000000 |
25% | 1965.750000 | 48.125000 | 2.829882e+06 | 1192.603485 | 208.000000 |
50% | 1979.500000 | 60.492000 | 7.150606e+06 | 3484.113173 | 410.000000 |
75% | 1993.250000 | 70.811250 | 1.977102e+07 | 9145.776073 | 638.000000 |
max | 2007.000000 | 82.603000 | 1.318683e+09 | 49357.190170 | 894.000000 |
This shows that the values on the gdpPercap
column go from 241 to 49,357. Thus passing range_x = [0, 55000]
would cover all the values.
Similarly, the values on the lifeExp
column go from 23.5 to 82.6. Tgus, passing range_y = [20, 90]
would cover all the values.
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
animation_frame="year",
animation_group="country",
hover_name="country",
color="continent",
color_discrete_sequence=px.colors.qualitative.Bold,
template='plotly_white',
height=600, width=800,
title='Gap Minder Data: GDP per Capita vs Life Expectancy',
range_x = [0, 55000],
range_y = [20, 90]
)
fig.show()
Transform your animated scatter into an animated bubble char by passing the additonal arguments
size
size_max
This allows you to display an extra variable in your visualization
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
animation_frame="year",
animation_group="country",
hover_name="country",
color="continent",
color_discrete_sequence=px.colors.qualitative.Bold,
size="pop",
size_max=50,
template='plotly_white',
height=600, width=800,
title='Gap Minder Data: GDP per Capita vs Life Expectancy',
range_x = [-5000, 55000],
range_y = [25, 90]
)
fig.show()
There is more#
Using a logarithmic scale for an axis in data visualization is a powerful technique for displaying data that spans several orders of magnitude. Unlike a linear scale, where each unit increase corresponds to an equal increment, a logarithmic scale represents data in terms of powers of a base (commonly 10).
This approach compresses large values and expands smaller ones, making it easier to visualize and compare data with extreme ranges. Logarithmic scales are particularly useful in fields like finance (e.g., stock price changes), science (e.g., earthquake magnitudes, pH levels), and engineering (e.g., signal strength). They are ideal for identifying proportional relationships, exponential growth, or power-law distributions that might be obscured on a linear scale.
However, it is crucial to clearly label the axis and ensure the audience understands the scale, as logarithmic representations can be less intuitive for those unfamiliar with the concept. Misuse or poor communication of the logarithmic scale can lead to confusion or misinterpretation.
Set the scale of the x
-axis as logarithmic by passing the argument log_x = True
Note: Misuse or poor communication of the logarithmic scale can lead to confusion or misinterpretation.
fig = px.scatter(df, x='gdpPercap', y ='lifeExp',
animation_frame="year",
animation_group="country",
hover_name="country",
color="continent",
size="pop",
size_max=75,
height=600, width=800,
template='plotly_white',
title='Gap Minder Data: GDP per Capita vs Life Expectancy',
color_discrete_sequence=px.colors.qualitative.Bold,
range_x=[100,100000],
range_y = [25, 90],
log_x=True,
)
fig.show()