1. Making scatter plot#

A scatter plot or scatter chart is a type of graph that displays data points based on two variables along the x and y axes. Each point represents a pair of values, providing insight into the relationship between those variables.

🚀 When to use them:

  • Scatter charts are particularly effective for visualizing patterns of correlation or distribution, such as whether an increase in one variable leads to a corresponding increase (positive correlation) or decrease (negative correlation) in the other. They are widely used in fields like statistics, data science, and economics to examine relationships between variables, identify outliers, or determine clustering patterns.

  • Scatter charts are most useful when you need to identify, analyze, and interpret the relationships or correlations between two continuous variables. For instance, they are commonly used in scientific research to investigate variables such as height vs. weight or sales revenue vs. advertising spend. Scatter plots make it easy to spot trends or anomalies in the data.

⚠️ Be aware:

  • Scatter charts may not be the best choice when dealing with large datasets that could result in overlapping points (also known as “overplotting”), making it difficult to distinguish individual data points. They also tend to be less effective for categorical or non-numeric data and might not clearly display relationships when the data has no significant correlation or is highly scattered without any discernible pattern.

Getting ready#

For this recipe we will load the Gapminder data set and filter the data for the year 2007.

import plotly.express as px
df = px.data.gapminder()
df = df[df.year==2007]
df.head()
country continent year lifeExp pop gdpPercap iso_alpha iso_num
11 Afghanistan Asia 2007 43.828 31889923 974.580338 AFG 4
23 Albania Europe 2007 76.423 3600523 5937.029526 ALB 8
35 Algeria Africa 2007 72.301 33333216 6223.367465 DZA 12
47 Angola Africa 2007 42.731 12420476 4797.231267 AGO 24
59 Argentina Americas 2007 75.320 40301927 12779.379640 ARG 32

How to do it#

Visualising two variables#

  1. Make a simple scatter using px.scatter and passing the data frame as well as the names of the two columns that will be ploted as x and y respectively. Then, use the method show to display it.

fig = px.scatter(df, x='gdpPercap', y ='lifeExp')
fig.show()
  1. Add a title to your chart by passing a string as the input title

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 title='Gap Minder Data: GDP per Capita vs Life Expectancy')
fig.show()
  1. Customise the size of the figure by using the inputs height and width. Both have to be integers and correspond to the size of the figure in pixels.

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 height=600, width=800,
                 title='Gap Minder Data: GDP per Capita vs Life Expectancy')
fig.show()

Introducing a third variable#

  1. Use the input color to specify the color of the dots according to a third categorical variable.

In this case, we pass continent which allows us to observe if the relationship between GDP per capita and life expectancy is different depending on the continent.

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 color='continent', 
                 height=600, width=800,
                 title='Gap Minder Data: GDP per Capita vs Life Expectancy')
fig.show()

An alternative way to introduce a third variable is by using the input symbol. This would make the marks different according to the specified variable. Let’s take a look at the result when passing continent.

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 symbol='continent',
                 height=600, width=800,
                 title='Gap Minder Data: GDP per Capita vs Life Expectancy')
fig.show()

You can also use both methods together as follows

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 symbol='continent', color='continent',
                 height=600, width=800,
                 title='Gap Minder Data: GDP per Capita vs Life Expectancy')
fig.show()

There is More#

Further Customisation#

  1. Customise the colors used in the scatter by using the input color_discrete_sequence

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 color='continent', 
                 height=600, width=800,
                 color_discrete_sequence=px.colors.qualitative.Bold,
                 title='Gap Minder Data: GDP per Capita vs Life Expectancy')
fig.show()
  1. Use a pre-defined template. The Plotly library comes pre-loaded with several themes that you can get started using right away.

fig = px.scatter(df, x='gdpPercap', y ='lifeExp', 
                 color='continent', 
                 height=600, width=800,
                 color_discrete_sequence=px.colors.qualitative.Bold,
                 template="plotly_dark", 
                 title='Gap Minder Data: GDP per Capita vs Life Expectancy')
fig.show()