# Making Dot Plots

A [Dot Plot](https://en.wikipedia.org/wiki/Dot_plot_(statistics)), also known as a Cleveland Dot Plot, is a simple yet effective data visualization tool that displays quantitative data for various categories. It is most commonly used for comparing numerical values across two categories when you want to prioritize readability and avoid the clutter of other chart types.

They are typically used as an alternative to bar charts. Instead of bars, dot plots use points (dots) aligned to show data values. Each dot represents a data point, and its position along an axis corresponds to a numerical value.

Dot plots are often associated with William S. Cleveland, a statistician who popularised this form of data visualization in the context of exploratory data analysis. Cleveland‚Äôs work emphasised the value of simple, clear, and interpretable visualizations.

üöÄ When to use them:

Dot plots are particularly useful when dealing with medium size datasets and when exact values are important. They are effective for **comparing multiple groups or categories** side by side, making them a great alternative to bar charts, particularly when data labels are long. They also work well for **showing small differences between values**, which might be harder to detect in bar charts due to the added area of the bars. Cleveland dot plots are frequently used in fields such as economics, healthcare, and business analytics to present comparisons in a straightforward and space-efficient manner.

‚ö†Ô∏è Be aware:

However, dot plots also have limitations that may make them unsuitable in certain scenarios. They can become **cluttered and difficult to interpret** when dealing with **large datasets**, as overlapping dots may obscure differences between values. Additionally, dot plots might not be as intuitive as bar charts for **audiences unfamiliar with them**, since people are generally more accustomed to interpreting length rather than dot position. Furthermore, for datasets with extreme values or very small variations, dots may be placed too close together, making it challenging to distinguish between them.


## Getting ready

To create the data set for this recipe, we need to read a `csv` file and filter some of its data

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('data/data_recipe05.csv')
data = data[['Reference area', 'Time period', 'Sex', 'Value', 'Country ISO3']]
data = data[data['Sex']!='Total']

In [3]:
data.head()

Unnamed: 0,Reference area,Time period,Sex,Value,Country ISO3
1,Greece,2023,Female,6.3402,GRC
2,Italy,2023,Female,9.8221,ITA
3,Mexico,2022,Male,13.398436,MEX
4,Norway,2023,Female,20.6612,NOR
5,Sweden,2023,Male,22.0652,SWE


## How to do it

### Using `plotly Express`

1. Import the `plotly.express` module as `px`

In [4]:
import plotly.express as px

In [5]:
df = data

2. Start by making a simple scatter plot using the function `scatter`. We are going to differentiate the two categories (Female and Male) using the inputs `color` and `symbol` both set to the relevant column, in this case `Sex`.


In [6]:
fig = px.scatter(df, x="Country ISO3", y="Value", 
                 color="Sex", symbol="Sex",
                 title="Women as a share of all 16-24 year-olds who can program"
                )
fig.show()

2. Add a title to your chart by passing a string as the input `title`

In [7]:
fig = px.scatter(df, x="Country ISO3", y="Value", 
                 color="Sex", symbol="Sex",
                )
fig.show()

3. Customise the size of the figure by using the inputs `height` and `width`. Both have to be integers and correspond to the size of the figure in pixels.

In [8]:
fig = px.scatter(df, x="Country ISO3", y="Value", 
                 color="Sex",
                 symbol="Sex",
                 title="Women as a share of all 16-24 year-olds who can program",
                 width=900, height=500
                )
fig.show()

4. Customise the colors used in the scatter by using the input `color_discrete_sequence`

In [9]:
fig = px.scatter(df, x="Country ISO3", y="Value", 
                 color="Sex",
                 symbol="Sex",
                 color_discrete_sequence = px.colors.qualitative.Set1,
                 title="Women as a share of all 16-24 year-olds who can program",
                 width=900, height=500
                )
fig.show()

Alternatively, you can specify the colors by using `color_discrete_map`. This allows you to pass a dictionary to map each category into a particular color.

In [10]:
fig = px.scatter(df, x="Country ISO3", y="Value", 
                 color="Sex",
                 symbol="Sex",
                 color_discrete_map = {'Female':'purple', 'Male':'Green'},
                 title="Women as a share of all 16-24 year-olds who can program",
                 width=900, height=500
                )
fig.show()

Note: You don't have to specify all the categories available in the mapping. For example, in the following snippet we only set the color for the category 'Female' and left the rest to `plotly`

In [11]:
fig = px.scatter(df, x="Country ISO3", y="Value", 
                 color="Sex",
                 symbol="Sex",
                 color_discrete_map = {'Female':'purple'},
                 title="Women as a share of all 16-24 year-olds who can program",
                 width=900, height=500
                )
fig.show()

5. Change the orientation of the x-axis text by calling the Figure method `update_xaxis` and passing a dictionary with the `value` -90 into the key `tickangle`. This has the effect of rotating the text by -90 degrees.

In [12]:
fig = px.scatter(df, x="Country ISO3", y="Value", 
                 color="Sex",
                 symbol="Sex",
                 color_discrete_map = {'Female':'purple', 'Male':'Green'},
                 title="Women as a share of all 16-24 year-olds who can program",
                 width=900, height=500
                )
fig.update_xaxes({'tickangle':-90})
fig.show()



We can get the same effect by using the Figure method `update_layout` as follows:

In [13]:
fig = px.scatter(df, x="Country ISO3", y="Value", 
                 color="Sex",
                 symbol="Sex",
                 color_discrete_map = {'Female':'purple', 'Male':'Green'},
                 title="Women as a share of all 16-24 year-olds who can program",
                 width=900, height=500
                )
fig.update_layout(xaxis={'tickangle':-90})
fig.show()

### Using `plotly Go`

1. Import the Plotly Graph Objects library as `go`

In [14]:
import plotly.graph_objects as go

1. Define a blank figue using the method `Figure`

In [15]:
fig = go.Figure()
fig.show()

2. Use the method `update_layout` to add a title and customise the figure size, using the inputs
- `title`
- `width` and `height`
respectivelyl.

In [16]:
fig = go.Figure()

fig.update_layout(title="Female/Male as a share of all 16-24 year-olds who can program", 
                width=900, height=500)

fig.show()

3. Loop over the unique values in the column which define the `x` values, in this case `Country ISO3`.
For each value: Get the subset of the data matching each value. Then, extract the values corresponding to each category (in this case 'Female', 'Male'). Add a trace for each of these extracts, by calling `go.Scatter`.

Notes:

-  We specify the markers  by passina a dictionary as the `marker` input. This allows you great customisation of the symbols.
-  We call the method `update_layout` and pass `showlegend=False`. This prevents the legend to appear next to the chart. In this case, we will have too many labels (one per each trace) resulting on an overcrowded figure. So, we hide them.

In [17]:
fig = go.Figure()

for country in df["Country ISO3"].unique():
    subset = df[df["Country ISO3"] == country]
    female = subset[subset["Sex"] == "Female"]
    male = subset[subset["Sex"] == "Male"]
    
    fig.add_trace(go.Scatter(x=female["Country ISO3"], y=female["Value"],
                        marker = dict(color="purple",
                                      symbol="circle",
                                      size=7)
                        ))
    fig.add_trace(go.Scatter(x=male["Country ISO3"], y=male["Value"],
                        marker = dict(color="green",
                                      symbol="diamond",
                                      size=7,
                                      )
                        ))
    
fig.update_layout(title="Female/Male as a share of all 16-24 year-olds who can program",
                  width=900, height=500)

fig.update_layout(showlegend=False)

fig.show()


4. Rotate the x-axis text by using the Figure method `update_layout` as we did before

In [18]:
fig = go.Figure()

for country in df["Country ISO3"].unique():
    subset = df[df["Country ISO3"] == country]
    female = subset[subset["Sex"] == "Female"]
    male = subset[subset["Sex"] == "Male"]
    
    fig.add_trace(go.Scatter(x=subset["Country ISO3"], y=subset["Value"],
                            marker = dict(color="gray"
                                          )
                            ))
    fig.add_trace(go.Scatter(x=female["Country ISO3"], y=female["Value"],
                        marker = dict(color="purple",
                                      symbol="circle",
                                      size=7)
                        ))
    fig.add_trace(go.Scatter(x=male["Country ISO3"], y=male["Value"],
                        marker = dict(color="green",
                                      symbol="diamond",
                                      size=7,
                                      )
                        ))

fig.update_layout(title="Female/Male as a share of all 16-24 year-olds who can program", 
                  xaxis={'tickangle':-90},
                  width=900, height=500,
                  showlegend=False)

fig.show()