{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Making Dot Plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A [Dot Plot](https://en.wikipedia.org/wiki/Dot_plot_(statistics)), also known as a Cleveland Dot Plot, is a simple yet effective data visualization tool that displays quantitative data for various categories. It is most commonly used for comparing numerical values across two categories when you want to prioritize readability and avoid the clutter of other chart types.\n", "\n", "They are typically used as an alternative to bar charts. Instead of bars, dot plots use points (dots) aligned to show data values. Each dot represents a data point, and its position along an axis corresponds to a numerical value.\n", "\n", "Dot plots are often associated with William S. Cleveland, a statistician who popularised this form of data visualization in the context of exploratory data analysis. Cleveland’s work emphasised the value of simple, clear, and interpretable visualizations.\n", "\n", "🚀 When to use them:\n", "\n", "Dot plots are particularly useful when dealing with medium size datasets and when exact values are important. They are effective for **comparing multiple groups or categories** side by side, making them a great alternative to bar charts, particularly when data labels are long. They also work well for **showing small differences between values**, which might be harder to detect in bar charts due to the added area of the bars. Cleveland dot plots are frequently used in fields such as economics, healthcare, and business analytics to present comparisons in a straightforward and space-efficient manner.\n", "\n", "⚠️ Be aware:\n", "\n", "However, dot plots also have limitations that may make them unsuitable in certain scenarios. They can become **cluttered and difficult to interpret** when dealing with **large datasets**, as overlapping dots may obscure differences between values. Additionally, dot plots might not be as intuitive as bar charts for **audiences unfamiliar with them**, since people are generally more accustomed to interpreting length rather than dot position. Furthermore, for datasets with extreme values or very small variations, dots may be placed too close together, making it challenging to distinguish between them.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting ready" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To create the data set for this recipe, we need to read a `csv` file and filter some of its data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv('data/data_recipe05.csv')\n", "data = data[['Reference area', 'Time period', 'Sex', 'Value', 'Country ISO3']]\n", "data = data[data['Sex']!='Total']" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Reference area | \n", "Time period | \n", "Sex | \n", "Value | \n", "Country ISO3 | \n", "
---|---|---|---|---|---|
1 | \n", "Greece | \n", "2023 | \n", "Female | \n", "6.340200 | \n", "GRC | \n", "
2 | \n", "Italy | \n", "2023 | \n", "Female | \n", "9.822100 | \n", "ITA | \n", "
3 | \n", "Mexico | \n", "2022 | \n", "Male | \n", "13.398436 | \n", "MEX | \n", "
4 | \n", "Norway | \n", "2023 | \n", "Female | \n", "20.661200 | \n", "NOR | \n", "
5 | \n", "Sweden | \n", "2023 | \n", "Male | \n", "22.065200 | \n", "SWE | \n", "