{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Making scatter plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A [scatter plot](https://en.wikipedia.org/wiki/Scatter_plot) or scatter chart is a type of graph that displays data points based on two variables along the x and y axes. **Each point represents a pair of values**, providing insight into the **relationship between** those **variables**. \n", "\n", "🚀 When to use them:\n", "\n", "- Scatter charts are particularly effective for **visualizing patterns of correlation or distribution**, such as whether an increase in one variable leads to a corresponding increase (positive correlation) or decrease (negative correlation) in the other. They are widely used in fields like statistics, data science, and economics to examine relationships between variables, identify outliers, or determine clustering patterns.\n", "\n", "\n", "- Scatter charts are most useful when you need to identify, analyze, and interpret the relationships or correlations between two **continuous variables**. For instance, they are commonly used in scientific research to investigate variables such as height vs. weight or sales revenue vs. advertising spend. Scatter plots make it easy to spot trends or anomalies in the data.\n", "\n", "⚠️ Be aware:\n", "\n", "- Scatter charts may not be the best choice when dealing with large datasets that could result in overlapping points (also known as **\"overplotting\"**), making it difficult to distinguish individual data points. They also tend to be less effective for categorical or non-numeric data and might not clearly display relationships when the data has no significant correlation or is highly scattered without any discernible pattern." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting ready" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this recipe we will load the `Gapminder` data set and filter the data for the year 2007." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import plotly.express as px" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "df = px.data.gapminder()\n", "df = df[df.year==2007]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | country | \n", "continent | \n", "year | \n", "lifeExp | \n", "pop | \n", "gdpPercap | \n", "iso_alpha | \n", "iso_num | \n", "
---|---|---|---|---|---|---|---|---|
11 | \n", "Afghanistan | \n", "Asia | \n", "2007 | \n", "43.828 | \n", "31889923 | \n", "974.580338 | \n", "AFG | \n", "4 | \n", "
23 | \n", "Albania | \n", "Europe | \n", "2007 | \n", "76.423 | \n", "3600523 | \n", "5937.029526 | \n", "ALB | \n", "8 | \n", "
35 | \n", "Algeria | \n", "Africa | \n", "2007 | \n", "72.301 | \n", "33333216 | \n", "6223.367465 | \n", "DZA | \n", "12 | \n", "
47 | \n", "Angola | \n", "Africa | \n", "2007 | \n", "42.731 | \n", "12420476 | \n", "4797.231267 | \n", "AGO | \n", "24 | \n", "
59 | \n", "Argentina | \n", "Americas | \n", "2007 | \n", "75.320 | \n", "40301927 | \n", "12779.379640 | \n", "ARG | \n", "32 | \n", "