{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Making 2D-Histograms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A 2-D histogram is an extension of the traditional histogram, designed to visualize the relationship between two continuous variables. Instead of grouping values into bins along a single axis, a 2-D histogram divides both the x-axis and y-axis into bins, creating a grid where each cell represents the frequency of data points that fall within the corresponding ranges of both variables. The frequency or count for each bin is indicated by color or shading in the grid, often using a heat map or color gradient. This allows you to see how the two variables are distributed together, and where data points are concentrated or sparse.\n", "\n", "2-D histograms are particularly useful when you're analyzing the joint distribution of two continuous variables and want to explore any patterns or correlations between them. For example, in fields like meteorology or economics, 2-D histograms can be used to visualize how temperature and humidity co-vary, or how income and expenditure relate to one another. Unlike scatter plots, which display individual data points, 2-D histograms are beneficial when working with large datasets where overlapping points can obscure patterns. The binning process groups the data, making it easier to observe density, trends, or anomalies in the relationship between the variables. However, the choice of bin size is still important, as too many or too few bins can either obscure meaningful patterns or add unnecessary complexity." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting ready\n", "\n", "\n", "In addition to `plotly`, `numpy` and `pandas`, make sure the `scipy` Python library avaiable in your Python environment\n", "You can install it using the command:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "pip install scipy \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this recipe we will create two data sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Import the Python modules `numpy`, `pandas`; and the [`multivariate_normal`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multivariate_normal.html) object from `scipy.stats`. This object will allow us to generate random samples from a bi-variate normal distribution. This will help us to create data sets to be used in this recipe." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from scipy.stats import multivariate_normal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Create the data set that we are going to use in this recipe" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "rv = multivariate_normal([1.0, 3.0], [[1.0, 0.3], [0.3, 0.5]])\n", "n = 200\n", "sample = rv.rvs(n)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | X | \n", "Y | \n", "
---|---|---|
0 | \n", "2.317120 | \n", "3.128239 | \n", "
1 | \n", "0.995553 | \n", "4.070351 | \n", "
2 | \n", "1.593538 | \n", "4.044972 | \n", "
3 | \n", "0.470556 | \n", "2.772609 | \n", "
4 | \n", "0.415854 | \n", "3.184090 | \n", "