2. Making 2D-Histograms#

A 2-D histogram is an extension of the traditional histogram, designed to visualize the relationship between two continuous variables. Instead of grouping values into bins along a single axis, a 2-D histogram divides both the x-axis and y-axis into bins, creating a grid where each cell represents the frequency of data points that fall within the corresponding ranges of both variables. The frequency or count for each bin is indicated by color or shading in the grid, often using a heat map or color gradient. This allows you to see how the two variables are distributed together, and where data points are concentrated or sparse.

2-D histograms are particularly useful when you’re analyzing the joint distribution of two continuous variables and want to explore any patterns or correlations between them. For example, in fields like meteorology or economics, 2-D histograms can be used to visualize how temperature and humidity co-vary, or how income and expenditure relate to one another. Unlike scatter plots, which display individual data points, 2-D histograms are beneficial when working with large datasets where overlapping points can obscure patterns. The binning process groups the data, making it easier to observe density, trends, or anomalies in the relationship between the variables. However, the choice of bin size is still important, as too many or too few bins can either obscure meaningful patterns or add unnecessary complexity.

Getting ready#

In addition to plotly, numpy and pandas, make sure the scipy Python library avaiable in your Python environment You can install it using the command:

pip install scipy 

For this recipe we will create two data sets

  1. Import the Python modules numpy, pandas; and the multivariate_normal object from scipy.stats. This object will allow us to generate random samples from a bi-variate normal distribution. This will help us to create data sets to be used in this recipe.

import numpy as np
import pandas as pd
from scipy.stats import multivariate_normal
  1. Create the data set that we are going to use in this recipe

rv = multivariate_normal([1.0, 3.0], [[1.0, 0.3], [0.3, 0.5]])
n = 200
sample = rv.rvs(n)
data1 = pd.DataFrame(sample, columns=['X', 'Y'])
data1.head()
X Y
0 0.592594 3.224629
1 0.958740 3.019011
2 0.149638 2.856354
3 1.297205 2.050106
4 -0.366634 2.788429

How to do it#

  1. Import the plotly.express module as px

import plotly.express as px
  1. Make a simple 2-D histogram plot to illustrate the distributions of the data set from data1 using the function density_heatmap

df = data1
fig = px.density_heatmap(df, x="X", y="Y")
fig.show()
  1. Add a title to your chart by passing a string as the input title into the function density_heatmap

  2. And customise the size of the figure by using the inputs height and width. Both have to be integers and correspond to the size of the figure in pixels.

fig = px.density_heatmap(df, x="X", y="Y",
                          height = 500, width = 800,
                         title='Sample from a Bi-variate Normal Distribution')
fig.show()
  1. Just as in the case of simple histograms, we can display 2-D histograms in density scale by setting the input histnorm as 'probability density'

fig = px.density_heatmap(df, x="X", y="Y",
                         histnorm='probability density',
                         height = 500, width = 800,
                         title='Sample from a Bi-variate Normal Distribution')
fig.show()
  1. Customize the number of bins in both axys by using the inputs nbinsx and nbinsy

fig = px.density_heatmap(df, x="X", y="Y",
                         nbinsx= 25,
                         nbinsy=25,
                         histnorm='probability density',
                         height = 500, width = 800,
                         title='Sample from a Bi-variate Normal Distribution')
fig.show()
  1. Customise the color of the bars using the input color_continuous_scale as follows

fig = px.density_heatmap(df, x="X", y="Y",
                         color_continuous_scale="Viridis",
                         nbinsx= 25,
                         nbinsy=25,
                         histnorm='probability density',
                         height = 500, width = 800,
                         title='Sample from a Bi-variate Normal Distribution')
fig.show()