2. Making 2D-Histograms#
A 2-D histogram is an extension of the traditional histogram, designed to visualize the relationship between two continuous variables. Instead of grouping values into bins along a single axis, a 2-D histogram divides both the x-axis and y-axis into bins, creating a grid where each cell represents the frequency of data points that fall within the corresponding ranges of both variables. The frequency or count for each bin is indicated by color or shading in the grid, often using a heat map or color gradient. This allows you to see how the two variables are distributed together, and where data points are concentrated or sparse.
2-D histograms are particularly useful when you’re analyzing the joint distribution of two continuous variables and want to explore any patterns or correlations between them. For example, in fields like meteorology or economics, 2-D histograms can be used to visualize how temperature and humidity co-vary, or how income and expenditure relate to one another. Unlike scatter plots, which display individual data points, 2-D histograms are beneficial when working with large datasets where overlapping points can obscure patterns. The binning process groups the data, making it easier to observe density, trends, or anomalies in the relationship between the variables. However, the choice of bin size is still important, as too many or too few bins can either obscure meaningful patterns or add unnecessary complexity.
Getting ready#
In addition to plotly
, numpy
and pandas
, make sure the scipy
Python library avaiable in your Python environment
You can install it using the command:
pip install scipy
For this recipe we will create two data sets
Import the Python modules
numpy
,pandas
; and themultivariate_normal
object fromscipy.stats
. This object will allow us to generate random samples from a bi-variate normal distribution. This will help us to create data sets to be used in this recipe.
import numpy as np
import pandas as pd
from scipy.stats import multivariate_normal
Create the data set that we are going to use in this recipe
rv = multivariate_normal([1.0, 3.0], [[1.0, 0.3], [0.3, 0.5]])
n = 200
sample = rv.rvs(n)
data1 = pd.DataFrame(sample, columns=['X', 'Y'])
data1.head()
X | Y | |
---|---|---|
0 | 0.592594 | 3.224629 |
1 | 0.958740 | 3.019011 |
2 | 0.149638 | 2.856354 |
3 | 1.297205 | 2.050106 |
4 | -0.366634 | 2.788429 |
How to do it#
Import the
plotly.express
module aspx
import plotly.express as px
Make a simple 2-D histogram plot to illustrate the distributions of the data set from
data1
using the functiondensity_heatmap
df = data1
fig = px.density_heatmap(df, x="X", y="Y")
fig.show()
Add a title to your chart by passing a string as the input
title
into the functiondensity_heatmap
And customise the size of the figure by using the inputs
height
andwidth
. Both have to be integers and correspond to the size of the figure in pixels.
fig = px.density_heatmap(df, x="X", y="Y",
height = 500, width = 800,
title='Sample from a Bi-variate Normal Distribution')
fig.show()
Just as in the case of simple histograms, we can display 2-D histograms in density scale by setting the input
histnorm
as'probability density'
fig = px.density_heatmap(df, x="X", y="Y",
histnorm='probability density',
height = 500, width = 800,
title='Sample from a Bi-variate Normal Distribution')
fig.show()
Customize the number of bins in both axys by using the inputs
nbinsx
andnbinsy
fig = px.density_heatmap(df, x="X", y="Y",
nbinsx= 25,
nbinsy=25,
histnorm='probability density',
height = 500, width = 800,
title='Sample from a Bi-variate Normal Distribution')
fig.show()
Customise the color of the bars using the input
color_continuous_scale
as follows
fig = px.density_heatmap(df, x="X", y="Y",
color_continuous_scale="Viridis",
nbinsx= 25,
nbinsy=25,
histnorm='probability density',
height = 500, width = 800,
title='Sample from a Bi-variate Normal Distribution')
fig.show()