7. Making histograms better with marginals#
Incorporating marginal rugs, box plots, or violin plots into histograms can enhance the depth and clarity of data visualizations significantly.
Marginal rugs are simple line plots that display individual data points along the axes of a histogram, providing a quick visual reference for the actual distribution of values and helping to identify patterns such as clustering or gaps in the data. By adding marginal rugs, you can see where data points are concentrated, making it easier to interpret the histogram’s overall shape.
Integrating box plots or violin plots into the marginal spaces of a histogram can provide additional context about the dataset. Box plots show summary statistics such as the median, quartiles, and potential outliers, offering a clear view of the central tendency and spread of the data. In contrast, violin plots convey the density of the data across different values, allowing for a detailed understanding of the distribution shape and the presence of multiple modes. By combining these visualizations with a histogram, you can effectively illustrate both the frequency distribution of the data and important statistical measures, enhancing interpretability and providing a richer analysis of the underlying data structure.
These enhancements allow you to create a more comprehensive view of the data, facilitating better insights and interpretations for analysts and decision-makers alike.
Getting ready#
In addition to plotly
, numpy
and pandas
, make sure the scipy
Python library avaiable in your Python environment
You can install it using the command:
pip install scipy
Import the Python modules
numpy
,pandas
. Import thenorm
object fromscipy.stats
. This object will allow us to generate random samples from a normal distribution. This will help us to create data sets to be used in this recipe.
import numpy as np
import pandas as pd
from scipy.stats import norm
Create the data set to be used in this recipe
n = 400
sample1 = norm().rvs(n)
sample2 = norm(loc=3, scale=0.5).rvs(n)
samples = np.concatenate( (sample1, sample2))
labels = ['Sample 1']*n + ['Sample 2']*n
data2 = pd.DataFrame({'Data': samples, 'Label':labels})
How to do it#
Import the
plotly.express
module aspx
import plotly.express as px
df = data2
Create a simple histogram to compare the two samples in our data
fig = px.histogram(df, x='Data',
color='Label',
barmode="overlay",
opacity=0.5,
color_discrete_sequence=px.colors.qualitative.Prism,
nbins=25,
histnorm='probability density',
height = 500, width = 800,
title='Sample from a Normal Distribution')
fig.show()
3a. Add marginal rungs by setting the argument marginal
as 'rug'
fig = px.histogram(df, x='Data',
color='Label',
marginal="rug",
opacity=0.5,
color_discrete_sequence=px.colors.qualitative.Prism,
nbins=25,
histnorm='probability density',
height = 500, width = 800,
title='Sample from a Normal Distribution')
fig.show()
3b. Add marginal box-plots by setting the argument marginal
as 'box'
fig = px.histogram(df, x='Data',
color='Label',
marginal="box",
opacity=0.5,
color_discrete_sequence=px.colors.qualitative.Prism,
nbins=25,
histnorm='probability density',
height = 500, width = 800,
title='Sample from a Normal Distribution')
fig.show()
3c. Add marginal violin-plots by setting the argument marginal
as 'violin'
fig = px.histogram(df, x='Data',
color='Label',
marginal="violin",
opacity=0.5,
color_discrete_sequence=px.colors.qualitative.Prism,
nbins=25,
histnorm='probability density',
height = 500, width = 800,
title='Sample from a Normal Distribution')
fig.show()