Making histograms better with marginals

7. Making histograms better with marginals#

Incorporating marginal rugs, box plots, or violin plots into histograms can enhance the depth and clarity of data visualizations significantly.

Marginal rugs are simple line plots that display individual data points along the axes of a histogram, providing a quick visual reference for the actual distribution of values and helping to identify patterns such as clustering or gaps in the data. By adding marginal rugs, you can see where data points are concentrated, making it easier to interpret the histogram’s overall shape.

Integrating box plots or violin plots into the marginal spaces of a histogram can provide additional context about the dataset. Box plots show summary statistics such as the median, quartiles, and potential outliers, offering a clear view of the central tendency and spread of the data. In contrast, violin plots convey the density of the data across different values, allowing for a detailed understanding of the distribution shape and the presence of multiple modes. By combining these visualizations with a histogram, you can effectively illustrate both the frequency distribution of the data and important statistical measures, enhancing interpretability and providing a richer analysis of the underlying data structure.

These enhancements allow you to create a more comprehensive view of the data, facilitating better insights and interpretations for analysts and decision-makers alike.

Getting ready#

In addition to plotly, numpy and pandas, make sure the scipy Python library avaiable in your Python environment You can install it using the command:

pip install scipy 

Import the Python modules numpy, pandas. Import the norm object from scipy.stats. This object will allow us to generate random samples from a normal distribution. This will help us to create data sets to be used in this recipe.

import numpy as np
import pandas as pd
from scipy.stats import norm

Create the data set to be used in this recipe

n = 400
sample1 = norm().rvs(n)
sample2 = norm(loc=3, scale=0.5).rvs(n)

samples =  np.concatenate( (sample1, sample2))
labels = ['Sample 1']*n + ['Sample 2']*n 
data2 = pd.DataFrame({'Data': samples, 'Label':labels})

How to do it#

Import the plotly.express module as px

import plotly.express as px

df = data2

Create a simple histogram to compare the two samples in our data

fig = px.histogram(df, x='Data',
                   color='Label', 
                   barmode="overlay",
                   opacity=0.5,
                   color_discrete_sequence=px.colors.qualitative.Prism,
                   nbins=25,
                   histnorm='probability density',
                   height = 500, width = 800,
                   title='Sample from a Normal Distribution')
fig.show()

3a. Add marginal rungs by setting the argument marginal as 'rug'

fig = px.histogram(df, x='Data',
                   color='Label',
                   marginal="rug",
                   opacity=0.5,
                   color_discrete_sequence=px.colors.qualitative.Prism,
                   nbins=25,
                   histnorm='probability density',
                   height = 500, width = 800,
                   title='Sample from a Normal Distribution')
fig.show()

3b. Add marginal box-plots by setting the argument marginal as 'box'

fig = px.histogram(df, x='Data',
                   color='Label',
                   marginal="box",
                   opacity=0.5,
                   color_discrete_sequence=px.colors.qualitative.Prism,
                   nbins=25,
                   histnorm='probability density',
                   height = 500, width = 800,
                   title='Sample from a Normal Distribution')
fig.show()

3c. Add marginal violin-plots by setting the argument marginal as 'violin'

fig = px.histogram(df, x='Data',
                   color='Label',
                   marginal="violin",
                   opacity=0.5,
                   color_discrete_sequence=px.colors.qualitative.Prism,
                   nbins=25,
                   histnorm='probability density',
                   height = 500, width = 800,
                   title='Sample from a Normal Distribution')
fig.show()

Making histograms better with marginals

Contents

7. Making histograms better with marginals#

Getting ready#

How to do it#