8. Using marginal plots with facets#

Facets in data visualization refer to the practice of creating a grid of smaller plots, each representing a subset of the data based on one or more categorical variables. This technique allows for easy comparison of multiple groups or conditions within the same dataset, facilitating a clearer understanding of variations and trends across categories. Each facet shares the same scale and axes, making it straightforward to identify patterns and differences between groups. Faceting is particularly useful in exploratory data analysis, as it can reveal how relationships between variables change across different subsets of data, such as different demographics, experimental conditions, or time periods.

Combining facets with marginal plots can enhance the analytical capabilities of visualizations even further. For instance, each facet can contain a scatter plot of two continuous variables, accompanied by marginal histograms or density plots that show the distribution of each variable within that specific group. This approach allows viewers to not only compare the overall relationship between the two variables across different categories but also to examine the individual distributions of each variable within those categories. This dual-level analysis can reveal insights that might be missed when examining only the overall data or relying on single visualizations, enabling a deeper understanding of how categorical factors influence the relationships between continuous variables.

However, there are situations where faceting combined with marginal plots may not be appropriate. If the dataset contains a very large number of categories or groups, the resulting grid of plots can become overcrowded and difficult to interpret, potentially obscuring meaningful insights. Additionally, if the sample sizes within each category are small, the distributions shown in the marginal plots may be unreliable or misleading, leading to erroneous conclusions. In such cases, simplifying the visualization by focusing on a few key categories or using alternative visualization methods may be more effective. Moreover, if the analysis involves complex interactions among multiple continuous variables, faceting alone might not capture the full picture, and multivariate visualizations could be more appropriate.

Getting ready#

For this recipe we will use two data sets that are available in the plotly.express module

How to do it#

  1. Import the plotly.express module as px

import plotly.express as px
  1. Using the iris data set, we will create a scatter plot with violin-plot marginals. The aim of our plots is to illustrate the relationship between the sepal lenght and width for each of the three species

df = px.data.iris()
fig = px.scatter(df, x="sepal_length", y="sepal_width", color="species", 
                 height = 500, width = 800,
                 marginal_x="violin", marginal_y="violin",
                 title ="Iris Data: Sepal Width vs Length by Species"
                 )
fig.show()
  1. Add facets by setting the argument facet_col as 'species'

fig = px.scatter(df, x="sepal_length", y="sepal_width", color="species", 
                 facet_col="species",
                 height = 500, width = 800,
                 marginal_x="violin", marginal_y="violin",
                 title ="Iris Data: Sepal Width vs Length by Species"
                 )
fig.show()
  1. Next, using the tips data create a faceted scatter plot with box-plot marginals. In this case, the facet helps us to visualize clearly the relationship between the bill and the tip on each day of the week

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip",facet_col="day",
                  color="sex", 
                  marginal_x="box")
fig.show()