8. Using marginal plots with facets#
Facets in data visualization refer to the practice of creating a grid of smaller plots, each representing a subset of the data based on one or more categorical variables. This technique allows for easy comparison of multiple groups or conditions within the same dataset, facilitating a clearer understanding of variations and trends across categories. Each facet shares the same scale and axes, making it straightforward to identify patterns and differences between groups. Faceting is particularly useful in exploratory data analysis, as it can reveal how relationships between variables change across different subsets of data, such as different demographics, experimental conditions, or time periods.
Combining facets with marginal plots can enhance the analytical capabilities of visualizations even further. For instance, each facet can contain a scatter plot of two continuous variables, accompanied by marginal histograms or density plots that show the distribution of each variable within that specific group. This approach allows viewers to not only compare the overall relationship between the two variables across different categories but also to examine the individual distributions of each variable within those categories. This dual-level analysis can reveal insights that might be missed when examining only the overall data or relying on single visualizations, enabling a deeper understanding of how categorical factors influence the relationships between continuous variables.
However, there are situations where faceting combined with marginal plots may not be appropriate. If the dataset contains a very large number of categories or groups, the resulting grid of plots can become overcrowded and difficult to interpret, potentially obscuring meaningful insights. Additionally, if the sample sizes within each category are small, the distributions shown in the marginal plots may be unreliable or misleading, leading to erroneous conclusions. In such cases, simplifying the visualization by focusing on a few key categories or using alternative visualization methods may be more effective. Moreover, if the analysis involves complex interactions among multiple continuous variables, faceting alone might not capture the full picture, and multivariate visualizations could be more appropriate.
Getting ready#
For this recipe we will use two data sets that are available in the plotly.express
module
How to do it#
Import the
plotly.express
module aspx
import plotly.express as px
Using the
iris
data set, we will create a scatter plot with violin-plot marginals. The aim of our plots is to illustrate the relationship between the sepal lenght and width for each of the three species
df = px.data.iris()
fig = px.scatter(df, x="sepal_length", y="sepal_width", color="species",
height = 500, width = 800,
marginal_x="violin", marginal_y="violin",
title ="Iris Data: Sepal Width vs Length by Species"
)
fig.show()
Add facets by setting the argument
facet_col
as'species'
fig = px.scatter(df, x="sepal_length", y="sepal_width", color="species",
facet_col="species",
height = 500, width = 800,
marginal_x="violin", marginal_y="violin",
title ="Iris Data: Sepal Width vs Length by Species"
)
fig.show()
Next, using the
tips
data create a faceted scatter plot with box-plot marginals. In this case, the facet helps us to visualize clearly the relationship between the bill and the tip on each day of the week
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip",facet_col="day",
color="sex",
marginal_x="box")
fig.show()