12. Visualizing Correlation Matrices#
Visualizing correlation matrices is a useful practice in data analysis, as it allows practicioners to quickly assess the relationships between multiple variables in a dataset. Correlation matrices summarize the pairwise Pearson (or Spearman) correlation coefficients between each pair of variables. Their values range from -1 to 1. A value of 1 indicates a perfect positive correlation, -1 indicates a perfect negative correlation, and 0 occurs when there is no correlation.
Visual representations, such as heatmaps, facilitate the interpretation of these relationships by providing an intuitive overview. In a heatmap, the variables are represented along the axes, and the correlation coefficients are represented as colors, making it easy to spot strong correlations (either positive or negative) and identify patterns in the data at a glance.
One of the main advantages of visualizing correlation matrices is the ability to identify multicollinearity, which can be problematic in regression analysis and predictive modeling. By highlighting highly correlated variables, analysts can make informed decisions about feature selection, potentially reducing dimensionality and improving model performance. Furthermore, correlation matrix visualizations are useful in exploratory data analysis, helping to reveal underlying structures in the data, such as clusters of related variables. However, it’s important to be cautious when interpreting correlation matrices, as correlation does not imply causation. A high correlation between two variables may be coincidental or driven by a third variable, so further investigation is often needed to understand the nature of these relationships. Overall, correlation matrix visualizations are powerful tools that enhance the understanding of complex datasets, aiding in the identification of trends and relationships among variables.
Getting ready#
In addition to plotly
, numpy
and pandas
, make sure the yfinance
and scipy
Python library avaiable in your Python environment
You can install it using the command:
pip install scipy, yfinance
For this recipe we will create two data sets
Import the Python modules
numpy
,pandas
. Import thenorm
object fromscipy.stats
. This object will allow us to generate random samples from a normal distribution. This will help us to create data sets to be used in this recipe.
import numpy as np
import pandas as pd
Create a correlation matrix to be used in this recipe
data = np.random.rand(5, 5)
df = pd.DataFrame(data, columns=[f'Variable {i+1}' for i in range(5)])
correlation_matrix = df.corr()
correlation_matrix
Variable 1 | Variable 2 | Variable 3 | Variable 4 | Variable 5 | |
---|---|---|---|---|---|
Variable 1 | 1.000000 | 0.083551 | -0.622897 | -0.611068 | 0.428068 |
Variable 2 | 0.083551 | 1.000000 | 0.572976 | -0.040542 | -0.229055 |
Variable 3 | -0.622897 | 0.572976 | 1.000000 | 0.112700 | -0.044306 |
Variable 4 | -0.611068 | -0.040542 | 0.112700 | 1.000000 | -0.876124 |
Variable 5 | 0.428068 | -0.229055 | -0.044306 | -0.876124 | 1.000000 |
How to do it#
Import the
plotly.express
module aspx
import plotly.express as px
Visualize the correlation matrix using the function
imshow
with the argumentstext_auto
set toTrue
color_continuous_midpoing
set to 0.0range_color
set to[-1,1]
fig = px.imshow(correlation_matrix, text_auto=True, title="Correlation Matrix",
color_continuous_midpoint = 0.0,
range_color=[-1, 1],)
fig.show()
Set the aspect of the figure using the argument
aspect
:
'equal'
: Ensures an aspect ratio of 1 or pixels (square pixels)'auto'
: The axes is kept fixed and the aspect ratio of pixels is adjusted so that the data fit in the axes. In general, this will result in non-square pixels.if
None
,'equal'
is used for numpy arrays and'auto'
for xarrays (which have typically heterogeneous coordinates)
fig = px.imshow(correlation_matrix, aspect="auto",
color_continuous_midpoint = 0.0,
range_color=[-1, 1],
text_auto=True, title="Correlation Matrix")
fig.show()
Alternatively, set the dimensions of the figure manually using
height
andwidth
fig = px.imshow(correlation_matrix, text_auto=True,
height = 600, width = 600,
color_continuous_midpoint = 0.0,
range_color=[-1, 1],
title="Correlation Matrix"
)
fig.show()
Customise the color scale of the chart by using
color_continuous_scale
fig = px.imshow(correlation_matrix,
color_continuous_scale='RdBu',
color_continuous_midpoint = 0.0,
range_color=[-1, 1],
text_auto=True,
height = 600, width = 600,
title="Correlation Matrix"
)
fig.show()