12. Visualizing Correlation Matrices#

Visualizing correlation matrices is a useful practice in data analysis, as it allows practicioners to quickly assess the relationships between multiple variables in a dataset. Correlation matrices summarize the pairwise Pearson (or Spearman) correlation coefficients between each pair of variables. Their values range from -1 to 1. A value of 1 indicates a perfect positive correlation, -1 indicates a perfect negative correlation, and 0 occurs when there is no correlation.

Visual representations, such as heatmaps, facilitate the interpretation of these relationships by providing an intuitive overview. In a heatmap, the variables are represented along the axes, and the correlation coefficients are represented as colors, making it easy to spot strong correlations (either positive or negative) and identify patterns in the data at a glance.

One of the main advantages of visualizing correlation matrices is the ability to identify multicollinearity, which can be problematic in regression analysis and predictive modeling. By highlighting highly correlated variables, analysts can make informed decisions about feature selection, potentially reducing dimensionality and improving model performance. Furthermore, correlation matrix visualizations are useful in exploratory data analysis, helping to reveal underlying structures in the data, such as clusters of related variables. However, it’s important to be cautious when interpreting correlation matrices, as correlation does not imply causation. A high correlation between two variables may be coincidental or driven by a third variable, so further investigation is often needed to understand the nature of these relationships. Overall, correlation matrix visualizations are powerful tools that enhance the understanding of complex datasets, aiding in the identification of trends and relationships among variables.

Getting ready#

In addition to plotly, numpy and pandas, make sure the yfinance and scipy Python library avaiable in your Python environment You can install it using the command:

pip install scipy, yfinance

For this recipe we will create two data sets

  1. Import the Python modules numpy, pandas. Import the norm object from scipy.stats. This object will allow us to generate random samples from a normal distribution. This will help us to create data sets to be used in this recipe.

import numpy as np
import pandas as pd
  1. Create a correlation matrix to be used in this recipe

data = np.random.rand(5, 5)
df = pd.DataFrame(data, columns=[f'Variable {i+1}' for i in range(5)])
correlation_matrix = df.corr()
correlation_matrix
Variable 1 Variable 2 Variable 3 Variable 4 Variable 5
Variable 1 1.000000 0.083551 -0.622897 -0.611068 0.428068
Variable 2 0.083551 1.000000 0.572976 -0.040542 -0.229055
Variable 3 -0.622897 0.572976 1.000000 0.112700 -0.044306
Variable 4 -0.611068 -0.040542 0.112700 1.000000 -0.876124
Variable 5 0.428068 -0.229055 -0.044306 -0.876124 1.000000

How to do it#

  1. Import the plotly.express module as px

import plotly.express as px
  1. Visualize the correlation matrix using the function imshow with the arguments

    • text_auto set to True

    • color_continuous_midpoing set to 0.0

    • range_color set to [-1,1]

fig = px.imshow(correlation_matrix, text_auto=True, title="Correlation Matrix",                 
                color_continuous_midpoint = 0.0,
                range_color=[-1, 1],)
fig.show()
  1. Set the aspect of the figure using the argument aspect:

  • 'equal': Ensures an aspect ratio of 1 or pixels (square pixels)

  • 'auto': The axes is kept fixed and the aspect ratio of pixels is adjusted so that the data fit in the axes. In general, this will result in non-square pixels.

  • if None, 'equal' is used for numpy arrays and 'auto' for xarrays (which have typically heterogeneous coordinates)

fig = px.imshow(correlation_matrix, aspect="auto", 
                color_continuous_midpoint = 0.0,
                range_color=[-1, 1],
                text_auto=True, title="Correlation Matrix")
fig.show()
  1. Alternatively, set the dimensions of the figure manually using height and width

fig = px.imshow(correlation_matrix, text_auto=True,  
                height = 600, width = 600,
                color_continuous_midpoint = 0.0,
                range_color=[-1, 1],
                title="Correlation Matrix"
                )
fig.show()
  1. Customise the color scale of the chart by using color_continuous_scale

fig = px.imshow(correlation_matrix, 
                color_continuous_scale='RdBu',
                color_continuous_midpoint = 0.0,
                range_color=[-1, 1],
                text_auto=True,  
                height = 600, width = 600,
                title="Correlation Matrix"
                )
fig.show()