5. Reshaping Data with Pandas#

When working with data, it is common to encounter situations where the format of your DataFrame is not ideal for analysis or visualization. Reshaping data can help you transform the structure of your data to better suit your needs. This recipe illustrates two powerful tools available in Pandas for reshaping data, namely

  • pivot: This method allows you to “pivot” a DataFrame, turning unique values from one column into new columns and filling them with corresponding values from another column. It is especially useful for converting long-form data (also known as tidy data) into a wide format.

  • melt: This method is the inverse of pivot. It is used to “melt” a wide-format DataFrame into a long-format, where multiple columns are unpivoted into a single column, making it easier to work with or visualize when you have a large number of variables.

Getting ready#

Before starting, we are going to create two DataFrame objects to work with.

  • df_vertical this data is in a vertical/long format where each row represents a record of a single observation or instance, and each column represents a single variable.

  • df_horizontal this data is in horizontal/wide format where multiple columns/rows represent levels of a categorical variable.

import pandas as pd
import numpy as np
df_vertical = pd.DataFrame({
    'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar', 'Apr', 'Apr'],
    'Region': ['North', 'South', 'North', 'South', 'North', 'South', 'North', 'South'],
    'Product': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B'],
    'Sales': [250, 200, 260, 210, 270, 220, 180, 210],
    'Costs': [130, 100, 130, 100, 120, 120, 100, 100]
})
df_horizontal = pd.DataFrame({
    'Month': ['Jan', 'Feb', 'Mar', 'Apr'],
    'North': [250, 260, 270, 180],
    'South': [200, 210, 220, 210]
})

How to do it#

Use pivot to go from vertical to horizontal#

  1. Take a look at the df_vertical object

df_vertical
Month Region Product Sales Costs
0 Jan North A 250 130
1 Jan South A 200 100
2 Feb North A 260 130
3 Feb South A 210 100
4 Mar North A 270 120
5 Mar South A 220 120
6 Apr North B 180 100
7 Apr South B 210 100
df_vertical.shape
(8, 5)
  1. Convert the df_vertical into horizontal/wide format by using the Pandas method pivot. This method requires three inputs:

  • index

  • columns

  • values which will determine the structure of the new DataFrame

df_vertical.pivot(index='Month', columns='Region', values='Sales')
Region North South
Month
Apr 180 210
Feb 260 210
Jan 250 200
Mar 270 220

In this format each row represents a month, each column represents a region, and the sales values are filled in accordingly. Here, the Month column becomes the index, the Region column values become the column names, and the Sales values are filled into the corresponding cells.

Note that the information in the columns Costs and Product is lost.

  1. Pass a list into the input values to obtain a pivot with several values on it. In this case, we are going to pass Sales and Costs.

df_vertical.pivot(index='Month', columns='Region', values=['Sales', 'Costs'])
Sales Costs
Region North South North South
Month
Apr 180 210 100 100
Feb 260 210 130 100
Jan 250 200 130 100
Mar 270 220 120 120
  1. Pass a list into the input columns to include the information about the product type

df_vertical.pivot(index='Month', columns=['Product', 'Region'], values=['Sales', 'Costs'])
Sales Costs
Product A B A B
Region North South North South North South North South
Month
Apr NaN NaN 180.0 210.0 NaN NaN 100.0 100.0
Feb 260.0 210.0 NaN NaN 130.0 100.0 NaN NaN
Jan 250.0 200.0 NaN NaN 130.0 100.0 NaN NaN
Mar 270.0 220.0 NaN NaN 120.0 120.0 NaN NaN
  1. Similarly, you can pass a list into the input index to include the information about the product type

df_vertical.pivot(index=['Month','Product'], columns=['Region'], values=['Sales', 'Costs'])
Sales Costs
Region North South North South
Month Product
Apr B 180 210 100 100
Feb A 260 210 130 100
Jan A 250 200 130 100
Mar A 270 220 120 120

Use melt to go from horizontal to vertical#

  1. Take a look at the df_horizontal object

df_horizontal
Month North South
0 Jan 250 200
1 Feb 260 210
2 Mar 270 220
3 Apr 180 210
  1. Convert the df_horizontal into vertical/long format by using the Pandas function melt

pd.melt(df_horizontal)
variable value
0 Month Jan
1 Month Feb
2 Month Mar
3 Month Apr
4 North 250
5 North 260
6 North 270
7 North 180
8 South 200
9 South 210
10 South 220
11 South 210
  1. To get a better result we need to pass the inputs

  • id_vars

  • value_vars

  • var_name

  • value_name

which will determine the structure of the new DataFrame

melted_df = pd.melt(df_horizontal, id_vars=['Month'], value_vars=['North', 'South'], var_name='Region', value_name='Sales')
melted_df
Month Region Sales
0 Jan North 250
1 Feb North 260
2 Mar North 270
3 Apr North 180
4 Jan South 200
5 Feb South 210
6 Mar South 220
7 Apr South 210