Reshaping Data with Pandas

5. Reshaping Data with Pandas#

When working with data, it is common to encounter situations where the format of your DataFrame is not ideal for analysis or visualization. Reshaping data can help you transform the structure of your data to better suit your needs. This recipe illustrates two powerful tools available in Pandas for reshaping data, namely

pivot: This method allows you to “pivot” a DataFrame, turning unique values from one column into new columns and filling them with corresponding values from another column. It is especially useful for converting long-form data (also known as tidy data) into a wide format.
melt: This method is the inverse of pivot. It is used to “melt” a wide-format DataFrame into a long-format, where multiple columns are unpivoted into a single column, making it easier to work with or visualize when you have a large number of variables.

Getting ready#

Before starting, we are going to create two DataFrame objects to work with.

df_vertical this data is in a vertical/long format where each row represents a record of a single observation or instance, and each column represents a single variable.
df_horizontal this data is in horizontal/wide format where multiple columns/rows represent levels of a categorical variable.

import pandas as pd
import numpy as np

df_vertical = pd.DataFrame({
    'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar', 'Apr', 'Apr'],
    'Region': ['North', 'South', 'North', 'South', 'North', 'South', 'North', 'South'],
    'Product': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B'],
    'Sales': [250, 200, 260, 210, 270, 220, 180, 210],
    'Costs': [130, 100, 130, 100, 120, 120, 100, 100]
})

df_horizontal = pd.DataFrame({
    'Month': ['Jan', 'Feb', 'Mar', 'Apr'],
    'North': [250, 260, 270, 180],
    'South': [200, 210, 220, 210]
})

How to do it#

Use `pivot` to go from vertical to horizontal#

Take a look at the df_vertical object

df_vertical

	Month	Region	Product	Sales	Costs
0	Jan	North	A	250	130
1	Jan	South	A	200	100
2	Feb	North	A	260	130
3	Feb	South	A	210	100
4	Mar	North	A	270	120
5	Mar	South	A	220	120
6	Apr	North	B	180	100
7	Apr	South	B	210	100

df_vertical.shape

(8, 5)

Convert the df_vertical into horizontal/wide format by using the Pandas method pivot. This method requires three inputs:

index
columns
values which will determine the structure of the new DataFrame

df_vertical.pivot(index='Month', columns='Region', values='Sales')

Region	North	South
Month
Apr	180	210
Feb	260	210
Jan	250	200
Mar	270	220

In this format each row represents a month, each column represents a region, and the sales values are filled in accordingly. Here, the Month column becomes the index, the Region column values become the column names, and the Sales values are filled into the corresponding cells.

Note that the information in the columns Costs and Product is lost.

Pass a list into the input values to obtain a pivot with several values on it. In this case, we are going to pass Sales and Costs.

df_vertical.pivot(index='Month', columns='Region', values=['Sales', 'Costs'])

	Sales		Costs
Region	North	South	North	South
Month
Apr	180	210	100	100
Feb	260	210	130	100
Jan	250	200	130	100
Mar	270	220	120	120

Pass a list into the input columns to include the information about the product type

df_vertical.pivot(index='Month', columns=['Product', 'Region'], values=['Sales', 'Costs'])

	Sales				Costs
Product	A		B		A		B
Region	North	South	North	South	North	South	North	South
Month
Apr	NaN	NaN	180.0	210.0	NaN	NaN	100.0	100.0
Feb	260.0	210.0	NaN	NaN	130.0	100.0	NaN	NaN
Jan	250.0	200.0	NaN	NaN	130.0	100.0	NaN	NaN
Mar	270.0	220.0	NaN	NaN	120.0	120.0	NaN	NaN

Similarly, you can pass a list into the input index to include the information about the product type

df_vertical.pivot(index=['Month','Product'], columns=['Region'], values=['Sales', 'Costs'])

		Sales		Costs
	Region	North	South	North	South
Month	Product
Apr	B	180	210	100	100
Feb	A	260	210	130	100
Jan	A	250	200	130	100
Mar	A	270	220	120	120

Use `melt` to go from horizontal to vertical#

Take a look at the df_horizontal object

df_horizontal

	Month	North	South
0	Jan	250	200
1	Feb	260	210
2	Mar	270	220
3	Apr	180	210

Convert the df_horizontal into vertical/long format by using the Pandas function melt

pd.melt(df_horizontal)

	variable	value
0	Month	Jan
1	Month	Feb
2	Month	Mar
3	Month	Apr
4	North	250
5	North	260
6	North	270
7	North	180
8	South	200
9	South	210
10	South	220
11	South	210

To get a better result we need to pass the inputs

id_vars
value_vars
var_name
value_name

which will determine the structure of the new DataFrame

melted_df = pd.melt(df_horizontal, id_vars=['Month'], value_vars=['North', 'South'], var_name='Region', value_name='Sales')
melted_df

	Month	Region	Sales
0	Jan	North	250
1	Feb	North	260
2	Mar	North	270
3	Apr	North	180
4	Jan	South	200
5	Feb	South	210
6	Mar	South	220
7	Apr	South	210

Reshaping Data with Pandas

Contents

5. Reshaping Data with Pandas#

Getting ready#

How to do it#

Use pivot to go from vertical to horizontal#

Use melt to go from horizontal to vertical#

Use `pivot` to go from vertical to horizontal#

Use `melt` to go from horizontal to vertical#