{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Selecting and Manipulating Data Frames"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting ready\n",
"\n",
"1. Import both `pandas` and `numpy` libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Create a range of dates using `date_range`. We will use this as the index of our `DataFrame`.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"dates = pd.date_range(start='1/1/2020', periods=8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Create a `numpy` of size (8, 4) with random numbers. We will use these numbers as the content of our `DataFrame`"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"data = np.random.randn(8, 4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4. Create a `DataFrame` with index and contex given by `dates` and `data`, respectively. We will specify the names of the columns via the input `columns`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product_A | \n",
" Product_B | \n",
" Product_C | \n",
" Product_D | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-01 | \n",
" 0.881663 | \n",
" 0.987620 | \n",
" 0.515382 | \n",
" -0.244335 | \n",
"
\n",
" \n",
" 2020-01-02 | \n",
" -0.367481 | \n",
" -1.071615 | \n",
" 0.216773 | \n",
" 1.318869 | \n",
"
\n",
" \n",
" 2020-01-03 | \n",
" 1.174202 | \n",
" 1.252871 | \n",
" -0.545009 | \n",
" 0.077535 | \n",
"
\n",
" \n",
" 2020-01-04 | \n",
" -1.007827 | \n",
" -1.093999 | \n",
" 1.472493 | \n",
" -0.493387 | \n",
"
\n",
" \n",
" 2020-01-05 | \n",
" -0.972515 | \n",
" 0.119187 | \n",
" -0.563892 | \n",
" 0.389504 | \n",
"
\n",
" \n",
" 2020-01-06 | \n",
" 2.905071 | \n",
" 0.168398 | \n",
" -0.679490 | \n",
" -0.571970 | \n",
"
\n",
" \n",
" 2020-01-07 | \n",
" -0.070295 | \n",
" 0.269492 | \n",
" 0.107710 | \n",
" 0.998263 | \n",
"
\n",
" \n",
" 2020-01-08 | \n",
" 0.657587 | \n",
" 1.262432 | \n",
" -0.284617 | \n",
" 0.465781 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product_A Product_B Product_C Product_D\n",
"2020-01-01 0.881663 0.987620 0.515382 -0.244335\n",
"2020-01-02 -0.367481 -1.071615 0.216773 1.318869\n",
"2020-01-03 1.174202 1.252871 -0.545009 0.077535\n",
"2020-01-04 -1.007827 -1.093999 1.472493 -0.493387\n",
"2020-01-05 -0.972515 0.119187 -0.563892 0.389504\n",
"2020-01-06 2.905071 0.168398 -0.679490 -0.571970\n",
"2020-01-07 -0.070295 0.269492 0.107710 0.998263\n",
"2020-01-08 0.657587 1.262432 -0.284617 0.465781"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(data, index=dates, columns=['Product_A', 'Product_B', 'Product_C', 'Product_D'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to do it"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Select a column of the `DataFrame` by simply specifying its name as follows"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2020-01-01 0.881663\n",
"2020-01-02 -0.367481\n",
"2020-01-03 1.174202\n",
"2020-01-04 -1.007827\n",
"2020-01-05 -0.972515\n",
"2020-01-06 2.905071\n",
"2020-01-07 -0.070295\n",
"2020-01-08 0.657587\n",
"Freq: D, Name: Product_A, dtype: float64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Product_A']"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2020-01-01 0.881663\n",
"2020-01-02 -0.367481\n",
"2020-01-03 1.174202\n",
"2020-01-04 -1.007827\n",
"2020-01-05 -0.972515\n",
"2020-01-06 2.905071\n",
"2020-01-07 -0.070295\n",
"2020-01-08 0.657587\n",
"Freq: D, Name: Product_A, dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.Product_A"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Select a column by using the method `loc`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2020-01-01 0.881663\n",
"2020-01-02 -0.367481\n",
"2020-01-03 1.174202\n",
"2020-01-04 -1.007827\n",
"2020-01-05 -0.972515\n",
"2020-01-06 2.905071\n",
"2020-01-07 -0.070295\n",
"2020-01-08 0.657587\n",
"Freq: D, Name: Product_A, dtype: float64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[:,'Product_A']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Select several columns using the method `loc`"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product_A | \n",
" Product_B | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-01 | \n",
" 0.881663 | \n",
" 0.987620 | \n",
"
\n",
" \n",
" 2020-01-02 | \n",
" -0.367481 | \n",
" -1.071615 | \n",
"
\n",
" \n",
" 2020-01-03 | \n",
" 1.174202 | \n",
" 1.252871 | \n",
"
\n",
" \n",
" 2020-01-04 | \n",
" -1.007827 | \n",
" -1.093999 | \n",
"
\n",
" \n",
" 2020-01-05 | \n",
" -0.972515 | \n",
" 0.119187 | \n",
"
\n",
" \n",
" 2020-01-06 | \n",
" 2.905071 | \n",
" 0.168398 | \n",
"
\n",
" \n",
" 2020-01-07 | \n",
" -0.070295 | \n",
" 0.269492 | \n",
"
\n",
" \n",
" 2020-01-08 | \n",
" 0.657587 | \n",
" 1.262432 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product_A Product_B\n",
"2020-01-01 0.881663 0.987620\n",
"2020-01-02 -0.367481 -1.071615\n",
"2020-01-03 1.174202 1.252871\n",
"2020-01-04 -1.007827 -1.093999\n",
"2020-01-05 -0.972515 0.119187\n",
"2020-01-06 2.905071 0.168398\n",
"2020-01-07 -0.070295 0.269492\n",
"2020-01-08 0.657587 1.262432"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[:,['Product_A', 'Product_B']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4. Select a set of rows and columns using the method `loc`"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Product_A 2.905071\n",
"Product_B 0.168398\n",
"Name: 2020-01-06 00:00:00, dtype: float64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[dates[5],['Product_A', 'Product_B']]"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product_B | \n",
" Product_C | \n",
" Product_D | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-06 | \n",
" 0.168398 | \n",
" -0.679490 | \n",
" -0.571970 | \n",
"
\n",
" \n",
" 2020-01-07 | \n",
" 0.269492 | \n",
" 0.107710 | \n",
" 0.998263 | \n",
"
\n",
" \n",
" 2020-01-08 | \n",
" 1.262432 | \n",
" -0.284617 | \n",
" 0.465781 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product_B Product_C Product_D\n",
"2020-01-06 0.168398 -0.679490 -0.571970\n",
"2020-01-07 0.269492 0.107710 0.998263\n",
"2020-01-08 1.262432 -0.284617 0.465781"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[dates[5] :,'Product_B': ]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"5. Select a column by using `.iloc`. In this case, we have to pass the position of the column"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2020-01-01 0.987620\n",
"2020-01-02 -1.071615\n",
"2020-01-03 1.252871\n",
"2020-01-04 -1.093999\n",
"2020-01-05 0.119187\n",
"2020-01-06 0.168398\n",
"2020-01-07 0.269492\n",
"2020-01-08 1.262432\n",
"Freq: D, Name: Product_B, dtype: float64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:,1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"6. Select several columns by using the method `iloc`"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product_B | \n",
" Product_C | \n",
" Product_D | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-01 | \n",
" 0.987620 | \n",
" 0.515382 | \n",
" -0.244335 | \n",
"
\n",
" \n",
" 2020-01-02 | \n",
" -1.071615 | \n",
" 0.216773 | \n",
" 1.318869 | \n",
"
\n",
" \n",
" 2020-01-03 | \n",
" 1.252871 | \n",
" -0.545009 | \n",
" 0.077535 | \n",
"
\n",
" \n",
" 2020-01-04 | \n",
" -1.093999 | \n",
" 1.472493 | \n",
" -0.493387 | \n",
"
\n",
" \n",
" 2020-01-05 | \n",
" 0.119187 | \n",
" -0.563892 | \n",
" 0.389504 | \n",
"
\n",
" \n",
" 2020-01-06 | \n",
" 0.168398 | \n",
" -0.679490 | \n",
" -0.571970 | \n",
"
\n",
" \n",
" 2020-01-07 | \n",
" 0.269492 | \n",
" 0.107710 | \n",
" 0.998263 | \n",
"
\n",
" \n",
" 2020-01-08 | \n",
" 1.262432 | \n",
" -0.284617 | \n",
" 0.465781 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product_B Product_C Product_D\n",
"2020-01-01 0.987620 0.515382 -0.244335\n",
"2020-01-02 -1.071615 0.216773 1.318869\n",
"2020-01-03 1.252871 -0.545009 0.077535\n",
"2020-01-04 -1.093999 1.472493 -0.493387\n",
"2020-01-05 0.119187 -0.563892 0.389504\n",
"2020-01-06 0.168398 -0.679490 -0.571970\n",
"2020-01-07 0.269492 0.107710 0.998263\n",
"2020-01-08 1.262432 -0.284617 0.465781"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:,1:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"7. Select a row by using the method `iloc`"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Product_A 0.881663\n",
"Product_B 0.987620\n",
"Product_C 0.515382\n",
"Product_D -0.244335\n",
"Name: 2020-01-01 00:00:00, dtype: float64"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[0,:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"8. Reset the index of the `DataFrame`"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" Product_A | \n",
" Product_B | \n",
" Product_C | \n",
" Product_D | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2020-01-01 | \n",
" 0.881663 | \n",
" 0.987620 | \n",
" 0.515382 | \n",
" -0.244335 | \n",
"
\n",
" \n",
" 1 | \n",
" 2020-01-02 | \n",
" -0.367481 | \n",
" -1.071615 | \n",
" 0.216773 | \n",
" 1.318869 | \n",
"
\n",
" \n",
" 2 | \n",
" 2020-01-03 | \n",
" 1.174202 | \n",
" 1.252871 | \n",
" -0.545009 | \n",
" 0.077535 | \n",
"
\n",
" \n",
" 3 | \n",
" 2020-01-04 | \n",
" -1.007827 | \n",
" -1.093999 | \n",
" 1.472493 | \n",
" -0.493387 | \n",
"
\n",
" \n",
" 4 | \n",
" 2020-01-05 | \n",
" -0.972515 | \n",
" 0.119187 | \n",
" -0.563892 | \n",
" 0.389504 | \n",
"
\n",
" \n",
" 5 | \n",
" 2020-01-06 | \n",
" 2.905071 | \n",
" 0.168398 | \n",
" -0.679490 | \n",
" -0.571970 | \n",
"
\n",
" \n",
" 6 | \n",
" 2020-01-07 | \n",
" -0.070295 | \n",
" 0.269492 | \n",
" 0.107710 | \n",
" 0.998263 | \n",
"
\n",
" \n",
" 7 | \n",
" 2020-01-08 | \n",
" 0.657587 | \n",
" 1.262432 | \n",
" -0.284617 | \n",
" 0.465781 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index Product_A Product_B Product_C Product_D\n",
"0 2020-01-01 0.881663 0.987620 0.515382 -0.244335\n",
"1 2020-01-02 -0.367481 -1.071615 0.216773 1.318869\n",
"2 2020-01-03 1.174202 1.252871 -0.545009 0.077535\n",
"3 2020-01-04 -1.007827 -1.093999 1.472493 -0.493387\n",
"4 2020-01-05 -0.972515 0.119187 -0.563892 0.389504\n",
"5 2020-01-06 2.905071 0.168398 -0.679490 -0.571970\n",
"6 2020-01-07 -0.070295 0.269492 0.107710 0.998263\n",
"7 2020-01-08 0.657587 1.262432 -0.284617 0.465781"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.reset_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that this operation does not modify the original `DataFrame`. \n",
"Verify this by inspecting it."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product_A | \n",
" Product_B | \n",
" Product_C | \n",
" Product_D | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-01 | \n",
" 0.881663 | \n",
" 0.987620 | \n",
" 0.515382 | \n",
" -0.244335 | \n",
"
\n",
" \n",
" 2020-01-02 | \n",
" -0.367481 | \n",
" -1.071615 | \n",
" 0.216773 | \n",
" 1.318869 | \n",
"
\n",
" \n",
" 2020-01-03 | \n",
" 1.174202 | \n",
" 1.252871 | \n",
" -0.545009 | \n",
" 0.077535 | \n",
"
\n",
" \n",
" 2020-01-04 | \n",
" -1.007827 | \n",
" -1.093999 | \n",
" 1.472493 | \n",
" -0.493387 | \n",
"
\n",
" \n",
" 2020-01-05 | \n",
" -0.972515 | \n",
" 0.119187 | \n",
" -0.563892 | \n",
" 0.389504 | \n",
"
\n",
" \n",
" 2020-01-06 | \n",
" 2.905071 | \n",
" 0.168398 | \n",
" -0.679490 | \n",
" -0.571970 | \n",
"
\n",
" \n",
" 2020-01-07 | \n",
" -0.070295 | \n",
" 0.269492 | \n",
" 0.107710 | \n",
" 0.998263 | \n",
"
\n",
" \n",
" 2020-01-08 | \n",
" 0.657587 | \n",
" 1.262432 | \n",
" -0.284617 | \n",
" 0.465781 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product_A Product_B Product_C Product_D\n",
"2020-01-01 0.881663 0.987620 0.515382 -0.244335\n",
"2020-01-02 -0.367481 -1.071615 0.216773 1.318869\n",
"2020-01-03 1.174202 1.252871 -0.545009 0.077535\n",
"2020-01-04 -1.007827 -1.093999 1.472493 -0.493387\n",
"2020-01-05 -0.972515 0.119187 -0.563892 0.389504\n",
"2020-01-06 2.905071 0.168398 -0.679490 -0.571970\n",
"2020-01-07 -0.070295 0.269492 0.107710 0.998263\n",
"2020-01-08 0.657587 1.262432 -0.284617 0.465781"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to modify the original `DataFrame`, use the input `inplace=True`."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"df.reset_index(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" Product_A | \n",
" Product_B | \n",
" Product_C | \n",
" Product_D | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2020-01-01 | \n",
" 0.881663 | \n",
" 0.987620 | \n",
" 0.515382 | \n",
" -0.244335 | \n",
"
\n",
" \n",
" 1 | \n",
" 2020-01-02 | \n",
" -0.367481 | \n",
" -1.071615 | \n",
" 0.216773 | \n",
" 1.318869 | \n",
"
\n",
" \n",
" 2 | \n",
" 2020-01-03 | \n",
" 1.174202 | \n",
" 1.252871 | \n",
" -0.545009 | \n",
" 0.077535 | \n",
"
\n",
" \n",
" 3 | \n",
" 2020-01-04 | \n",
" -1.007827 | \n",
" -1.093999 | \n",
" 1.472493 | \n",
" -0.493387 | \n",
"
\n",
" \n",
" 4 | \n",
" 2020-01-05 | \n",
" -0.972515 | \n",
" 0.119187 | \n",
" -0.563892 | \n",
" 0.389504 | \n",
"
\n",
" \n",
" 5 | \n",
" 2020-01-06 | \n",
" 2.905071 | \n",
" 0.168398 | \n",
" -0.679490 | \n",
" -0.571970 | \n",
"
\n",
" \n",
" 6 | \n",
" 2020-01-07 | \n",
" -0.070295 | \n",
" 0.269492 | \n",
" 0.107710 | \n",
" 0.998263 | \n",
"
\n",
" \n",
" 7 | \n",
" 2020-01-08 | \n",
" 0.657587 | \n",
" 1.262432 | \n",
" -0.284617 | \n",
" 0.465781 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index Product_A Product_B Product_C Product_D\n",
"0 2020-01-01 0.881663 0.987620 0.515382 -0.244335\n",
"1 2020-01-02 -0.367481 -1.071615 0.216773 1.318869\n",
"2 2020-01-03 1.174202 1.252871 -0.545009 0.077535\n",
"3 2020-01-04 -1.007827 -1.093999 1.472493 -0.493387\n",
"4 2020-01-05 -0.972515 0.119187 -0.563892 0.389504\n",
"5 2020-01-06 2.905071 0.168398 -0.679490 -0.571970\n",
"6 2020-01-07 -0.070295 0.269492 0.107710 0.998263\n",
"7 2020-01-08 0.657587 1.262432 -0.284617 0.465781"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}