Creating Series and Data Frames

1. Creating Series and Data Frames#

Pandas provides two types of classes for handling data:

Series: a one-dimensional labeled array holding data of any type such as integers, strings, Python objects etc.
DataFrame: a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.

In this recipe, you will learn how to create objects of this types.

How to do it#

Import both pandas and numpy libraries.

import pandas as pd
import numpy as np

Create an array of a given shape and populate it with random numbers from a uniform distribution over the interval \([0, 1)\). To do this use the method rand from the numpy.random module.

random_numbers = np.random.rand(5)
random_numbers

array([0.47946782, 0.94900802, 0.8003903 , 0.66087312, 0.5021559 ])

type(random_numbers)

numpy.ndarray

Series#

Create a pandas Series object from a numpy.ndarray of random numbers.

series = pd.Series(random_numbers)
series

  0.479468
  0.949008
  0.800390
  0.660873
  0.502156
dtype: float64

type(series)

pandas.core.series.Series

Create a pandas Series object from a numpy.ndarray specifying the index

series = pd.Series(random_numbers, index=["a", "b", "c", "d", "e"])
series

a    0.479468
b    0.949008
c    0.800390
d    0.660873
e    0.502156
dtype: float64

Create a pandas Series from a dictionary. In this case, the keys of the dictionary act as indices.

d = {"b": 1, "a": 0, "c": 2, "d":None}
series = pd.Series(d)
series

b    1.0
a    0.0
c    2.0
d    NaN
dtype: float64

DataFrames#

From Dictionaries#

Create a DataFrame from a dictionary containing pandas Series objects.

d = {
    "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
    "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
}
df = pd.DataFrame(d)
df

	one	two
a	1.0	1.0
b	2.0	2.0
c	3.0	3.0
d	NaN	4.0

Create a DataFrame from a dictionary containing iterable elements such as lists, tuples, or numpy.ndarrays

d = {"one": [1.0, 2.0, 3.0, 4.0], "two": [4.0, 3.0, 2.0, 1.0]}
df = pd.DataFrame(d)
df

	one	two
0	1.0	4.0
1	2.0	3.0
2	3.0	2.0
3	4.0	1.0

d = {"one": (1.0, 2.0, 3.0, 4.0), "two": (4.0, 3.0, 2.0, 1.0)}
df = pd.DataFrame(d)
df

	one	two
0	1.0	4.0
1	2.0	3.0
2	3.0	2.0
3	4.0	1.0

random_numbers1 = np.random.rand(5)
random_numbers2 = np.random.rand(5)
d = {"one": random_numbers1, "two": random_numbers2}
df = pd.DataFrame(d)
df

	one	two
0	0.902797	0.056748
1	0.203754	0.555891
2	0.884239	0.304098
3	0.991691	0.891247
4	0.223379	0.588790

From numpy arrays#

Create a DataFrame from a structured or record numpy.ndarrays

data = np.random.randn(3, 4)
df = pd.DataFrame(data)
df

	0	1	2	3
0	-0.739118	0.807602	-2.480828	0.392597
1	0.261429	-0.643314	-0.381101	0.898246
2	0.868136	-0.146715	0.550467	-1.919859

There is more#

Some additional steps for your recipe.

Specify the names of the columns of the columns by passing the input columns

data = np.random.randn(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])
df

	A	B	C	D
0	0.476239	0.371352	0.568178	0.007894
1	-0.277712	-0.190171	-1.526762	-0.973880
2	-0.630408	0.736540	-1.075738	0.126177
3	-1.749325	0.616581	1.408302	1.288096
4	0.782224	-1.192311	0.527716	-1.211001
5	-0.647764	-0.724451	-1.019370	-0.287565
6	0.639737	1.042487	0.730149	-0.479395
7	-0.731619	-0.717644	-0.697512	1.361714
8	2.158295	-2.013561	-0.891208	-1.019687
9	-1.035943	-0.607005	-0.487272	0.722944

Specify the index of the DataFrame by passing the input index

data = np.random.randn(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'], index=[
                  'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'j', 'i'])
df

	A	B	C	D
a	1.650168	0.893275	-0.623417	-1.449416
b	0.567526	-1.315214	-1.026773	-1.069463
c	-0.669923	1.250267	0.208683	-0.108237
d	0.416075	-0.105983	0.628503	-0.898255
e	-0.019420	-0.616960	-0.071081	1.074706
f	0.136327	0.041658	1.506550	0.809719
g	1.179197	0.150831	0.601135	1.176382
h	1.944653	-0.580342	0.336385	-1.179848
j	0.060012	-0.097694	0.851866	0.126996
i	-0.897467	-1.518786	2.172422	-1.962632

Create a range of dates to be used as an index in our DataFrame by simply calling the method date_range

dates = pd.date_range(start='1/1/2024', periods=10)
data = np.random.randn(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'], index=dates)
df

	A	B	C	D
2024-01-01	0.081631	0.605092	-0.577445	1.360987
2024-01-02	0.643335	-0.418880	2.053832	0.185818
2024-01-03	-0.868394	0.243802	-1.390107	-0.009417
2024-01-04	-1.539327	-0.288292	-1.631790	1.616059
2024-01-05	-0.064223	-1.641774	-0.567148	0.066072
2024-01-06	0.434569	0.020560	-0.606185	-2.128939
2024-01-07	1.228884	2.001144	0.066804	1.220431
2024-01-08	0.573809	0.445986	-0.918571	0.033251
2024-01-09	1.113194	0.246670	0.084873	0.661690
2024-01-10	-0.564881	0.206114	1.104971	0.011947

Creating Series and Data Frames

Contents

1. Creating Series and Data Frames#

How to do it#

Series#

DataFrames#

From Dictionaries#

From numpy arrays#

There is more#