# Creating Series and Data Frames

Pandas provides two types of classes for handling data:

- Series: a one-dimensional labeled array holding data of any type
such as integers, strings, Python objects etc.
- DataFrame: a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.

In this recipe, you will learn how to create objects of this types.

## How to do it

1. Import both `pandas` and `numpy` libraries.

In [2]:
import pandas as pd
import numpy as np

2. Create an array of a given shape and populate it with random numbers from a uniform distribution over the interval $[0, 1)$.
To do this use the method `rand` from the `numpy.random` module. 

In [3]:
random_numbers = np.random.rand(5)
random_numbers

array([0.48531061, 0.73009714, 0.87925494, 0.79081622, 0.55796458])

In [4]:
type(random_numbers)

numpy.ndarray

### Series

3. Create a pandas `Series` object  from a `numpy.ndarray` of random numbers.

In [5]:
series = pd.Series(random_numbers)
series

0    0.485311
1    0.730097
2    0.879255
3    0.790816
4    0.557965
dtype: float64

In [6]:
type(series)

pandas.core.series.Series

4. Create a pandas `Series` object from a `numpy.ndarray` specifying the `index`

In [7]:
series = pd.Series(random_numbers, index=["a", "b", "c", "d", "e"])
series

a    0.485311
b    0.730097
c    0.879255
d    0.790816
e    0.557965
dtype: float64

5. Create a pandas `Series` from a dictionary. In this case, the keys of the dictionary act as indices.

In [8]:
d = {"b": 1, "a": 0, "c": 2, "d":None}
series = pd.Series(d)
series


b    1.0
a    0.0
c    2.0
d    NaN
dtype: float64

### DataFrames

#### From Dictionaries

6. Create a `DataFrame` from a dictionary containing pandas `Series` objects.

In [9]:
d = {
    "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
    "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


7. Create a `DataFrame` from a dictionary containing iterable elements such as lists, tuples, or `numpy.ndarrays`

In [10]:
d = {"one": [1.0, 2.0, 3.0, 4.0], "two": [4.0, 3.0, 2.0, 1.0]}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
0,1.0,4.0
1,2.0,3.0
2,3.0,2.0
3,4.0,1.0


In [11]:
d = {"one": (1.0, 2.0, 3.0, 4.0), "two": (4.0, 3.0, 2.0, 1.0)}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
0,1.0,4.0
1,2.0,3.0
2,3.0,2.0
3,4.0,1.0


In [12]:
random_numbers1 = np.random.rand(5)
random_numbers2 = np.random.rand(5)
d = {"one": random_numbers1, "two": random_numbers2}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
0,0.12193,0.953737
1,0.114782,0.813447
2,0.952269,0.566762
3,0.952865,0.041119
4,0.924162,0.162749


#### From numpy arrays

8. Create a `DataFrame` from a structured or record `numpy.ndarrays`

In [13]:
data = np.random.randn(3, 4)
df = pd.DataFrame(data)
df

Unnamed: 0,0,1,2,3
0,-0.668084,0.222884,2.513686,0.305373
1,1.017044,0.42786,1.522738,0.670215
2,0.401991,-0.68087,-0.603461,-1.8917


## There is more

Some additional steps for your recipe.

1. Specify the names of the columns of the columns by passing the input `columns`

In [14]:
data = np.random.randn(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
0,0.637888,-1.246476,-0.827807,0.793993
1,0.094042,-0.334123,-2.314964,-0.040574
2,-1.540451,0.141884,-0.412291,1.598198
3,0.853527,1.48753,0.62823,0.689152
4,-0.016924,-0.426706,-0.372193,-2.188799
5,1.920892,0.264744,-0.472632,1.318725
6,0.273845,-3.55763,0.156851,1.547334
7,0.471513,-0.300621,-0.430357,-0.052466
8,-1.247736,0.057143,0.469189,0.082202
9,0.953041,0.361258,1.09325,1.18467


2. Specify the index of the `DataFrame` by passing the input `index`

In [15]:
data = np.random.randn(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'], index=[
                  'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'j', 'i'])
df

Unnamed: 0,A,B,C,D
a,0.578273,1.270129,0.342404,1.004533
b,-0.192073,1.681448,-1.349938,-0.901333
c,-0.425436,-2.497983,-0.866897,1.658416
d,-0.100432,-0.755526,1.970736,-1.316195
e,-1.627001,0.236729,2.445102,-0.065333
f,-0.346216,1.131049,0.811479,1.271289
g,-1.16423,-2.096089,-0.95183,-0.795075
h,-0.398551,1.121733,-0.346834,-0.327875
j,0.317409,-0.049026,-0.016953,1.044969
i,-1.712959,0.860858,1.676125,2.020117


3. Create a range of dates to be used as an index in our `DataFrame` by simply calling the method `date_range`

In [16]:
dates = pd.date_range(start='1/1/2024', periods=10)
data = np.random.randn(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'], index=dates)
df

Unnamed: 0,A,B,C,D
2024-01-01,0.78362,-0.40298,-0.097422,1.071258
2024-01-02,-0.035859,-0.42942,0.403533,-0.091177
2024-01-03,2.096545,0.981989,-0.096551,-0.72605
2024-01-04,-0.645941,-0.470849,-0.616163,-0.915679
2024-01-05,-0.577531,-0.639427,-1.022793,-0.595243
2024-01-06,-0.407882,0.040938,-0.642683,0.387839
2024-01-07,0.397409,1.445801,0.528059,-0.158812
2024-01-08,-1.956465,-0.271021,-1.102996,-0.758496
2024-01-09,0.376288,-0.237009,0.010373,0.504651
2024-01-10,1.490337,2.252731,0.67962,0.316594
