Pandas Series


import pandas as pd

Pandas provides high-performance, easy-to-use data structures and data analysis tools in Python

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option("display.max_rows", 8)
plt.rcParams['figure.figsize'] = (9, 6)


  • A Series contains a one-dimensional array of data, and an associated sequence of labels called the index.
  • The index can contain numeric, string, or date/time values.
  • When the index is a time value, the series is a time series.
  • The index must be the same length as the data.
  • If no index is supplied it is automatically generated as range(len(data)).
pd.Series([1,3,5,np.nan,6,8], dtype=np.float64)
pd.Series(index=pd.period_range('09/11/2017', '09/18/2017', freq="D"), dtype=np.int8)


  • Create a text with lorem and count word occurences with a collection.Counter. Put the result in a dict.


  • From the results create a Pandas series name latin_series with words in alphabetical order as index.
df = pd.Series(result)


  • Plot the series using ‘bar’ kind.


  • Pandas provides explicit functions for indexing loc and iloc.
    • Use loc to display the number of occurrences of ‘dolore’.
    • Use iloc to diplay the number of occurrences of the last word in index.


  • Sort words by number of occurrences.
  • Plot the Series.

Full globe temperature between 1901 and 2000.

We read the text file and load the results in a pandas dataframe. In cells below you need to clean the data and convert the dataframe to a time series.

import os
here = os.getcwd()

filename = os.path.join(here,"data","")

df = pd.read_table(filename, sep="\s+", 
                   names=["year", "month", "mean temp"])


  • Insert a third column with value one named “day” with .insert.
  • convert df index to datetime with pd.to_datetime function.
  • convert df to Series containing only “mean temp” column.


  • Display the beginning of the file with .head.


  • Display the end of the file with .tail.

In the dataset, -999.00 was used to indicate that there was no value for that year.


  • Display values equal to -999 with .values.
  • Replace the missing value (-999.000) by np.nan

Once they have been converted to np.nan, missing values can be removed (dropped).


  • Remove missing values with .dropna.


  • Generate a basic visualization using .plot.


Convert df index from timestamp to period is more meaningfull since it was measured and averaged over the month. Use to_period method.


Series can be resample, downsample or upsample. - Frequencies can be specified as strings: “us”, “ms”, “S”, “T”, “H”, “D”, “B”, “W”, “M”, “A”, “3min”, “2h20”, … - More aliases at


  • With resample method, convert df Series to 10 year blocks:

Saving Work

HDF5 is widely used and one of the most powerful file format to store binary data. It allows to store both Series and DataFrames.

with pd.HDFStore("data/pandas_series.h5") as writer:
    df.to_hdf(writer, "/temperatures/full_globe")

Reloading data

with pd.HDFStore("data/pandas_series.h5") as store:
    df = store["/temperatures/full_globe"]