PYTHON

pandas

Date Published:
Last Modified:

Overview

pandas is a data analystics library for Python. It provides high-level data structures and analytics tools for data analysis.

The pandas logo.

The pandas logo.

Installation

pandas can be installed with pip:

$ pip install pandas

or conda:

$ conda install pandas

The Dataframe

The core data structure in pandas is the DataFrame. A DataFrame is a container for holding tabular data (2D), and supports labelled rows and columns.

You can create a DataFrame by passing in a dict:

df = pd.DataFrame({
  'Name': [ 'John', 'Geoff', 'Brett' ],
  'Age': [ 45, 23, 30 ],
  'Height': [ 1.23, 4.56, 7.89 ],
})

You can then print the dataframe, and pandas will render the data nicely in a tabular form:

print(df)
#     Name  Age  Height
# 0   John   45    1.23
# 1  Geoff   23    4.56
# 2  Brett   30    7.89

Selecting Columns

You can then select (extract) certain columns of data by passing in a list of the column names you want:

print(df[['Name', 'Height']])
#     Name  Height
# 0   John    1.23
# 1  Geoff    4.56
# 2  Brett    7.8

The command above returns a dataframe.

Selecting Rows Based On A Column Value

To select all rows in a dataframe in where a particular column has a certain value, use the following code:

df.loc[df['column_name'] == some_value]

This returns a new dataframe with only the applicable rows included.

For example:

import pandas as pd

df = pd.DataFrame({
    'A': [ 1, 5, 6, 3, 4 ],
    'B': [ 'foo', 'bar', 'bar', 'foo', 'foo' ]
})
print(df)
#    A    B
# 0  1  foo
# 1  5  bar
# 2  6  bar
# 3  3  foo
# 4  4  foo

filtered_df = df.loc[df['B'] == 'foo']
print(filtered_df)
#    A    B
# 0  1  foo
# 3  3  foo
# 4  4  foo

Parsing CSV Files

pandas has first-tier support for CSV files. It can load in a CSV file directly into a DataFrame, ready for analyzing, without having to write any line-by-line CSV parsing. It will also label the columns if the CSV file has a header row (which is recommended!).

To load a CSV file into a DataFrame:

df = pandas.read_csv('file_path.csv')

Like this page? Upvote with shurikens!

Related Content:

Tags:

comments powered by Disqus