Lesson 22. Working With Data In Pandas

Pandas are built on top of Numpy. It is one of the most used and preferred data analysis library. Data manipulation with Pandas become a lot easier and intuitive. If you are familiar with Numpy arrays, then moving onto pandas would be a lot easier. You can convert Numpy array into pandas dataframe by passing array object to Pandas' DataFrame function.

The cool thing about pandas is that it can take data from multiple sources (like Numpy arrays, Excel sheets in csv formant, or SQL databases) and creates table like grid having rows and columns that is very similar to format we see in relational databases. If you are familiar with R language, you will see similarities too.

Pandas also allows us to easily access a portion of data using indexing and perform operations on that portion of data. Perfoming operations on a portion of data, especially when there are multiple lists, becomes cumbersome using builtin python lists.

You can do away with a lot of overhead, when you load tabular data into a pandas DataFrame, as you can see most commonly used statistical information like mean,average,max,std,count with just one describe() function.

Examples

Example #1: Importing Data From CSV File

You can import any excel sheet into pandas DataFrame, but for sake of this tutorial, I am going to load this open source titanic dataset.

import pandas as pd
import shutil
import glob
import os

if not 'script_dir' in globals():
    script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'PandasExample\\'
source_file_name = 'titanic.csv'
target_file_name = 'female_dataset.csv'

source_path = os.path.join(script_dir, data_directory, example_directory, file_name)
target_path = os.path.join(script_dir, data_directory, example_directory, target_file_name)

#Import and show top five rows.
dataset = pd.read_csv(source_path)
dataset.head(5)

Example #2: Exploring Your Data

Below is some basic exploratory data analysis.

dataset.describe()
#Let's only select femal passenger's data
female_dataset = dataset[dataset.Sex == "female"]
female_dataset.head(5)

Example #3: Writing DataFrames To Disk

Let's save the new dataset into csv file pipe delimited. The default delimiter is comma (You can use any delimiter). Run the code and check the example folder for the new file.

female_dataset.to_csv(target_path, sep='|')

Now you try it!

Don't copy and past. Type the code yourself!

Last updated