Tutorials
Hands-On Python
Hands-On Python
  • Hands-On Python Tutorial For Real-World Business Analytics Problems
  • Preface
    • Section I. A Note From The Author
    • Section II. Tutorial Overview
    • Section III. What Is The Preflight Checklist?
    • Section IV. Supplimentery Material
  • Preflight Checklist
    • Section V. Select Your Difficulty Setting
    • Section VI. Download Anaconda
    • Section VII. Download PyCharm (Optional)
    • Section VIII. Download SQL Server Developer Edition
    • Section IX. Configure Database Environment
    • Section X. Download The Source Code
    • Section XI. Starting JupyterLab
    • Section XII. How To Get Help With This Tutorial
  • Language Basics
    • Lesson 1. Obligatory Hello World
    • Lesson 2. Code Comments
    • Lesson 3. Data Types
    • Lesson 4. Variables
    • Lesson 5. String Concatenation
    • Lesson 6. Arithmetic Operators
    • Lesson 7. Making Decisions
    • Lesson 8. Control Flow With if-elif-else
    • Lesson 9. Control Flow With while
    • Lesson 10. Data Structures Part I: List
    • Lesson 11. Data Structures Part II: Tuples
    • Lesson 12. Data Structures Part III: Dictionaries
    • Lesson 13. Looping With for
    • Lesson 14. Functions
    • Lesson 15. Importing Modules
    • Lesson 16. Python Programming Standards
  • Advanced Topics
    • Lesson 17. Functional Programing With map
    • Lesson 18. Generators
    • Lesson 19. Comprehensions
    • Lesson 20. Basic File Operations
    • Lesson 21. Working With Data In Numpy
    • Lesson 22. Working With Data In Pandas
    • Lesson 23. Working With JSON
    • Lesson 24. Making File Request Over HTTP And SFTP
    • Lesson 25. Interacting With Databases
    • Lesson 26. Saving Objects With Pickle
    • Lesson 27. Error Handling
    • Lesson 28. Bringing It All Together
  • Solutions To Real World Problems
    • Lesson 29. Download A Zip File Over HTTP
    • Lesson 30. Looping Over Files In A Directory
    • Lesson 31. Convert Comma Delmited Files To Pipe Delimited
    • Lesson 32. Combining Multiple CSVs Into One File
    • Lesson 33. Load Large CSVs Into Data Warehouse Staging Tables
    • Lesson 34. Efficiently Write Large Database Query Results To Disk
    • Lesson 35. Working With SFTP In The Real World
    • Lesson 36. Executing Python From SQL Server Agent
Powered by GitBook
On this page
  • Examples
  • Now you try it!
  1. Advanced Topics

Lesson 22. Working With Data In Pandas

Pandas are built on top of Numpy. It is one of the most used and preferred data analysis library. Data manipulation with Pandas become a lot easier and intuitive. If you are familiar with Numpy arrays, then moving onto pandas would be a lot easier. You can convert Numpy array into pandas dataframe by passing array object to Pandas' DataFrame function.

The cool thing about pandas is that it can take data from multiple sources (like Numpy arrays, Excel sheets in csv formant, or SQL databases) and creates table like grid having rows and columns that is very similar to format we see in relational databases. If you are familiar with R language, you will see similarities too.

Pandas also allows us to easily access a portion of data using indexing and perform operations on that portion of data. Perfoming operations on a portion of data, especially when there are multiple lists, becomes cumbersome using builtin python lists.

You can do away with a lot of overhead, when you load tabular data into a pandas DataFrame, as you can see most commonly used statistical information like mean,average,max,std,count with just one describe() function.

Examples

Example #1: Importing Data From CSV File

You can import any excel sheet into pandas DataFrame, but for sake of this tutorial, I am going to load this open source titanic dataset.

import pandas as pd
import shutil
import glob
import os

if not 'script_dir' in globals():
    script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'PandasExample\\'
source_file_name = 'titanic.csv'
target_file_name = 'female_dataset.csv'

source_path = os.path.join(script_dir, data_directory, example_directory, file_name)
target_path = os.path.join(script_dir, data_directory, example_directory, target_file_name)

#Import and show top five rows.
dataset = pd.read_csv(source_path)
dataset.head(5)

Example #2: Exploring Your Data

Below is some basic exploratory data analysis.

dataset.describe()
#Let's only select femal passenger's data
female_dataset = dataset[dataset.Sex == "female"]
female_dataset.head(5)

Example #3: Writing DataFrames To Disk

Let's save the new dataset into csv file pipe delimited. The default delimiter is comma (You can use any delimiter). Run the code and check the example folder for the new file.

female_dataset.to_csv(target_path, sep='|')

Now you try it!

Don't copy and past. Type the code yourself!

PreviousLesson 21. Working With Data In NumpyNextLesson 23. Working With JSON

Last updated 3 years ago