Tutorials
Hands-On Python
Hands-On Python
  • Hands-On Python Tutorial For Real-World Business Analytics Problems
  • Preface
    • Section I. A Note From The Author
    • Section II. Tutorial Overview
    • Section III. What Is The Preflight Checklist?
    • Section IV. Supplimentery Material
  • Preflight Checklist
    • Section V. Select Your Difficulty Setting
    • Section VI. Download Anaconda
    • Section VII. Download PyCharm (Optional)
    • Section VIII. Download SQL Server Developer Edition
    • Section IX. Configure Database Environment
    • Section X. Download The Source Code
    • Section XI. Starting JupyterLab
    • Section XII. How To Get Help With This Tutorial
  • Language Basics
    • Lesson 1. Obligatory Hello World
    • Lesson 2. Code Comments
    • Lesson 3. Data Types
    • Lesson 4. Variables
    • Lesson 5. String Concatenation
    • Lesson 6. Arithmetic Operators
    • Lesson 7. Making Decisions
    • Lesson 8. Control Flow With if-elif-else
    • Lesson 9. Control Flow With while
    • Lesson 10. Data Structures Part I: List
    • Lesson 11. Data Structures Part II: Tuples
    • Lesson 12. Data Structures Part III: Dictionaries
    • Lesson 13. Looping With for
    • Lesson 14. Functions
    • Lesson 15. Importing Modules
    • Lesson 16. Python Programming Standards
  • Advanced Topics
    • Lesson 17. Functional Programing With map
    • Lesson 18. Generators
    • Lesson 19. Comprehensions
    • Lesson 20. Basic File Operations
    • Lesson 21. Working With Data In Numpy
    • Lesson 22. Working With Data In Pandas
    • Lesson 23. Working With JSON
    • Lesson 24. Making File Request Over HTTP And SFTP
    • Lesson 25. Interacting With Databases
    • Lesson 26. Saving Objects With Pickle
    • Lesson 27. Error Handling
    • Lesson 28. Bringing It All Together
  • Solutions To Real World Problems
    • Lesson 29. Download A Zip File Over HTTP
    • Lesson 30. Looping Over Files In A Directory
    • Lesson 31. Convert Comma Delmited Files To Pipe Delimited
    • Lesson 32. Combining Multiple CSVs Into One File
    • Lesson 33. Load Large CSVs Into Data Warehouse Staging Tables
    • Lesson 34. Efficiently Write Large Database Query Results To Disk
    • Lesson 35. Working With SFTP In The Real World
    • Lesson 36. Executing Python From SQL Server Agent
Powered by GitBook
On this page
  • Examples
  • Now you try it!
  1. Advanced Topics

Lesson 20. Basic File Operations

When it comes to running data for a living, sooner or later you’re gonna have to deal with a file. You’re going to have to download it, move it, manipulate it, load it, smack it up, flip it, rub it down, OH NO!

There are other parts of this tutorial where we will learn how to download files. For now, we’re going to make the assumption that the file is living on the local drive and we need to push it around various places.

The examples below will walk you through some basic task with files. You might have noticed that I do not teach you how to create a file. We will do that when we get to talking about Pandas.

I’m also not going to teach you how to create directories at this point. If you track out to data engineering, you will get more information on manipulating the file system.

In the BasicFileOpsExample directory, you will find three directories: In, Out, Archive. This is a pattern from basic data warehouse processing. We are going to simulate moving a file through a data warehouse ETL processing evolution.

Examples

Example #1: Move A File

AKA cut and paste. Moving files is a fast operation because we are just changing a pointer to the ones and zeros to point to something else. We are not actually pushing bits around disk to make the move happen.

The module shutil is filled with all kinds of file handling goodies.

In the root of the example directory is a small file. While it is in the root folder, it is simulating being outside the boundaries of the data warehouse environment. Let’s bring it inside by moving it to the In folder.

import shutil as sh
import os

if not 'script_dir' in globals():
    script_dir = os.getcwd()
    
data_directory = 'data\\'
example_directory = 'BasicFileOpsExample\\'
target_directory = 'In\\'
file_name = 'forestfires.csv'

source_path = os.path.join(script_dir,data_directory,example_directory,file_name)
target_path = os.path.join(script_dir,data_directory,example_directory,target_directory,file_name)

sh.move(source_path, target_path)

Example #2: Archiving A File

We are now done processing the file and we need to archive it in case we need to drag it out and reload the system.

The process of archiving is multi step.

  1. Zip up the file.

  2. Move the file to the Archive folder.

  3. Blow away the original.

Once you run the example, check the Archive folder and the In folder. You should see a zip file in Archive and nothing in the In folder.

import zipfile as zf
import os

if not 'script_dir' in globals():
    script_dir = os.getcwd()
    
data_directory = 'data\\'
example_directory = 'BasicFileOpsExample\\'
source_directory = 'In\\'
target_directory = 'Archive\\'
file_name = 'forestfires.csv'
archive_name = 'forestfires.zip'

target_path = os.path.join(script_dir,data_directory,example_directory,target_directory,archive_name)
source_path = os.path.join(data_directory,example_directory,source_directory)

archive = zf.ZipFile(target_path, "w")
os.chdir(source_path)
archive.write(file_name)
archive.close()

Now you try it!

Don't copy and past. Type the code yourself!

PreviousLesson 19. ComprehensionsNextLesson 21. Working With Data In Numpy

Last updated 3 years ago