Lesson 20. Basic File Operations

When it comes to running data for a living, sooner or later you’re gonna have to deal with a file. You’re going to have to download it, move it, manipulate it, load it, smack it up, flip it, rub it down, OH NO!

There are other parts of this tutorial where we will learn how to download files. For now, we’re going to make the assumption that the file is living on the local drive and we need to push it around various places.

The examples below will walk you through some basic task with files. You might have noticed that I do not teach you how to create a file. We will do that when we get to talking about Pandas.

I’m also not going to teach you how to create directories at this point. If you track out to data engineering, you will get more information on manipulating the file system.

In the BasicFileOpsExample directory, you will find three directories: In, Out, Archive. This is a pattern from basic data warehouse processing. We are going to simulate moving a file through a data warehouse ETL processing evolution.

Examples

Example #1: Move A File

AKA cut and paste. Moving files is a fast operation because we are just changing a pointer to the ones and zeros to point to something else. We are not actually pushing bits around disk to make the move happen.

The module shutil is filled with all kinds of file handling goodies.

In the root of the example directory is a small file. While it is in the root folder, it is simulating being outside the boundaries of the data warehouse environment. Let’s bring it inside by moving it to the In folder.

import shutil as sh
import os

if not 'script_dir' in globals():
    script_dir = os.getcwd()
    
data_directory = 'data\\'
example_directory = 'BasicFileOpsExample\\'
target_directory = 'In\\'
file_name = 'forestfires.csv'

source_path = os.path.join(script_dir,data_directory,example_directory,file_name)
target_path = os.path.join(script_dir,data_directory,example_directory,target_directory,file_name)

sh.move(source_path, target_path)

Example #2: Archiving A File

We are now done processing the file and we need to archive it in case we need to drag it out and reload the system.

The process of archiving is multi step.

  1. Zip up the file.

  2. Move the file to the Archive folder.

  3. Blow away the original.

Once you run the example, check the Archive folder and the In folder. You should see a zip file in Archive and nothing in the In folder.

import zipfile as zf
import os

if not 'script_dir' in globals():
    script_dir = os.getcwd()
    
data_directory = 'data\\'
example_directory = 'BasicFileOpsExample\\'
source_directory = 'In\\'
target_directory = 'Archive\\'
file_name = 'forestfires.csv'
archive_name = 'forestfires.zip'

target_path = os.path.join(script_dir,data_directory,example_directory,target_directory,archive_name)
source_path = os.path.join(data_directory,example_directory,source_directory)

archive = zf.ZipFile(target_path, "w")
os.chdir(source_path)
archive.write(file_name)
archive.close()

Now you try it!

Don't copy and past. Type the code yourself!

Last updated