Lesson 32. Combining Multiple CSVs Into One File

Sure. You COULD write a bunch of loops and process files one at a time. Sometimes it is more efficient to create one file especially if you have a bunch of downstream processes that also operate on that data.

In that scenario, it is better to write one loop that combines all the CSVs into one big file. That way, no more loops have to be written.



We are going to take some small things and make them into a bigger thing.

We are going work with the files in the zip folder that we downloaded earlier. We are going to combine them and spit out the result into the CombineCSVExample folder.

We are also going to introduce a new way to loop over files and filter by file extension using the glob module. We will take the paths of all the files we want to combine and turn them into an iterable object that we can use the function enumerate on.

Once we are done, we do not need the original files so we can toss those.

This is also a long running process.

import shutil
import glob
import os

script_dir = os.getcwd()
data_directory = 'data\\'
source_directory = 'ZipFileExample\\Eurostat\\'
target_directory = 'CombineCSVExample\\'
file_name = 'EurostatDataCombined.csv'

source_path = os.path.join(script_dir, data_directory,source_directory)
target_path = os.path.join(script_dir, data_directory, target_directory, file_name)

source_files = glob.glob(source_path + '*.csv')

with open(target_path, 'wb') as outfile:
    for i, fname in enumerate(source_files):
        with open(fname, 'rb') as infile:
            if i != 0:
                infile.readline()  #Throw away header on all but first file
            shutil.copyfileobj(infile, outfile)  #Block copy rest of file from input to output without parsing

#clean input directory 
for i, fname in enumerate(source_files):

Last updated