Tutorials
Hands-On Python
Hands-On Python
  • Hands-On Python Tutorial For Real-World Business Analytics Problems
  • Preface
    • Section I. A Note From The Author
    • Section II. Tutorial Overview
    • Section III. What Is The Preflight Checklist?
    • Section IV. Supplimentery Material
  • Preflight Checklist
    • Section V. Select Your Difficulty Setting
    • Section VI. Download Anaconda
    • Section VII. Download PyCharm (Optional)
    • Section VIII. Download SQL Server Developer Edition
    • Section IX. Configure Database Environment
    • Section X. Download The Source Code
    • Section XI. Starting JupyterLab
    • Section XII. How To Get Help With This Tutorial
  • Language Basics
    • Lesson 1. Obligatory Hello World
    • Lesson 2. Code Comments
    • Lesson 3. Data Types
    • Lesson 4. Variables
    • Lesson 5. String Concatenation
    • Lesson 6. Arithmetic Operators
    • Lesson 7. Making Decisions
    • Lesson 8. Control Flow With if-elif-else
    • Lesson 9. Control Flow With while
    • Lesson 10. Data Structures Part I: List
    • Lesson 11. Data Structures Part II: Tuples
    • Lesson 12. Data Structures Part III: Dictionaries
    • Lesson 13. Looping With for
    • Lesson 14. Functions
    • Lesson 15. Importing Modules
    • Lesson 16. Python Programming Standards
  • Advanced Topics
    • Lesson 17. Functional Programing With map
    • Lesson 18. Generators
    • Lesson 19. Comprehensions
    • Lesson 20. Basic File Operations
    • Lesson 21. Working With Data In Numpy
    • Lesson 22. Working With Data In Pandas
    • Lesson 23. Working With JSON
    • Lesson 24. Making File Request Over HTTP And SFTP
    • Lesson 25. Interacting With Databases
    • Lesson 26. Saving Objects With Pickle
    • Lesson 27. Error Handling
    • Lesson 28. Bringing It All Together
  • Solutions To Real World Problems
    • Lesson 29. Download A Zip File Over HTTP
    • Lesson 30. Looping Over Files In A Directory
    • Lesson 31. Convert Comma Delmited Files To Pipe Delimited
    • Lesson 32. Combining Multiple CSVs Into One File
    • Lesson 33. Load Large CSVs Into Data Warehouse Staging Tables
    • Lesson 34. Efficiently Write Large Database Query Results To Disk
    • Lesson 35. Working With SFTP In The Real World
    • Lesson 36. Executing Python From SQL Server Agent
Powered by GitBook
On this page
  1. Solutions To Real World Problems

Lesson 32. Combining Multiple CSVs Into One File

Sure. You COULD write a bunch of loops and process files one at a time. Sometimes it is more efficient to create one file especially if you have a bunch of downstream processes that also operate on that data.

In that scenario, it is better to write one loop that combines all the CSVs into one big file. That way, no more loops have to be written.

Examples

Example #1: CONSTRUCTICONS! UNITE!!

We are going to take some small things and make them into a bigger thing.

We are going work with the files in the zip folder that we downloaded earlier. We are going to combine them and spit out the result into the CombineCSVExample folder.

We are also going to introduce a new way to loop over files and filter by file extension using the glob module. We will take the paths of all the files we want to combine and turn them into an iterable object that we can use the function enumerate on.

Once we are done, we do not need the original files so we can toss those.

This is also a long running process.

import shutil
import glob
import os

script_dir = os.getcwd()
data_directory = 'data\\'
source_directory = 'ZipFileExample\\Eurostat\\'
target_directory = 'CombineCSVExample\\'
file_name = 'EurostatDataCombined.csv'

source_path = os.path.join(script_dir, data_directory,source_directory)
target_path = os.path.join(script_dir, data_directory, target_directory, file_name)

source_files = glob.glob(source_path + '*.csv')

with open(target_path, 'wb') as outfile:
    for i, fname in enumerate(source_files):
        with open(fname, 'rb') as infile:
            if i != 0:
                infile.readline()  #Throw away header on all but first file
                                           
            shutil.copyfileobj(infile, outfile)  #Block copy rest of file from input to output without parsing

#clean input directory 
for i, fname in enumerate(source_files):
    os.remove(fname)
PreviousLesson 31. Convert Comma Delmited Files To Pipe DelimitedNextLesson 33. Load Large CSVs Into Data Warehouse Staging Tables

Last updated 3 years ago