Tutorials
Hands-On Python
Hands-On Python
  • Hands-On Python Tutorial For Real-World Business Analytics Problems
  • Preface
    • Section I. A Note From The Author
    • Section II. Tutorial Overview
    • Section III. What Is The Preflight Checklist?
    • Section IV. Supplimentery Material
  • Preflight Checklist
    • Section V. Select Your Difficulty Setting
    • Section VI. Download Anaconda
    • Section VII. Download PyCharm (Optional)
    • Section VIII. Download SQL Server Developer Edition
    • Section IX. Configure Database Environment
    • Section X. Download The Source Code
    • Section XI. Starting JupyterLab
    • Section XII. How To Get Help With This Tutorial
  • Language Basics
    • Lesson 1. Obligatory Hello World
    • Lesson 2. Code Comments
    • Lesson 3. Data Types
    • Lesson 4. Variables
    • Lesson 5. String Concatenation
    • Lesson 6. Arithmetic Operators
    • Lesson 7. Making Decisions
    • Lesson 8. Control Flow With if-elif-else
    • Lesson 9. Control Flow With while
    • Lesson 10. Data Structures Part I: List
    • Lesson 11. Data Structures Part II: Tuples
    • Lesson 12. Data Structures Part III: Dictionaries
    • Lesson 13. Looping With for
    • Lesson 14. Functions
    • Lesson 15. Importing Modules
    • Lesson 16. Python Programming Standards
  • Advanced Topics
    • Lesson 17. Functional Programing With map
    • Lesson 18. Generators
    • Lesson 19. Comprehensions
    • Lesson 20. Basic File Operations
    • Lesson 21. Working With Data In Numpy
    • Lesson 22. Working With Data In Pandas
    • Lesson 23. Working With JSON
    • Lesson 24. Making File Request Over HTTP And SFTP
    • Lesson 25. Interacting With Databases
    • Lesson 26. Saving Objects With Pickle
    • Lesson 27. Error Handling
    • Lesson 28. Bringing It All Together
  • Solutions To Real World Problems
    • Lesson 29. Download A Zip File Over HTTP
    • Lesson 30. Looping Over Files In A Directory
    • Lesson 31. Convert Comma Delmited Files To Pipe Delimited
    • Lesson 32. Combining Multiple CSVs Into One File
    • Lesson 33. Load Large CSVs Into Data Warehouse Staging Tables
    • Lesson 34. Efficiently Write Large Database Query Results To Disk
    • Lesson 35. Working With SFTP In The Real World
    • Lesson 36. Executing Python From SQL Server Agent
Powered by GitBook
On this page
  1. Solutions To Real World Problems

Lesson 29. Download A Zip File Over HTTP

The method read_csv() is a workhorse in the data analytics world. Technically, it has the capability to download zip files by itself. However there is some limitation with data.world where read_csv() will not work on zip files that are stored there. Since all of MSU's sample datasets reside in Data.World, we need to develop a workaround so you can work through the lessons in the solutions section.

Examples

Example #1: Download. Unzip. Clean Up.

There are some things in here that I am going to handwave for now. They will be explained in more relevant lessons.

So we have to pull in three modules to make the magic happen two items of which we have not seen yet.

  • urllib.request – This allows us to open up a pipe to file using a url.

  • pyunpack – This lets us work with zip archive files.

The rest should be familiar from lesson 20. So the steps here are:

  1. Grab the file from the cloud and write it to disk

  2. Decompress the archive file

  3. Delete the archive file

This will take a while to run. When it is complete, check the ZipFileExample folder. You should find a new directory containing a csv.

import urllib.request
import os
from pyunpack import Archive

if not 'script_dir' in globals():
    script_dir = os.getcwd()

url = 'https://query.data.world/s/vb53nuuux6umwmccbwlajvlzttmz3q'
file_name = 'Eurostat.zip'
data_directory = 'data\\'
example_directory = 'ZipFileExample\\'
abs_file_path = os.path.join(script_dir, data_directory, example_directory, file_name)
abs_directory_path = os.path.join(script_dir, data_directory, example_directory)

with urllib.request.urlopen(url) as source_file:
    with open(abs_file_path, 'wb') as target_file:
        target_file.write(source_file.read())

Archive(abs_file_path).extractall(abs_directory_path)
os.remove(abs_file_path)
PreviousSolutions To Real World ProblemsNextLesson 30. Looping Over Files In A Directory

Last updated 3 years ago