Lesson 29. Download A Zip File Over HTTP

The method read_csv() is a workhorse in the data analytics world. Technically, it has the capability to download zip files by itself. However there is some limitation with data.world where read_csv() will not work on zip files that are stored there. Since all of MSU's sample datasets reside in Data.World, we need to develop a workaround so you can work through the lessons in the solutions section.

Examples

Example #1: Download. Unzip. Clean Up.

There are some things in here that I am going to handwave for now. They will be explained in more relevant lessons.

So we have to pull in three modules to make the magic happen two items of which we have not seen yet.

  • urllib.request – This allows us to open up a pipe to file using a url.

  • pyunpack – This lets us work with zip archive files.

The rest should be familiar from lesson 20. So the steps here are:

  1. Grab the file from the cloud and write it to disk

  2. Decompress the archive file

  3. Delete the archive file

This will take a while to run. When it is complete, check the ZipFileExample folder. You should find a new directory containing a csv.

import urllib.request
import os
from pyunpack import Archive

if not 'script_dir' in globals():
    script_dir = os.getcwd()

url = 'https://query.data.world/s/vb53nuuux6umwmccbwlajvlzttmz3q'
file_name = 'Eurostat.zip'
data_directory = 'data\\'
example_directory = 'ZipFileExample\\'
abs_file_path = os.path.join(script_dir, data_directory, example_directory, file_name)
abs_directory_path = os.path.join(script_dir, data_directory, example_directory)

with urllib.request.urlopen(url) as source_file:
    with open(abs_file_path, 'wb') as target_file:
        target_file.write(source_file.read())

Archive(abs_file_path).extractall(abs_directory_path)
os.remove(abs_file_path)

Last updated