Lesson 29. Download A Zip File Over HTTP
The method read_csv() is a workhorse in the data analytics world. Technically, it has the capability to download zip files by itself. However there is some limitation with data.world where read_csv() will not work on zip files that are stored there. Since all of MSU's sample datasets reside in Data.World, we need to develop a workaround so you can work through the lessons in the solutions section.
Example #1: Download. Unzip. Clean Up.
There are some things in here that I am going to handwave for now. They will be explained in more relevant lessons.
So we have to pull in three modules to make the magic happen two items of which we have not seen yet.
- urllib.request – This allows us to open up a pipe to file using a url.
- pyunpack – This lets us work with zip archive files.
The rest should be familiar from lesson 20. So the steps here are:
- 1.Grab the file from the cloud and write it to disk
- 2.Decompress the archive file
- 3.Delete the archive file
This will take a while to run. When it is complete, check the ZipFileExample folder. You should find a new directory containing a csv.
from pyunpack import Archive
if not 'script_dir' in globals():
script_dir = os.getcwd()
url = 'https://query.data.world/s/vb53nuuux6umwmccbwlajvlzttmz3q'
file_name = 'Eurostat.zip'
data_directory = 'data\\'
example_directory = 'ZipFileExample\\'
abs_file_path = os.path.join(script_dir, data_directory, example_directory, file_name)
abs_directory_path = os.path.join(script_dir, data_directory, example_directory)
with urllib.request.urlopen(url) as source_file:
with open(abs_file_path, 'wb') as target_file: