Tutorials
Hands-On Python
Hands-On Python
  • Hands-On Python Tutorial For Real-World Business Analytics Problems
  • Preface
    • Section I. A Note From The Author
    • Section II. Tutorial Overview
    • Section III. What Is The Preflight Checklist?
    • Section IV. Supplimentery Material
  • Preflight Checklist
    • Section V. Select Your Difficulty Setting
    • Section VI. Download Anaconda
    • Section VII. Download PyCharm (Optional)
    • Section VIII. Download SQL Server Developer Edition
    • Section IX. Configure Database Environment
    • Section X. Download The Source Code
    • Section XI. Starting JupyterLab
    • Section XII. How To Get Help With This Tutorial
  • Language Basics
    • Lesson 1. Obligatory Hello World
    • Lesson 2. Code Comments
    • Lesson 3. Data Types
    • Lesson 4. Variables
    • Lesson 5. String Concatenation
    • Lesson 6. Arithmetic Operators
    • Lesson 7. Making Decisions
    • Lesson 8. Control Flow With if-elif-else
    • Lesson 9. Control Flow With while
    • Lesson 10. Data Structures Part I: List
    • Lesson 11. Data Structures Part II: Tuples
    • Lesson 12. Data Structures Part III: Dictionaries
    • Lesson 13. Looping With for
    • Lesson 14. Functions
    • Lesson 15. Importing Modules
    • Lesson 16. Python Programming Standards
  • Advanced Topics
    • Lesson 17. Functional Programing With map
    • Lesson 18. Generators
    • Lesson 19. Comprehensions
    • Lesson 20. Basic File Operations
    • Lesson 21. Working With Data In Numpy
    • Lesson 22. Working With Data In Pandas
    • Lesson 23. Working With JSON
    • Lesson 24. Making File Request Over HTTP And SFTP
    • Lesson 25. Interacting With Databases
    • Lesson 26. Saving Objects With Pickle
    • Lesson 27. Error Handling
    • Lesson 28. Bringing It All Together
  • Solutions To Real World Problems
    • Lesson 29. Download A Zip File Over HTTP
    • Lesson 30. Looping Over Files In A Directory
    • Lesson 31. Convert Comma Delmited Files To Pipe Delimited
    • Lesson 32. Combining Multiple CSVs Into One File
    • Lesson 33. Load Large CSVs Into Data Warehouse Staging Tables
    • Lesson 34. Efficiently Write Large Database Query Results To Disk
    • Lesson 35. Working With SFTP In The Real World
    • Lesson 36. Executing Python From SQL Server Agent
Powered by GitBook
On this page
  • Examples
  • Now you try it!
  1. Advanced Topics

Lesson 24. Making File Request Over HTTP And SFTP

PreviousLesson 23. Working With JSONNextLesson 25. Interacting With Databases

Last updated 3 years ago

So, all the cool data is in the cloud, right? The ubiquitous cloud? Is a third-party bank’s on prim file server part of the cloud? Whatever. The point is the data is not local so we gotta go out and get it.

Up to this point, when we snag a file from the internet, we have been doing it with Pandas. That’s cool, but the problems is when you download a file with Pandas, it gets loaded into a data frame which is limited by your machine’s memory before it gets written to disk.

Today, I am going to show you how to skip the data frame and just write a file straight to disk. We will be using a module called urllib to yank files off the internet using the same protocols used to retrieve webpages.

I am also going to show you how to get files from an FTP server which can be kind of a goat rope.

Sometimes you need to get data from an FTP server. However, the data is probably sensitive so you cannot transfer the file in clear text. Here is where secured FTP (SFTP) comes in.

In order to run the SFTP example below you will need to do two things.

Pip install . This is the module that we are going to use to download our file stored on a secured server.

We also need to install some sort of graphical FTP too. I use . We are going to need this tool to explore the FTP server so we can find out what we need to write our download script.

Examples

Example #1: Where’s The Beef?!

Acquiring data from the internet is often not as straight forward as dropping a URL in a script. In some cases, in order to get the data that you want, you have to pass in URL variables. Below is a good example of this.

We are going to download some beef data from the USDA website. We are going to pull down an XML file, but the file is not sitting static on the server. The file is built from a reports generating application on the USDA server.

Normally users get their data by using the web application’s GUI. But we need to automate things. So, the strategy is to use the GUI once. When you do that, you take note of the query string that gets built in the URL. Then you make Python variables out of any URL parameters. Now you can bypass the GUI and automate the process of downloading the data.

import urllib
import os

report_date = '09/01/2020'

if not 'script_dir' in globals():
    script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'HTTPAndFTPExample\\'
file_name = 'BeefReport.xml'

target_path = os.path.join(script_dir,data_directory,example_directory,file_name)

url = 'https://mpr.datamart.ams.usda.gov/ws/report/v1/beef/LM_XB459?'
url = url + 'filter={%22filters%22:[{%22fieldName%22:%22Report%20date%22,'
url = url + '%22operatorType%22:%22GREATER%22,%22values%22:[%22' + report_date + '%22]}]}'

with urllib.request.urlopen(url) as source_file:
    with open(target_path, 'wb') as target_file:
        target_file.write(source_file.read())

Example #2: Download File From SFTP Server

If you go to that site and look at the SFTP protocol, you will find the following settings.

host: test.rebex.net

username: demo

password: password

port: 22

Plug those settings into your FTP tool and log into the server. Navigate around and see if you can find an interesting file to download. For this example, we are going to keep is simple and just grab the readme.txt file from the root folder.

This is a simplified example for clarity. Below you will see the line where we set cnopts.hostkeys = None. This actually leaves you open to something called a "man in the middle" attack. In a real scenario, you would have to use something called a host key. We will tackle that in the solutions section.

import urllib
import os
import pysftp

if not 'script_dir' in globals():
    script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'HTTPAndFTPExample\\'
file_name = 'readme.txt'

host = 'test.rebex.net'
username = 'demo'
password = 'password'

target_path = os.path.join(script_dir,data_directory,example_directory,file_name)

cnopts = pysftp.CnOpts()
cnopts.hostkeys = None  

with pysftp.Connection(host = host, username = username, password = password, cnopts=cnopts) as sftp:
        sftp.get(file_name, target_path)

Now you try it!

Don't copy and past. Type the code yourself!

We are going to use a test SFTP server called Rebex. The protocols for server access can be found at .

pysftp
WinSCP
https://test.rebex.net