Lesson 25. Interacting With Databases

Our focus is analytics not application development. As such, the discussion that you would expect from other generic tutorials is totally not going to happen right now.

Instead, we’re going to keep our eyes on the prize and focus on those task common to business analytics.

A large difference between app dev and analytics is the volume of data that you work with. Applications are frequently making changes to the database one record at a time. In analytics, you are frequently dealing with large fractions of the total organizations data assets. It requires a different mindset.

Below is a standard scenario that you most likely will be dealing with on a daily basis. You need to do an analysis so you use JupyterLab to connect into the database and get what you need.

Examples

Example #1: Git It Yo Self Service

If you know SQL, you do not have to wait for someone to get you your data. You can just grab it yourself from the database.

The process you see below is fairly common across all programming languages.

  1. Set up your query.

  2. Set up your connection.

  3. Execute your query.

  4. Go to town.

import os
import pyodbc as db
import time
import pandas as pd

#Prep query
sql = 'SELECT BusinessEntityID, AnnualSales, AnnualRevenue, SquareFeet, NumberEmployees' 
sql = sql + ' ' + 'FROM Sales.vStoreWithDemographics'

#Set up the connection.
connection_string = 'DSN=AdventureWorks;'
conn = db.connect(connection_string)

#Execute the query.
result_set = pd.read_sql(sql, conn, index_col = 'BusinessEntityID')

#Review the data.
#Set option used here to make describe output easier to read.
pd.set_option('display.float_format', lambda x: '%.2f' % x)
result_set.describe()

Now you try it!

Don't copy and past. Type the code yourself!

Last updated