Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_18/code/starter-code/starter-code-17.ipynb
1904 views
Kernel: Python 3
import pandas as pd from pandas.io import sql

Pandas can be used to connect to most relational databases. In this demonstration, we will create and connect to a SQLite database. SQLite creates portable SQL databases saved in a single file. These databases are stored in a very efficient manner and allow fast querying, making them ideal for small databases or databases that need to be moved across machines.

If you are looking to start using a database without the setup of mysql or postgres, SQLite is a good start.

import sqlite3
conn = sqlite3.connect('dat-test.db')

Let's return to the Rossmann sales data and load that into the database.

Data is moved to the database through the to_sql command, similar to the to_csv command.

to_sql takes as arugments: - name, the table name to create - con, a connection to a database - index, whether to input the index column - schema, if we want to write a custom schema for the new table - if_exists, what to do if the table already exists. We can overwrite it, add to it, or fail

data = pd.read_csv('../../assets/dataset/rossmann-stores.csv', low_memory=False) data.head()
data.to_sql('rossmann_sales', con=conn, if_exists='replace', index=False)
sql.read_sql('select * from rossmann_sales limit 10', con=conn)

CHECK: (SELECT) Write a query that returns the Store, Date and Customers

query = ''' select store, Date, Customers From rossmann_sales limit 10; ''' sql.read_sql(query, con=conn)

CHECK: (WHERE) Write a query that returns the Store, Date and Customers for when the stores were open and running a promotion

query = ''' select store, Date, Customers From rossmann_sales Where Open = 1 limit 10; ''' sql.read_sql(query, con=conn)

CHECK: (GROUP BY) Write a query that returns the total sales on promotion days.

Exercises

  1. Load the Walmart sales and store features data

  2. Create a table for each of those datasets

  3. Select the store, date and fuel price on days it was over 90 degrees

  4. Select the store, date and weekly sales and temperature

  5. What were average sales on holiday vs. non-holiday sales

  6. What were average sales on holiday vs. non-holiday sales when the temperature was below 32 degrees

data1 = pd.read_csv('../../assets/dataset/walmart-sales.csv', low_memory=False) data1.head()
data2 = pd.read_csv('../../assets/dataset/features.csv', low_memory=False) data2.head()
#data3 = data1.merge(data2, how='left', on='Store') #data3.head()
conn = sqlite3.connect('wm-test.db')
data1.to_sql('walmart_sales', con=conn, if_exists='replace', index=False)
data1.head()
data2.to_sql('features', con=conn, if_exists='replace', index=False)
data2.head()
query = ''' select Store, Date, Fuel_Price, Temperature, from features where Temperature > 90 limit 10; ''' sql.read_sql(query, con=conn).head()
--------------------------------------------------------------------------- OperationalError Traceback (most recent call last) /anaconda3/lib/python3.6/site-packages/pandas/io/sql.py in execute(self, *args, **kwargs) 1408 else: -> 1409 cur.execute(*args) 1410 return cur OperationalError: near "from": syntax error During handling of the above exception, another exception occurred: DatabaseError Traceback (most recent call last) <ipython-input-42-5fbb372061b3> in <module>() 11 ''' 12 ---> 13 sql.read_sql(query, con=conn).head() /anaconda3/lib/python3.6/site-packages/pandas/io/sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize) 398 sql, index_col=index_col, params=params, 399 coerce_float=coerce_float, parse_dates=parse_dates, --> 400 chunksize=chunksize) 401 402 try: /anaconda3/lib/python3.6/site-packages/pandas/io/sql.py in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize) 1442 1443 args = _convert_params(sql, params) -> 1444 cursor = self.execute(*args) 1445 columns = [col_desc[0] for col_desc in cursor.description] 1446 /anaconda3/lib/python3.6/site-packages/pandas/io/sql.py in execute(self, *args, **kwargs) 1419 ex = DatabaseError( 1420 "Execution failed on sql '%s': %s" % (args[0], exc)) -> 1421 raise_with_traceback(ex) 1422 1423 @staticmethod /anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py in raise_with_traceback(exc, traceback) 383 if traceback == Ellipsis: 384 _, _, traceback = sys.exc_info() --> 385 raise exc.with_traceback(traceback) 386 else: 387 # this version of raise is a syntax error in Python 3 /anaconda3/lib/python3.6/site-packages/pandas/io/sql.py in execute(self, *args, **kwargs) 1407 cur.execute(*args, **kwargs) 1408 else: -> 1409 cur.execute(*args) 1410 return cur 1411 except Exception as exc: DatabaseError: Execution failed on sql ' select Store, Date, Fuel_Price, Temperature, from features where Temperature > 90 limit 10; ': near "from": syntax error
query = ''' select walmart_sales.Store, walmart_sales.Date, walmart_sales.Weekly_Sales, Temperature from walmart_sales join features on walmart_sales.Store = features.Store limit 10; ''' sql.read_sql(query, con=conn)
query = ''' select avg(walmart_sales.Weekly_Sales) from walmart_sales join features on walmart_sales.Store = features.Store where walmart_sales.IsHoliday = 1 limit 10; ''' sql.read_sql(query, con=conn)
query = ''' select avg(walmart_sales.Weekly_Sales) from walmart_sales join features on walmart_sales.Store = features.Store where walmart_sales.IsHoliday = 0 limit 10; ''' sql.read_sql(query, con=conn)
query = ''' select avg(walmart_sales.Weekly_Sales) from walmart_sales join features on walmart_sales.Store = features.Store where walmart_sales.IsHoliday = 1 and Temperature < 32 limit 10; ''' sql.read_sql(query, con=conn)
query = ''' select avg(walmart_sales.Weekly_Sales) from walmart_sales join features on walmart_sales.Store = features.Store where walmart_sales.IsHoliday = 0 and Temperature < 32 limit 10; ''' sql.read_sql(query, con=conn)