Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
microsoft
GitHub Repository: microsoft/vscode
Path: blob/main/extensions/copilot/test/scenarios/test-notebooks/Chipotle.solution.ipynb
13397 views
Kernel: .venv

Ex2 - Getting and Knowing your Data

Check out Chipotle Exercises Video Tutorial to watch a data scientist go through the exercises

This time we are going to pull data directly from the internet. Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Step 1. Import the necessary libraries

import pandas as pd import numpy as np

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called chipo.

url = "https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv" chipo = pd.read_csv(url, sep="\t")

Step 4. See the first 10 entries

chipo.head(10)

Step 5. What is the number of observations in the dataset?

chipo.shape[0]
4622

Step 6. What is the number of columns in the dataset?

chipo.shape[1]
5

Step 7. Print the name of all the columns.

chipo.columns
Index(['order_id', 'quantity', 'item_name', 'choice_description', 'item_price'], dtype='object')

Step 8. How is the dataset indexed?

chipo.index
RangeIndex(start=0, stop=4622, step=1)

Step 9. Which was the most-ordered item?

c = chipo.groupby("item_name") c = c.sum() c = c.sort_values(["quantity"], ascending=False) c.head(1)

Step 10. For the most-ordered item, how many items were ordered?

c = chipo.groupby("item_name") c = c.sum() c = c.sort_values(["quantity"], ascending=False) c.head(1)

Step 11. What was the most ordered item in the choice_description column?

c = chipo.groupby("choice_description").sum() c = c.sort_values(["quantity"], ascending=False) c.head(1) # Diet Coke 159

Step 12. How many items were orderd in total?

total_items_orders = chipo.quantity.sum() total_items_orders
4972

Step 13. Turn the item price into a float

Step 13.a. Check the item price type

chipo.item_price.dtype
dtype('O')

Step 13.b. Create a lambda function and change the type of item price, then print the item price type

dollarizer = lambda x: float(x[1:-1]) chipo.item_price = chipo.item_price.apply(dollarizer) chipo.item_price.dtype
(Output Hidden)

Step 13.c. Check the item price type

chipo.item_price.dtype
dtype('float64')

Step 14. How much was the revenue for the period in the dataset?

revenue = (chipo["quantity"] * chipo["item_price"]).sum() print("Revenue was: $" + str(np.round(revenue, 2)))
Revenue was: $39237.02

Step 15. How many orders were made in the period?

orders = chipo.order_id.value_counts().count() orders
1834

Step 16. What is the average revenue amount per order?

average_revenue_per_order = revenue / orders average_revenue_per_order
21.39423118865867

Step 17. How many different items are sold?

chipo.item_name.value_counts().count()
50