{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Execute the cell below to import the Pandas and NumPy modules using their familiar aliases." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "tags": [ "nograde" ] }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Write a statement in the cell below that creates and displays a `DataFrame` named `stats` with the structure shown below.\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pointsassists
Player
Schoonveld52
Voskuil113
Muller185
\n", "\n", "**Player** is not a data row, but is the index's **name**." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "stats = pd.DataFrame(data={\"points\": [5, 11, 18], \"assists\": [2, 3, 5]}, index=['Schoonveld', 'Voskuil', 'Muller'])\n", "stats.index.name=\"Player\"\n", "stats" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Execute the cell below to load several data sets from the `seaborn` package." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "tags": [ "nograde" ] }, "outputs": [], "source": [ "from exam2_data import taxis, vehicles" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Write a statement in the cell below that lists the **names** and **types** of the columns in the `DataFrame` named `taxis`. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "taxis.info()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Write a statement in the cell below that displays the rows in `taxis` with **index values** 1000 through 1005." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "taxis.loc[1000:1005]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Write a statement in the cell below that displays the `pickup_zone`, `dropoff_zone`, and `fare` for the **10th** through **20th** rows in `taxis`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "taxis.iloc[10:20][['pickup_zone', 'dropoff_zone', 'fare']]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Write a statement that displays the possible values in the `payment` column in `taxis`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "taxis['payment'].unique()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "What **percentage** of trips were taken by one passenger? Write one or more statements in the cell below that display the answer to this question in the format below\n", "\n", "```\n", "Out of #,### trips, ##.##% were taken by a single passenger.\n", "```" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "total_trips = len(taxis)\n", "number_single_passenger_trips = np.sum(taxis['passengers'] == 1)\n", "percent_single_passenger = number_single_passenger_trips / total_trips * 100\n", "print(f\"Out of {total_trips:,} trips, {percent_single_passenger:.2f}% were taken by a single passenger.\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Write a **single statement** to display the 2 most frequently occurring values in the `pickup_borough` column along with the number of trips originating in those boroughs." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "taxis['pickup_borough'].value_counts()[0:2]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Determine if more trips were made whose total was $20 or less, or whose total was $50 or more.\n", "\n", "\n", "Print either the string **$20 or less** or **$50 or more**." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [], "source": [ "under_20 = np.sum(taxis['total'] <= 20)\n", "over_50 = np.sum(taxis['total'] >= 50)\n", "\n", "print (\"$20 or less\" if under_20 > over_50 else \"$50 or more\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Write a statement that loads the contents of a file named **mpg.txt** into a `DataFrame` named `mpg`. The **|** character is used to separate column within the file. The columns should be named:\n", "\n", "* mpg\n", "* cylinders\n", "* displacement\n", "* horsepower\n", "* weight\n", "* acceleration\n", "* model_year\n", "* origin\n", "* name\n", "\n", "The `displacement` and `acceleration` columns should not be imported into `mpg`. Display the first 10 rows in `mpg` to verify the import worked correctly." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'name']\n", "keep = ['mpg', 'cylinders', 'horsepower', 'weight', 'model_year', 'origin', 'name']\n", "mpg = pd.read_csv('mpg.txt', names=columns, usecols=keep, delimiter=\"|\")\n", "mpg.head(n=10)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Write code to add a new column named `weight_tons` to the `DataFrame` named `vehicles`. The values in the `weight_tons` column should be the values in the `weight` column divided by 2000. \n", "\n", "Display the first 5 values in the `weight_tons` column after adding the column." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "vehicles['weight_tons'] = vehicles['weight'].map(lambda w: w/2000)\n", "vehicles['weight_tons'][:5]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "This question uses the `DataFrame` named `taxis`. Write a statement in the cell below that creates and displays a `DataFrame` named `mean_by_borough` containing the average values for the `fare` and `distance` traveled for each value of the `pickup_borough` column." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [], "source": [ "grouped_by_pickup = taxis.groupby('pickup_borough')\n", "mean_by_borough = grouped_by_pickup[['fare', 'distance']].mean()\n", "mean_by_borough" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "tags": [ "question" ] }, "source": [ "Determine the **most common combination** of `pickup_borough` and `dropoff_borough` for which `pickup_borough` and `dropoff_borough` are **not the same**.\n", "\n", "For that combination only, determine the number of fares and the amount of revenue generated. Revenue is defined as the sum of the values in the `total` column. \n", "\n", "The output should be:\n", "```\n", "The most common trip was from Queens to Manhattan, with 224 trips generating $11,436.66 of revenue.\n", "```" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "different_dropoff = taxis[taxis['pickup_borough'] != taxis['dropoff_borough']]\n", "grouped_by_pickup_dropoff = different_dropoff.groupby(['pickup_borough', 'dropoff_borough'])['fare'].count()\n", "(pickup, dropoff) = grouped_by_pickup_dropoff.sort_values().index[-1]\n", "mask = (taxis['pickup_borough'] == pickup) & (taxis['dropoff_borough'] == dropoff)\n", "trips = taxis[mask]\n", "revenue = taxis[mask]['total'].sum()\n", "num_trips = trips.shape[0]\n", "print(f\"The most common trip was from {pickup} to {dropoff}, with {num_trips:,} trips generating ${revenue:,.2f} of revenue.\")" ] } ] }