{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false
            },
            "source": [
                "Execute the cell below to import the Pandas and NumPy modules using their familiar aliases."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 17,
            "metadata": {
                "collapsed": false,
                "tags": [
                    "nograde"
                ]
            },
            "outputs": [],
            "source": [
                "import numpy as np\n",
                "import pandas as pd"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Write a statement in the cell below that creates and displays a `DataFrame` named `stats` with the structure shown below.\n",
                "\n",
                "<table border=\"1\" class=\"dataframe table\">\n",
                "  <thead>\n",
                "    <tr style=\"text-align: right;\">\n",
                "      <th></th>\n",
                "      <th>points</th>\n",
                "      <th>assists</th>\n",
                "    </tr>\n",
                "    <tr>\n",
                "      <th>Player</th>\n",
                "      <th></th>\n",
                "      <th></th>\n",
                "    </tr>\n",
                "  </thead>\n",
                "  <tbody>\n",
                "    <tr>\n",
                "      <th>Schoonveld</th>\n",
                "      <td>5</td>\n",
                "      <td>2</td>\n",
                "    </tr>\n",
                "    <tr>\n",
                "      <th>Voskuil</th>\n",
                "      <td>11</td>\n",
                "      <td>3</td>\n",
                "    </tr>\n",
                "    <tr>\n",
                "      <th>Muller</th>\n",
                "      <td>18</td>\n",
                "      <td>5</td>\n",
                "    </tr>\n",
                "  </tbody>\n",
                "</table>\n",
                "\n",
                "**Player** is not a data row, but is the index's **name**."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 16,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "stats = pd.DataFrame(data={\"points\": [5, 11, 18], \"assists\": [2, 3, 5]}, index=['Schoonveld', 'Voskuil', 'Muller'])\n",
                "stats.index.name=\"Player\"\n",
                "stats"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false
            },
            "source": [
                "Execute the cell below to load several data sets from the `seaborn` package."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "metadata": {
                "collapsed": false,
                "tags": [
                    "nograde"
                ]
            },
            "outputs": [],
            "source": [
                "from exam2_data import taxis, vehicles"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Write a statement in the cell below that lists the **names** and **types** of the columns in the `DataFrame` named `taxis`. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "taxis.info()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Write a statement in the cell below that displays the rows in `taxis` with **index values** 1000 through 1005."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "taxis.loc[1000:1005]"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Write a statement in the cell below that displays the `pickup_zone`, `dropoff_zone`, and `fare` for the **10th** through **20th** rows in `taxis`."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "taxis.iloc[10:20][['pickup_zone', 'dropoff_zone', 'fare']]"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Write a statement that displays the possible values in the `payment` column in `taxis`."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 15,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "taxis['payment'].unique()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "What **percentage** of trips were taken by one passenger?  Write one or more statements in the cell below that display the answer to this question in the format below\n",
                "\n",
                "```\n",
                "Out of #,### trips, ##.##% were taken by a single passenger.\n",
                "```"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "total_trips = len(taxis)\n",
                "number_single_passenger_trips = np.sum(taxis['passengers'] == 1)\n",
                "percent_single_passenger = number_single_passenger_trips / total_trips * 100\n",
                "print(f\"Out of {total_trips:,} trips, {percent_single_passenger:.2f}% were taken by a single passenger.\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Write a **single statement** to display the 2 most frequently occurring values in the `pickup_borough` column along with the number of trips originating in those boroughs."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "taxis['pickup_borough'].value_counts()[0:2]"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Determine if more trips were made whose total was $20 or less, or whose total was $50 or more.\n",
                "\n",
                "\n",
                "Print either the string **$20 or less** or **$50 or more**."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 25,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "under_20 = np.sum(taxis['total'] <= 20)\n",
                "over_50 = np.sum(taxis['total'] >= 50)\n",
                "\n",
                "print (\"$20 or less\" if under_20 > over_50 else \"$50 or more\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Write a statement that loads the contents of a file named **mpg.txt** into a `DataFrame` named `mpg`.  The **|** character is used to separate column within the file.  The columns should be named:\n",
                "\n",
                "* mpg\n",
                "* cylinders\n",
                "* displacement\n",
                "* horsepower\n",
                "* weight\n",
                "* acceleration\n",
                "* model_year\n",
                "* origin\n",
                "* name\n",
                "\n",
                "The `displacement` and `acceleration` columns should not be imported into `mpg`.    Display the first 10 rows in `mpg` to verify the import worked correctly."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'name']\n",
                "keep = ['mpg', 'cylinders', 'horsepower', 'weight', 'model_year', 'origin', 'name']\n",
                "mpg = pd.read_csv('mpg.txt', names=columns, usecols=keep, delimiter=\"|\")\n",
                "mpg.head(n=10)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Write code to add a new column named `weight_tons` to the `DataFrame` named `vehicles`.  The values in the `weight_tons` column should be the values in the `weight` column divided by 2000.  \n",
                "\n",
                "Display the first 5 values in the `weight_tons` column after adding the column."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "vehicles['weight_tons'] = vehicles['weight'].map(lambda w: w/2000)\n",
                "vehicles['weight_tons'][:5]"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "This question uses the `DataFrame` named `taxis`.  Write a statement in the cell below that creates and displays a `DataFrame` named `mean_by_borough` containing the average values for the `fare` and `distance` traveled for each value of the `pickup_borough` column."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 23,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "grouped_by_pickup = taxis.groupby('pickup_borough')\n",
                "mean_by_borough = grouped_by_pickup[['fare', 'distance']].mean()\n",
                "mean_by_borough"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "collapsed": false,
                "tags": [
                    "question"
                ]
            },
            "source": [
                "Determine the **most common combination** of `pickup_borough` and `dropoff_borough` for which `pickup_borough` and `dropoff_borough` are **not the same**.\n",
                "\n",
                "For that combination only, determine the number of fares and the amount of revenue generated.  Revenue is defined as the sum of the values in the `total` column.  \n",
                "\n",
                "The output should be:\n",
                "```\n",
                "The most common trip was from Queens to Manhattan, with 224 trips generating $11,436.66 of revenue.\n",
                "```"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 18,
            "metadata": {
                "collapsed": false
            },
            "outputs": [],
            "source": [
                "different_dropoff = taxis[taxis['pickup_borough'] != taxis['dropoff_borough']]\n",
                "grouped_by_pickup_dropoff = different_dropoff.groupby(['pickup_borough', 'dropoff_borough'])['fare'].count()\n",
                "(pickup, dropoff) = grouped_by_pickup_dropoff.sort_values().index[-1]\n",
                "mask = (taxis['pickup_borough'] == pickup) & (taxis['dropoff_borough'] == dropoff)\n",
                "trips = taxis[mask]\n",
                "revenue = taxis[mask]['total'].sum()\n",
                "num_trips = trips.shape[0]\n",
                "print(f\"The most common trip was from {pickup} to {dropoff}, with {num_trips:,} trips generating ${revenue:,.2f} of revenue.\")"
            ]
        }
    ]
}