{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "#  Using groupby to summarize information\n",
    "\n",
    "Import some data from the <a href=\"https://github.com/mwaskom/seaborn-data\" target=\"_blank\">Seaborn sample data sets</a>, and do some exploration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>horsepower</th>\n",
       "      <th>weight</th>\n",
       "      <th>acceleration</th>\n",
       "      <th>model_year</th>\n",
       "      <th>origin</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3504</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>chevrolet chevelle malibu</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>3693</td>\n",
       "      <td>11.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>buick skylark 320</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>318.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3436</td>\n",
       "      <td>11.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>plymouth satellite</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>16.0</td>\n",
       "      <td>8</td>\n",
       "      <td>304.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3433</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>amc rebel sst</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>17.0</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>140.0</td>\n",
       "      <td>3449</td>\n",
       "      <td>10.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>ford torino</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    mpg  cylinders  displacement  horsepower  weight  acceleration  \\\n",
       "0  18.0          8         307.0       130.0    3504          12.0   \n",
       "1  15.0          8         350.0       165.0    3693          11.5   \n",
       "2  18.0          8         318.0       150.0    3436          11.0   \n",
       "3  16.0          8         304.0       150.0    3433          12.0   \n",
       "4  17.0          8         302.0       140.0    3449          10.5   \n",
       "\n",
       "   model_year origin                       name  \n",
       "0          70    usa  chevrolet chevelle malibu  \n",
       "1          70    usa          buick skylark 320  \n",
       "2          70    usa         plymouth satellite  \n",
       "3          70    usa              amc rebel sst  \n",
       "4          70    usa                ford torino  "
      ]
     },
     "execution_count": 1,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "import seaborn as sns\n",
    "mpg = sns.load_dataset('mpg')\n",
    "mpg.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Let's work with only cars made in the USA. \n",
    "* Look at what possible values are in the `origin` column.\n",
    "* Create a new `DataFrame` named `usa` with only those rows in `mpg` for which the value of the `origin` column is **usa**.\n",
    "* Verify there is only 1 value in the `origin` column in `usa."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['usa']\n",
      "    mpg  cylinders  displacement  horsepower  weight  acceleration  \\\n",
      "0  18.0          8         307.0       130.0    3504          12.0   \n",
      "1  15.0          8         350.0       165.0    3693          11.5   \n",
      "2  18.0          8         318.0       150.0    3436          11.0   \n",
      "3  16.0          8         304.0       150.0    3433          12.0   \n",
      "4  17.0          8         302.0       140.0    3449          10.5   \n",
      "\n",
      "   model_year origin                       name  \n",
      "0          70    usa  chevrolet chevelle malibu  \n",
      "1          70    usa          buick skylark 320  \n",
      "2          70    usa         plymouth satellite  \n",
      "3          70    usa              amc rebel sst  \n",
      "4          70    usa                ford torino  \n"
     ]
    }
   ],
   "source": [
    "mpg['origin'].unique()\n",
    "usa = mpg[mpg.loc[:, 'origin'] == 'usa']\n",
    "print(usa['origin'].unique())\n",
    "print(usa.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now let's specify that we want to create some groupings based on the `model_year` attribute and see what is contained in the `groups` property of the `DataFrameGroupBy` generated by calling `groupby`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{70: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 24, 25, 26, 27, 28], 71: [30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 56], 72: [58, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 72, 73, 74, 75, 80, 83], 73: [85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 103, 104, 105, 106, 107, 109, 112, 113, 115, 116, 121, 124], 74: [125, 126, 127, 128, 130, 132, 133, 134, 135, 136, 137, 138, 139, 140, 146], 75: [152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 168, 169, 170, 174, 176], 76: [184, 185, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 199, 200, 201, 202, 206, 208, 212, 213, 214, 215], 77: [217, 219, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 234, 236, 237, 238], 78: [245, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 269, 271, 272], 79: [280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 295, 296, 298, 300, 301, 302, 305, 306, 307, 308], 80: [311, 313, 314, 315, 316, 323, 336], 81: [338, 339, 340, 341, 342, 344, 350, 351, 352, 363, 364, 365, 366], 82: [367, 368, 369, 370, 371, 372, 373, 374, 378, 379, 386, 387, 388, 389, 391, 392, 393, 395, 396, 397]}"
      ]
     },
     "execution_count": 3,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped = usa.groupby('model_year')\n",
    "grouped.groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Let's see what the values associated with the key **70** in `grouped.groups` are."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>horsepower</th>\n",
       "      <th>weight</th>\n",
       "      <th>acceleration</th>\n",
       "      <th>model_year</th>\n",
       "      <th>origin</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3504</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>chevrolet chevelle malibu</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>3693</td>\n",
       "      <td>11.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>buick skylark 320</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>318.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3436</td>\n",
       "      <td>11.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>plymouth satellite</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>16.0</td>\n",
       "      <td>8</td>\n",
       "      <td>304.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3433</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>amc rebel sst</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>17.0</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>140.0</td>\n",
       "      <td>3449</td>\n",
       "      <td>10.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>ford torino</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>429.0</td>\n",
       "      <td>198.0</td>\n",
       "      <td>4341</td>\n",
       "      <td>10.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>ford galaxie 500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>14.0</td>\n",
       "      <td>8</td>\n",
       "      <td>454.0</td>\n",
       "      <td>220.0</td>\n",
       "      <td>4354</td>\n",
       "      <td>9.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>chevrolet impala</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>14.0</td>\n",
       "      <td>8</td>\n",
       "      <td>440.0</td>\n",
       "      <td>215.0</td>\n",
       "      <td>4312</td>\n",
       "      <td>8.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>plymouth fury iii</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>14.0</td>\n",
       "      <td>8</td>\n",
       "      <td>455.0</td>\n",
       "      <td>225.0</td>\n",
       "      <td>4425</td>\n",
       "      <td>10.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>pontiac catalina</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>390.0</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3850</td>\n",
       "      <td>8.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>amc ambassador dpl</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>383.0</td>\n",
       "      <td>170.0</td>\n",
       "      <td>3563</td>\n",
       "      <td>10.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>dodge challenger se</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>14.0</td>\n",
       "      <td>8</td>\n",
       "      <td>340.0</td>\n",
       "      <td>160.0</td>\n",
       "      <td>3609</td>\n",
       "      <td>8.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>plymouth 'cuda 340</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>400.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3761</td>\n",
       "      <td>9.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>chevrolet monte carlo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>14.0</td>\n",
       "      <td>8</td>\n",
       "      <td>455.0</td>\n",
       "      <td>225.0</td>\n",
       "      <td>3086</td>\n",
       "      <td>10.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>buick estate wagon (sw)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>22.0</td>\n",
       "      <td>6</td>\n",
       "      <td>198.0</td>\n",
       "      <td>95.0</td>\n",
       "      <td>2833</td>\n",
       "      <td>15.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>plymouth duster</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>18.0</td>\n",
       "      <td>6</td>\n",
       "      <td>199.0</td>\n",
       "      <td>97.0</td>\n",
       "      <td>2774</td>\n",
       "      <td>15.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>amc hornet</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>21.0</td>\n",
       "      <td>6</td>\n",
       "      <td>200.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>2587</td>\n",
       "      <td>16.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>ford maverick</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>21.0</td>\n",
       "      <td>6</td>\n",
       "      <td>199.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>2648</td>\n",
       "      <td>15.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>amc gremlin</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>10.0</td>\n",
       "      <td>8</td>\n",
       "      <td>360.0</td>\n",
       "      <td>215.0</td>\n",
       "      <td>4615</td>\n",
       "      <td>14.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>ford f250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>10.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>200.0</td>\n",
       "      <td>4376</td>\n",
       "      <td>15.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>chevy c20</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     mpg  cylinders  displacement  horsepower  weight  acceleration  \\\n",
       "0   18.0          8         307.0       130.0    3504          12.0   \n",
       "1   15.0          8         350.0       165.0    3693          11.5   \n",
       "2   18.0          8         318.0       150.0    3436          11.0   \n",
       "3   16.0          8         304.0       150.0    3433          12.0   \n",
       "4   17.0          8         302.0       140.0    3449          10.5   \n",
       "5   15.0          8         429.0       198.0    4341          10.0   \n",
       "6   14.0          8         454.0       220.0    4354           9.0   \n",
       "7   14.0          8         440.0       215.0    4312           8.5   \n",
       "8   14.0          8         455.0       225.0    4425          10.0   \n",
       "9   15.0          8         390.0       190.0    3850           8.5   \n",
       "10  15.0          8         383.0       170.0    3563          10.0   \n",
       "11  14.0          8         340.0       160.0    3609           8.0   \n",
       "12  15.0          8         400.0       150.0    3761           9.5   \n",
       "13  14.0          8         455.0       225.0    3086          10.0   \n",
       "15  22.0          6         198.0        95.0    2833          15.5   \n",
       "16  18.0          6         199.0        97.0    2774          15.5   \n",
       "17  21.0          6         200.0        85.0    2587          16.0   \n",
       "24  21.0          6         199.0        90.0    2648          15.0   \n",
       "25  10.0          8         360.0       215.0    4615          14.0   \n",
       "26  10.0          8         307.0       200.0    4376          15.0   \n",
       "\n",
       "    model_year origin                       name  \n",
       "0           70    usa  chevrolet chevelle malibu  \n",
       "1           70    usa          buick skylark 320  \n",
       "2           70    usa         plymouth satellite  \n",
       "3           70    usa              amc rebel sst  \n",
       "4           70    usa                ford torino  \n",
       "5           70    usa           ford galaxie 500  \n",
       "6           70    usa           chevrolet impala  \n",
       "7           70    usa          plymouth fury iii  \n",
       "8           70    usa           pontiac catalina  \n",
       "9           70    usa         amc ambassador dpl  \n",
       "10          70    usa        dodge challenger se  \n",
       "11          70    usa         plymouth 'cuda 340  \n",
       "12          70    usa      chevrolet monte carlo  \n",
       "13          70    usa    buick estate wagon (sw)  \n",
       "15          70    usa            plymouth duster  \n",
       "16          70    usa                 amc hornet  \n",
       "17          70    usa              ford maverick  \n",
       "24          70    usa                amc gremlin  \n",
       "25          70    usa                  ford f250  \n",
       "26          70    usa                  chevy c20  "
      ]
     },
     "execution_count": 4,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "usa.loc[grouped.groups[70]].head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Remember `loc` uses **index values**, not **list positions**.  Let's prove that a minute.  First we'll change the index values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>horsepower</th>\n",
       "      <th>weight</th>\n",
       "      <th>acceleration</th>\n",
       "      <th>model_year</th>\n",
       "      <th>origin</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3504</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>chevrolet chevelle malibu</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>3693</td>\n",
       "      <td>11.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>buick skylark 320</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>318.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3436</td>\n",
       "      <td>11.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>plymouth satellite</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103</th>\n",
       "      <td>16.0</td>\n",
       "      <td>8</td>\n",
       "      <td>304.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3433</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>amc rebel sst</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>17.0</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>140.0</td>\n",
       "      <td>3449</td>\n",
       "      <td>10.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>ford torino</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      mpg  cylinders  displacement  horsepower  weight  acceleration  \\\n",
       "100  18.0          8         307.0       130.0    3504          12.0   \n",
       "101  15.0          8         350.0       165.0    3693          11.5   \n",
       "102  18.0          8         318.0       150.0    3436          11.0   \n",
       "103  16.0          8         304.0       150.0    3433          12.0   \n",
       "104  17.0          8         302.0       140.0    3449          10.5   \n",
       "\n",
       "     model_year origin                       name  \n",
       "100          70    usa  chevrolet chevelle malibu  \n",
       "101          70    usa          buick skylark 320  \n",
       "102          70    usa         plymouth satellite  \n",
       "103          70    usa              amc rebel sst  \n",
       "104          70    usa                ford torino  "
      ]
     },
     "execution_count": 5,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_index = pd.RangeIndex(100, len(usa)+100)\n",
    "usa_with_new_index = usa.set_index(new_index)\n",
    "usa_with_new_index.head()\n",
    "#print(len(usa))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now re-compute the groups based on the `model_year` attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{70: [100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121], 71: [122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141], 72: [142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159], 73: [160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188], 74: [189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203], 75: [204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223], 76: [224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245], 77: [246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263], 78: [264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285], 79: [286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308], 80: [309, 310, 311, 312, 313, 314, 315], 81: [316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328], 82: [329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348]}"
      ]
     },
     "execution_count": 6,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_index_grouped = usa_with_new_index.groupby('model_year')\n",
    "new_index_grouped.groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "This hopefully proves that the values in the `groups` dictionary represent **index values**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now let's look at grouping by **multiple attributes**; in particular, we'll group by `model_year` followed by `cylinders`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{(70, 6): [15, 16, 17, 24], (70, 8): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 25, 26, 27, 28], (71, 4): [30, 32, 46, 49, 56], (71, 6): [33, 34, 35, 36, 37, 45, 47, 48], (71, 8): [38, 39, 40, 41, 42, 43, 44], (72, 4): [58, 60, 61, 80, 83], (72, 8): [62, 63, 64, 65, 66, 67, 68, 69, 70, 72, 73, 74, 75], (73, 4): [109, 112], (73, 6): [97, 98, 99, 100, 101, 107, 113], (73, 8): [85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 103, 104, 105, 106, 115, 116, 121, 124], (74, 4): [130, 132, 146], (74, 6): [125, 126, 127, 128, 133, 134, 135], (74, 8): [136, 137, 138, 139, 140], (75, 4): [168, 170], (75, 6): [152, 153, 154, 155, 160, 161, 162, 163, 164, 169, 174, 176], (75, 8): [156, 157, 158, 159, 165, 166], (76, 4): [184, 185, 195, 196, 206], (76, 6): [191, 192, 193, 194, 199, 200, 201, 202], (76, 8): [187, 188, 189, 190, 208, 212, 213, 214, 215], (77, 4): [217, 219, 234, 236, 237, 238], (77, 6): [225, 226, 227, 228], (77, 8): [221, 222, 223, 224, 229, 230, 231, 232], (78, 4): [245, 255, 266, 269, 271, 272], (78, 6): [252, 253, 254, 256, 257, 258, 259, 260, 261, 263], (78, 8): [249, 250, 251, 262, 264, 265], (79, 4): [282, 295, 296, 301, 302, 305, 308], (79, 6): [280, 281, 283, 284, 306, 307], (79, 8): [285, 286, 287, 288, 289, 290, 291, 292, 298, 300], (80, 4): [311, 313, 314, 315, 323, 336], (80, 6): [316], (81, 4): [338, 339, 340, 342, 344, 350, 351, 352], (81, 6): [341, 363, 365, 366], (81, 8): [364], (82, 4): [367, 368, 369, 370, 371, 372, 373, 374, 378, 379, 388, 391, 392, 393, 395, 396, 397], (82, 6): [386, 387, 389]}"
      ]
     },
     "execution_count": 7,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped = usa.groupby(['model_year', 'cylinders'])\n",
    "grouped.groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now we could get the members of the 6-cylinder vehicles manufactured in 1970 like this"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>horsepower</th>\n",
       "      <th>weight</th>\n",
       "      <th>acceleration</th>\n",
       "      <th>model_year</th>\n",
       "      <th>origin</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>22.0</td>\n",
       "      <td>6</td>\n",
       "      <td>198.0</td>\n",
       "      <td>95.0</td>\n",
       "      <td>2833</td>\n",
       "      <td>15.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>plymouth duster</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>18.0</td>\n",
       "      <td>6</td>\n",
       "      <td>199.0</td>\n",
       "      <td>97.0</td>\n",
       "      <td>2774</td>\n",
       "      <td>15.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>amc hornet</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>21.0</td>\n",
       "      <td>6</td>\n",
       "      <td>200.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>2587</td>\n",
       "      <td>16.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>ford maverick</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>21.0</td>\n",
       "      <td>6</td>\n",
       "      <td>199.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>2648</td>\n",
       "      <td>15.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "      <td>amc gremlin</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     mpg  cylinders  displacement  horsepower  weight  acceleration  \\\n",
       "15  22.0          6         198.0        95.0    2833          15.5   \n",
       "16  18.0          6         199.0        97.0    2774          15.5   \n",
       "17  21.0          6         200.0        85.0    2587          16.0   \n",
       "24  21.0          6         199.0        90.0    2648          15.0   \n",
       "\n",
       "    model_year origin             name  \n",
       "15          70    usa  plymouth duster  \n",
       "16          70    usa       amc hornet  \n",
       "17          70    usa    ford maverick  \n",
       "24          70    usa      amc gremlin  "
      ]
     },
     "execution_count": 8,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "usa.loc[grouped.groups[(70,6)]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "And we can get the average mpg for that specific set of vehicles by finding them using `loc`, and then using the `Series` method `mean` to compute the average"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "mpg           20.50\n",
       "horsepower    91.75\n",
       "dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "usa.loc[grouped.groups[(70,6)],[\"mpg\", \"horsepower\"]].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "We can get the average `mpg` for **every** `(model_year, cylinders)` combination using the `DataFrameGroupBy` object's `mean` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>displacement</th>\n",
       "      <th>horsepower</th>\n",
       "      <th>weight</th>\n",
       "      <th>acceleration</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>model_year</th>\n",
       "      <th>cylinders</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">70</th>\n",
       "      <th>6</th>\n",
       "      <td>20.500000</td>\n",
       "      <td>199.000000</td>\n",
       "      <td>91.750000</td>\n",
       "      <td>2710.500000</td>\n",
       "      <td>15.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>14.111111</td>\n",
       "      <td>367.555556</td>\n",
       "      <td>183.666667</td>\n",
       "      <td>3940.055556</td>\n",
       "      <td>11.194444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">71</th>\n",
       "      <th>4</th>\n",
       "      <td>24.800000</td>\n",
       "      <td>118.200000</td>\n",
       "      <td>79.500000</td>\n",
       "      <td>2178.600000</td>\n",
       "      <td>17.600000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>18.000000</td>\n",
       "      <td>243.375000</td>\n",
       "      <td>98.875000</td>\n",
       "      <td>3171.875000</td>\n",
       "      <td>14.750000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>13.428571</td>\n",
       "      <td>371.714286</td>\n",
       "      <td>166.857143</td>\n",
       "      <td>4537.714286</td>\n",
       "      <td>12.214286</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            mpg  displacement  horsepower       weight  \\\n",
       "model_year cylinders                                                     \n",
       "70         6          20.500000    199.000000   91.750000  2710.500000   \n",
       "           8          14.111111    367.555556  183.666667  3940.055556   \n",
       "71         4          24.800000    118.200000   79.500000  2178.600000   \n",
       "           6          18.000000    243.375000   98.875000  3171.875000   \n",
       "           8          13.428571    371.714286  166.857143  4537.714286   \n",
       "\n",
       "                      acceleration  \n",
       "model_year cylinders                \n",
       "70         6             15.500000  \n",
       "           8             11.194444  \n",
       "71         4             17.600000  \n",
       "           6             14.750000  \n",
       "           8             12.214286  "
      ]
     },
     "execution_count": 10,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "means = grouped.mean()\n",
    "means.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "The index of the `DateFrame` that comes back from `DataFrameGroupBy.mean` is a **multi-level** index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([(70, 6),\n",
       "            (70, 8),\n",
       "            (71, 4),\n",
       "            (71, 6),\n",
       "            (71, 8),\n",
       "            (72, 4),\n",
       "            (72, 8),\n",
       "            (73, 4),\n",
       "            (73, 6),\n",
       "            (73, 8),\n",
       "            (74, 4),\n",
       "            (74, 6),\n",
       "            (74, 8),\n",
       "            (75, 4),\n",
       "            (75, 6),\n",
       "            (75, 8),\n",
       "            (76, 4),\n",
       "            (76, 6),\n",
       "            (76, 8),\n",
       "            (77, 4),\n",
       "            (77, 6),\n",
       "            (77, 8),\n",
       "            (78, 4),\n",
       "            (78, 6),\n",
       "            (78, 8),\n",
       "            (79, 4),\n",
       "            (79, 6),\n",
       "            (79, 8),\n",
       "            (80, 4),\n",
       "            (80, 6),\n",
       "            (81, 4),\n",
       "            (81, 6),\n",
       "            (81, 8),\n",
       "            (82, 4),\n",
       "            (82, 6)],\n",
       "           names=['model_year', 'cylinders'])"
      ]
     },
     "execution_count": 11,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "means.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Here's an example from our book that illustrates an important idea when passing a `Series` object as the field to group by.  Consider the following `DataFrame` named `df`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>1.628402</td>\n",
       "      <td>0.630337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>two</td>\n",
       "      <td>-1.107394</td>\n",
       "      <td>0.022734</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>0.475426</td>\n",
       "      <td>-0.792190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-1.547331</td>\n",
       "      <td>-1.789632</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>1.078556</td>\n",
       "      <td>-1.490519</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2     data1     data2\n",
       "0    a  one  1.628402  0.630337\n",
       "1    a  two -1.107394  0.022734\n",
       "2    b  one  0.475426 -0.792190\n",
       "3    b  two -1.547331 -1.789632\n",
       "4    a  one  1.078556 -1.490519"
      ]
     },
     "execution_count": 12,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({\n",
    "        'key1': ['a','a', 'b', 'b', 'a'],\n",
    "        'key2': ['one', 'two', 'one', 'two', 'one'],\n",
    "        'data1': np.random.randn(5),\n",
    "        'data2': np.random.randn(5)\n",
    "});\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "The book uses the following code to compute the average of the `data1` column, grouped by the values in the `key1` column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1\n",
       "a    0.533188\n",
       "b   -0.535952\n",
       "Name: data1, dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['data1'].groupby(df['key1']).mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0    1.628402\n",
      "1   -1.107394\n",
      "2    0.475426\n",
      "3   -1.547331\n",
      "4    1.078556\n",
      "Name: data1, dtype: float64\n",
      "0    a\n",
      "1    a\n",
      "2    b\n",
      "3    b\n",
      "4    a\n",
      "Name: key1, dtype: object\n"
     ]
    }
   ],
   "source": [
    "print((df['data1']))\n",
    "print(df['key1'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Another way to write this is as follows:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1\n",
       "a    0.533188\n",
       "b   -0.535952\n",
       "Name: data1, dtype: float64"
      ]
     },
     "execution_count": 15,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(df['key1']).mean()['data1']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Look carefully at the first example again, particularly focusing on what's being **grouped**:\n",
    "```\n",
    "df['data1'].groupby(df['key1']).mean()\n",
    "```\n",
    "\n",
    "Does this raise any questions in your head?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Consider the code in the following 2 cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  key1 key2     data1     data2\n",
      "0    a  one  1.628402  0.630337\n",
      "1    a  two -1.107394  0.022734\n",
      "2    b  one  0.475426 -0.792190\n",
      "3    b  two -1.547331 -1.789632\n",
      "4    a  one  1.078556 -1.490519\n",
      "\n",
      "5    a\n",
      "0    b\n",
      "7    b\n",
      "8    a\n",
      "9    a\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "key1 = pd.Series(data=['a', 'b', 'b', 'a', 'a'], index=[5, 0, 7, 8, 9])\n",
    "print(df.head())\n",
    "print()\n",
    "print(key1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'b': [0]}"
      ]
     },
     "execution_count": 17,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_by_key1 = df['data1'].groupby(key1)\n",
    "grouped_by_key1.groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "What gives?  The reason the original code works is because the **index values** are the same.\n",
    "```\n",
    "df['data1'].groupby(df['key1']).mean()\n",
    "```\n",
    "\n",
    "In the version above:\n",
    "```\n",
    "key1 = pd.Series(data=df['data1'].values, index=[5, 6, 7, 8, 9])\n",
    "grouped_by_key1 = df['data1'].groupby(key1)\n",
    "```\n",
    "there's no match in `key1` from the index values in `df['data1']`, so we don't get any groups."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now let's look at grouping by a **function**.  The function will be passed values of the index, so we'll transform our `mpg` dataset by using the `name` column values as the index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>horsepower</th>\n",
       "      <th>weight</th>\n",
       "      <th>acceleration</th>\n",
       "      <th>model_year</th>\n",
       "      <th>origin</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>chevrolet chevelle malibu</th>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3504</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>buick skylark 320</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>3693</td>\n",
       "      <td>11.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>plymouth satellite</th>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>318.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3436</td>\n",
       "      <td>11.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>amc rebel sst</th>\n",
       "      <td>16.0</td>\n",
       "      <td>8</td>\n",
       "      <td>304.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3433</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ford torino</th>\n",
       "      <td>17.0</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>140.0</td>\n",
       "      <td>3449</td>\n",
       "      <td>10.5</td>\n",
       "      <td>70</td>\n",
       "      <td>usa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            mpg  cylinders  displacement  horsepower  weight  \\\n",
       "name                                                                           \n",
       "chevrolet chevelle malibu  18.0          8         307.0       130.0    3504   \n",
       "buick skylark 320          15.0          8         350.0       165.0    3693   \n",
       "plymouth satellite         18.0          8         318.0       150.0    3436   \n",
       "amc rebel sst              16.0          8         304.0       150.0    3433   \n",
       "ford torino                17.0          8         302.0       140.0    3449   \n",
       "\n",
       "                           acceleration  model_year origin  \n",
       "name                                                        \n",
       "chevrolet chevelle malibu          12.0          70    usa  \n",
       "buick skylark 320                  11.5          70    usa  \n",
       "plymouth satellite                 11.0          70    usa  \n",
       "amc rebel sst                      12.0          70    usa  \n",
       "ford torino                        10.5          70    usa  "
      ]
     },
     "execution_count": 18,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "usa_model_index = usa.set_index('name')\n",
    "usa_model_index.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now we can group by just the vehicle's **make** by creating a function that extracts the make from each index value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['amc', 'buick', 'cadillac', 'capri', 'chevroelt', 'chevrolet', 'chevy', 'chrysler', 'dodge', 'ford', 'hi', 'mercury', 'oldsmobile', 'plymouth', 'pontiac'])"
      ]
     },
     "execution_count": 19,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def get_make(name):\n",
    "    stripper = name.strip()\n",
    "    parts = stripper.split()\n",
    "    return parts[0]\n",
    "\n",
    "grouped_by_make = usa_model_index.groupby(lambda name: name.strip().split()[0])\n",
    "grouped_by_make.groups.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "We see some problems here: \n",
    "\n",
    "* **chevroelt** should be **chevrolet**\n",
    "* **chevy** is an abbreviation for **chevrolet**. \n",
    "\n",
    "A lambda function can only have one expression within it.  We'll write an actual function instead to allow us to treat **chevrolet**, **chevroelt**, and **chevy** the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
   ],
   "source": [
    "def get_make(name):\n",
    "    stripper = name.strip()\n",
    "    parts = stripper.split()\n",
    "    if parts[0] in [\"chevy\", \"chevrolet\", \"chevroelt\"]:\n",
    "        return \"chevrolet\"\n",
    "    else:\n",
    "        return parts[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Let's pass the `get_make` function as the argument to `groupby` and look at the resulting groups."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['amc', 'buick', 'cadillac', 'capri', 'chevrolet', 'chrysler', 'dodge', 'ford', 'hi', 'mercury', 'oldsmobile', 'plymouth', 'pontiac'])"
      ]
     },
     "execution_count": 21,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_by_make = usa_model_index.groupby(get_make)\n",
    "grouped_by_make.groups.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now find the  average `mpg` for each make based on the keys returned by the `get_make` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name\n",
      "amc           18.246429\n",
      "buick         19.182353\n",
      "cadillac      19.750000\n",
      "capri         25.000000\n",
      "chevrolet     20.219149\n",
      "chrysler      17.266667\n",
      "dodge         22.060714\n",
      "ford          19.694118\n",
      "hi             9.000000\n",
      "mercury       19.118182\n",
      "oldsmobile    21.100000\n",
      "plymouth      21.703226\n",
      "pontiac       20.012500\n",
      "Name: mpg, dtype: float64\n",
      "name\n",
      "amc           18.246429\n",
      "buick         19.182353\n",
      "cadillac      19.750000\n",
      "capri         25.000000\n",
      "chevrolet     20.219149\n",
      "chrysler      17.266667\n",
      "dodge         22.060714\n",
      "ford          19.694118\n",
      "hi             9.000000\n",
      "mercury       19.118182\n",
      "oldsmobile    21.100000\n",
      "plymouth      21.703226\n",
      "pontiac       20.012500\n",
      "Name: mpg, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(grouped_by_make.mean()['mpg'])\n",
    "print(grouped_by_make['mpg'].mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Get the average miles per gallon for the 3 makes that appear the **most often** in our data set.  \n",
    "\n",
    "To get started, sort the result of calling the `size` aggregate function on `grouped_by_make` in descending order, storing the result in a variable named `most_frequently_occurring`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['ford', 'chevrolet', 'plymouth'], dtype='object', name='name')"
      ]
     },
     "execution_count": 31,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "size_by_make = grouped_by_make.size()\n",
    "most_frequently_occurring = size_by_make.sort_values(ascending=False)\n",
    "makes_of_most_frequently_occurring = most_frequently_occurring.index[0:3]\n",
    "makes_of_most_frequently_occurring"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now use the index values from `most_frequently_occurring` to select elements from `mean_mpg_by_make`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "ford         19.694118\n",
       "chevrolet    20.219149\n",
       "plymouth     21.703226\n",
       "Name: mpg, dtype: float64"
      ]
     },
     "execution_count": 32,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_mpg_by_make = grouped_by_make.mean()['mpg']\n",
    "mean_mpg_by_make.loc[makes_of_most_frequently_occurring]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "We can also create a new `DataFrame` from the `Series` objects `mean_mpg_by_make` and `size_by_make`, sort it using `sort_values`, and get the first 3 elements using `iloc`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
   ],
   "source": [
    "summary = pd.DataFrame(\n",
    "  data = {\n",
    "        \"Avg. MPG\": mean_mpg_by_make,\n",
    "        \"Num. Vehicles\": size_by_make\n",
    "    }\n",
    ")\n",
    "summary.sort_values('Num. Vehicles', ascending=False).iloc[0:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Finally, look at summarizing by `Make` and `Year`.  Get started by creating a multi-level index on those 2 columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
   ],
   "source": [
    "usa_model_year_idx = usa.set_index(['name', 'model_year'])\n",
    "usa_model_year_idx.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now we can group by a combination of the car's **make** and year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
   ],
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "And find the average miles per gallon for all vehicles manufactured by **Ford** in 1971."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
   ],
   "source": [
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (system-wide)",
   "language": "python",
   "metadata": {
    "cocalc": {
     "description": "Python 3 programming language",
     "priority": 100,
     "url": "https://www.python.org/"
    }
   },
   "name": "python3",
   "resource_dir": "/ext/jupyter/kernels/python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}