{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "As with `NumPy`, there's a fairly standard `import` statement that gets used for Pandas. Our textbook author also mentions importing `DataFrame` and `Series` specifically since these are used so frequently." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "from pandas import Series, DataFrame" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Getting the FIPS data\n", "\n", "In the NumPy version of this exercise, we manually downloaded and parsed a file that contained our FIPS code database. Pandas provides some pre-built parsing capabilities, and is able to parse information from \n", "* CSV (Comma Separated Values)\n", "* tab-delimited files\n", "* JSON\n", "* HTML tables\n", "* Data stored in an SQL database\n", "\n", "The USDA has a page that contains FIPS codes. First let's try just plain `pd.read_html` to see what it does." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "15\n" ] } ], "source": [ "url = 'https://www.nrcs.usda.gov/wps/portal/nrcs/detail/?cid=nrcs143_013696'\n", "tables = pd.read_html(url)\n", "print(len(tables))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Yikes, that's a lot of tables (for those of you who have some HTML experience, this is another reason why table-based page layout is not a very good idea!) \n", "\n", "Let's refine our search to look for one that contains the text **\"FIPS\"**." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "tables = pd.read_html(url, match=\"FIPS\")\n", "print(len(tables))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "That's a little better, but not great. Here's a version where you can identify the table by an attribute it has in the HTML - this requires a bit of HTML knowledge, but it works out the best. It would be even better if our table had an **id** attribute, but it doesn't." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "tables = pd.read_html(url, attrs={'class':'data'})\n", "print(len(tables))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "OK, now that we've got it down to 1 table, we can pick it out, and see what it looks like." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeFIPS
0AlabamaAL1
1AlaskaAK2
2ArizonaAZ4
3ArkansasAR5
4CaliforniaCA6
\n", "
" ], "text/plain": [ " Name Postal Code FIPS\n", "0 Alabama AL 1\n", "1 Alaska AK 2\n", "2 Arizona AZ 4\n", "3 Arkansas AR 5\n", "4 California CA 6" ] }, "execution_count": 5, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips = tables[0]\n", "fips.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "And look at the data at the end (usually if there's going to need to be adjustments made, they will occur at the beginning of the table or at the end)\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeFIPS
45VirginiaVA51
46WashingtonWA53
47West VirginiaWV54
48WisconsinWI55
49WyomingWY56
50American SamoaAS60
51GuamGU66
52Northern Mariana IslandsMP69
53Puerto RicoPR72
54Virgin IslandsVI78
\n", "
" ], "text/plain": [ " Name Postal Code FIPS\n", "45 Virginia VA 51\n", "46 Washington WA 53\n", "47 West Virginia WV 54\n", "48 Wisconsin WI 55\n", "49 Wyoming WY 56\n", "50 American Samoa AS 60\n", "51 Guam GU 66\n", "52 Northern Mariana Islands MP 69\n", "53 Puerto Rico PR 72\n", "54 Virgin Islands VI 78" ] }, "execution_count": 6, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips.tail(n=10)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Let's get rid of the last 5 rows, since they correspond to territories and not states. We can apply the same type of array slicing syntax that worked in NumPy." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeFIPS
45VirginiaVA51
46WashingtonWA53
47West VirginiaWV54
48WisconsinWI55
49WyomingWY56
\n", "
" ], "text/plain": [ " Name Postal Code FIPS\n", "45 Virginia VA 51\n", "46 Washington WA 53\n", "47 West Virginia WV 54\n", "48 Wisconsin WI 55\n", "49 Wyoming WY 56" ] }, "execution_count": 7, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips = fips[0:-5]\n", "fips.tail()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now let's since the Census site is going to use strings with leading zeros to help us identify a state, let's transform the FIPS column. First, pull out the desired column of the `DataFrame` and assign it to a variable and print out its type." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "0 1\n", "1 2\n", "2 4\n", "3 5\n", "4 6\n", "Name: FIPS, dtype: int64\n" ] } ], "source": [ "fips_codes = fips['FIPS']\n", "print(type(fips_codes))\n", "print(fips_codes.head())" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now use a **list comprehension** to generate a list containing the string versions of the FIPS codes" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['1',\n", " '2',\n", " '4',\n", " '5',\n", " '6',\n", " '8',\n", " '9',\n", " '10',\n", " '12',\n", " '13',\n", " '15',\n", " '16',\n", " '17',\n", " '18',\n", " '19',\n", " '20',\n", " '21',\n", " '22',\n", " '23',\n", " '24',\n", " '25',\n", " '26',\n", " '27',\n", " '28',\n", " '29',\n", " '30',\n", " '31',\n", " '32',\n", " '33',\n", " '34',\n", " '35',\n", " '36',\n", " '37',\n", " '38',\n", " '39',\n", " '40',\n", " '41',\n", " '42',\n", " '44',\n", " '45',\n", " '46',\n", " '47',\n", " '48',\n", " '49',\n", " '50',\n", " '51',\n", " '53',\n", " '54',\n", " '55',\n", " '56']" ] }, "execution_count": 9, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_codes_as_strings = [ str(code) for code in fips[\"FIPS\"] ]\n", "fips_codes_as_strings" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Pandas gives us a simpler way to convert data types, however, using the `astype` **method** on a `Series` object." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 4\n", "3 5\n", "4 6\n", "5 8\n", "6 9\n", "7 10\n", "8 12\n", "9 13\n", "10 15\n", "11 16\n", "12 17\n", "13 18\n", "14 19\n", "15 20\n", "16 21\n", "17 22\n", "18 23\n", "19 24\n", "20 25\n", "21 26\n", "22 27\n", "23 28\n", "24 29\n", "25 30\n", "26 31\n", "27 32\n", "28 33\n", "29 34\n", "30 35\n", "31 36\n", "32 37\n", "33 38\n", "34 39\n", "35 40\n", "36 41\n", "37 42\n", "38 44\n", "39 45\n", "40 46\n", "41 47\n", "42 48\n", "43 49\n", "44 50\n", "45 51\n", "46 53\n", "47 54\n", "48 55\n", "49 56\n", "Name: FIPS, dtype: string" ] }, "execution_count": 10, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_codes_as_strings = fips_codes.astype('string')\n", "fips_codes_as_strings" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "But, we want these to have a leading zero for values under 10. Let's write a function to do this for us." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "def convert(value):\n", " if value >= 10:\n", " return str(value)\n", " else:\n", " return '0' + str(value)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Try it with the first and last values in `fips`.\n", "\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "01\n", "13\n", "56\n" ] } ], "source": [ "print(convert(fips[\"FIPS\"][0]))\n", "print(convert(fips[\"FIPS\"][9]))\n", "print(convert(fips[\"FIPS\"][49]))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Note that the following call doesn't work. Why not?\n", "\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "fips_series = fips[\"FIPS\"]\n", "\n", "#fips_series[-1]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "The `Series` and `DataFrame` objects provide a separate method named `iloc` to work positionally, rather than with **index values**." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 4\n", "3 5\n", "4 6\n", "5 8\n", "6 9\n", "7 10\n", "8 12\n", "9 13\n", "Name: FIPS, dtype: int64" ] }, "execution_count": 14, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips[\"FIPS\"].iloc[0:10]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now we can use the `map` method along with `convert` to get new values for each of the values in the \"FIPS\" column. `map` is a method of a `Series` that returns a **new** `Series` based on calling a function on each value of the original `Series`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_428/1832877244.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " fips[\"FIPS\"] = fips[\"FIPS\"].map(convert)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeFIPS
0AlabamaAL01
1AlaskaAK02
2ArizonaAZ04
3ArkansasAR05
4CaliforniaCA06
\n", "
" ], "text/plain": [ " Name Postal Code FIPS\n", "0 Alabama AL 01\n", "1 Alaska AK 02\n", "2 Arizona AZ 04\n", "3 Arkansas AR 05\n", "4 California CA 06" ] }, "execution_count": 15, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips[\"FIPS\"] = fips[\"FIPS\"].map(convert)\n", "fips.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "And now we can replace the index with the FIPS values, and get rid of the FIPS column. First remind ourselves of the structure of our original `DataFrame`." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeFIPS
0AlabamaAL01
1AlaskaAK02
\n", "
" ], "text/plain": [ " Name Postal Code FIPS\n", "0 Alabama AL 01\n", "1 Alaska AK 02" ] }, "execution_count": 16, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips.head(n=2)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now create a new `DataFrame` named `fips_new` with the `converted` `Series` serving as the **index**." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "fips_new = DataFrame(data={\"Name\": fips[\"Name\"], \"Postal Code\": fips[\"Postal Code\"]}, index=fips[\"FIPS\"])" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal Code
FIPS
01NaNNaN
02NaNNaN
04NaNNaN
05NaNNaN
06NaNNaN
\n", "
" ], "text/plain": [ " Name Postal Code\n", "FIPS \n", "01 NaN NaN\n", "02 NaN NaN\n", "04 NaN NaN\n", "05 NaN NaN\n", "06 NaN NaN" ] }, "execution_count": 18, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new.head(n=5)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Ughh, that doesn't seem to have worked very well! When combining `Series` objects together like this, Pandas always wants to combine things together that have the same **index**. Since our new `DataFame` uses a completely different index than the original `Series` objects, this approach doesn't work." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RangeIndex(start=0, stop=50, step=1)\n", "Index(['01', '02', '04', '05', '06', '08', '09', '10', '12', '13', '15', '16',\n", " '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28',\n", " '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40',\n", " '41', '42', '44', '45', '46', '47', '48', '49', '50', '51', '53', '54',\n", " '55', '56'],\n", " dtype='object', name='FIPS')\n" ] } ], "source": [ "print(fips.index)\n", "print(fips_new.index)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Here's one approach we can try: using the `values` property of a `Series` object to get a more \"raw\" form of the data." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Alabama' 'Alaska' 'Arizona' 'Arkansas' 'California' 'Colorado'\n", " 'Connecticut' 'Delaware' 'Florida' 'Georgia' 'Hawaii' 'Idaho' 'Illinois'\n", " 'Indiana' 'Iowa' 'Kansas' 'Kentucky' 'Louisiana' 'Maine' 'Maryland'\n", " 'Massachusetts' 'Michigan' 'Minnesota' 'Mississippi' 'Missouri' 'Montana'\n", " 'Nebraska' 'Nevada' 'New Hampshire' 'New Jersey' 'New Mexico' 'New York'\n", " 'North Carolina' 'North Dakota' 'Ohio' 'Oklahoma' 'Oregon' 'Pennsylvania'\n", " 'Rhode Island' 'South Carolina' 'South Dakota' 'Tennessee' 'Texas' 'Utah'\n", " 'Vermont' 'Virginia' 'Washington' 'West Virginia' 'Wisconsin' 'Wyoming']\n" ] } ], "source": [ "print(fips[\"Name\"].values)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "What might the type of `fips[\"Name\"].values` be?" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "print(type(fips[\"Name\"].values))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Let's try to create our new data frame using the **values** from the Series, so that we're not trying to match indexes." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePost Code
FIPS
01AlabamaAL
02AlaskaAK
04ArizonaAZ
05ArkansasAR
06CaliforniaCA
\n", "
" ], "text/plain": [ " Name Post Code\n", "FIPS \n", "01 Alabama AL\n", "02 Alaska AK\n", "04 Arizona AZ\n", "05 Arkansas AR\n", "06 California CA" ] }, "execution_count": 22, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new = DataFrame(data={\"Name\": fips[\"Name\"].values, \"Post Code\": fips[\"Postal Code\"].values}, index=fips[\"FIPS\"])\n", "fips_new.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Use the `info` method to see some details about the `fips_new` `DataFrame`." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Index: 50 entries, 01 to 56\n", "Data columns (total 2 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Name 50 non-null object\n", " 1 Post Code 50 non-null object\n", "dtypes: object(2)\n", "memory usage: 1.2+ KB\n" ] } ], "source": [ "fips_new.info()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Take a look at the **index** associated with `fips_new`." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['01', '02', '04', '05', '06', '08', '09', '10', '12', '13', '15', '16',\n", " '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28',\n", " '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40',\n", " '41', '42', '44', '45', '46', '47', '48', '49', '50', '51', '53', '54',\n", " '55', '56'],\n", " dtype='object', name='FIPS')\n" ] } ], "source": [ "print(fips_new.index)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now we can get out information about Michigan like this" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name Michigan\n", "Post Code MI\n", "Name: 26, dtype: object\n", "Name Alabama\n", "Post Code AL\n", "Name: 01, dtype: object\n" ] } ], "source": [ "# Get row by index key value\n", "print(fips_new.loc['26'])\n", "# Get row by position\n", "print(fips_new.iloc[0])" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "### Other ways to create the new `DataFrame`" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "First, we can use the `set_index` method, which uses an **existing column** as the index for the `DataFrame`." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeFIPS
0AlabamaAL01
1AlaskaAK02
\n", "
" ], "text/plain": [ " Name Postal Code FIPS\n", "0 Alabama AL 01\n", "1 Alaska AK 02" ] }, "execution_count": 26, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips.head(n=2)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeFIPS
FIPS
01AlabamaAL01
02AlaskaAK02
04ArizonaAZ04
05ArkansasAR05
06CaliforniaCA06
\n", "
" ], "text/plain": [ " Name Postal Code FIPS\n", "FIPS \n", "01 Alabama AL 01\n", "02 Alaska AK 02\n", "04 Arizona AZ 04\n", "05 Arkansas AR 05\n", "06 California CA 06" ] }, "execution_count": 27, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new = fips.set_index(keys=\"FIPS\", inplace=False, drop=False)\n", "fips_new.head(n=5)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "It kind of looks like there's an index value **FIPS**. Let's see if there is." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "# We can't just use the [ ] notation to try to get an entry out by index, since for a DataFrame \n", "# this will look for a column with that name.\n" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "ename": "KeyError", "evalue": "'FIPS'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3620\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3621\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcasted_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3622\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n", "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n", "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mKeyError\u001b[0m: 'FIPS'", "\nThe above exception was the direct cause of the following exception:\n", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/tmp/ipykernel_428/3242763737.py\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfips_new\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'FIPS'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 965\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 966\u001b[0m \u001b[0mmaybe_callable\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_if_callable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 967\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmaybe_callable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 968\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 969\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_is_scalar_access\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_axis\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1200\u001b[0m \u001b[0;31m# fall thru to straight lookup\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1201\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_key\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1202\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_label\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1203\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1204\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_slice_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mslice_obj\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mslice\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_get_label\u001b[0;34m(self, label, axis)\u001b[0m\n\u001b[1;32m 1151\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_label\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1152\u001b[0m \u001b[0;31m# GH#5667 this will fail if the label is not present in the axis.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1153\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mxs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1154\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1155\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_handle_lowerdim_multi_index_axis0\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtup\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36mxs\u001b[0;34m(self, key, axis, level, drop_level)\u001b[0m\n\u001b[1;32m 3874\u001b[0m \u001b[0mnew_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3875\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3876\u001b[0;31m \u001b[0mloc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3877\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3878\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndarray\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3621\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcasted_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3622\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3623\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3624\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3625\u001b[0m \u001b[0;31m# If we have a listlike key, _check_indexing_error will raise\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mKeyError\u001b[0m: 'FIPS'" ] } ], "source": [ "fips_new.loc['FIPS']" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "So the display of _FIPS_ in the view of the table view is just Pandas telling us the **name** of the index, not an actual value." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index(['01', '02', '04', '05', '06', '08', '09', '10', '12', '13', '15', '16',\n", " '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28',\n", " '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40',\n", " '41', '42', '44', '45', '46', '47', '48', '49', '50', '51', '53', '54',\n", " '55', '56'],\n", " dtype='object', name='FIPS')" ] }, "execution_count": 31, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new.index" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeFIPS
FIPS
01AlabamaAL01
02AlaskaAK02
\n", "
" ], "text/plain": [ " Name Postal Code FIPS\n", "FIPS \n", "01 Alabama AL 01\n", "02 Alaska AK 02" ] }, "execution_count": 32, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new.head(n=2)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now let's get rid of the **FIPS** column (which could have been done with we called `set_index`)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal Code
FIPS
01AlabamaAL
02AlaskaAK
\n", "
" ], "text/plain": [ " Name Postal Code\n", "FIPS \n", "01 Alabama AL\n", "02 Alaska AK" ] }, "execution_count": 33, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new.drop(\"FIPS\", axis=1, inplace=True)\n", "fips_new.head(2)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## Adding data to the DataFrame object\n", "Now we'll set the region for each of the Midwest states. First, set the region to 0 for **all the states**." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeRegion
FIPS
55WisconsinWI0
56WyomingWY0
\n", "
" ], "text/plain": [ " Name Postal Code Region\n", "FIPS \n", "55 Wisconsin WI 0\n", "56 Wyoming WY 0" ] }, "execution_count": 34, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new['Region'] = 0\n", "fips_new.tail(2)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now set the region to 1 for Michigan and Ohio, on the assumption that we **already know** the FIPS codes are **'26'** and **'39'**" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeRegion
FIPS
26MichiganMI1
39OhioOH1
\n", "
" ], "text/plain": [ " Name Postal Code Region\n", "FIPS \n", "26 Michigan MI 1\n", "39 Ohio OH 1" ] }, "execution_count": 71, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new.loc[['26', '39'], 'Region'] = 1\n", "fips_new.loc[['26', '39']]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "What if we don't know what the codes are? We can first get the index values for the rows using **boolean indexing**." ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "FIPS\n", "01 False\n", "02 False\n", "04 False\n", "05 False\n", "06 False\n", "08 False\n", "09 False\n", "10 False\n", "12 False\n", "13 False\n", "15 False\n", "16 False\n", "17 True\n", "18 True\n", "19 False\n", "20 False\n", "21 False\n", "22 False\n", "23 False\n", "24 False\n", "25 False\n", "26 False\n", "27 False\n", "28 False\n", "29 False\n", "30 False\n", "31 False\n", "32 False\n", "33 False\n", "34 False\n", "35 False\n", "36 False\n", "37 False\n", "38 False\n", "39 False\n", "40 False\n", "41 False\n", "42 False\n", "44 False\n", "45 False\n", "46 False\n", "47 False\n", "48 False\n", "49 False\n", "50 False\n", "51 False\n", "53 False\n", "54 False\n", "55 False\n", "56 False\n", "Name: Postal Code, dtype: bool\n" ] } ], "source": [ "indiana = fips_new['Postal Code'] == 'IN'\n", "illinois = fips_new['Postal Code'] == 'IL'\n", "print(type(illinois.values))\n", "import numpy as np\n", "#either = np.logical_or(indiana,illinois)\n", "either = indiana | illinois\n", "print(either)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Let's see the rows in `fips_new` matched by this boolean array." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeRegion
FIPS
17IllinoisIL0
18IndianaIN0
\n", "
" ], "text/plain": [ " Name Postal Code Region\n", "FIPS \n", "17 Illinois IL 0\n", "18 Indiana IN 0" ] }, "execution_count": 76, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new.loc[either]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "And now we can set the `Region` column." ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "fips_new.loc[either, 'Region'] = 1" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Here's an alternative way.\n", "\n", "Get the **index** associated with those rows in `either`" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "in_ill_index = fips_new[either].index\n", "in_ill_index" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "And now we are basically able to do what we did for Michigan and Ohio when we knew those index values in advance." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeRegion
FIPS
17IllinoisIL1
18IndianaIN1
26MichiganMI1
39OhioOH1
\n", "
" ], "text/plain": [ " Name Postal Code Region\n", "FIPS \n", "17 Illinois IL 1\n", "18 Indiana IN 1\n", "26 Michigan MI 1\n", "39 Ohio OH 1" ] }, "execution_count": 78, "metadata": { }, "output_type": "execute_result" } ], "source": [ "#fips_new.loc[in_ill_index, 'Region'] = 1\n", "fips_new[fips_new['Region'] == 1]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Here's a way to do it all in one step. Not necessarily recommended, because it's kind of hard to figure out exactly what it's doing, but it works!" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "FIPS\n", "27 1\n", "55 1\n", "Name: Region, dtype: int64" ] }, "execution_count": 84, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new.loc[(fips_new['Postal Code'] == 'WI') | (fips_new['Postal Code'] == 'MN'), 'Region'] = 1" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal CodeRegion
FIPS
17IllinoisIL1
18IndianaIN1
26MichiganMI1
27MinnesotaMN1
39OhioOH1
55WisconsinWI1
\n", "
" ], "text/plain": [ " Name Postal Code Region\n", "FIPS \n", "17 Illinois IL 1\n", "18 Indiana IN 1\n", "26 Michigan MI 1\n", "27 Minnesota MN 1\n", "39 Ohio OH 1\n", "55 Wisconsin WI 1" ] }, "execution_count": 82, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new.loc[fips_new['Region'] == 1]" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePostal Code
FIPS
26MichiganMI
39OhioOH
\n", "
" ], "text/plain": [ " Name Postal Code\n", "FIPS \n", "26 Michigan MI\n", "39 Ohio OH" ] }, "execution_count": 83, "metadata": { }, "output_type": "execute_result" } ], "source": [ "fips_new.loc[['26', '39'], ['Name', 'Postal Code']]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Getting the population data\n", "\n", "The cell below contains the function written previously to get us the census population data from api.census.gov\n", "\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "import requests\n", "import json\n", "import numpy as np\n", "\n", "def download_pop_data(year, states):\n", " states_list = \",\".join(states)\n", "\n", " base_url = 'http://api.census.gov/data'\n", " application = \"pep/charagegroups\"\n", " query = f\"get=POP&for=state:{states_list}\"\n", "\n", "\n", " url = f\"{base_url}/{year}/{application}?{query}\"\n", "\n", " response = requests.get(url)\n", "\n", " if not (200 <= response.status_code <= 299):\n", " return (False, None)\n", "\n", " state_data = json.loads(response.content)\n", " state_data = np.array(state_data[1:])\n", "\n", " population = state_data[:,0].astype('int')\n", " state = state_data[:, 1].astype('int')\n", "\n", "\n", " data = np.column_stack((state, population))\n", " years = np.full(data.shape[0],year, dtype='int')\n", "\n", " year_data = np.column_stack((years, data))\n", " return (True, year_data)" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "download_pop_data(2015, fips_new.index)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Let's modify this function to return a Pandas `DataFrame` instead of a NumPy `ndarray`, using the FIPS code as the index." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "def download_pop_data_pd(year, states):\n", " states_list = \",\".join(states)\n", "\n", " base_url = 'http://api.census.gov/data'\n", " application = \"pep/charagegroups\"\n", " query = f\"get=POP&for=state:{states_list}\"\n", "\n", "\n", " url = f\"{base_url}/{year}/{application}?{query}\"\n", "\n", " response = requests.get(url)\n", "\n", " if not (200 <= response.status_code <= 299):\n", " return (False, None)\n", "\n", " state_data = json.loads(response.content)\n", " state_data = np.array(state_data[1:])\n", "\n", " data = DataFrame(data{'Population': })\n", "\n", "\n", " return (True, data)" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "result, df = download_pop_data_pd(2015, fips_new.index)\n", "\n", "if result:\n", " print(df.head())\n", " print(df.dtypes)\n", "else:\n", " print(\"Failed to download population data for 2015 using the Pandas method\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Here's one way for us to generate all of the years of data. Pandas recommends against doing it this way, though, because of the costs of memory reallocation on each call." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Here's a more efficient way: create individual data frames for each year, then call `concat` to combine them." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now let's get the total population in the Midwest for 2015. " ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (system-wide)", "language": "python", "metadata": { "cocalc": { "description": "Python 3 programming language", "priority": 100, "url": "https://www.python.org/" } }, "name": "python3", "resource_dir": "/ext/jupyter/kernels/python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }