{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"As with `NumPy`, there's a fairly standard `import` statement that gets used for Pandas. Our textbook author also mentions importing `DataFrame` and `Series` specifically since these are used so frequently."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"from pandas import Series, DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# Getting the FIPS data\n",
"\n",
"In the NumPy version of this exercise, we manually downloaded and parsed a file that contained our FIPS code database. Pandas provides some pre-built parsing capabilities, and is able to parse information from \n",
"* CSV (Comma Separated Values)\n",
"* tab-delimited files\n",
"* JSON\n",
"* HTML tables\n",
"* Data stored in an SQL database\n",
"\n",
"The USDA has a page that contains FIPS codes. First let's try just plain `pd.read_html` to see what it does."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"15\n"
]
}
],
"source": [
"url = 'https://www.nrcs.usda.gov/wps/portal/nrcs/detail/?cid=nrcs143_013696'\n",
"tables = pd.read_html(url)\n",
"print(len(tables))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Yikes, that's a lot of tables (for those of you who have some HTML experience, this is another reason why table-based page layout is not a very good idea!) \n",
"\n",
"Let's refine our search to look for one that contains the text **\"FIPS\"**."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10\n"
]
}
],
"source": [
"tables = pd.read_html(url, match=\"FIPS\")\n",
"print(len(tables))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"That's a little better, but not great. Here's a version where you can identify the table by an attribute it has in the HTML - this requires a bit of HTML knowledge, but it works out the best. It would be even better if our table had an **id** attribute, but it doesn't."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n"
]
}
],
"source": [
"tables = pd.read_html(url, attrs={'class':'data'})\n",
"print(len(tables))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"OK, now that we've got it down to 1 table, we can pick it out, and see what it looks like."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" FIPS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Alabama | \n",
" AL | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" Alaska | \n",
" AK | \n",
" 2 | \n",
"
\n",
" \n",
" 2 | \n",
" Arizona | \n",
" AZ | \n",
" 4 | \n",
"
\n",
" \n",
" 3 | \n",
" Arkansas | \n",
" AR | \n",
" 5 | \n",
"
\n",
" \n",
" 4 | \n",
" California | \n",
" CA | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Postal Code FIPS\n",
"0 Alabama AL 1\n",
"1 Alaska AK 2\n",
"2 Arizona AZ 4\n",
"3 Arkansas AR 5\n",
"4 California CA 6"
]
},
"execution_count": 5,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips = tables[0]\n",
"fips.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"And look at the data at the end (usually if there's going to need to be adjustments made, they will occur at the beginning of the table or at the end)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" FIPS | \n",
"
\n",
" \n",
" \n",
" \n",
" 45 | \n",
" Virginia | \n",
" VA | \n",
" 51 | \n",
"
\n",
" \n",
" 46 | \n",
" Washington | \n",
" WA | \n",
" 53 | \n",
"
\n",
" \n",
" 47 | \n",
" West Virginia | \n",
" WV | \n",
" 54 | \n",
"
\n",
" \n",
" 48 | \n",
" Wisconsin | \n",
" WI | \n",
" 55 | \n",
"
\n",
" \n",
" 49 | \n",
" Wyoming | \n",
" WY | \n",
" 56 | \n",
"
\n",
" \n",
" 50 | \n",
" American Samoa | \n",
" AS | \n",
" 60 | \n",
"
\n",
" \n",
" 51 | \n",
" Guam | \n",
" GU | \n",
" 66 | \n",
"
\n",
" \n",
" 52 | \n",
" Northern Mariana Islands | \n",
" MP | \n",
" 69 | \n",
"
\n",
" \n",
" 53 | \n",
" Puerto Rico | \n",
" PR | \n",
" 72 | \n",
"
\n",
" \n",
" 54 | \n",
" Virgin Islands | \n",
" VI | \n",
" 78 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Postal Code FIPS\n",
"45 Virginia VA 51\n",
"46 Washington WA 53\n",
"47 West Virginia WV 54\n",
"48 Wisconsin WI 55\n",
"49 Wyoming WY 56\n",
"50 American Samoa AS 60\n",
"51 Guam GU 66\n",
"52 Northern Mariana Islands MP 69\n",
"53 Puerto Rico PR 72\n",
"54 Virgin Islands VI 78"
]
},
"execution_count": 6,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips.tail(n=10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Let's get rid of the last 5 rows, since they correspond to territories and not states. We can apply the same type of array slicing syntax that worked in NumPy."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" FIPS | \n",
"
\n",
" \n",
" \n",
" \n",
" 45 | \n",
" Virginia | \n",
" VA | \n",
" 51 | \n",
"
\n",
" \n",
" 46 | \n",
" Washington | \n",
" WA | \n",
" 53 | \n",
"
\n",
" \n",
" 47 | \n",
" West Virginia | \n",
" WV | \n",
" 54 | \n",
"
\n",
" \n",
" 48 | \n",
" Wisconsin | \n",
" WI | \n",
" 55 | \n",
"
\n",
" \n",
" 49 | \n",
" Wyoming | \n",
" WY | \n",
" 56 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Postal Code FIPS\n",
"45 Virginia VA 51\n",
"46 Washington WA 53\n",
"47 West Virginia WV 54\n",
"48 Wisconsin WI 55\n",
"49 Wyoming WY 56"
]
},
"execution_count": 7,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips = fips[0:-5]\n",
"fips.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now let's since the Census site is going to use strings with leading zeros to help us identify a state, let's transform the FIPS column. First, pull out the desired column of the `DataFrame` and assign it to a variable and print out its type."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"0 1\n",
"1 2\n",
"2 4\n",
"3 5\n",
"4 6\n",
"Name: FIPS, dtype: int64\n"
]
}
],
"source": [
"fips_codes = fips['FIPS']\n",
"print(type(fips_codes))\n",
"print(fips_codes.head())"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now use a **list comprehension** to generate a list containing the string versions of the FIPS codes"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['1',\n",
" '2',\n",
" '4',\n",
" '5',\n",
" '6',\n",
" '8',\n",
" '9',\n",
" '10',\n",
" '12',\n",
" '13',\n",
" '15',\n",
" '16',\n",
" '17',\n",
" '18',\n",
" '19',\n",
" '20',\n",
" '21',\n",
" '22',\n",
" '23',\n",
" '24',\n",
" '25',\n",
" '26',\n",
" '27',\n",
" '28',\n",
" '29',\n",
" '30',\n",
" '31',\n",
" '32',\n",
" '33',\n",
" '34',\n",
" '35',\n",
" '36',\n",
" '37',\n",
" '38',\n",
" '39',\n",
" '40',\n",
" '41',\n",
" '42',\n",
" '44',\n",
" '45',\n",
" '46',\n",
" '47',\n",
" '48',\n",
" '49',\n",
" '50',\n",
" '51',\n",
" '53',\n",
" '54',\n",
" '55',\n",
" '56']"
]
},
"execution_count": 9,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_codes_as_strings = [ str(code) for code in fips[\"FIPS\"] ]\n",
"fips_codes_as_strings"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Pandas gives us a simpler way to convert data types, however, using the `astype` **method** on a `Series` object."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"1 2\n",
"2 4\n",
"3 5\n",
"4 6\n",
"5 8\n",
"6 9\n",
"7 10\n",
"8 12\n",
"9 13\n",
"10 15\n",
"11 16\n",
"12 17\n",
"13 18\n",
"14 19\n",
"15 20\n",
"16 21\n",
"17 22\n",
"18 23\n",
"19 24\n",
"20 25\n",
"21 26\n",
"22 27\n",
"23 28\n",
"24 29\n",
"25 30\n",
"26 31\n",
"27 32\n",
"28 33\n",
"29 34\n",
"30 35\n",
"31 36\n",
"32 37\n",
"33 38\n",
"34 39\n",
"35 40\n",
"36 41\n",
"37 42\n",
"38 44\n",
"39 45\n",
"40 46\n",
"41 47\n",
"42 48\n",
"43 49\n",
"44 50\n",
"45 51\n",
"46 53\n",
"47 54\n",
"48 55\n",
"49 56\n",
"Name: FIPS, dtype: string"
]
},
"execution_count": 10,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_codes_as_strings = fips_codes.astype('string')\n",
"fips_codes_as_strings"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"But, we want these to have a leading zero for values under 10. Let's write a function to do this for us."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"def convert(value):\n",
" if value >= 10:\n",
" return str(value)\n",
" else:\n",
" return '0' + str(value)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Try it with the first and last values in `fips`.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"01\n",
"13\n",
"56\n"
]
}
],
"source": [
"print(convert(fips[\"FIPS\"][0]))\n",
"print(convert(fips[\"FIPS\"][9]))\n",
"print(convert(fips[\"FIPS\"][49]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Note that the following call doesn't work. Why not?\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"fips_series = fips[\"FIPS\"]\n",
"\n",
"#fips_series[-1]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"The `Series` and `DataFrame` objects provide a separate method named `iloc` to work positionally, rather than with **index values**."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"1 2\n",
"2 4\n",
"3 5\n",
"4 6\n",
"5 8\n",
"6 9\n",
"7 10\n",
"8 12\n",
"9 13\n",
"Name: FIPS, dtype: int64"
]
},
"execution_count": 14,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips[\"FIPS\"].iloc[0:10]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now we can use the `map` method along with `convert` to get new values for each of the values in the \"FIPS\" column. `map` is a method of a `Series` that returns a **new** `Series` based on calling a function on each value of the original `Series`."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_428/1832877244.py:1: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" fips[\"FIPS\"] = fips[\"FIPS\"].map(convert)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" FIPS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Alabama | \n",
" AL | \n",
" 01 | \n",
"
\n",
" \n",
" 1 | \n",
" Alaska | \n",
" AK | \n",
" 02 | \n",
"
\n",
" \n",
" 2 | \n",
" Arizona | \n",
" AZ | \n",
" 04 | \n",
"
\n",
" \n",
" 3 | \n",
" Arkansas | \n",
" AR | \n",
" 05 | \n",
"
\n",
" \n",
" 4 | \n",
" California | \n",
" CA | \n",
" 06 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Postal Code FIPS\n",
"0 Alabama AL 01\n",
"1 Alaska AK 02\n",
"2 Arizona AZ 04\n",
"3 Arkansas AR 05\n",
"4 California CA 06"
]
},
"execution_count": 15,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips[\"FIPS\"] = fips[\"FIPS\"].map(convert)\n",
"fips.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"And now we can replace the index with the FIPS values, and get rid of the FIPS column. First remind ourselves of the structure of our original `DataFrame`."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" FIPS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Alabama | \n",
" AL | \n",
" 01 | \n",
"
\n",
" \n",
" 1 | \n",
" Alaska | \n",
" AK | \n",
" 02 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Postal Code FIPS\n",
"0 Alabama AL 01\n",
"1 Alaska AK 02"
]
},
"execution_count": 16,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips.head(n=2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now create a new `DataFrame` named `fips_new` with the `converted` `Series` serving as the **index**."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"fips_new = DataFrame(data={\"Name\": fips[\"Name\"], \"Postal Code\": fips[\"Postal Code\"]}, index=fips[\"FIPS\"])"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
"
\n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 01 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 02 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 04 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 05 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 06 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Postal Code\n",
"FIPS \n",
"01 NaN NaN\n",
"02 NaN NaN\n",
"04 NaN NaN\n",
"05 NaN NaN\n",
"06 NaN NaN"
]
},
"execution_count": 18,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new.head(n=5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Ughh, that doesn't seem to have worked very well! When combining `Series` objects together like this, Pandas always wants to combine things together that have the same **index**. Since our new `DataFame` uses a completely different index than the original `Series` objects, this approach doesn't work."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RangeIndex(start=0, stop=50, step=1)\n",
"Index(['01', '02', '04', '05', '06', '08', '09', '10', '12', '13', '15', '16',\n",
" '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28',\n",
" '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40',\n",
" '41', '42', '44', '45', '46', '47', '48', '49', '50', '51', '53', '54',\n",
" '55', '56'],\n",
" dtype='object', name='FIPS')\n"
]
}
],
"source": [
"print(fips.index)\n",
"print(fips_new.index)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Here's one approach we can try: using the `values` property of a `Series` object to get a more \"raw\" form of the data."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Alabama' 'Alaska' 'Arizona' 'Arkansas' 'California' 'Colorado'\n",
" 'Connecticut' 'Delaware' 'Florida' 'Georgia' 'Hawaii' 'Idaho' 'Illinois'\n",
" 'Indiana' 'Iowa' 'Kansas' 'Kentucky' 'Louisiana' 'Maine' 'Maryland'\n",
" 'Massachusetts' 'Michigan' 'Minnesota' 'Mississippi' 'Missouri' 'Montana'\n",
" 'Nebraska' 'Nevada' 'New Hampshire' 'New Jersey' 'New Mexico' 'New York'\n",
" 'North Carolina' 'North Dakota' 'Ohio' 'Oklahoma' 'Oregon' 'Pennsylvania'\n",
" 'Rhode Island' 'South Carolina' 'South Dakota' 'Tennessee' 'Texas' 'Utah'\n",
" 'Vermont' 'Virginia' 'Washington' 'West Virginia' 'Wisconsin' 'Wyoming']\n"
]
}
],
"source": [
"print(fips[\"Name\"].values)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"What might the type of `fips[\"Name\"].values` be?"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"print(type(fips[\"Name\"].values))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Let's try to create our new data frame using the **values** from the Series, so that we're not trying to match indexes."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Post Code | \n",
"
\n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 01 | \n",
" Alabama | \n",
" AL | \n",
"
\n",
" \n",
" 02 | \n",
" Alaska | \n",
" AK | \n",
"
\n",
" \n",
" 04 | \n",
" Arizona | \n",
" AZ | \n",
"
\n",
" \n",
" 05 | \n",
" Arkansas | \n",
" AR | \n",
"
\n",
" \n",
" 06 | \n",
" California | \n",
" CA | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Post Code\n",
"FIPS \n",
"01 Alabama AL\n",
"02 Alaska AK\n",
"04 Arizona AZ\n",
"05 Arkansas AR\n",
"06 California CA"
]
},
"execution_count": 22,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new = DataFrame(data={\"Name\": fips[\"Name\"].values, \"Post Code\": fips[\"Postal Code\"].values}, index=fips[\"FIPS\"])\n",
"fips_new.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Use the `info` method to see some details about the `fips_new` `DataFrame`."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Index: 50 entries, 01 to 56\n",
"Data columns (total 2 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Name 50 non-null object\n",
" 1 Post Code 50 non-null object\n",
"dtypes: object(2)\n",
"memory usage: 1.2+ KB\n"
]
}
],
"source": [
"fips_new.info()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Take a look at the **index** associated with `fips_new`."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['01', '02', '04', '05', '06', '08', '09', '10', '12', '13', '15', '16',\n",
" '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28',\n",
" '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40',\n",
" '41', '42', '44', '45', '46', '47', '48', '49', '50', '51', '53', '54',\n",
" '55', '56'],\n",
" dtype='object', name='FIPS')\n"
]
}
],
"source": [
"print(fips_new.index)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now we can get out information about Michigan like this"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Name Michigan\n",
"Post Code MI\n",
"Name: 26, dtype: object\n",
"Name Alabama\n",
"Post Code AL\n",
"Name: 01, dtype: object\n"
]
}
],
"source": [
"# Get row by index key value\n",
"print(fips_new.loc['26'])\n",
"# Get row by position\n",
"print(fips_new.iloc[0])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Other ways to create the new `DataFrame`"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"First, we can use the `set_index` method, which uses an **existing column** as the index for the `DataFrame`."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" FIPS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Alabama | \n",
" AL | \n",
" 01 | \n",
"
\n",
" \n",
" 1 | \n",
" Alaska | \n",
" AK | \n",
" 02 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Postal Code FIPS\n",
"0 Alabama AL 01\n",
"1 Alaska AK 02"
]
},
"execution_count": 26,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips.head(n=2)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" FIPS | \n",
"
\n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 01 | \n",
" Alabama | \n",
" AL | \n",
" 01 | \n",
"
\n",
" \n",
" 02 | \n",
" Alaska | \n",
" AK | \n",
" 02 | \n",
"
\n",
" \n",
" 04 | \n",
" Arizona | \n",
" AZ | \n",
" 04 | \n",
"
\n",
" \n",
" 05 | \n",
" Arkansas | \n",
" AR | \n",
" 05 | \n",
"
\n",
" \n",
" 06 | \n",
" California | \n",
" CA | \n",
" 06 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Postal Code FIPS\n",
"FIPS \n",
"01 Alabama AL 01\n",
"02 Alaska AK 02\n",
"04 Arizona AZ 04\n",
"05 Arkansas AR 05\n",
"06 California CA 06"
]
},
"execution_count": 27,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new = fips.set_index(keys=\"FIPS\", inplace=False, drop=False)\n",
"fips_new.head(n=5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"It kind of looks like there's an index value **FIPS**. Let's see if there is."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"# We can't just use the [ ] notation to try to get an entry out by index, since for a DataFrame \n",
"# this will look for a column with that name.\n"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "KeyError",
"evalue": "'FIPS'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3620\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3621\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcasted_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3622\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mKeyError\u001b[0m: 'FIPS'",
"\nThe above exception was the direct cause of the following exception:\n",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/tmp/ipykernel_428/3242763737.py\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfips_new\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'FIPS'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 965\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 966\u001b[0m \u001b[0mmaybe_callable\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_if_callable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 967\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmaybe_callable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 968\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 969\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_is_scalar_access\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_axis\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1200\u001b[0m \u001b[0;31m# fall thru to straight lookup\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1201\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_key\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1202\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_label\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1203\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1204\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_slice_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mslice_obj\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mslice\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_get_label\u001b[0;34m(self, label, axis)\u001b[0m\n\u001b[1;32m 1151\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_label\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1152\u001b[0m \u001b[0;31m# GH#5667 this will fail if the label is not present in the axis.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1153\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mxs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1154\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1155\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_handle_lowerdim_multi_index_axis0\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtup\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36mxs\u001b[0;34m(self, key, axis, level, drop_level)\u001b[0m\n\u001b[1;32m 3874\u001b[0m \u001b[0mnew_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3875\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3876\u001b[0;31m \u001b[0mloc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3877\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3878\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndarray\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3621\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcasted_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3622\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3623\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3624\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3625\u001b[0m \u001b[0;31m# If we have a listlike key, _check_indexing_error will raise\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mKeyError\u001b[0m: 'FIPS'"
]
}
],
"source": [
"fips_new.loc['FIPS']"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"So the display of _FIPS_ in the view of the table view is just Pandas telling us the **name** of the index, not an actual value."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['01', '02', '04', '05', '06', '08', '09', '10', '12', '13', '15', '16',\n",
" '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28',\n",
" '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40',\n",
" '41', '42', '44', '45', '46', '47', '48', '49', '50', '51', '53', '54',\n",
" '55', '56'],\n",
" dtype='object', name='FIPS')"
]
},
"execution_count": 31,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new.index"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" FIPS | \n",
" \n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
" | \n",
" \n",
" \n",
" \n",
" \n",
" 01 | \n",
" Alabama | \n",
" AL | \n",
" 01 | \n",
" \n",
" \n",
" 02 | \n",
" Alaska | \n",
" AK | \n",
" 02 | \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
" Name Postal Code FIPS\n",
"FIPS \n",
"01 Alabama AL 01\n",
"02 Alaska AK 02"
]
},
"execution_count": 32,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new.head(n=2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now let's get rid of the **FIPS** column (which could have been done with we called `set_index`)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" \n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
" \n",
" \n",
" \n",
" \n",
" 01 | \n",
" Alabama | \n",
" AL | \n",
" \n",
" \n",
" 02 | \n",
" Alaska | \n",
" AK | \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
" Name Postal Code\n",
"FIPS \n",
"01 Alabama AL\n",
"02 Alaska AK"
]
},
"execution_count": 33,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new.drop(\"FIPS\", axis=1, inplace=True)\n",
"fips_new.head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Adding data to the DataFrame object\n",
"Now we'll set the region for each of the Midwest states. First, set the region to 0 for **all the states**."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" Region | \n",
" \n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
" | \n",
" \n",
" \n",
" \n",
" \n",
" 55 | \n",
" Wisconsin | \n",
" WI | \n",
" 0 | \n",
" \n",
" \n",
" 56 | \n",
" Wyoming | \n",
" WY | \n",
" 0 | \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
" Name Postal Code Region\n",
"FIPS \n",
"55 Wisconsin WI 0\n",
"56 Wyoming WY 0"
]
},
"execution_count": 34,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new['Region'] = 0\n",
"fips_new.tail(2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now set the region to 1 for Michigan and Ohio, on the assumption that we **already know** the FIPS codes are **'26'** and **'39'**"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" Region | \n",
" \n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
" | \n",
" \n",
" \n",
" \n",
" \n",
" 26 | \n",
" Michigan | \n",
" MI | \n",
" 1 | \n",
" \n",
" \n",
" 39 | \n",
" Ohio | \n",
" OH | \n",
" 1 | \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
" Name Postal Code Region\n",
"FIPS \n",
"26 Michigan MI 1\n",
"39 Ohio OH 1"
]
},
"execution_count": 71,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new.loc[['26', '39'], 'Region'] = 1\n",
"fips_new.loc[['26', '39']]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"What if we don't know what the codes are? We can first get the index values for the rows using **boolean indexing**."
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"FIPS\n",
"01 False\n",
"02 False\n",
"04 False\n",
"05 False\n",
"06 False\n",
"08 False\n",
"09 False\n",
"10 False\n",
"12 False\n",
"13 False\n",
"15 False\n",
"16 False\n",
"17 True\n",
"18 True\n",
"19 False\n",
"20 False\n",
"21 False\n",
"22 False\n",
"23 False\n",
"24 False\n",
"25 False\n",
"26 False\n",
"27 False\n",
"28 False\n",
"29 False\n",
"30 False\n",
"31 False\n",
"32 False\n",
"33 False\n",
"34 False\n",
"35 False\n",
"36 False\n",
"37 False\n",
"38 False\n",
"39 False\n",
"40 False\n",
"41 False\n",
"42 False\n",
"44 False\n",
"45 False\n",
"46 False\n",
"47 False\n",
"48 False\n",
"49 False\n",
"50 False\n",
"51 False\n",
"53 False\n",
"54 False\n",
"55 False\n",
"56 False\n",
"Name: Postal Code, dtype: bool\n"
]
}
],
"source": [
"indiana = fips_new['Postal Code'] == 'IN'\n",
"illinois = fips_new['Postal Code'] == 'IL'\n",
"print(type(illinois.values))\n",
"import numpy as np\n",
"#either = np.logical_or(indiana,illinois)\n",
"either = indiana | illinois\n",
"print(either)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Let's see the rows in `fips_new` matched by this boolean array."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" Region | \n",
" \n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
" | \n",
" \n",
" \n",
" \n",
" \n",
" 17 | \n",
" Illinois | \n",
" IL | \n",
" 0 | \n",
" \n",
" \n",
" 18 | \n",
" Indiana | \n",
" IN | \n",
" 0 | \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
" Name Postal Code Region\n",
"FIPS \n",
"17 Illinois IL 0\n",
"18 Indiana IN 0"
]
},
"execution_count": 76,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new.loc[either]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"And now we can set the `Region` column."
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"fips_new.loc[either, 'Region'] = 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Here's an alternative way.\n",
"\n",
"Get the **index** associated with those rows in `either`"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"in_ill_index = fips_new[either].index\n",
"in_ill_index"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"And now we are basically able to do what we did for Michigan and Ohio when we knew those index values in advance."
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" Region | \n",
" \n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
" | \n",
" \n",
" \n",
" \n",
" \n",
" 17 | \n",
" Illinois | \n",
" IL | \n",
" 1 | \n",
" \n",
" \n",
" 18 | \n",
" Indiana | \n",
" IN | \n",
" 1 | \n",
" \n",
" \n",
" 26 | \n",
" Michigan | \n",
" MI | \n",
" 1 | \n",
" \n",
" \n",
" 39 | \n",
" Ohio | \n",
" OH | \n",
" 1 | \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
" Name Postal Code Region\n",
"FIPS \n",
"17 Illinois IL 1\n",
"18 Indiana IN 1\n",
"26 Michigan MI 1\n",
"39 Ohio OH 1"
]
},
"execution_count": 78,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"#fips_new.loc[in_ill_index, 'Region'] = 1\n",
"fips_new[fips_new['Region'] == 1]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Here's a way to do it all in one step. Not necessarily recommended, because it's kind of hard to figure out exactly what it's doing, but it works!"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"FIPS\n",
"27 1\n",
"55 1\n",
"Name: Region, dtype: int64"
]
},
"execution_count": 84,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new.loc[(fips_new['Postal Code'] == 'WI') | (fips_new['Postal Code'] == 'MN'), 'Region'] = 1"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" Region | \n",
" \n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
" | \n",
" \n",
" \n",
" \n",
" \n",
" 17 | \n",
" Illinois | \n",
" IL | \n",
" 1 | \n",
" \n",
" \n",
" 18 | \n",
" Indiana | \n",
" IN | \n",
" 1 | \n",
" \n",
" \n",
" 26 | \n",
" Michigan | \n",
" MI | \n",
" 1 | \n",
" \n",
" \n",
" 27 | \n",
" Minnesota | \n",
" MN | \n",
" 1 | \n",
" \n",
" \n",
" 39 | \n",
" Ohio | \n",
" OH | \n",
" 1 | \n",
" \n",
" \n",
" 55 | \n",
" Wisconsin | \n",
" WI | \n",
" 1 | \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
" Name Postal Code Region\n",
"FIPS \n",
"17 Illinois IL 1\n",
"18 Indiana IN 1\n",
"26 Michigan MI 1\n",
"27 Minnesota MN 1\n",
"39 Ohio OH 1\n",
"55 Wisconsin WI 1"
]
},
"execution_count": 82,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new.loc[fips_new['Region'] == 1]"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" \n",
" | \n",
" Name | \n",
" Postal Code | \n",
" \n",
" \n",
" FIPS | \n",
" | \n",
" | \n",
" \n",
" \n",
" \n",
" \n",
" 26 | \n",
" Michigan | \n",
" MI | \n",
" \n",
" \n",
" 39 | \n",
" Ohio | \n",
" OH | \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
" Name Postal Code\n",
"FIPS \n",
"26 Michigan MI\n",
"39 Ohio OH"
]
},
"execution_count": 83,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"fips_new.loc[['26', '39'], ['Name', 'Postal Code']]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# Getting the population data\n",
"\n",
"The cell below contains the function written previously to get us the census population data from api.census.gov\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"import requests\n",
"import json\n",
"import numpy as np\n",
"\n",
"def download_pop_data(year, states):\n",
" states_list = \",\".join(states)\n",
"\n",
" base_url = 'http://api.census.gov/data'\n",
" application = \"pep/charagegroups\"\n",
" query = f\"get=POP&for=state:{states_list}\"\n",
"\n",
"\n",
" url = f\"{base_url}/{year}/{application}?{query}\"\n",
"\n",
" response = requests.get(url)\n",
"\n",
" if not (200 <= response.status_code <= 299):\n",
" return (False, None)\n",
"\n",
" state_data = json.loads(response.content)\n",
" state_data = np.array(state_data[1:])\n",
"\n",
" population = state_data[:,0].astype('int')\n",
" state = state_data[:, 1].astype('int')\n",
"\n",
"\n",
" data = np.column_stack((state, population))\n",
" years = np.full(data.shape[0],year, dtype='int')\n",
"\n",
" year_data = np.column_stack((years, data))\n",
" return (True, year_data)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"download_pop_data(2015, fips_new.index)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Let's modify this function to return a Pandas `DataFrame` instead of a NumPy `ndarray`, using the FIPS code as the index."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"def download_pop_data_pd(year, states):\n",
" states_list = \",\".join(states)\n",
"\n",
" base_url = 'http://api.census.gov/data'\n",
" application = \"pep/charagegroups\"\n",
" query = f\"get=POP&for=state:{states_list}\"\n",
"\n",
"\n",
" url = f\"{base_url}/{year}/{application}?{query}\"\n",
"\n",
" response = requests.get(url)\n",
"\n",
" if not (200 <= response.status_code <= 299):\n",
" return (False, None)\n",
"\n",
" state_data = json.loads(response.content)\n",
" state_data = np.array(state_data[1:])\n",
"\n",
" data = DataFrame(data{'Population': })\n",
"\n",
"\n",
" return (True, data)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"result, df = download_pop_data_pd(2015, fips_new.index)\n",
"\n",
"if result:\n",
" print(df.head())\n",
" print(df.dtypes)\n",
"else:\n",
" print(\"Failed to download population data for 2015 using the Pandas method\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Here's one way for us to generate all of the years of data. Pandas recommends against doing it this way, though, because of the costs of memory reallocation on each call."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Here's a more efficient way: create individual data frames for each year, then call `concat` to combine them."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now let's get the total population in the Midwest for 2015. "
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (system-wide)",
"language": "python",
"metadata": {
"cocalc": {
"description": "Python 3 programming language",
"priority": 100,
"url": "https://www.python.org/"
}
},
"name": "python3",
"resource_dir": "/ext/jupyter/kernels/python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
} |