Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ycchen00
GitHub Repository: ycchen00/Introduction-to-Data-Science-in-Python
Path: blob/main/resources/week-2/DataFrameIndexingAndLoading_ed.ipynb
3223 views
Kernel: Python 3

In this course, we'll be largely using smaller or moderate-sized datasets. A common workflow is to read the dataset in, usually from some external file, then begin to clean and manipulate the dataset for analysis. In this lecture I'm going to demonstrate how you can load data from a comma separated file into a DataFrame.

!ls
DataFrameDataStructure_ed.ipynb DataFrameIndexingAndLoading_ed.ipynb DataFrameManipulation_ed.ipynb ExampleManipulatingDataFrames.ipynb IndexingDataFrame_ed.ipynb MissingValues_ed.ipynb PandasIntroduction.ipynb QueryingDataFrame_ed.ipynb QueryingSeries_ed.ipynb SeriesDataStructure_ed.ipynb datasets
# Lets just jump right in and talk about comma separated values (csv) files. You've undoubtedly used these - # any spreadsheet software like excel or google sheets can save output in CSV format. It's pretty loose as a # format, and incredibly lightweight. And totally ubiquitous. # Now, I'm going to make a quick aside because it's convenient here. The Jupyter notebooks use ipython as the # kernel underneath, which provides convenient ways to integrate lower level shell commands, which are # programs run in the underlying operating system. If you're not familiar with the shell don't worry too much # about this, but if you are, this is super handy for integration of your data science workflows. I want to # use one shell command here called "cat", for "concatenate", which just outputs the contents of a file. In # ipython if we prepend the line with an exclamation mark it will execute the remainder of the line as a shell # command. So lets look at the content of a CSV file !cat datasets/Admission_Predict.csv
Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR ,CGPA,Research,Chance of Admit 1,337,118,4,4.5,4.5,9.65,1,0.92 2,324,107,4,4,4.5,8.87,1,0.76 3,316,104,3,3,3.5,8,1,0.72 4,322,110,3,3.5,2.5,8.67,1,0.8 5,314,103,2,2,3,8.21,0,0.65 6,330,115,5,4.5,3,9.34,1,0.9 7,321,109,3,3,4,8.2,1,0.75 8,308,101,2,3,4,7.9,0,0.68 9,302,102,1,2,1.5,8,0,0.5 10,323,108,3,3.5,3,8.6,0,0.45 11,325,106,3,3.5,4,8.4,1,0.52 12,327,111,4,4,4.5,9,1,0.84 13,328,112,4,4,4.5,9.1,1,0.78 14,307,109,3,4,3,8,1,0.62 15,311,104,3,3.5,2,8.2,1,0.61 16,314,105,3,3.5,2.5,8.3,0,0.54 17,317,107,3,4,3,8.7,0,0.66 18,319,106,3,4,3,8,1,0.65 19,318,110,3,4,3,8.8,0,0.63 20,303,102,3,3.5,3,8.5,0,0.62 21,312,107,3,3,2,7.9,1,0.64 22,325,114,4,3,2,8.4,0,0.7 23,328,116,5,5,5,9.5,1,0.94 24,334,119,5,5,4.5,9.7,1,0.95 25,336,119,5,4,3.5,9.8,1,0.97 26,340,120,5,4.5,4.5,9.6,1,0.94 27,322,109,5,4.5,3.5,8.8,0,0.76 28,298,98,2,1.5,2.5,7.5,1,0.44 29,295,93,1,2,2,7.2,0,0.46 30,310,99,2,1.5,2,7.3,0,0.54 31,300,97,2,3,3,8.1,1,0.65 32,327,103,3,4,4,8.3,1,0.74 33,338,118,4,3,4.5,9.4,1,0.91 34,340,114,5,4,4,9.6,1,0.9 35,331,112,5,4,5,9.8,1,0.94 36,320,110,5,5,5,9.2,1,0.88 37,299,106,2,4,4,8.4,0,0.64 38,300,105,1,1,2,7.8,0,0.58 39,304,105,1,3,1.5,7.5,0,0.52 40,307,108,2,4,3.5,7.7,0,0.48 41,308,110,3,3.5,3,8,1,0.46 42,316,105,2,2.5,2.5,8.2,1,0.49 43,313,107,2,2.5,2,8.5,1,0.53 44,332,117,4,4.5,4,9.1,0,0.87 45,326,113,5,4.5,4,9.4,1,0.91 46,322,110,5,5,4,9.1,1,0.88 47,329,114,5,4,5,9.3,1,0.86 48,339,119,5,4.5,4,9.7,0,0.89 49,321,110,3,3.5,5,8.85,1,0.82 50,327,111,4,3,4,8.4,1,0.78 51,313,98,3,2.5,4.5,8.3,1,0.76 52,312,100,2,1.5,3.5,7.9,1,0.56 53,334,116,4,4,3,8,1,0.78 54,324,112,4,4,2.5,8.1,1,0.72 55,322,110,3,3,3.5,8,0,0.7 56,320,103,3,3,3,7.7,0,0.64 57,316,102,3,2,3,7.4,0,0.64 58,298,99,2,4,2,7.6,0,0.46 59,300,99,1,3,2,6.8,1,0.36 60,311,104,2,2,2,8.3,0,0.42 61,309,100,2,3,3,8.1,0,0.48 62,307,101,3,4,3,8.2,0,0.47 63,304,105,2,3,3,8.2,1,0.54 64,315,107,2,4,3,8.5,1,0.56 65,325,111,3,3,3.5,8.7,0,0.52 66,325,112,4,3.5,3.5,8.92,0,0.55 67,327,114,3,3,3,9.02,0,0.61 68,316,107,2,3.5,3.5,8.64,1,0.57 69,318,109,3,3.5,4,9.22,1,0.68 70,328,115,4,4.5,4,9.16,1,0.78 71,332,118,5,5,5,9.64,1,0.94 72,336,112,5,5,5,9.76,1,0.96 73,321,111,5,5,5,9.45,1,0.93 74,314,108,4,4.5,4,9.04,1,0.84 75,314,106,3,3,5,8.9,0,0.74 76,329,114,2,2,4,8.56,1,0.72 77,327,112,3,3,3,8.72,1,0.74 78,301,99,2,3,2,8.22,0,0.64 79,296,95,2,3,2,7.54,1,0.44 80,294,93,1,1.5,2,7.36,0,0.46 81,312,105,3,2,3,8.02,1,0.5 82,340,120,4,5,5,9.5,1,0.96 83,320,110,5,5,4.5,9.22,1,0.92 84,322,115,5,4,4.5,9.36,1,0.92 85,340,115,5,4.5,4.5,9.45,1,0.94 86,319,103,4,4.5,3.5,8.66,0,0.76 87,315,106,3,4.5,3.5,8.42,0,0.72 88,317,107,2,3.5,3,8.28,0,0.66 89,314,108,3,4.5,3.5,8.14,0,0.64 90,316,109,4,4.5,3.5,8.76,1,0.74 91,318,106,2,4,4,7.92,1,0.64 92,299,97,3,5,3.5,7.66,0,0.38 93,298,98,2,4,3,8.03,0,0.34 94,301,97,2,3,3,7.88,1,0.44 95,303,99,3,2,2.5,7.66,0,0.36 96,304,100,4,1.5,2.5,7.84,0,0.42 97,306,100,2,3,3,8,0,0.48 98,331,120,3,4,4,8.96,1,0.86 99,332,119,4,5,4.5,9.24,1,0.9 100,323,113,3,4,4,8.88,1,0.79 101,322,107,3,3.5,3.5,8.46,1,0.71 102,312,105,2,2.5,3,8.12,0,0.64 103,314,106,2,4,3.5,8.25,0,0.62 104,317,104,2,4.5,4,8.47,0,0.57 105,326,112,3,3.5,3,9.05,1,0.74 106,316,110,3,4,4.5,8.78,1,0.69 107,329,111,4,4.5,4.5,9.18,1,0.87 108,338,117,4,3.5,4.5,9.46,1,0.91 109,331,116,5,5,5,9.38,1,0.93 110,304,103,5,5,4,8.64,0,0.68 111,305,108,5,3,3,8.48,0,0.61 112,321,109,4,4,4,8.68,1,0.69 113,301,107,3,3.5,3.5,8.34,1,0.62 114,320,110,2,4,3.5,8.56,0,0.72 115,311,105,3,3.5,3,8.45,1,0.59 116,310,106,4,4.5,4.5,9.04,1,0.66 117,299,102,3,4,3.5,8.62,0,0.56 118,290,104,4,2,2.5,7.46,0,0.45 119,296,99,2,3,3.5,7.28,0,0.47 120,327,104,5,3,3.5,8.84,1,0.71 121,335,117,5,5,5,9.56,1,0.94 122,334,119,5,4.5,4.5,9.48,1,0.94 123,310,106,4,1.5,2.5,8.36,0,0.57 124,308,108,3,3.5,3.5,8.22,0,0.61 125,301,106,4,2.5,3,8.47,0,0.57 126,300,100,3,2,3,8.66,1,0.64 127,323,113,3,4,3,9.32,1,0.85 128,319,112,3,2.5,2,8.71,1,0.78 129,326,112,3,3.5,3,9.1,1,0.84 130,333,118,5,5,5,9.35,1,0.92 131,339,114,5,4,4.5,9.76,1,0.96 132,303,105,5,5,4.5,8.65,0,0.77 133,309,105,5,3.5,3.5,8.56,0,0.71 134,323,112,5,4,4.5,8.78,0,0.79 135,333,113,5,4,4,9.28,1,0.89 136,314,109,4,3.5,4,8.77,1,0.82 137,312,103,3,5,4,8.45,0,0.76 138,316,100,2,1.5,3,8.16,1,0.71 139,326,116,2,4.5,3,9.08,1,0.8 140,318,109,1,3.5,3.5,9.12,0,0.78 141,329,110,2,4,3,9.15,1,0.84 142,332,118,2,4.5,3.5,9.36,1,0.9 143,331,115,5,4,3.5,9.44,1,0.92 144,340,120,4,4.5,4,9.92,1,0.97 145,325,112,2,3,3.5,8.96,1,0.8 146,320,113,2,2,2.5,8.64,1,0.81 147,315,105,3,2,2.5,8.48,0,0.75 148,326,114,3,3,3,9.11,1,0.83 149,339,116,4,4,3.5,9.8,1,0.96 150,311,106,2,3.5,3,8.26,1,0.79 151,334,114,4,4,4,9.43,1,0.93 152,332,116,5,5,5,9.28,1,0.94 153,321,112,5,5,5,9.06,1,0.86 154,324,105,3,3,4,8.75,0,0.79 155,326,108,3,3,3.5,8.89,0,0.8 156,312,109,3,3,3,8.69,0,0.77 157,315,105,3,2,2.5,8.34,0,0.7 158,309,104,2,2,2.5,8.26,0,0.65 159,306,106,2,2,2.5,8.14,0,0.61 160,297,100,1,1.5,2,7.9,0,0.52 161,315,103,1,1.5,2,7.86,0,0.57 162,298,99,1,1.5,3,7.46,0,0.53 163,318,109,3,3,3,8.5,0,0.67 164,317,105,3,3.5,3,8.56,0,0.68 165,329,111,4,4.5,4,9.01,1,0.81 166,322,110,5,4.5,4,8.97,0,0.78 167,302,102,3,3.5,5,8.33,0,0.65 168,313,102,3,2,3,8.27,0,0.64 169,293,97,2,2,4,7.8,1,0.64 170,311,99,2,2.5,3,7.98,0,0.65 171,312,101,2,2.5,3.5,8.04,1,0.68 172,334,117,5,4,4.5,9.07,1,0.89 173,322,110,4,4,5,9.13,1,0.86 174,323,113,4,4,4.5,9.23,1,0.89 175,321,111,4,4,4,8.97,1,0.87 176,320,111,4,4.5,3.5,8.87,1,0.85 177,329,119,4,4.5,4.5,9.16,1,0.9 178,319,110,3,3.5,3.5,9.04,0,0.82 179,309,108,3,2.5,3,8.12,0,0.72 180,307,102,3,3,3,8.27,0,0.73 181,300,104,3,3.5,3,8.16,0,0.71 182,305,107,2,2.5,2.5,8.42,0,0.71 183,299,100,2,3,3.5,7.88,0,0.68 184,314,110,3,4,4,8.8,0,0.75 185,316,106,2,2.5,4,8.32,0,0.72 186,327,113,4,4.5,4.5,9.11,1,0.89 187,317,107,3,3.5,3,8.68,1,0.84 188,335,118,5,4.5,3.5,9.44,1,0.93 189,331,115,5,4.5,3.5,9.36,1,0.93 190,324,112,5,5,5,9.08,1,0.88 191,324,111,5,4.5,4,9.16,1,0.9 192,323,110,5,4,5,8.98,1,0.87 193,322,114,5,4.5,4,8.94,1,0.86 194,336,118,5,4.5,5,9.53,1,0.94 195,316,109,3,3.5,3,8.76,0,0.77 196,307,107,2,3,3.5,8.52,1,0.78 197,306,105,2,3,2.5,8.26,0,0.73 198,310,106,2,3.5,2.5,8.33,0,0.73 199,311,104,3,4.5,4.5,8.43,0,0.7 200,313,107,3,4,4.5,8.69,0,0.72 201,317,103,3,2.5,3,8.54,1,0.73 202,315,110,2,3.5,3,8.46,1,0.72 203,340,120,5,4.5,4.5,9.91,1,0.97 204,334,120,5,4,5,9.87,1,0.97 205,298,105,3,3.5,4,8.54,0,0.69 206,295,99,2,2.5,3,7.65,0,0.57 207,315,99,2,3.5,3,7.89,0,0.63 208,310,102,3,3.5,4,8.02,1,0.66 209,305,106,2,3,3,8.16,0,0.64 210,301,104,3,3.5,4,8.12,1,0.68 211,325,108,4,4.5,4,9.06,1,0.79 212,328,110,4,5,4,9.14,1,0.82 213,338,120,4,5,5,9.66,1,0.95 214,333,119,5,5,4.5,9.78,1,0.96 215,331,117,4,4.5,5,9.42,1,0.94 216,330,116,5,5,4.5,9.36,1,0.93 217,322,112,4,4.5,4.5,9.26,1,0.91 218,321,109,4,4,4,9.13,1,0.85 219,324,110,4,3,3.5,8.97,1,0.84 220,312,104,3,3.5,3.5,8.42,0,0.74 221,313,103,3,4,4,8.75,0,0.76 222,316,110,3,3.5,4,8.56,0,0.75 223,324,113,4,4.5,4,8.79,0,0.76 224,308,109,2,3,4,8.45,0,0.71 225,305,105,2,3,2,8.23,0,0.67 226,296,99,2,2.5,2.5,8.03,0,0.61 227,306,110,2,3.5,4,8.45,0,0.63 228,312,110,2,3.5,3,8.53,0,0.64 229,318,112,3,4,3.5,8.67,0,0.71 230,324,111,4,3,3,9.01,1,0.82 231,313,104,3,4,4.5,8.65,0,0.73 232,319,106,3,3.5,2.5,8.33,1,0.74 233,312,107,2,2.5,3.5,8.27,0,0.69 234,304,100,2,2.5,3.5,8.07,0,0.64 235,330,113,5,5,4,9.31,1,0.91 236,326,111,5,4.5,4,9.23,1,0.88 237,325,112,4,4,4.5,9.17,1,0.85 238,329,114,5,4.5,5,9.19,1,0.86 239,310,104,3,2,3.5,8.37,0,0.7 240,299,100,1,1.5,2,7.89,0,0.59 241,296,101,1,2.5,3,7.68,0,0.6 242,317,103,2,2.5,2,8.15,0,0.65 243,324,115,3,3.5,3,8.76,1,0.7 244,325,114,3,3.5,3,9.04,1,0.76 245,314,107,2,2.5,4,8.56,0,0.63 246,328,110,4,4,2.5,9.02,1,0.81 247,316,105,3,3,3.5,8.73,0,0.72 248,311,104,2,2.5,3.5,8.48,0,0.71 249,324,110,3,3.5,4,8.87,1,0.8 250,321,111,3,3.5,4,8.83,1,0.77 251,320,104,3,3,2.5,8.57,1,0.74 252,316,99,2,2.5,3,9,0,0.7 253,318,100,2,2.5,3.5,8.54,1,0.71 254,335,115,4,4.5,4.5,9.68,1,0.93 255,321,114,4,4,5,9.12,0,0.85 256,307,110,4,4,4.5,8.37,0,0.79 257,309,99,3,4,4,8.56,0,0.76 258,324,100,3,4,5,8.64,1,0.78 259,326,102,4,5,5,8.76,1,0.77 260,331,119,4,5,4.5,9.34,1,0.9 261,327,108,5,5,3.5,9.13,1,0.87 262,312,104,3,3.5,4,8.09,0,0.71 263,308,103,2,2.5,4,8.36,1,0.7 264,324,111,3,2.5,1.5,8.79,1,0.7 265,325,110,2,3,2.5,8.76,1,0.75 266,313,102,3,2.5,2.5,8.68,0,0.71 267,312,105,2,2,2.5,8.45,0,0.72 268,314,107,3,3,3.5,8.17,1,0.73 269,327,113,4,4.5,5,9.14,0,0.83 270,308,108,4,4.5,5,8.34,0,0.77 271,306,105,2,2.5,3,8.22,1,0.72 272,299,96,2,1.5,2,7.86,0,0.54 273,294,95,1,1.5,1.5,7.64,0,0.49 274,312,99,1,1,1.5,8.01,1,0.52 275,315,100,1,2,2.5,7.95,0,0.58 276,322,110,3,3.5,3,8.96,1,0.78 277,329,113,5,5,4.5,9.45,1,0.89 278,320,101,2,2.5,3,8.62,0,0.7 279,308,103,2,3,3.5,8.49,0,0.66 280,304,102,2,3,4,8.73,0,0.67 281,311,102,3,4.5,4,8.64,1,0.68 282,317,110,3,4,4.5,9.11,1,0.8 283,312,106,3,4,3.5,8.79,1,0.81 284,321,111,3,2.5,3,8.9,1,0.8 285,340,112,4,5,4.5,9.66,1,0.94 286,331,116,5,4,4,9.26,1,0.93 287,336,118,5,4.5,4,9.19,1,0.92 288,324,114,5,5,4.5,9.08,1,0.89 289,314,104,4,5,5,9.02,0,0.82 290,313,109,3,4,3.5,9,0,0.79 291,307,105,2,2.5,3,7.65,0,0.58 292,300,102,2,1.5,2,7.87,0,0.56 293,302,99,2,1,2,7.97,0,0.56 294,312,98,1,3.5,3,8.18,1,0.64 295,316,101,2,2.5,2,8.32,1,0.61 296,317,100,2,3,2.5,8.57,0,0.68 297,310,107,3,3.5,3.5,8.67,0,0.76 298,320,120,3,4,4.5,9.11,0,0.86 299,330,114,3,4.5,4.5,9.24,1,0.9 300,305,112,3,3,3.5,8.65,0,0.71 301,309,106,2,2.5,2.5,8,0,0.62 302,319,108,2,2.5,3,8.76,0,0.66 303,322,105,2,3,3,8.45,1,0.65 304,323,107,3,3.5,3.5,8.55,1,0.73 305,313,106,2,2.5,2,8.43,0,0.62 306,321,109,3,3.5,3.5,8.8,1,0.74 307,323,110,3,4,3.5,9.1,1,0.79 308,325,112,4,4,4,9,1,0.8 309,312,108,3,3.5,3,8.53,0,0.69 310,308,110,4,3.5,3,8.6,0,0.7 311,320,104,3,3,3.5,8.74,1,0.76 312,328,108,4,4.5,4,9.18,1,0.84 313,311,107,4,4.5,4.5,9,1,0.78 314,301,100,3,3.5,3,8.04,0,0.67 315,305,105,2,3,4,8.13,0,0.66 316,308,104,2,2.5,3,8.07,0,0.65 317,298,101,2,1.5,2,7.86,0,0.54 318,300,99,1,1,2.5,8.01,0,0.58 319,324,111,3,2.5,2,8.8,1,0.79 320,327,113,4,3.5,3,8.69,1,0.8 321,317,106,3,4,3.5,8.5,1,0.75 322,323,104,3,4,4,8.44,1,0.73 323,314,107,2,2.5,4,8.27,0,0.72 324,305,102,2,2,2.5,8.18,0,0.62 325,315,104,3,3,2.5,8.33,0,0.67 326,326,116,3,3.5,4,9.14,1,0.81 327,299,100,3,2,2,8.02,0,0.63 328,295,101,2,2.5,2,7.86,0,0.69 329,324,112,4,4,3.5,8.77,1,0.8 330,297,96,2,2.5,1.5,7.89,0,0.43 331,327,113,3,3.5,3,8.66,1,0.8 332,311,105,2,3,2,8.12,1,0.73 333,308,106,3,3.5,2.5,8.21,1,0.75 334,319,108,3,3,3.5,8.54,1,0.71 335,312,107,4,4.5,4,8.65,1,0.73 336,325,111,4,4,4.5,9.11,1,0.83 337,319,110,3,3,2.5,8.79,0,0.72 338,332,118,5,5,5,9.47,1,0.94 339,323,108,5,4,4,8.74,1,0.81 340,324,107,5,3.5,4,8.66,1,0.81 341,312,107,3,3,3,8.46,1,0.75 342,326,110,3,3.5,3.5,8.76,1,0.79 343,308,106,3,3,3,8.24,0,0.58 344,305,103,2,2.5,3.5,8.13,0,0.59 345,295,96,2,1.5,2,7.34,0,0.47 346,316,98,1,1.5,2,7.43,0,0.49 347,304,97,2,1.5,2,7.64,0,0.47 348,299,94,1,1,1,7.34,0,0.42 349,302,99,1,2,2,7.25,0,0.57 350,313,101,3,2.5,3,8.04,0,0.62 351,318,107,3,3,3.5,8.27,1,0.74 352,325,110,4,3.5,4,8.67,1,0.73 353,303,100,2,3,3.5,8.06,1,0.64 354,300,102,3,3.5,2.5,8.17,0,0.63 355,297,98,2,2.5,3,7.67,0,0.59 356,317,106,2,2,3.5,8.12,0,0.73 357,327,109,3,3.5,4,8.77,1,0.79 358,301,104,2,3.5,3.5,7.89,1,0.68 359,314,105,2,2.5,2,7.64,0,0.7 360,321,107,2,2,1.5,8.44,0,0.81 361,322,110,3,4,5,8.64,1,0.85 362,334,116,4,4,3.5,9.54,1,0.93 363,338,115,5,4.5,5,9.23,1,0.91 364,306,103,2,2.5,3,8.36,0,0.69 365,313,102,3,3.5,4,8.9,1,0.77 366,330,114,4,4.5,3,9.17,1,0.86 367,320,104,3,3.5,4.5,8.34,1,0.74 368,311,98,1,1,2.5,7.46,0,0.57 369,298,92,1,2,2,7.88,0,0.51 370,301,98,1,2,3,8.03,1,0.67 371,310,103,2,2.5,2.5,8.24,0,0.72 372,324,110,3,3.5,3,9.22,1,0.89 373,336,119,4,4.5,4,9.62,1,0.95 374,321,109,3,3,3,8.54,1,0.79 375,315,105,2,2,2.5,7.65,0,0.39 376,304,101,2,2,2.5,7.66,0,0.38 377,297,96,2,2.5,2,7.43,0,0.34 378,290,100,1,1.5,2,7.56,0,0.47 379,303,98,1,2,2.5,7.65,0,0.56 380,311,99,1,2.5,3,8.43,1,0.71 381,322,104,3,3.5,4,8.84,1,0.78 382,319,105,3,3,3.5,8.67,1,0.73 383,324,110,4,4.5,4,9.15,1,0.82 384,300,100,3,3,3.5,8.26,0,0.62 385,340,113,4,5,5,9.74,1,0.96 386,335,117,5,5,5,9.82,1,0.96 387,302,101,2,2.5,3.5,7.96,0,0.46 388,307,105,2,2,3.5,8.1,0,0.53 389,296,97,2,1.5,2,7.8,0,0.49 390,320,108,3,3.5,4,8.44,1,0.76 391,314,102,2,2,2.5,8.24,0,0.64 392,318,106,3,2,3,8.65,0,0.71 393,326,112,4,4,3.5,9.12,1,0.84 394,317,104,2,3,3,8.76,0,0.77 395,329,111,4,4.5,4,9.23,1,0.89 396,324,110,3,3.5,3.5,9.04,1,0.82 397,325,107,3,3,3.5,9.11,1,0.84 398,330,116,4,5,4.5,9.45,1,0.91 399,312,103,3,3.5,4,8.78,0,0.67 400,333,117,4,5,4,9.66,1,0.95
# We see from the output that there is a list of columns, and the column identifiers are listed as strings on # the first line of the file. Then we have rows of data, all columns separated by commas. Now, there are lots # of oddities with the CSV file format, and there is no one agreed upon specification. So you should be # prepared to do a bit of work when you pull down CSV files to explore. But this lecture isn't focused on CSV # files, and is more about pandas DataFrames. So lets jump into that. # Let's bring in pandas to work with import pandas as pd # Pandas mades it easy to turn a CSV into a dataframe, we just call read_csv() df = pd.read_csv('datasets/Admission_Predict.csv') # And let's look at the first few rows df.head()
# We notice that by default index starts with 0 while the students' serial number starts from 1. If you jump # back to the CSV output you'll deduce that pandas has create a new index. Instead, we can set the serial no. # as the index if we want to by using the index_col. df = pd.read_csv('datasets/Admission_Predict.csv', index_col=0) df.head()
# Notice that we have two columns "SOP" and "LOR" and probably not everyone knows what they mean So let's # change our column names to make it more clear. In Pandas, we can use the rename() function It takes a # parameter called columns, and we need to pass into a dictionary which the keys are the old column name and # the value is the corresponding new column name new_df=df.rename(columns={'GRE Score':'GRE Score', 'TOEFL Score':'TOEFL Score', 'University Rating':'University Rating', 'SOP': 'Statement of Purpose','LOR': 'Letter of Recommendation', 'CGPA':'CGPA', 'Research':'Research', 'Chance of Admit':'Chance of Admit'}) new_df.head()
# From the output, we can see that only "SOP" is changed but not "LOR" Why is that? Let's investigate this a # bit. First we need to make sure we got all the column names correct We can use the columns attribute of # dataframe to get a list. new_df.columns
Index(['GRE Score', 'TOEFL Score', 'University Rating', 'Statement of Purpose', 'LOR ', 'CGPA', 'Research', 'Chance of Admit '], dtype='object')
# If we look at the output closely, we can see that there is actually a space right after "LOR" and a space # right after "Chance of Admit. Sneaky, huh? So this is why our rename dictionary does not work for LOR, # because the key we used was just three characters, instead of "LOR " # There are a couple of ways we could address this. One way would be to change a column by including the space # in the name new_df=new_df.rename(columns={'LOR ': 'Letter of Recommendation'}) new_df.head()
# So that works well, but it's a bit fragile. What if that was a tab instead of a space? Or two spaces? # Another way is to create some function that does the cleaning and then tell renamed to apply that function # across all of the data. Python comes with a handy string function to strip white space called "strip()". # When we pass this in to rename we pass the function as the mapper parameter, and then indicate whether the # axis should be columns or index (row labels) new_df=new_df.rename(mapper=str.strip, axis='columns') # Let's take a look at results new_df.head()
# Now we've got it - both SOP and LOR have been renamed and Chance of Admit has been trimmed up. Remember # though that the rename function isn't modifying the dataframe. In this case, df is the same as it always # was, there's just a copy in new_df with the changed names. df.columns
Index(['GRE Score', 'TOEFL Score', 'University Rating', 'SOP', 'LOR ', 'CGPA', 'Research', 'Chance of Admit '], dtype='object')
# We can also use the df.columns attribute by assigning to it a list of column names which will directly # rename the columns. This will directly modify the original dataframe and is very efficient especially when # you have a lot of columns and you only want to change a few. This technique is also not affected by subtle # errors in the column names, a problem that we just encountered. With a list, you can use the list index to # change a certain value or use list comprehension to change all of the values # As an example, lets change all of the column names to lower case. First we need to get our list cols = list(df.columns) # Then a little list comprehenshion cols = [x.lower().strip() for x in cols] # Then we just overwrite what is already in the .columns attribute df.columns=cols # And take a look at our results df.head()

In this lecture, you've learned how to import a CSV file into a pandas DataFrame object, and how to do some basic data cleaning to the column names. The CSV file import mechanisms in pandas have lots of different options, and you really need to learn these in order to be proficient at data manipulation. Once you have set up the format and shape of a DataFrame, you have a solid start to further actions such as conducting data analysis and modeling.

Now, there are other data sources you can load directly into dataframes as well, including HTML web pages, databases, and other file formats. But the CSV is by far the most common data format you'll run into, and an important one to know how to manipulate in pandas.