Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_02/code/exploratory-data-analysis-master/README.md
1904 views

Exploratory Data Analysis in Pandas

Unit 2: Required


Materials We Provide

TopicDescriptionLink
LessonPandas for Exploratory Data Analysis (ipynb slides)Here
SolutionCompleted template from lessonHere
PracticePrompts to practice EDA in PandasHere
Data for EDA PracticeHere
Sample Solutions for EDA PracticeHere
DatasetsCountry/continent/servings of alcoholHere
UFO sighting recordsHere
Movie & Title Info from IMDBHere
User Info from IMDBHere
Movie & Title Info from IMDBHere

This lesson purposefully uses a large number of datasets. This allows students to practice opening different types of data files. So, it would be useful to emphasize manually looking at the files to identify the separator and header. Having many datasets available allows us to explore a variety of themes throughout the lesson that might not be present in one dataset alone.

Note: Datasets have 3 types. ".csv" files are separated by commas, ".tsv" by tabs, and ".tbl" by "|" character


Learning Objectives

  • Explain the definition and purpose of Pandas in a data science context

  • Manipulate Pandas DataFrames and Series

  • Filter and sort Pandas data

  • Manipulate DataFrame columns

  • Define how to handle null and missing values


Student Requirements

Before this lesson(s), students should already be able to:

  • Recall and define basic syntax for Python code


Lesson Outline

Instructor Note: Start with the lesson Jupyter slide deck. Next, walk the students through the lab. Periodically stop and let the students try the challenges. The challenges are typically just 1-3 lines of code that are very similar to what was just discussed.

TOTAL: 170 mins

  • What is Pandas (20 mins)

  • Reading Files, Selecting Columns, and Summarizing (15 mins)

    • EXERCISE ONE (15 mins)

  • Filtering and Sorting (15 mins)

    • EXERCISE TWO (15 mins)

  • Renaming, Adding, and Removing Columns (15 mins)

  • Handling Missing Values (15 mins)

    • EXERCISE THREE (15 mins)

  • Split-Apply-Combine (15 mins)

    • EXERCISE FOUR (15 mins)

  • Selecting Multiple Columns and Filtering Rows (10 mins)

  • Joining (Merging) DataFrames (5 mins)

  • OPTIONAL: Other Commonly Used Features

  • OPTIONAL: Other Less Used Features of Pandas

  • Summary


Additional Resources

For more information on this topic, check out the following resources: