Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/final_project/03-exploratory-analysis/README.md
1904 views

Final Project, Part 3: Exploratory Data Analysis

PROMPT

Exploratory data analysis is a crucial and informative step in the data process. It helps confirm or deny your initial hypotheses and helps visualize the relationships among your data. Your exploratory analysis also informs the kinds of data transformations that you'll need to optimize for machine learning models.

In this assignment, you will explore and visualize your initial analysis in order to effectively tell your data's story. You'll create an iPython notebook that explores your data mathematically, using a python visualization package.

Goal: Confirm your data and create an exploratory analysis notebook with stat analysis and visualization.


DELIVERABLES

Exploratory Analysis Writeup

  • Requirements:

    • Review the data set and project with an EIR during office hours.

    • Practice importing (potentially unformatted) data into clean matrices|data frames, and if necessary, export into a form that makes sense (text files or a database, for example).

    • Explore the mathematical properties and visualize data through a python visualization tool (matplotlib and seaborn)

    • Provide insight about the data set and any impact on a hypothesis.

  • Detailed Breakdown:

    • A well organized iPython notebook with code and output

    • At least one visual for each independent variable and, if possible, its relationship to your dependent variable.

      • It's just as important to show what's not correlated as it is to show any actual correlations found.

      • Visuals should be well labeled and intuitive based on the data types.

        • For example, if your X variable is temperature and Y is "did it rain," a reasonable visual would be two histograms of temperature, one where it rained, and one where it didn't.

      • Tables are a perfectly valid visualization tool! Interweave them into your work.

  • Bonus:

    • Surface and share your analysis online. Jupyter makes this very simple and the setup should not take long.

    • Try experimenting with other visualization languages; python/pandas-highcharts, shiny/r, or for a real challenge, d3 on its own. Interactive data analysis opens the doors for others to easily interpret your work and explore the data themselves!

  • Submission:

    • TBD by instructor.


TIMELINE

DeadlineDeliverableDescription
Lesson 8Part 1 - Lightning PresentationPresent 3 Problem Statements
Lesson 14Part 2 - Experiment WriteupResearch Design Problem Statement & Outline
Lesson 16Part 3 - Exploratory AnalysisDataset Approval and Exploratory Analysis
Lesson 18Part 4 - Notebook DraftiPython Notebook & Model Draft
Lesson 20Part 5 - PresentationPresent Your Final Report

EVALUATION

Your project will be assessed using the following standards:

  1. Parse the Data

Rubric: Click here for the complete rubric.

Requirements for these standards will be assessed using the scale below:

Score | Expectations ----- | ------------ **0** | _Incomplete._ **1** | _Does not meet expectations._ **2** | _Meets expectations, good job!_ **3** | _Exceeds expectations, you wonderful creature, you!_

While your total score may serve as a helpful gauge of whether you've met project goals, specific standards scores are more important since they can show you where to focus your efforts in the future!


RESOURCES

Suggestions for Getting Started

  • Keep the project simple! The "cool" part of the analysis will come; just looking at simple relationships between variables can be incredibly insightful.

  • Consider building some helper functions that help you quickly visualize and interpret data.

    • Exploratory data analysis should be formulaic; the code should not be holding you back. There are plenty of "starter code" examples from class materials.

  • DRY: Don't Repeat Yourself! If you see yourself copy and pasting code a lot, turn it into a function, and use the function instead!

Specific Tips

  • This deliverable should be similar to the work you did for Unit Project 2 earlier in the course.