Path: blob/master/final_project/03-exploratory-analysis/README.md
1904 views
Final Project, Part 3: Exploratory Data Analysis
PROMPT
Exploratory data analysis is a crucial and informative step in the data process. It helps confirm or deny your initial hypotheses and helps visualize the relationships among your data. Your exploratory analysis also informs the kinds of data transformations that you'll need to optimize for machine learning models.
In this assignment, you will explore and visualize your initial analysis in order to effectively tell your data's story. You'll create an iPython notebook that explores your data mathematically, using a python visualization package.
Goal: Confirm your data and create an exploratory analysis notebook with stat analysis and visualization.
DELIVERABLES
Exploratory Analysis Writeup
Requirements:
Review the data set and project with an EIR during office hours.
Practice importing (potentially unformatted) data into clean matrices|data frames, and if necessary, export into a form that makes sense (text files or a database, for example).
Explore the mathematical properties and visualize data through a python visualization tool (matplotlib and seaborn)
Provide insight about the data set and any impact on a hypothesis.
Detailed Breakdown:
A well organized iPython notebook with code and output
At least one visual for each independent variable and, if possible, its relationship to your dependent variable.
It's just as important to show what's not correlated as it is to show any actual correlations found.
Visuals should be well labeled and intuitive based on the data types.
For example, if your X variable is temperature and Y is "did it rain," a reasonable visual would be two histograms of temperature, one where it rained, and one where it didn't.
Tables are a perfectly valid visualization tool! Interweave them into your work.
Bonus:
Surface and share your analysis online. Jupyter makes this very simple and the setup should not take long.
Try experimenting with other visualization languages; python/pandas-highcharts, shiny/r, or for a real challenge, d3 on its own. Interactive data analysis opens the doors for others to easily interpret your work and explore the data themselves!
Submission:
TBD by instructor.
TIMELINE
Deadline | Deliverable | Description |
---|---|---|
Lesson 8 | Part 1 - Lightning Presentation | Present 3 Problem Statements |
Lesson 14 | Part 2 - Experiment Writeup | Research Design Problem Statement & Outline |
Lesson 16 | Part 3 - Exploratory Analysis | Dataset Approval and Exploratory Analysis |
Lesson 18 | Part 4 - Notebook Draft | iPython Notebook & Model Draft |
Lesson 20 | Part 5 - Presentation | Present Your Final Report |
EVALUATION
Your project will be assessed using the following standards:
Parse the Data
Rubric: Click here for the complete rubric.
Requirements for these standards will be assessed using the scale below:
While your total score may serve as a helpful gauge of whether you've met project goals, specific standards scores are more important since they can show you where to focus your efforts in the future!
RESOURCES
Suggestions for Getting Started
Keep the project simple! The "cool" part of the analysis will come; just looking at simple relationships between variables can be incredibly insightful.
Consider building some helper functions that help you quickly visualize and interpret data.
Exploratory data analysis should be formulaic; the code should not be holding you back. There are plenty of "starter code" examples from class materials.
DRY: Don't Repeat Yourself! If you see yourself copy and pasting code a lot, turn it into a function, and use the function instead!
Specific Tips
This deliverable should be similar to the work you did for Unit Project 2 earlier in the course.