Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/april_18/lessons/lesson-05/README.md
1904 views
---
title: Flex lab duration: "1:45" creator: name: Amy Roberts city: NYC
---

Flex lesson

DS | Lesson 5

Instructor Note: Feel free to run this workshop earlier, as needed

LEARNING OBJECTIVES

After this lesson, you will be able to:

  • Identify the data science toolkit

  • Navigate Git and the Command Line

  • Describe Probability vs Odds

STUDENT PRE-WORK

Before this lesson, you should already be able to:

  • Explain the difference between variance and bias

  • Use descriptive stats to understand your data

INSTRUCTOR PREP

Before this lesson, instructors will need to:

  • Determine which sections of this flex lesson are applicable for their class

  • Copy and modify the lesson slide deck as needed

  • Add to the "Additional Resources" section for this lesson

  • Optional: Prep for the Github Lesson

    • Draw the skeleton (boxes without content) of the landscape.jpg on the white board.

    • Fill it in as you talk through the different features.

    • Write commands (e.g., git push and git pull) on the lines connecting the correct tool to CL/Terminal as you discuss each one.

A notebook of [data visualization exercises](./code/Data Visualization Lab.ipynb ) is available.

LESSON GUIDE

TIMINGTYPETOPIC
5 minOpeningLesson Objectives
5 minIntroductionCommon Data Science Tools
10 minCommand Line DemoCommand Line Demo
10 minText Editor DemoText Editor Demo
10 miniPython NotebookiPython Notebooks
10 minGit ToolsIntro to Git
50 minIndependent PracticeGIT and CL
5 minIntroductionOdds and Probability
20 minGuided PracticeHand calcs
15 minWrap-upReview Guided Practice

Opening (5 min)

  • Review Current Lesson Objectives

Intro: Tools of the Trade (10 mins)

Today we are going to review some of the tools we use in data science and see how they fit into the wider programming environment.

Command line- This is your portal to your computer and the outside world.

Instructor note: Walk through the bottom half of this diagram: landscape.jpg.

Local machine

On your local machine you have a variety of tools at your disposal, including:

  1. Text editor

  2. Programs/Packages/Tools

  3. Your files

All of these can be accessed through terminal and many can also be accessed through a GUI, or Graphical User Interface.

Let's take a look at the file directory system through finder and then through terminal. Pull up the class folder on Finder. Then navigate to the same folder via terminal.

Command Line Demo

Demo a few commands-

  1. cd

  2. pwd

  3. $home

  4. mkdir

  5. open

As we mentioned, we can access many tools with terminal. Let's walk through a few that are important for data science.

So far we've been using iPython notebook in place of a text editor. However, there are lots of other options available, including: Emacs, Vim, Sublime.

Check: What is a text editor? Can you name any other examples?

Text Editor - Sublime

Let's do a demo of Sublime with python.

Instructor Note: Pull up the text editor and demo writing a simple python program. You can use say-hi.py .

iPython Notebook

Where does IPython Notebook fit in?

From the iPython Notebook docs- "The notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results.

iPython notebooks combine two components:

  • A web application: a browser-based tool for interactive authoring of documents which combine explanatory text, mathematics, computations and their rich media output.

  • Notebook documents: a representation of all content visible in the web application, including inputs and outputs of the computations, explanatory text, mathematics, images, and rich media representations of objects."

As you just saw, terminal allows us to run programs such as python or javascript (node). It also allows us to reach out to the outside world and add packages we may need.

When we do this with python we often use a tool called pip. Let's pip install a package. We could do this with a GUI, but the best way is to use the CL.

Here we will checkout a popular Python Library, Beautiful Soup:

pip install beautifulsoup4

Outside World

As we saw with pip, the CL can connect us to the outside world. In data this is particularly important.

Let's say we have HIPAA protected data (note: HIPAA is a policy that protects health data for people. It requires extra security so you can't leave data around on your local computer.) Often times it will be the data we'll leave on an external computer that we need to communicate with. We can do this through the CL.

Instructor Note: Demo SSH and/or Tunnel to a AWS or similar instance

Ok! so we've see a few of the ways that we can use terminal to enable our data analysis and help make it more powerful.

Now lets talk about a tool that make version control much easier.

Intro to Git

Git is a way of tracking changes we've made to our programs and go back in time to fix errors. It is also a powerful tool for collaborating with colleagues allowing you to work on different aspects of the project simultaneously and merge all the changes together seamlessly. Let's see how it works. There are lots of ways to use git one common tool is Github (it's a way of sharing your code with others on your team or with the world).

Let's see it in action. As I'm sure you've guessed, a common way to do this is through CL! You can also use the github GUI.

Instructor Note: Demo making a small change to say-hi.py (e.g. change howdy to hello). Save it.

git add git commit git push

When a colleague wants to implement our change, they can use a command called git pull.

Instructor Note: Draw this new connection on the board.

Ok! That was a lot of material. Let's review.

Check:

  1. What are the big advantages of using CL?

  • speed - connection with outside world - flexibility

  1. What's a GUI?

  • Graphical user interface (e.g. finder vs terminal )

  1. Will I destroy my computer if I use terminal?

  • NOPE! ...Although you should be careful with the sudo command and cautious when deleting files with rm

Independent Practice- GIT and Command Line (50 min)

Instructor Note - Review any of the following exercises from the course pre-work

  • "Codecademy" Python module

  • GA Command Line

Alternative Resources:

Introduction: Odds and Probability

How proficient are you with statistical odds? Let's review some quick basics about odds and probability so you can test your familiarity with these concepts.

Guided Practice- Odds and Probability (20 min)

Instructor Note: Walk through starter code with students.

Review (5-15 mins)

  • What did we cover today?

  • What are some common data science tools?

  • Why are these tools useful?

  • What questions do you have?


BEFORE NEXT CLASS

PROJECTTBA

ADDITIONAL RESOURCES

  • If any