Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
UBC-DSCI
GitHub Repository: UBC-DSCI/dsci-100-assets
Path: blob/master/2019-fall/slides/01_intro_jupyter_r.ipynb
2051 views
Kernel: R

DSCI 100 - Introduction to Data Science

Lecture 1 - Getting started with Jupyter & R

2019-09-05

Teaching team introductions

Instructors:

  • Trevor Campbell

  • Tiffany Timbers

Teaching Assistants

  • Daniel Alimohd

  • Jordan Bourak

  • Alex Chow

  • Grandon Seto

  • Petal Vitis

High-level goals of this course:

  1. Learn how to use reproducible tools (Jupyter + R) to do data analysis

  1. Learn how to solve 4 common problems in Data Science, and when you have the means to do so

But wait, what is Data Science exactly???

In this course we define data science as:

the processes used to obtain value (i.e., insight) from data through reproducible and auditable processes.

Value (i.e., insight) is gained through asking and answering statistical questions.

Mapping statistical questions to data analyses

6 Types of questions we can ask:

  1. Descriptive

  1. Exploratory

  1. Inferential

  1. Predictive

  1. Causal

  1. Mechanistic

See examples of each here

Problems we will focus on in DSCI 100:

  1. Predict a class/category for a new observation/measurement (e.g., cancerous or benign tumour)

  1. Predicting a value for a new observation/measurement (e.g., 10 km race time for 20 year old females with a BMI of 25)

  1. Finding previously unknown/unlabelled subgroups in your data (e.g., products commonly bought together on Amazon)

  1. Estimating an average or a proportion from a representative sample (group of people or units) and using that estimate to generalize to the broader population (e.g., the proportion of undergraduate students that own an iphone)

Course syllabus:

Read on your own time: https://github.com/UBC-DSCI/dsci-100/blob/master/README.md

TL;DR

Well, please do read the syllabus later... but for now...

Flipped classroom

  • read text/watch videos before class

  • We will kick off the lectures with a little intro (like today)

  • do lecture worksheets and activities in class (Thursdays), due Saturdays at 6pm

  • work on tutorial questions in class (Tuesdays), due Wednesdays at 10pm

  • you will need a laptop/chromebook/etc in every class! Don't have one? Borrow one from the library (see here).

Everything will be posted as links/buttons in Canvas

Collaborate

  • talk to each other (in class, on Piazza) as you work through the worksheets and tutorials

  • group project at middle-end of course

  • follow the DSCI 100 course code of conduct (TL;DR be respectful, inclusive and nice!)

First week learning goals:

  • use a Jupyter notebook to execute provided R code

  • edit code and markdown cells in a Jupyter notebook

  • create new code and markdown cells in a Jupyter notebook

  • load the tidyverse library into R

  • create new variables and objects in R using the assignment symbol

  • use the help and documentation tools in R

  • match the names of the following functions from the tidyverse library to their documentation descriptions: read_csv, select, mutate, filter, ggplot, aes

We've got a lot to do! Let's get started!

Jupyter notebook demo time!

Now it's your turn!

  • Use your neighbours, the TAs & Instructors and the textbook reading to help you get unstuck when needed!

  • I will interupt in about 20 minutes for a class activity.

Class activity:

Practice using LaTeX and code formatting in Piazza!

Your task: Create a Piazza post in the class_activity folder to say hello and introduce yourself to everyone. In that post include the following code formatted as code, and the following LaTeX formatted as Math:

Code to include:

whyamihere <- "to learn data science!"

LaTeX to include:

data = (statistics + computer \: science)^2

What did we learn today?

  • How to use the basics in R

  • How to use Jupyter notebooks

  • How to ask for help on Piazza

  • That you can use Jupyter with R