Path: blob/master/2019-fall/slides/01_intro_jupyter_r.ipynb
2051 views
DSCI 100 - Introduction to Data Science
Lecture 1 - Getting started with Jupyter & R
2019-09-05
Teaching team introductions
Instructors:
Trevor Campbell
Tiffany Timbers
Teaching Assistants
Daniel Alimohd
Jordan Bourak
Alex Chow
Grandon Seto
Petal Vitis
High-level goals of this course:
Learn how to use reproducible tools (Jupyter + R) to do data analysis
Learn how to solve 4 common problems in Data Science, and when you have the means to do so
But wait, what is Data Science exactly???
In this course we define data science as:
the processes used to obtain value (i.e., insight) from data through reproducible and auditable processes.
Value (i.e., insight) is gained through asking and answering statistical questions.
Mapping statistical questions to data analyses
6 Types of questions we can ask:
Descriptive
Exploratory
Inferential
Predictive
Causal
Mechanistic
See examples of each here
Problems we will focus on in DSCI 100:
Predict a class/category for a new observation/measurement (e.g., cancerous or benign tumour)
Predicting a value for a new observation/measurement (e.g., 10 km race time for 20 year old females with a BMI of 25)
Finding previously unknown/unlabelled subgroups in your data (e.g., products commonly bought together on Amazon)
Estimating an average or a proportion from a representative sample (group of people or units) and using that estimate to generalize to the broader population (e.g., the proportion of undergraduate students that own an iphone)
Course syllabus:
Read on your own time: https://github.com/UBC-DSCI/dsci-100/blob/master/README.md
TL;DR
Well, please do read the syllabus later... but for now...
Flipped classroom
read text/watch videos before class
We will kick off the lectures with a little intro (like today)
do lecture worksheets and activities in class (Thursdays), due Saturdays at 6pm
work on tutorial questions in class (Tuesdays), due Wednesdays at 10pm
you will need a laptop/chromebook/etc in every class! Don't have one? Borrow one from the library (see here).
Everything will be posted as links/buttons in Canvas
Collaborate
talk to each other (in class, on Piazza) as you work through the worksheets and tutorials
group project at middle-end of course
follow the DSCI 100 course code of conduct (TL;DR be respectful, inclusive and nice!)
First week learning goals:
use a Jupyter notebook to execute provided R code
edit code and markdown cells in a Jupyter notebook
create new code and markdown cells in a Jupyter notebook
load the
tidyverse
library into R
create new variables and objects in R using the assignment symbol
use the help and documentation tools in R
match the names of the following functions from the
tidyverse
library to their documentation descriptions:read_csv
,select
,mutate
,filter
,ggplot
,aes
We've got a lot to do! Let's get started!
Jupyter notebook demo time!
Now it's your turn!
Everyone, navigate to Canvas and open the assignment
worksheet_01
.
Use your neighbours, the TAs & Instructors and the textbook reading to help you get unstuck when needed!
I will interupt in about 20 minutes for a class activity.
Class activity:
Practice using LaTeX and code formatting in Piazza!
Your task: Create a Piazza post in the class_activity
folder to say hello and introduce yourself to everyone. In that post include the following code formatted as code, and the following LaTeX formatted as Math:
Code to include:
LaTeX to include:
What did we learn today?
How to use the basics in R
How to use Jupyter notebooks
How to ask for help on Piazza
That you can use Jupyter with R