CoCalc -- 07_classification_continued.ipynb

GitHub Repository: UBC-DSCI/dsci-100-assets
Path: blob/master/2019-spring/slides/07_classification_continued.ipynb
²⁰⁵¹ views

Kernel: R

DSCI 100 - Introduction to Data Science

Lecture 7 - Classification continued

2019-02-14

Continuing with the classification problem

Can we use data we have seen in the past, to predict something about the future?

Unanswered questions from last week

Is our model any good?
How do we choose k?

1. Is our model any good?

Is one accuracy measurement good enough?

Cross-validation as an alternative approach

2. How do we choose k?

The big picture

The big picture

in reality there are many iterations of the cross-validation stage where you fiddle with your classifier to try to find the best one

save some data for the end, once you are done fiddling, so you don't "cheat"

Why are we doing all this???

Our question:

Can we use past information to predict the class labels of new observations we don't have labels for?

We can always do this, but we might only want to do this if we have evidence we can do this well.

Class activity 1

In your group, discuss and explain cross-validation in your own words. Post your group's answer as a response to this post in Piazza.

Class activity 2

In your group, discuss and explain what a test, validation and training data set are in your own words. Post your group's answer as a response to this post in Piazza.