Path: blob/master/2019-spring/slides/07_classification_continued.ipynb
2051 views
DSCI 100 - Introduction to Data Science
Lecture 7 - Classification continued
2019-02-14
Continuing with the classification problem
Can we use data we have seen in the past, to predict something about the future?
Unanswered questions from last week
Is our model any good?
How do we choose
k
?
1. Is our model any good?
Is one accuracy measurement good enough?
Cross-validation as an alternative approach
2. How do we choose k?
The big picture
The big picture
in reality there are many iterations of the cross-validation stage where you fiddle with your classifier to try to find the best one
save some data for the end, once you are done fiddling, so you don't "cheat"
Why are we doing all this???
Our question:
Can we use past information to predict the class labels of new observations we don't have labels for?
We can always do this, but we might only want to do this if we have evidence we can do this well.
Class activity 1
In your group, discuss and explain cross-validation in your own words. Post your group's answer as a response to this post in Piazza.
Class activity 2
In your group, discuss and explain what a test, validation and training data set are in your own words. Post your group's answer as a response to this post in Piazza.