Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Jupyter notebook 28_Machine_Learning_III/ML_3_Inclass_Homework.ipynb
Discussion
Clump together at a set of tables with a TA. Discuss your thoughts about the pre-class reading material.
Game time!
Now that we have this capability and we've seen some of the dangers, we're going to spend this week on a game. In this game, we have two goals: 1) We want to build the best predictor that we can, but 2)at all times we want to have an accurate idea of how well the predictor works.
For this game, we've managed to get our hands on some data about two diseases (D1 and D2). Each of these datasets has features in columns and examples in rows. Each feature represents a clinical measurement, while each row represents a person. We want to be able to predict whether or not a person has a disease (the last column).
We'll supply you with four datasets for each disease throughout the week. For the first day, we've given you two of them. We also provide example code to read the data. From there, the path that you take is up to you. We do not know the best predictor or even what the maximum achievable accuracy for these data! This is a chance to experiment and find out what best captures disease status.
The machine learning algorithm, SVM, that we've already introduced has many things that you can change. You've already played around with changes to the C parameter. You could change other options as well. You may want to try to play around with different "kernel" parameters, "C" parameters, even the underlying algorithm!
If you feel like trying entirely different algorithms, a few potential ones are demonstrated in scikit-learn's documentation: http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
In the interests of recording your research steps, whatever you change should be recorded and noted in the iPython notebook. We provide an example first move below. In every case, please label the move number, the goal (what you hope to implement), the rationale (why you've chosen to implement that, or make that move as a result of the prior move), and an expected accuracy which you fill out after you build and run your code.
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-1-2dd336d7980e> in <module>()
19
20 # use numpy to load our training set
---> 21 d1_train = np.loadtxt(open("D1_S1.csv", "rb"), delimiter=",")
22 # features are all rows for columns before 200
23 d1_train_features = d1_train[:,:200]
IOError: [Errno 2] No such file or directory: 'D1_S1.csv'