Advanced Classical Machine Learning:

This is the second session of the classical machine learning (ML) introduction presented by Amira. She presents a brief of history of ML, then introduces simple ML methods, i.e., linear regression based models. Amira introduces neural networks, starting with simple perceptron and then illustrating Feed Forward Neural Networks (FFNNs). In the final section, Amira introduces Supper Vector Machines (SVM) concepts for classification type tasks. Amira introduces a very high level overview of Quantum Machine Learning, with the quadrants showing both classical and quantum machine learning in terms of Data Processing device as one axis and the Data generating system as the remaining axis.

FAQ

How are linear models related to the standard statistical methods of least squares for regression analysis? They are the same

Are the activation functions in each layer the same or different? - "It can be differnt" - Boniface Yogendran - "Usually last activation function is different rest all are kept same." - Edwin

Just confirming...The more data there is the easier the higher accuracy the model has? How to combat lack of available data especialy when create a CNN network? One way to combat is replicating present data and augment that data. Maybe vertical flip, horizontal flip, crop, blur, etc.

Do linear models in general get more/less accurate when taking more features into account? If you'll take more more features, it'll result in overfitting. If less, then you'll have biased output.

On the regression model, does theta include the magnitude of the intercept? Yes, it's

\theta_0

During the lecture it was stated that one can easily add or out a bias, how does this work? Could you please explain? - A1: For example, think about on a linear model, you can change the value of the intercept to move the boundary within the plane.

A2: The main function of a bias is to provide every node with a trainable constant value (in addition to the normal inputs that the node receives). You can achieve that with a single bias node with connections to N nodes, or with N bias nodes each with a single connection; the result should be the same

Is it correct to think of neural networks (NN) as models that can be used for non-linear fitting, in general? Would it an overkill to use NNs for datasets with linear dependencies? NN requires lots of data and computationally expensive. If its a linear model, we can get away with less computationally expensive ML methods.

Is there a rule of thumb in the selection of the weights and thus the number of neurons? Actually in practice, number of neurons in each layer is 2^x. Number of layers is your call. Weights are automatically selected for you during backpropagation (optimization). You just initialize weight vector with random numbers. Thats it.

Does noise affect all these models? Yes it does. It won't if your model is generalized aptly (which hasn't been achieved yet as still in one way or the other, models are affected by some kind of a noise)

Can we use a circular or elliptical function instead of a feature map for data that is not separable linearly? Yes. feature mapping is easier.

Live Q&A

What are the different ways to map the data in higher dimensions? Answer was provided at timestamp 2m 29s in the Lecture 4.2 Live Q&A session

Could you please explain again what a kernel is? Answer was provided at timestamp 4m 48s in the Lecture 4.2 Live Q&A session

How do we decide the Bias ? and does it remain same throughout training? Answer was provided at timestamp 9m 30s in the Lecture 4.2 Live Q&A session

Someone in the previous video wanted a bit more clarification on why the dual formulation is useful Answer was provided at timestamp 10m 40s in the Lecture 4.2 Live Q&A session

What if, after applying the feature map, data is still not linearly separable? Answer was provided at timestamp 13m 4s in the Lecture 4.2 Live Q&A session

Why do we not have a different activation function for each parameter set ? Answer was provided at timestamp 14m 46s in the Lecture 4.2 Live Q&A session

How can we choose a good activation function for a FFNN ? Answer was provided at timestamp 17m 21s in the Lecture 4.2 Live Q&A session

How do we incorporate the optimization of the distance between the linear model and nearest points in each class into the SVM ? Answer was provided at timestamp 20m 5s in the Lecture 4.2 Live Q&A session

Advanced Classical Machine Learning:

Suggested links

FAQ

Live Q&A

Suggested links

Suggested Reading

Product

Resources

Company