Path: blob/master/deep_learning/softmax_tensorflow.ipynb
1470 views
Tensorflow
TensorFlow provides multiple APIs. The lowest level API--TensorFlow Core-- provides you with complete programming control. We recommend TensorFlow Core for machine learning researchers and others who require fine levels of control over their models
Hello World
We can think of TensorFlow Core programs as consisting of two discrete sections:
Building the computational graph.
Running the computational graph.
We can think of tensorflow as a system to define our computation, and using the operation that we've defined it will construct a computation graph (where each operation becomes a node in the graph). The computation graph that we've defined will not be run
unless we give it some context and explicitly tell it to do so. In this case, we create the Session
that encapsulates the environment in which the objects are evaluated (execute the operations that are defined in the graph).
Consider another example that simply add and multiply two constant numbers.
The example above is not especially interesting because it always produces a constant result. A graph can be parameterized to accept external inputs, known as placeholders
. Think of it as the input data we would give to machine learning algorithm at some point.
We can do the same operation as above by first defining a placeholder
(note that we must specify the data type). Then feed
in values using feed_dict
when we run
it.
Some matrix operations are the same compared to numpy. e.g.
The functionality of numpy.mean
and tensorflow.reduce_mean
are the same. When axis argument parameter is 1, it computes mean across (3,4) and (5,6) and (6,7), so 1 defines across which axis the mean is computed (axis = 1, means the operation is along the column, so it will compute the mean for each row). When it is 0, the mean is computed across(3,5,6) and (4,6,7), and so on. The same can be applied to argmax which returns the index that contains the maximum value along an axis.
Linear Regression
We'll start off by writing a simple linear regression model. To do so, we first need to understand the difference between tf.Variable
and tf.placeholder
.
Stackoverflow. The difference is that with
tf.Variable
you have to provide an initial value when you declare it. Withtf.placeholder
you don't have to provide an initial value and you can specify it at run time with thefeed_dict
argument insideSession.run
. In short, we will usetf.Variable
for trainable variables such as weights (W) and biases (B) for our model. On the other hand,tf.placeholder
is used to feed actual training examples.
Also note that, constants are automatically initialized when we call tf.constant
, and their value can never change. By contrast, variables are not initialized when we call tf.Variable
. To initialize all the variables in a TensorFlow program, we must explicitly call a special operation called tf.global_variables_initializer()
. Things will become clearer with the example below.
MNIST Using Softmax
MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:
Each image is 28 pixels by 28 pixels, which is essentially a array of numbers. To use it in a context of a machine learning problem, we can flatten this array into a vector of , this will be the number of features for each image. It doesn't matter how we flatten the array, as long as we're consistent between images. Note that, flattening the data throws away information about the 2D structure of the image. Isn't that bad? Well, the best computer vision methods do exploit this structure. But the simple method we will be using here, a softmax regression (defined below), won't.
The dataset also includes labels for each image, telling us the each image's label. For example, the labels for the above images are 5, 0, 4, and 1. Here we're going to train a softmax model to look at images and predict what digits they are. The possible label values in the MNIST dataset are numbers between 0 and 9, hence this will be a 10-class classification problem.
In the following code chunk, we define the overall computational graph/structure for the softmax classifier using the cross entropy cost function as the objective. Recall that the formula for this function can be denoted as:
Where y is our predicted probability distribution, and y′ is the true distribution.
Now that we defined the structure of our model, we'll:
Define a optimization algorithm the train it. In this case, we ask TensorFlow to minimize our defined cross_entropy cost using the gradient descent algorithm with a learning rate of 0.5. There are also other off the shelf optimizers that we can use that are faster for more complex models.
We'll also add an operation to initialize the variables we created
Define helper "function" to evaluate the prediction accuracy
Now it's time to run it. During each step of the loop, we get a "batch" of one hundred random data points (defined by batch_size
) from our training set. We run train_step feeding in the batches data to replace the placeholders.
Using small batches of random data is called stochastic training -- in this case, stochastic gradient descent. Ideally, we'd like to use all our data for every step of training because that would give us a better sense of what we should be doing, but that's expensive. So, instead, we use a different subset every time. Doing this is cheap and has much of the same benefit.
Notice that we did not have to worry about computing the gradient to update the model, the nice thing about Tensorflow is that, once we've defined the structure of our model it has the capability to automatically differentiate mathematical expressions. This means we no longer need to compute the gradients ourselves! In this example, our softmax classifier obtained pretty nice result around 90%. But we can certainly do better with more advanced techniques such as convolutional deep learning.