Theano is a Python library that is widely used for deep learning. It allows you to take advantage of the GPU for faster floating point calculations, as gradient descent can take quite a while. In this book, I will show you how to write Theano code, but if you want to know more about how to get a machine with GPU capabilities and how to tweak your Theano code and commands to use them, you should visit my course on Udemy at: https://udemy.com/data-science-deep-learning-in-theano-tensorflow. If you would like to view this code in a Python file on your computer, please go to https://github.com/lazyprogrammer/machine_learning_examples/tree/master/ann_class2.
Learning Numpy when you already know Python is easy, but moving from Numpy to Theano is a different beast. There are a lot of new concepts that just do not look like regular Python. So let’s first talk about Theano variables. Theano has different types of variable objects based on the number of dimensions of the object. For example, a 0-dimensional object is a scalar, a 1-dimensional object is a vector, a 2-dimensional object is a matrix, and a 3+ dimensional object is a tensor. They are all within the theano.tensor module. So in your import section:
import theano.tensor as T
You can create a scalar variable like this:
c = T.scalar(‘c’)
The string that is passed in is the variable’s name, which may be useful for debugging. A vector could be created like this:
v = T.vector(‘v’)
And a matrix like this:
A = T.matrix(‘A’)
What is strange about regular Python vs. Theano is that none of the variables we just created have values! Theano variables are more like nodes in a graph. We only “pass in” values to the graph when we want to perform computations like feedforward or backpropagation, which we haven’t defined yet. TensorFlow works in the same way.
Despite that, we can still define operations on the variables. For example, if you wanted to do matrix multiplication, it is similar to Numpy:
u = A.dot(v)
You can think of this as creating a new node in the graph called u, which is connected to A and v by a matrix multiply. To actually do the multiply with real values, we need to create a Theano function.
import theano
matrix_times_vector = theano.function(inputs=[A,v], outputs=[u])
import numpy as np
A_val = np.array([[1,2], [3,4]])
v_val = np.array([5,6])
u_val = matrix_times_vector(A_val, v_val)
Using this, try to think about how you would implement the “feedforward” action of a neural network. One of the biggest advantages of Theano is that it links all these variables up into a graph and can use that structure to calculate gradients for you using the chain rule, which we discussed in the previous chapter.
In Theano regular variables are not “updateable”, and to make an updateable variable we create what is called a shared variable. So let’s do that now:
x = theano.shared(20.0, ‘x’)
Let’s also create a simple cost function that we can solve ourselves and we know it has a global minimum:
cost = x*x + x
And let’s tell Theano how we want to update x by giving it an update expression:
x_update = x – 0.3*T.grad(cost, x)
Now let’s create a Theano train function. We’re going to add a new argument called the updates argument. It takes in a list of tuples, and each tuple has 2 things in it. The first thing is the shared variable to update, and the 2nd thing is the update expression to use.
train = theano.function(inputs=[], outputs=cost, updates=[(x, x_update)])
Notice that ‘x’ is not an input, it’s the thing we update. In later examples, the inputs will be the data and labels. So the inputs param takes in data and labels, and the updates param takes in your model parameters with their updates.
Now we simply write a loop to call the train function again and again:
for i in xrange(25):
cost_val = train()
print(cost_val)
And print the optimal value of x:
print(x.get_value())
Now let’s take all these basic concepts and build a neural network in Theano.