First, I’m going to define my inputs, outputs, and weights (the weights will be shared variables):
X = T.matrix(‘X’)
T = T.matrix(‘T’)
W1 = theano.shared(np.random.randn(D, M), ‘W1’)
W2 = theano.shared(np.random.randn(M, K), ‘W2’)
Notice I’ve added a “X” and “T” prefix to the Theano variables because I’m going to call my actual data, which are Numpy arrays, X and T. M is the number of units in the hidden layer.
Next, I define the feedforward action.
Z = T.tanh( X.dot(W1))
Y = T.nnet.softmax( Z.dot(W2) )
T.tanh is a non-linear function similar to the sigmoid, but it ranges between -1 and +1.
Next I define my cost function and my prediction function (this is used to calculate the classification error later).
cost = -(T * T.log(Y)).sum()
prediction = T.argmax(Y, axis=1)
And I define my update expressions. (notice how Theano has a function to calculate gradients!)
update_W1 = W1 – lrT.grad(cost, W1) update_W2 = W2 – lrT.grad(cost, W2)
I create a train function similar to the simple example above:
train = theano.function(
inputs=[X, T],
updates=[(W1, update_W1),(W2, update_W2)],
)
And I create a prediction function to tell me the cost and prediction of my test set so I can later calculate the error rate and classification rate.
get_prediction = theano.function(
inputs=[X, T],
outputs=[cost, prediction],
)
And similar to the last section, I do a for-loop where I just call train() again and again until convergence. (Note that the derivative at a minimum will be 0, so at that point the weight won’t change anymore). This code uses a method called “batch gradient descent”, which iterates over batches of the training set one at a time, instead of the entire training set. This is a “stochastic” method, meaning that we hope that over a large number of samples that come from the same distribution, we will converge to a value that is optimal for all of them.
for i in xrange(max_iter):
for j in xrange(n_batches):
Xbatch = Xtrain[jbatch_sz:(jbatch_sz + batch_sz),]
Tbatch = Ttrain_ind[jbatch_sz:(jbatch_sz + batch_sz),]
train(Xbatch, Tbatch)
if j % print_period == 0:
cost_val, prediction_val = get_prediction(Xtest, Ttest_ind)
Exercise
Complete the code above by adding the following:
A function to convert the labels into an indicator matrix (if you haven’t done so yet)
(Note that the examples above refer to the variables Ttrain_ind and Ttest_ind – that’s what these are)
Add bias terms at the hidden and output layers and add the update expressions for them as well.
Split your data into training and test sets to conform to the code above.
Try it on a dataset like MNIST.