Since my first attempts at the Titanic problem using Python, I have now began learning about TensorFlow, a machine learning framework which should hopefully make the building of neural networks easier.
I decided to stick with the same data and rewrite my neural network classes using the TensorFlow framework. TensorFlow has many of the functions required built-in, I have gone for a halfway-house here, and not used the Keras API which effectively gives you the Neural Network model. I’ll have a look at that later but, for now, I still want to learn from the ground up.
Within TensorFlow you initialise your model by building a computation graph that will get run when you train your model (and when you make predictions based on your model).
There are built-in optimiser functions, which use the graph you have built to train the variables that you have indicated. These optimiser functions can calculate the derivatives of the graph that you have built, thus you don’t have to worry about the backwards propagation step. This greatly simplifies the code that you have to write.
The layer class is greatly simplified from my baseline Python implementation.
import tensorflow as tf from enum import Enum class ActivationType(Enum): RELU = 1, SIGMOID = 2 class TFNeuralNetworkLayer(object): """description of class""" def __init__(self, previous_layer_size, this_layer_size, layer_index): # Create the TensorFlow variables self.W = tf.get_variable("W" + str(layer_index), [this_layer_size, previous_layer_size], initializer = tf.contrib.layers.xavier_initializer(dtype=tf.float64), dtype=tf.float64) self.b = tf.get_variable("b" + str(layer_index), [this_layer_size, 1], initializer = tf.zeros_initializer(), dtype=tf.float64) self.activation_type = ActivationType.RELU def forward_propogation_tensor(self, prev_A): self.Z = tf.add(tf.matmul(self.W, prev_A), self.b) if (self.activation_type == ActivationType.RELU): self.A = tf.nn.relu(self.Z) else: self.A = tf.nn.sigmoid(self.Z) return self.Z, self.A
Within the init function the W and b tensors are initialised as variables that can be trained in the model. Note, that no computation is taking place here, the code is simply instructing TensorFlow to add the variables to the graph with the built-in initialisers. These initialisers will be run when a tensor using the variables is run.
The only other function defines the forward propagation tensor. Again, the computation graph is being built, which will take the W and b variables and calculate Z and A. But here, the tensors for Z and A are being returned, these operations will be performed when they are run.
There is no need for me to implement the “calculate_backward_gradients” or “update_parameters” functions, as TensorFlow will take care of that for us.
Within the initialisation of this class I first initialise the layers I want in my model, this has been passed in as a parameter as before.
# Remember that layer_dimensions[] is one bigger than layers[]/_number_layers self._layers = [] for l in range(self._number_layers): layer = TFNeuralNetworkLayer(self._layer_dimensions[l], self._layer_dimensions[l+1], l+1) # (0,1), (1,2), (2,3)...(number_layers-1,number_layers) self._layers.append(layer) self._layers[self._number_layers - 1].activation_type = ActivationType.SIGMOID
I can then build up the computation graph that will be run when the model is trained (and when predictions are made). X and Y are placeholders, which will be provided to the model when it is run.
# Create the forward propogation tensor graph A = self.X_normed_tensor for layer in self._layers: prev_A = A Z, A = layer.forward_propogation_tensor(prev_A) self.ZL_tensor = Z self.AL_tensor = A
The final tensors use some of the built-in functionality to calculate the cost associated with a set of parameters and an optimiser that uses gradient descent to minimise that cost.
ZL_tensor_T = tf.transpose(self.ZL_tensor) Y_placeholder_T = tf.transpose(self.Y_placeholder) self.cost_tensor = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = ZL_tensor_T, labels = Y_placeholder_T)) self.optimiser_tensor = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(self.cost_tensor)
To run the TensorFlow graph a session object is used, which needs to be available for the lifetime of the neural network object. In order to ensure that this is closed when the neural network is finished with, I made use of the __enter__ and __exit__ functions within the class. This allows the use of the “with” keyword on the neural network class and ensures that all of the resources are released when the object is finished with.
def __enter__(self): self.session = tf.Session() return self def __exit__(self, type, value, traceback): self.session.close()
I wanted to normalise my input to the neural network before training it. To correctly predict using the model the same mean and variance needs to be applied to the prediction features as those that were used to normalise the training data.
As the same tensor is used for both training and prediction it is necessary to pass the mean and variance in as placeholders. The actual values will be calculated at the beginning of the training and then stored for future use in training and predictions. So, I needed to create the tensors for calculating the mean and variance, create the placeholders, and then create a tensor for the normalisation of the input features. All of this is done in the neural network init method.
X_tensor = tf.constant(X) epsilon_tensor = tf.constant(epsilon, tf.float64) mean_tensor, variance_tensor = tf.nn.moments(X_tensor, 1, keep_dims=True) self.mean_tensor = tf.cast(mean_tensor, tf.float64) self.variance_tensor = tf.cast(variance_tensor, tf.float64) self.mean_placeholder = tf.placeholder(tf.float64, shape=(X.shape[0], 1), name = "mean") self.variance_placeholder = tf.placeholder(tf.float64, shape=(X.shape[0], 1), name = "variance") self.X_normed_tensor = tf.div(tf.subtract(self.X_placeholder, self.mean_placeholder), tf.sqrt(tf.add(self.variance_placeholder, epsilon_tensor)))
All of the tensors I have built can now be used to train the model and predict unknown input.
def train_model(self, num_iterations): init = tf.global_variables_initializer() self.session.run(init) self.mean = self.session.run(self.mean_tensor) self.variance = self.session.run(self.variance_tensor) for x in range(num_iterations): self.update_model() def update_model(self): self.session.run(self.optimiser_tensor, feed_dict={self.X_placeholder: self.X, self.Y_placeholder: self.Y, self.mean_placeholder: self.mean, self.variance_placeholder: self.variance}) cost = self.session.run(self.cost_tensor, feed_dict={self.X_placeholder: self.X, self.Y_placeholder: self.Y, self.mean_placeholder: self.mean, self.variance_placeholder: self.variance}) print("Cost: " + str(cost)) def predict(self, X_test): with self.session.as_default(): predictions = self.AL_tensor.eval({self.X_placeholder: X_test, self.mean_placeholder: self.mean, self.variance_placeholder: self.variance}) return predictions
Training and prediction can be run as follows.
learning_rate = 0.1 layer_dimensions = [X.shape[0], 7, 1] with TFNeuralNetwork(X, Y, layer_dimensions, learning_rate) as model: model.train_model(10000) predictions = model.predict(unknown_X) print(predictions)
Here are some of the results I got with the new TensorFlow implementation. First, with no hidden layers.
Training set accuracy: 81.92%
Dev set accuracy: 76.4%
Adding a hidden layer of 7 units gave:
Training set accuracy: 91.86%
Dev set accuracy: 76.40%
Increasing the number of hidden units or adding extra hidden layers increased the training set accuracy even further, but to the expense of the dev set accuracy.
Training set accuracy: 96.07%
Dev set accuracy: 70.79%
It was possible to get the training set accuracy even higher, but obviously the big flaw with the model is now that it is over-fitting the training data. To try and resolve this I decided to try out regularisation.
Again, I used some built-in tensors for calculating the regularisation expression.
# L2 Regularisation - increase cost if individual parameters (W) get too large m = self.Y.shape[1] regularisation_tensor = tf.constant(0, dtype=tf.float64) for layer in self._layers: regularisation_tensor = tf.add(regularisation_tensor, tf.nn.l2_loss(layer.W)) self.cost_tensor = tf.add(self.cost_tensor, tf.div(tf.multiply(regularisation_tensor, l2_lambda), m))
After a lot of experimentation, I found that this didn’t really make a great deal of difference to my results, with the best being.
Training set accuracy: 88.63%
Dev set accuracy: 82.58%
If I increased the regularisation effect then the dev set accuracy started reducing and that, in the end, is what I want to be as high as possible.
I think I’ve got as far as I’m going to with this data. 82.58% is a correct prediction for 147 out of 178 in my dev set. I think that with so little data I’m not really going to improve on that.
Next, I want to start looking at some of the higher-level functionality of TensorFlow and to start looking at some more complex data.