I recently started playing with neural nets via TensorFlow. It’s a big departure to my reinforcement learning background. But I must say, the learning is usually the other way around for most people. That is, they first learn neural nets, then reinforcement learning. That being said, many concepts overlap. One of the many things that differ is that reinforcement learning “neurons”, known as nodes, are not tied directly to a set of nodes. A concept of layer simply doesn’t exist in reinforcement learning. And one of the things that are the same is the concept of training and optimizing weight vector.
To begin my exploration in this new domain, I decided to read Fundamentals of Deep Learning. Specifically going through its introduction of TensorFlow and Handwritten Digit Recognition. To “reinforce” some neurons in my head, I manually programed TensorFlow, using the book as a reference. I first begin with the book’s 3 layers neural net for a sanity check.
3 Layer Neural Net
My first neural net for learning handwritten numbers is made up of 3 Layer. The first layer has a neuron for each pixel, 784 of them. Layer 2 is made up of 256 neurons. Finally, the 3rd neuron has 10 neurons, one for each possible digit.
The whole process took 1 hour on my 16 core IBM server for a 550K training, achieving a success rate of 97.96%. Which is pretty high. The chart below showing the error rate going down as we train the neural net.
Since 1.7% error rate is still a bit much in many application, let us see if we can improve this by adding another layer.
4 Layer Neural Net
In an attempt to reduce the error, I added another layer between layer 2 and output. So now our neural network is composed of layer 1 composed of 784 neurons, second layer composed of 256 neurons, third layer composed of 128 neurons, and finally the output layer with 10 neurons.
With this setup, I achieved exactly the same success rate of 97.99%. Slightly worse even.
The author did warn that a better solution will be introduced in the next chapter…