Advanced Optimization Methods: Artificial Neural Networks Part 1

If you’ve been following along we’ve finally got to the fun stuff. The first “advanced” topic we will explore is Artificial Neural Networks (ANN). ANN is a simplified model of a network of “neurons” like the ones that make up our brains. Each neuron takes in information and transmits to another neuron, which in turns analyzes(weighs) and modifies the input and sends it to the next. Well how does this help us optimize? Neural Networks let us predict and optimize problems at a greater speed and with greater accuracy than other methods. That is after they are trained, but more on that later. First lets look at an individual neuron then see how it interacts in a network.

Above is a diagram of a neuron (from wikipedia), the ones actually in your brain. Our artificial neurons look a very similar to the one above. Now not all the structures labeled on the above picture are important. The two we are concerned with are the dendrites and the Axon, take a quick note that both of these structures have branches. Just a brief tangent from when I use to work as a neuroscientist at the Motor Neuron Center, neurons work by transmitting signals through their axons and receiving signals from their dendrites. Neurons can receive tons of connections from other neurons through their dendrites. However, they can only have one outgoing connection through their axon. This is a very simplified explanation, but nevertheless important. Our artificial Neurons will work in pretty much the same way.

Above is a diagram of an artificial neuron (from wikipedia), the ones we will use in our method. Unlike our real neuron diagram, all the labels are important here. We will start with the axons from other neurons (inputs) to the dendrites (weights) and into the cell body (transfer function/activation function) and then leave the neuron to the next neuron through the axon (activation). So we have an idea of how the data moves through a neuron, but what does any of this exactly mean.

So imagine we have the neuron listed above. It has n axons attached to it. Each axon will transmit a number (xn)to this neuron. The neuron then weighs each of these numbers by multiplying it by its respective weight (wnj). These newly weighted numbers are then summed at our transfer function step. This new number is then passed to our activation function. This is where the power of neural networks is derived. Different activation functions can be used to achieve more accurate methods than their linearly bound counterparts. After the weighted average is passed through our chosen activation function we pass that newly created matrix of values to the next neuron and the method repeats. This leads us to the network part of neural networks.

Above is an example of a very simple neural network. The network consists of three parts or layers, input layer, hidden layer, and output layer. Each layer can be made up of as few or as many neurons as the user designs. Also it must be noted that the hidden layer is not restricted to single column of neurons. There can be 3,4 or 100 layers in the hidden layer depending on the design and the neurons required, these however, must be designed initially before any optimization takes place. Another thing to note is that all neurons point forward. This is called a feed-forward design. Having this limitation becomes important when we look at the optimization portion but that comes later. Each neuron outputs to every neuron in the next layer and receives inputs from every neuron in the layer previous.

That’s a general overview of how feed-forward artificial neural networks work. Next part we will derive some equations and make a general outline of how to create an ANN.

-Marcello