In recent years Deep Learning has become the most successful approach to pattern recognition for perceptual tasks. When you speak to Siri, Cortana, or Google Voice, your speech is being interpreted by a Deep Neural Network. And in the Large Scale Vision Recognition Challenge, Deep Neural Networks are outperforming humans at visual recognition tasks.
Deep Learning is fast becoming mainstream; from film recommendations to Facebook tags, from autonomous vehicles to defeating the European GO champion, Deep Learning is finding its application everywhere. Deep Learning is not new however; in fact it’s been around since the 1970’s in one form or another.
The basic idea is to train a very deep (i.e. lots of layers) neural network. Multiple studies have shown that neural networks, if appropriately configured, can reproduce any function (think universal Turing machines). However this doesn't mean that we know how to configure them. This is where Deep Learning comes in as by having lots and lots of layers a Deep Neural Network will solve a problem in lots of little steps, rather than in one or two big steps. While this may not seem such a revolutionary idea, it means that with a sufficiently large set of training data it is possible to train / configure these neural networks to solve tasks that have previously eluded us.
While all this is true, the training of Deep Neural Networks is incredibly computationally expensive, not only are the networks themselves very large but huge data sets are required to train them well. Until recently we simply didn’t have the computational power, or access to the data required for Deep Learning to showcase what it can do, this changed with the use of NVIDIA graphics cards for parallel programming and Deep Learning is now almost exclusively trained on GPUs, while the deployment of the resulting trained networks can be a relatively light load.
The most common form of Deep Learning applies to what is called a convolutional neural network, this is a special kind of neural network in which each artificial neurone is connected to a small window over the input or previous layer. For example, in a visual task, each neurone in the first convolution layer will only see a small part of the image, maybe only a few pixels. This convolution layer consists of multiple maps, each searching for a different feature, and each neurone in a map searching for that feature in a slightly different location.
This first layer will come (after some training) to identify useful low level features in the image, such as lines, edges, and gradients in different orientations. This convolution layer is then sub-sampled in what is called a pooling layer, before the whole process starts again with another convolution layer this time finding combinations of the features of the previous layer (lines, corners, curves etc).
As with most neural networks, the parameters or weights of the system start out randomly, and the network will perform poorly. During training however you can program the network what the correct classification of an image is, and over many many examples the network parameters / weights are slowly modified to give the correct classification.
Scan 3XS offers a range of reliable, high performance server solutions, fully customisable to your needs.