Nishant's Machine Learning Blog

What are Channels and Kernels

To understand what are channels and kernels, we must understand what we mean by features.

Features

If I had to give a layman's definition, I would say a feature is something as simple as a dogs nose or facial features which can be used to identify a certian class of images. But these are features that we as humans can visually recognise and comprehend. There exist a lot of features that are characterstics of a particular class which cannot be comprehended by humans but exist as a result of model training. Spatial relationship between pixels, specific to a certain class is what we mean by features.

Kernels

A kernel is something that is used to identify\ locate a feature. They can do so with the help of the Convolution operation and are usually matices of size 3x3. For example a 3x3 kernel used to identify lines will be

Channels/Feature Maps/Feature Bucket

Kernels or Filterss are used to identify features on the input in a certain layer. The output we get are called channels. They are the aggragation of certain features found (or not) in the image. These channels are again input to the next layer.

The key differce between channel and kernel is that if ⁍ then A and C are channels and B (the multiplier) is the kernel

Why should we nearly always use 3x3 kernels

Before the invention of backpropagation kernel values were set manually to detecct features .(Poor Interns!!). 2x2 kernels would be so samll that they would be able to detect where a feature begins but they would neet another kernel to detect where the feature ends. Yes, a 5x5 would be better at detecting spatial relationships between pixel values, but the total number of parameters would increase drastically as compares to a 3x3 kernel (25/9 = 2.7). Its better to use more 3x3 filters instead of 5x5 or larger to reach the same receptive field.

The most important reason is that 3x3 kernels are accelerated on NVIDIA GPU's

Can be used to detect a line at an angle

←——-

can be used to detect a triangle

How are kernels initialized?

The most used method to initialise kernels is to sample the parameters of a kernel from a normal distritibution.

Another popular method is Xavier Initialisation - which sets a layers weight to values chosen from a random uniform distribution that is bound between

Although in the research paper this method was only tested against CIFAR10 dataset, and it showed better results. Im not sure if this works all the time.

Good Kernel initialisation prevents gradient vanishing and boosting

How many times do we need to perform 3x3 convolutions operations to reach close to 1x1 from 199x199 (type each layer output like 199x199 > 197x197...)

We need to perform 3x3 convolutions exactly 99 times to reach from 199x199 to 1x1

3x3 —> 1 time to reach 1x1

5x5 —> 2 times to reach 1x1

7x7 —> 3 times

... and so on

Therefore we need to find what index does 199 fall in the Arithmatic Progression 3,5,7....

It is the 99th term which is also the answer

What happens during the Training of a DNN?

A lot of caluclations!!!

The whole point of training a DNN is so that the model parameters,which had a certain initialisation can update their values through gradient descent. Therefore, a lot of gradients flow backwards, updating the parameters untill the loss function converges to a minima or we run out of input images to train on.