Counting No. of Parameters in Deep Learning Models by Hand

Photo by Andrik Langfield on Unsplash

5 simple examples to count parameters in FFNN, RNN and CNN models

Counting the number of trainable parameters of deep learning models is considered too trivial, because your code can already do this for you. But I’d like to keep my notes here for us to refer to once in a while. Here are the models that we’ll run through:

  1. Feed-Forward Neural Network (FFNN)
  2. Recurrent Neural Network (RNN)
  3. Convolutional Neural Network (CNN)

In parallel, I will build the model with APIs from Keras for easy prototyping and a clean code so let’s quickly import the relevant objects here:

from keras.layers import Input, Dense, SimpleRNN, LSTM, GRU, Conv2D
from keras.layers import Bidirectional
from keras.models import Model

After building the model, call model.count_params() to verify how many parameters are trainable.

1. FFNNs

  • i, input size
  • h, size of hidden layer
  • o, output size

For one hidden layer,

num_params
=
connections between layers + biases in every layer
= (i×h + h×o)(h+o)

Example 1.1: Input size 3, hidden layer size 5, output size 2

Fig. 1.1: FFNN with input size 3, hidden layer size 5, output size 2. The graphics reflect the no. of units.
  • i = 3
  • h = 5
  • o = 2

num_params
= connections between layers + biases in every layer
= (3×5 + 5×2) + (5+2)
= 32

 input = Input((None, 3))
dense = Dense(5)(input)
output = Dense(2)(dense)
model = Model(input, output)

Example 1.2: Input size 50, hidden layers size [100,1,100], output size 50

Fig. 1.2: FFNN with 3 hidden layers. The graphics do not reflect the no. of units.
  • i = 50
  • h = 100, 1, 100
  • o = 50

num_params
= connections between layers + biases in every layer
= (50×100 + 100×1 + 1×100 + 100×50) + (100+1+100+50)
= 10,451

 input = Input((None, 50))
dense = Dense(100)(input)
dense = Dense(1)(dense)
dense = Dense(100)(dense)
output = Dense(50)(dense)
model = Model(input, output)

2. RNNs

  • g, no. of gates (RNN has 1 gate, GRU has 3, LSTM has 4)
  • h, size of hidden units
  • i, dimension/size of input

The no. of weights in each gate is actually an FFNN with input size (h+i) and output size h. So each gate has h(h+i) + h parameters.

num_params = g × [h(h+i) + h]

Example 2.1: LSTM with 2 hidden units and input dimension 3.

Fig. 2.1: An LSTM cell. Taken from here.
  • g = 4 (LSTM has 4 gates)
  • h = 2
  • i = 3

num_params
= g × [h(h+i) + h]
= 4 × [2(2+3) + 2]
48

input = Input((None, 3))
lstm = LSTM(2)(input)
model = Model(input, lstm)

Example 2.2: Stacked Bidirectional GRU with 5 hidden units and input size 8 (whose outputs are concatenated) + LSTM with 50 hidden units

Fig. 2.2: A stacked RNN consisting of BiGRU and LSTM layers. The graphics do not reflect the no. of units.

Bidirectional GRU with 5 hidden units and input size 10

  • g = 3 (GRU has 3 gates)
  • h = 5
  • i = 8

num_params_layer1
=
2 × g × [h(h+i) + h] (first term is 2 because of bidirectionality)
= 2 × 3 × [5(5+8) + 5]
= 420

LSTM with 50 hidden units

  • g = 4 (LSTM has 4 gates)
  • h = 50
  • i = 5+5 (outputs from bidirectional GRU concatenated; output size of GRU is 5, same as no. of hidden units)

num_params_layer2
=
g × [h(h+i) + h]
= 4 × [50(50+10) + 50]
= 12,200

total_params = 420 + 12,200 = 12,620

 input = Input((None, 8))
layer1 = Bidirectional(GRU(5, return_sequences=True))(input)
layer2 = LSTM(50)(layer1)
model = Model(input, layer2)

merge_mode is concatenation by default.

CNNs

For one layer,

  • i, no. of input maps (or channels)
  • f, filter size (just the length)
  • o, no. of output maps (or channels. this is also defined by how many filters are used)

One filter is applied to every input map.

num_params
= weights + biases
=
[i × (f×f) × o] + o

Example 3.1: Greyscale image with 1×1 filter, output 3 channels

Fig. 3.1: Convolution of a greyscale image with 2×2 filter to output 3 channels
  • i = 1 (greyscale has only 1 channel)
  • f = 2
  • o = 3

num_params
=
[i × (f×f) × o] + o
= [1 × (2×2) × 3] + 3
= 15

 input = Input((None, None, 1))
conv2d = Conv2D(kernel_size=2, filters=3)(input)
model = Model(input, conv2d)

Example 3.2: RGB image with 2×2 filter, output of 1 channel

There is 1 filter for each input feature map. The resulting convolutions are added element-wise, and a bias term is added to each element. This gives an output with 1 feature map.

Fig. 3.2: Convolution of an RGB image with 2×2 filter to output 1 channel
  • i = 3 (RGB image has 3 channels)
  • f = 2
  • o = 1

num_params
=
[i × (f×f) × o] + o
= [3 × (2×2) × 1] + 1
= 13

 input = Input((None, None, 3))
conv2d = Conv2D(kernel_size=2, filters=1)(input)
model = Model(input, conv2d)

Example 3.3: Image with 2 channels, with 2×2 filter, and output of 3 channels

There are 3 filters (purple, yellow, cyan) for each input feature map. The resulting convolutions are added element-wise, and a bias term is added to each element. This gives an output with 3 feature maps.

Fig. 3.1: Convolution of a 2-channel image with 2×2 filter to output 3 channels
  • i = 2
  • f = 2
  • o = 3

num_params
=
[i × (f×f) × o] + o
= [2 × (2×2) × 3] + 3
= 27

 input = Input((None, None, 2))
conv2d = Conv2D(kernel_size=2, filters=3)(input)
model = Model(input, conv2d)

That’s all for now! Do leave comments below if you have any feedback!

Related Articles on Deep Learning

Animated RNN, LSTM and GRU

Step-by-Step Tutorial on Linear Regression with Stochastic Gradient Descent

10 Gradient Descent Optimisation Algorithms + Cheat Sheet

Attn: Illustrated Attention

Follow me on Twitter or LinkedIn for digested articles and demos on AI and Deep Learning. You may also reach out to me via raimi.bkarim@gmail.com.


Counting No. of Parameters in Deep Learning Models by Hand was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Reply

Your email address will not be published. Required fields are marked *