#### 5 simple examples to count parameters in FFNN, RNN and CNN models

Counting the number of *trainable *parameters of deep learning models is considered too trivial, because your code can already do this for you. But I’d like to keep my notes here for us to refer to once in a while. Here are the models that we’ll run through:

- Feed-Forward Neural Network (FFNN)
- Recurrent Neural Network (RNN)
- Convolutional Neural Network (CNN)

In parallel, I will build the model with APIs from Keras for easy prototyping and a clean code so let’s quickly import the relevant objects here:

fromkeras.layersimportInput, Dense, SimpleRNN, LSTM, GRU, Conv2Dfromkeras.layersimportBidirectionalfromkeras.modelsimportModel

After building the model, call model.count_params() to verify how many parameters are trainable.

#### 1. FFNNs

, input size*i*, size of hidden layer*h*, output size*o*

For one hidden layer,

*num_params = *connections between layers + biases in every layer

=

**(**

*i×h + h×o*)*+*(*h+o*)**Example 1.1: Input size 3, hidden layer size 5, output size 2**

= 3*i*= 5*h*= 2*o*

*num_params* *= *connections between layers + biases in every layer

= **(3×5 + 5×2) + (5+2) **=

**32**

input =Input((None,3))

dense =Dense(5)(input)

output =Dense(2)(dense)

model =Model(input, output)

**Example 1.2: Input size 50, hidden layers size [100,1,100], output size 50**

= 50*i*= 100, 1, 100*h*= 50*o*

*num_params*

= connections between layers + biases in every layer

= **(50×100 + 100×1 + 1×100 + 100×50) + (100+1+100+50) **=

**10,451**

input =Input((None,50))

dense =Dense(100)(input)

dense =Dense(1)(dense)

dense =Dense(100)(dense)

output =Dense(50)(dense)

model =Model(input, output)

#### 2. RNNs

, no. of gates (RNN has 1 gate, GRU has 3, LSTM has 4)*g*, size of hidden units*h*, dimension/size of input*i*

The no. of weights in each gate is actually an FFNN with input size (h+i) and output size h. So each gate has ** h(h+i) + h **parameters

*.**num_params = **g* × [*h*(*h*+*i*) + *h*]

**Example 2.1: LSTM with 2 hidden units and input dimension 3.**

= 4 (LSTM has 4 gates)*g*= 2*h*= 3*i*

*num_params*

= *g* × [*h*(*h*+*i*) + *h*]

= **4** **× [2(2+3) + 2] **=

**48**

input =Input((None,3))

lstm =LSTM(2)(input)

model =Model(input, lstm)

**Example 2.2: Stacked Bidirectional GRU with 5 hidden units and input size 8 (whose outputs are concatenated) + LSTM with 50 hidden units**

Bidirectional GRU with 5 hidden units and input size 10

= 3 (GRU has 3 gates)*g*= 5*h*= 8*i*

*num_params_layer1= *

**2**

**×**(first term is 2 because of bidirectionality)

*g*× [*h*(*h*+*i*) +*h*]**=**

**2**

**× 3 × [5(5+8) + 5]**

=

**420**

LSTM with 50 hidden units

= 4 (LSTM has 4 gates)*g*= 50*h*= 5+5 (outputs from bidirectional GRU concatenated; output size of GRU is 5, same as no. of hidden units)*i*

*num_params_layer2= *

**=**

*g*× [*h*(*h*+*i*) +*h*]**4 × [50(50+10) + 50]**

=

**12,200**

*total_params* = 420 + 12,200 = 12,620

input =Input((None,8))

layer1 =Bidirectional(GRU(5, return_sequences=True))(input)

layer2 =LSTM(50)(layer1)

model =Model(input, layer2)

merge_mode is concatenation by default.

#### CNNs

For one layer,

, no. of input maps (or channels)*i*, filter size (just the length)*f*, no. of output maps (or channels. this is also defined by how many filters are used)*o*

One filter is applied to every input map.

*num_params= weights + biases= *

**[**

*i ×*(*f×f*)*× o*]*+ o***Example 3.1: Greyscale image with 1 ×1 filter, output 3 channels**

= 1 (greyscale has only 1 channel)*i*= 2*f*= 3*o*

*num_params= *

**[**

*i ×*(*f×f*)*× o*]*+ o*

*=*

**[1**

=

*×*(2*×*2)*×*3]*+*3**15**

input =Input((None,None,1))

conv2d =Conv2D(kernel_size=2, filters=3)(input)

model =Model(input, conv2d)

**Example 3.2: RGB image with 2×2 filter, output of 1 channel**

There is 1 filter for each input feature map. The resulting convolutions are added element-wise, and a bias term is added to each element. This gives an output with 1 feature map.

= 3 (RGB image has 3 channels)*i*= 2*f*= 1*o*

*num_params = *

**[**

*i ×*(*f×f*)*× o*]*+ o*

*=*

**[3**

**× (2×2) × 1] + 1**

=

**13**

input =Input((None,None,3))

conv2d =Conv2D(kernel_size=2, filters=1)(input)

model =Model(input, conv2d)

**Example 3.3: Image with 2 channels, with 2×2 filter, and output of 3 channels**

There are 3 filters (purple, yellow, cyan) for each input feature map. The resulting convolutions are added element-wise, and a bias term is added to each element. This gives an output with 3 feature maps.

= 2*i*= 2*f*= 3*o*

*num_params = *

**[**

*i ×*(*f×f*)*× o*]*+ o*

*=*

**[2**

**× (2×2) × 3] + 3**

=

**27**

input =Input((None,None,2))

conv2d =Conv2D(kernel_size=2, filters=3)(input)

model =Model(input, conv2d)

That’s all for now! Do leave comments below if you have any feedback!

#### Related Articles on Deep Learning

Step-by-Step Tutorial on Linear Regression with Stochastic Gradient Descent

10 Gradient Descent Optimisation Algorithms + Cheat Sheet

*Follow me on **Twitter** or **LinkedIn** for digested articles and demos on AI and Deep Learning. You may also reach out to me via raimi.bkarim@gmail.com.*

Counting No. of Parameters in Deep Learning Models by Hand was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.