Derivatives of Activation functions

Here is my attempt at finding the derivatives of common activation functions of neural network namely sigmoid, tanh, ReLU and leaky ReLU functions.

1. Sigmoid function

Differentiating both sides,

Rearranging the terms,


2. tanh function

Differentiating both sides,

Using quotient rule,


3. ReLU function

Note: In software, we can use f’(x) = 1 for x = 0 (Prof. Andrew NG in Deep Learning Coursera Course)


4. Leaky ReLU function

Note: In software, we can use f’(x) = 1 or 0.01 for x = 0 (Prof. Andrew NG in Deep Learning Coursera Course)