-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
/
automatic_differentiation.py
96 lines (60 loc) · 3.43 KB
/
automatic_differentiation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# -*- coding: utf-8 -*-
"""automatic_differentiation.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1ibfKtpxC_hIhZlPbefCoqpAS7jTdyiFw
## Automatic Differentiation
The [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) is to calculate derivative of functions which is useful for algorithms such as [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent).
It's is particularly useful when we implement neural networks and desire to calculate differentiation of the output with respect to an input that are connected with a **chain of functions**:
$L(x)=f(g(h(x)))$
The differentiation is as below:
$\frac{dL}{dx} = \frac{df}{dg}\frac{dg}{dh}\frac{dh}{dx}$
The above rule is called the [chain rule](https://en.wikipedia.org/wiki/Chain_rule).
So the [gradients](https://en.wikipedia.org/wiki/Gradient) needs to be calculated for ultimate derivative calculations.
Let's see how TensorFlow does it!
"""
# Loading necessary libraries
import tensorflow as tf
import numpy as np
"""### Introduction
Some general information are useful to be addressed here:
* To compute gradients, TensorFlow uses [tf.GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape) which records the operation for later being used for gradient computation.
Let's have three similar example:
"""
x = tf.constant([2.0])
with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:
f = x ** 2
# Print gradient output
print('The gradient df/dx where f=(x^2):\n', grad.gradient(f, x))
x = tf.constant([2.0])
x = tf.Variable(x)
with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:
f = x ** 2
# Print gradient output
print('The gradient df/dx where f=(x^2):\n', grad.gradient(f, x))
x = tf.constant([2.0])
with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:
grad.watch(x)
f = x ** 2
# Print gradient output
print('The gradient df/dx where f=(x^2):\n', grad.gradient(f, x))
"""What's the difference between above examples?
1. Using tf.Variable on top of the tensor to transform it into a [tf.Variable](https://www.tensorflow.org/guide/variable).
2. Using [.watch()](https://www.tensorflow.org/api_docs/python/tf/GradientTape#watch) operation.
The tf.Variable turn tensor to a variable tensor which is the recommended approach by TensorFlow. The .watch() method ensures the variable is being tracked by the tf.GradientTape().
**You can see if we use neither, we get NONE as the gradient which means gradients were not being tracked!**
NOTE: In general it's always safe to work with variable as well as using .watch() to ensure tracking gradients.
We used default arguments as:
1. **persistent=False**: It says, any variable that is hold with tf.GradientTape(), after one calling of gradient will be released.
2. **watch_accessed_variables=True**: By default watching variables. So if we have a variable, we do not need to use .watch() with this default setting.
Let's have an example with **persistent=True**:
"""
x = tf.constant([2.0])
x = tf.Variable(x)
# For practice, turn persistent to False to see what happens.
with tf.GradientTape(persistent=True, watch_accessed_variables=True) as grad:
f = x ** 2
h = x ** 3
# Print gradient output
print('The gradient df/dx where f=(x^2):\n', grad.gradient(f, x))
print('The gradient dh/dx where h=(x^3):\n', grad.gradient(h, x))