We suggest starting with the
ti.Tape(), and then migrate to more advanced differentiable programming using the
kernel.grad() syntax if necessary.
For example, you have the following kernel:
x = ti.var(ti.f32, ()) y = ti.var(ti.f32, ()) @ti.kernel def compute_y(): y[None] = ti.sin(x[None])
Now if you want to get the derivative of y corresponding to x, i.e., dy/dx. You may want to implement the derivative kernel by yourself:
x = ti.var(ti.f32, ()) y = ti.var(ti.f32, ()) dy_dx = ti.var(ti.f32, ()) @ti.kernel def compute_dy_dx(): dy_dx[None] = ti.cos(x[None])
But wait, what if I changed the original
compute_y? We will have to
recalculate the derivative by hand and rewrite
compute_dy_dx again, which
is very error-prone and not convenient at all.
If this situation occurs, don’t worry! Taichi provides a handy autodiff system that can help you obtain the derivative of a kernel without any pain!
Let’s still take the
compute_y in above example for explaination.
What’s the most convienent way to obtain a kernel that computes x to dy/dx?
- Use the
needs_grad=Trueoption when declaring fields involved in the derivative chain.
with ti.Tape(y):to embrace the invocation into kernel(s) you want to compute derivative.
x.grad[None]is the dy/dx value at current x.
x = ti.var(ti.f32, (), needs_grad=True) y = ti.var(ti.f32, (), needs_grad=True) @ti.kernel def compute_y(): y[None] = ti.sin(x[None]) with ti.Tape(y): compute_y() print('dy/dx =', x.grad[None]) print('at x =', x[None])
It’s equivalant to:
x = ti.var(ti.f32, ()) y = ti.var(ti.f32, ()) dy_dx = ti.var(ti.f32, ()) @ti.kernel def compute_dy_dx(): dy_dx[None] = ti.cos(x[None]) compute_dy_dx() print('dy/dx =', dy_dx[None]) print('at x =', x[None])
For a physical simulation, sometimes it could be easy to compute the energy but hard to compute the force on each particles.
But recall that we can differentiate (negative) potential energy to get forces.
F_i = -dU / dx_i.
So once you’ve write a kernel that is able to compute the potential energy,
you may use Taichi’s autodiff system to obtain the derivative of it and
then the force on each particles.
Take examples/ad_gravity.py as an example:
import taichi as ti ti.init() N = 8 dt = 1e-5 x = ti.Vector.var(2, ti.f32, N, needs_grad=True) # position of particles v = ti.Vector.var(2, ti.f32, N) # velocity of particles U = ti.var(ti.f32, (), needs_grad=True) # potential energy @ti.kernel def compute_U(): for i, j in ti.ndrange(N, N): r = x[i] - x[j] # r.norm(1e-3) is equivalent to ti.sqrt(r.norm()**2 + 1e-3) # This is to prevent 1/0 error which can cause wrong derivative U[None] += -1 / r.norm(1e-3) # U += -1 / |r| @ti.kernel def advance(): for i in x: v[i] += dt * -x.grad[i] # dv/dt = -dU/dx for i in x: x[i] += dt * v[i] # dx/dt = v def substep(): with ti.Tape(U): # every kernel invocation within this indent scope # will also be accounted into the partial derivate of U # with corresponding input variables like x. compute_U() # will also computes dU/dx and save in x.grad advance() @ti.kernel def init(): for i in x: x[i] = [ti.random(), ti.random()] init() gui = ti.GUI('Autodiff gravity') while gui.running: for i in range(50): substep() print('U = ', U[None]) gui.circles(x.to_numpy(), radius=3) gui.show()
ti.Tape(U) must be a 0D field.
For using autodiff with multiple output variables, please see the
kernel.grad() usage below.
ti.Tape(U) will automatically set U[None]` to 0 on start up.
TODO: Documentation WIP.
Kernel Simplicity Rule¶
Unlike tools such as TensorFlow where immutable output buffers are generated, the imperative programming paradigm adopted in Taichi allows programmers to freely modify global fields.
To make automatic differentiation well-defined under this setting, we make the following assumption on Taichi programs for differentiable programming:
Global Data Access Rules:
- If a global field element is written more than once, then starting from the second write, the write must come in the form of an atomic add (“accumulation”, using
- No read accesses happen to a global field element, until its accumulation is done.
Kernel Simplicity Rule: Kernel body consists of multiple simply nested for-loops. I.e., each for-loop can either contain exactly one (nested) for-loop (and no other statements), or a group of statements without loops.
@ti.kernel def differentiable_task(): for i in x: x[i] = y[i] for i in range(10): for j in range(20): for k in range(300): ... do whatever you want, as long as there are no loops # Not allowed. The outer for loop contains two for loops for i in range(10): for j in range(20): ... for j in range(20): ...
Taichi programs that violate this rule will result in an error.
static for-loops (e.g.
for i in ti.static(range(4))) will get unrolled by the Python frontend preprocessor and therefore does not count as a level of loop.
The DiffTaichi repo contains 10 differentiable physical simulators built with Taichi differentiable programming. A few examples with neural network controllers optimized using differentiable simulators and brute-force gradient descent: