Taichi provides metaprogramming infrastructures. Metaprogramming can
- Unify the development of dimensionality-dependent code, such as 2D/3D physical simulations
- Improve run-time performance by from run-time costs to compile time
- Simplify the development of Taichi standard library
Taichi kernels are lazily instantiated and a lot of computation can happen at compile-time. Every kernel in Taichi is a template kernel, even if it has no template arguments.
@ti.kernel def copy(x: ti.template(), y: ti.template()): for i in x: y[i] = x[i]
Dimensionality-independent programming using grouped indices¶
@ti.kernel def copy(x: ti.template(), y: ti.template()): for I in ti.grouped(y): x[I] = y[I] @ti.kernel def array_op(x: ti.template(), y: ti.template()): # If tensor x is 2D for I in ti.grouped(x): # I is a vector of size x.dim() and data type i32 y[I + ti.Vector([0, 1])] = I + I # is equivalent to for i, j in x: y[i, j + 1] = i + j
Tensor size reflection¶
Sometimes it will be useful to get the dimensionality (
tensor.dim()) and shape (
tensor.shape()) of tensors.
These functions can be used in both Taichi kernels and python scripts.
@ti.func def print_tensor_size(x: ti.template()): print(x.dim()) for i in ti.static(range(x.dim())): print(x.shape()[i])
For sparse tensors, the full domain shape will be returned.
Using compile-time evaluation will allow certain computations to happen when kernels are being instantiated. This saves the overhead of those computations at runtime.
ti.staticfor compile-time branching (for those who come from C++17, this is if constexpr.)
enable_projection = True @ti.kernel def static(): if ti.static(enable_projection): # No runtime overhead x = 1
ti.staticfor forced loop unrolling
@ti.kernel def g2p(f: ti.i32): for p in range(0, n_particles): base = ti.cast(x[f, p] * inv_dx - 0.5, ti.i32) fx = x[f, p] * inv_dx - ti.cast(base, real) w = [0.5 * ti.sqr(1.5 - fx), 0.75 - ti.sqr(fx - 1.0), 0.5 * ti.sqr(fx - 0.5)] new_v = ti.Vector([0.0, 0.0]) new_C = ti.Matrix([[0.0, 0.0], [0.0, 0.0]]) # Unrolled 9 iterations for higher performance for i in ti.static(range(3)): for j in ti.static(range(3)): dpos = ti.cast(ti.Vector([i, j]), real) - fx g_v = grid_v_out[base(0) + i, base(1) + j] weight = w[i](0) * w[j](1) new_v += weight * g_v new_C += 4 * weight * ti.outer_product(g_v, dpos) * inv_dx v[f + 1, p] = new_v x[f + 1, p] = x[f, p] + dt * v[f + 1, p] C[f + 1, p] = new_C
When to use for loops with
There are several reasons why
ti.static for loops should be used.
- Loop unrolling for performance.
- Loop over vector/matrix elements. Indices into Taichi matrices must be a compile-time constant. Indexing into taichi tensors can be run-time variables. For example, if
xis a 1-D tensor of 3D vector, accessed as
x[tensor_index][matrix index]. The first index can be variable, yet the second must be a constant.
For example, code for resetting this tensor of vectors should be
@ti.kernel def reset(): for i in x: for j in ti.static(range(3)): # The inner loop must be unrolled since j is a vector index instead # of a global tensor index. x[i][j] = 0