Taichi provides metaprogramming infrastructures. Metaprogramming can

  • Unify the development of dimensionality-dependent code, such as 2D/3D physical simulations
  • Improve run-time performance by from run-time costs to compile time
  • Simplify the development of Taichi standard library

Taichi kernels are lazily instantiated and a lot of computation can happen at compile-time. Every kernel in Taichi is a template kernel, even if it has no template arguments.

Template metaprogramming

def copy(x: ti.template(), y: ti.template()):
    for i in x:
        y[i] = x[i]

Dimensionality-independent programming using grouped indices

def copy(x: ti.template(), y: ti.template()):
    for I in ti.grouped(y):
        x[I] = y[I]

def array_op(x: ti.template(), y: ti.template()):
    # If tensor x is 2D
    for I in ti.grouped(x): # I is a vector of size x.dim() and data type i32
        y[I + ti.Vector([0, 1])] = I[0] + I[1]
    # is equivalent to
    for i, j in x:
        y[i, j + 1] = i + j

Tensor size reflection

Sometimes it will be useful to get the dimensionality (tensor.dim()) and shape (tensor.shape()) of tensors. These functions can be used in both Taichi kernels and python scripts.

def print_tensor_size(x: ti.template()):
  for i in ti.static(range(x.dim())):

For sparse tensors, the full domain shape will be returned.

Compile-time evaluations

Using compile-time evaluation will allow certain computations to happen when kernels are being instantiated. This saves the overhead of those computations at runtime.

  • Use ti.static for compile-time branching (for those who come from C++17, this is if constexpr.)
enable_projection = True

def static():
  if ti.static(enable_projection): # No runtime overhead
    x[0] = 1
  • Use ti.static for forced loop unrolling
def g2p(f: ti.i32):
for p in range(0, n_particles):
 base = ti.cast(x[f, p] * inv_dx - 0.5, ti.i32)
 fx = x[f, p] * inv_dx - ti.cast(base, real)
 w = [0.5 * ti.sqr(1.5 - fx), 0.75 - ti.sqr(fx - 1.0),
      0.5 * ti.sqr(fx - 0.5)]
 new_v = ti.Vector([0.0, 0.0])
 new_C = ti.Matrix([[0.0, 0.0], [0.0, 0.0]])

 # Unrolled 9 iterations for higher performance
 for i in ti.static(range(3)):
   for j in ti.static(range(3)):
     dpos = ti.cast(ti.Vector([i, j]), real) - fx
     g_v = grid_v_out[base(0) + i, base(1) + j]
     weight = w[i](0) * w[j](1)
     new_v += weight * g_v
     new_C += 4 * weight * ti.outer_product(g_v, dpos) * inv_dx

 v[f + 1, p] = new_v
 x[f + 1, p] = x[f, p] + dt * v[f + 1, p]
 C[f + 1, p] = new_C

When to use for loops with ti.static

There are several reasons why ti.static for loops should be used.

  • Loop unrolling for performance.
  • Loop over vector/matrix elements. Indices into Taichi matrices must be a compile-time constant. Indexing into taichi tensors can be run-time variables. For example, if x is a 1-D tensor of 3D vector, accessed as x[tensor_index][matrix index]. The first index can be variable, yet the second must be a constant.

For example, code for resetting this tensor of vectors should be

def reset():
  for i in x:
    for j in ti.static(range(3)):
      # The inner loop must be unrolled since j is a vector index instead
      # of a global tensor index.
      x[i][j] = 0