Exhibit Hall | Forum 4
Purpose: The efficiency of GPU-based Monte Carlo particle transport (GPU-MCPT) simulations for dose calculations is limited by thread divergence (TD). We discuss the main TD sources in GPU-MCPT and propose mitigation strategies.
Methods: The four dominant TD sources in a GPU-MCPT are: (A) if… else statements, (B) simulation of discrete interactions, (C) propagation of secondaries, (D) fluctuation in number of steps travelled. A bare-bones proton GPU-MCPT was built and modified to implement the TD sources. Their impacts , individually and combined, were studied as a function of branching fraction (BF), arithmetic complexity and density, particle energy, interaction complexity and GPU memory type. To implement source (A): branching statements were inserted within the transport loop; (B): modified nuclear collisions were used; (C): threads were randomly selected to propagate secondary protons; (D) besides fluctuations resulting from scattering and straggling, wide primary energy distributions were adopted. Our general TD mitigation philosophy consisted of splitting the GPU-MCPT kernel into dedicated sub-kernels to process threads that follow identical branching paths.
Results: When a single TD source is present, the GPU calculation time rises and saturates rapidly with increasing BF. With multiple sources, the calculation time dependence on BF values is more complex. For sources B and C, our control strategies can result in a significant speed up. For source D, energy sorting prior to stepping can reduce the computation time in cases with wide particle energy spreads.
Conclusion: We have identified TD sources generally present in GPU-MCPT simulations, and devised methods to systematically study and compensate for them. This work paves the way to more optimal GPU-MCPT deployments.