Making memory bound kernels go brr on AMD’s MI300X

less than 1 minute read

Published:

Takeaways

  • Analyzed a vector add kernel using roofline analysis on a MI300X
  • Gathered perf counter data that additionally validated roofline assumptions, defined bandwidth util as a good figure of merit
  • Mathematically found ridge point, and demonstrated that the vector add kernel in current form will be memory bound, regardless of N
  • Delved into the actual internals of MI300X’s memory subsystem
  • Messed around with the memory subsystem to show a quetionably tiny amount of speedup

Full version on Substack → Making Memory Bound Kernels go brr on MI300X