Blog posts

2025

Kernels, and a cheeky IEEE-754 proof with somewhat practical debugging value

less than 1 minute read

Published:

Takeaways

  • Designed a kernel, which faces a bug. Understanding and debugging the issue teaches you a lot about the limitations of floating point representations
  • Delved deeper into the mathematics of IEEE-754, and common implementations of the standard such as bfloat16, fp32, fp64, and fp8
  • Explained why an unsigned 32 bit integer may be more precise in workloads such as hashing than FP32
  • Conclude with a mathematical proof that characterizes the cases in which a 32-bit integer will be more precise than floating point schemes like FP32

2024

Speeding Towards Silicon: Building a RISC-V Convolution Accelerator

less than 1 minute read

Published:

Takeaways

  • 150× speedup for RISC-V convolutions in simulation, and 65× speedup post-tapeout.
  • Designed a full System-on-Chip from scratch on Intel 16 nm technology.
  • Overcame major integration challenges on a TileLink-based NoC architecture.
  • Verified the accelerator on real silicon, developing C software to run 2D convolutions.