Speeding Towards Silicon: Building a RISC-V Convolution Accelerator

less than 1 minute read

Published: May 20, 2024

Takeaways

150× speedup for RISC-V convolutions in simulation, and 65× speedup post-tapeout.
Designed a full System-on-Chip from scratch on Intel 16 nm technology.
Overcame major integration challenges on a TileLink-based NoC architecture.
Verified the accelerator on real silicon, developing C software to run 2D convolutions.

Full version on Substack → Speeding Towards Silicon

Summary

In Berkeley’s tapeout course, our four-person team built a convolution accelerator from scratch — going from idea to silicon in just 15 weeks. We designed it to speed up 3×3 convolution operations using a custom multiply-accumulate pipeline integrated into the RISC-V Chipyard ecosystem. Working with evolving tools and minimal documentation forced us to rely on “tribal knowledge” and rapid iteration. After battling integration challenges and timing issues, our accelerator — BearlyML 24 — achieved up to 150× speedups in simulation and around 65× on real silicon. This chip ended up being demod to Apple. Seeing our design run successfully on hardware was the perfect ending to a fast, chaotic, and deeply rewarding project.

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Takeaways

Designed a kernel, which faces a bug. Understanding and debugging the issue teaches you a lot about the limitations of floating point representations

Delved deeper into the mathematics of IEEE-754, and common implementations of the standard such as bfloat16, fp32, fp64, and fp8

Explained why an unsigned 32 bit integer may be more precise in workloads such as hashing than FP32

Conclude with a mathematical proof that characterizes the cases in which a 32-bit integer will be more precise than floating point schemes like FP32

Ansh Chaurasia

Speeding Towards Silicon: Building a RISC-V Convolution Accelerator

Takeaways

Summary

Share on

You May Also Enjoy

Kernels, and a cheeky IEEE-754 proof with somewhat practical debugging value

Takeaways

GPUs in AI: Understanding the design of NVIDIA GPUs from the ground up, with AI compute cluster considerations

Takeaways