Blog posts

2026

less than 1 minute read

Published: January 03, 2026

Trained a RL policy model from scratch, with the weights open sourced and released on huggingface
Beyond surface level discussion of the training dynamics, with specific analysis of loss curves and rewards in W&B runs
Discussion of training failures, and how using a target network can be used to prevent training instability

less than 1 minute read

Published: November 26, 2025

Designed a kernel, which faces a bug. Understanding and debugging the issue teaches you a lot about the limitations of floating point representations
Delved deeper into the mathematics of IEEE-754, and common implementations of the standard such as bfloat16, fp32, fp64, and fp8
Explained why an unsigned 32 bit integer may be more precise in workloads such as hashing than FP32
Conclude with a mathematical proof that characterizes the cases in which a 32-bit integer will be more precise than floating point schemes like FP32

1 minute read

Published: November 20, 2025

Understanding the fundamentals of GPU design for NVIDIA products in Aleksa Gordic style blog
Designed a kernel, understand subtle limitations, with proof by examining compiled SASS code
Analyze a toy GEMM workload on my 4060 GPU, and determine critical conditions for memory bound vs compute bound

less than 1 minute read

Published: May 20, 2024

150× speedup for RISC-V convolutions in simulation, and 65× speedup post-tapeout.
Designed a full System-on-Chip from scratch on Intel 16 nm technology.
Overcame major integration challenges on a TileLink-based NoC architecture.
Verified the accelerator on real silicon, developing C software to run 2D convolutions.