Project 5 - Fun with diffusion models

1.1 Implementing the UNet

For this part, I ended up training a denoising neural network. I display results in the next couple parts as requested.

1.2 Using the UNet to Train a Denoiser

Listed below is a visualization showing how a higher sigma visually affects the images

1.2.1 Using the UNet to Train a Denoiser

The denoising UNet was trained for 5 epochs, with a hidden dimension D of 128 and sigma = 0.5.

Results after epoch 1 completed

Results after epoch 5 is completed

Final Training Loss vs Step curve

1.2.2 Out-of-Distribution Testing

Results shown for sigma values 0, 0.2, 0.4, 0.5, 0.6, 0.8, and 1.0 are shown below

Sigma = 0

Sigma = 0.2

Sigma = 0.4

Sigma = 0.5

Sigma = 0.6

Sigma = 0.8

Sigma = 1.0

Part 2 - Training a diffusion model

2.1 - Adding Time Conditioning to UNet

2.2 + 2.3 - Training and Sampling the Time Conditioning UNet

Below, you will find training loss curves from a 5 epoch and 20 epoch run. You will also find examples of denoising attempts from batch data provided during train time, and some examples of sampling the model from scratch.

5 Epoch Training Results

Visualization of data on the 5th epoch

Training run loss curve

20 Epoch Training Results

Visualization of data denoised during on the 5th epoch

Visualization of data sampled from scratch during on the 5th epoch

Visualization of data denoised on the 20th epoch

Visualization of data sampled from scratch during on the 20th epoch

Training run loss curve

Adding Class Conditioning

2.4, 2.5, 2.6 - Integrating Class Conditioning to Unet

The denoising UNet was trained for 5 epochs, and 20 epochs with a hidden dimension D of 128. Notably for the sake of performance, I enabled the training to sample batches of timesteps rather than one single timestep for all.

Previously, I had faced accuracy limitations, where my mean squared loss for optimization would taper off around 0.4. One notable suggestion during office hours was to replace the last "Conv" layer with a Conv(3, 1, 1), or a convolution layer with a kernel size of 3, and padding & stride of 1.

But we're here now. In 2.4, I implemented class conditioning for my already time conditioned diffusion model. Below, you will find training and sampling results from a run consisting of 5 epochs, and then 20 epochs. Each run lists the training loss curve (training requirements), alongside visualizations of the training data being fed to the class conditioning model and the output. At epoch 5 and 20, I also sample the model (completely from scratch - passing in a random tensor), and show how the digits are being listed.

5 Epoch Training Results

Visualization of data on the 5th epoch

Visualization of data generated after the end of the 5th epoch

Training run loss curve

20 Epoch Training Results

Visualization of data denoised during on the 5th epoch

Visualization of data sampled from scratch during on the 5th epoch

Visualization of data denoised on the 20th epoch

Visualization of data sampled from scratch during on the 20th epoch

Training run loss curve