Project 5 - Fun with diffusion models

Note, this page only contains part A. To see part b, go here .

Part 0

With a seed value: 189 , here is a list of the prompts we were tasked to try:

  1. an oil painting of a snowy mountain village
  2. a man wearing a hat
  3. a rocket ship
num_inference = 20 Results

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship

num_inference = 40 Results

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship

For each of the three prompts (listed above), the model and upsampled output is shown in the table, alongside the num_inference values. One thing that pops out immedietly about the model selected in question is that there is clear diversity in the training data distribution. We can see that the man is black, and is very characterized in his formal suit and shirt underneath. The model is also notably good at creating human faces without common misgivings related to humans such as - disfiguring or lack of detail in the eyes (though squinting further down, the eyes are a bit off), a clearly gray professional background, and ears that look humanlike.

In the case of the other two images, they embody a broad varienty of styles, which again reflects the data diversity DeepFloyd had access to.

We can compare these with the images generated with parameter num_inference = 40 to see more fine grained details (which would make sense with additional denoising steps). In the case of the man with the hat prompt, we see more complex backgrounds, such as a glimmer of a shadow, and better eye placement. In the other two, we see less blurred lines (see tree), and more of the stylistic theme included in the prompt such as oil painting. Notice that in part 3, the lack of a stylistic prompt enables the diffusion model to choose the style it pleases.

Part 1.1 - Implementing the Forward Process

Forward Pass of Diffusion Model, noise at various timesteps

t = 0

t = 250

t = 500

t = 750

t = 999

The key things to notice here was the organization of timesteps. t = 0 contains the least noise, while t = T - 1 = 999 contains the most, and also is the last step in the forward process (the first step in the backward, or denoising process). The progression of noise is dictated an alpha_cumprods variable, listing the products of the values from the series of timesteps.

Part 1.2 - Classical Denoising

Gaussian Denoising

t = 0

t = 250

t = 500

t = 750

t = 999

Gaussian denoising isn't very effective. Look at t = 250 for an example; even when the amount of noise is comparatively minimal, the low pass filter, aka "classical denoising" isn't able to precisely erase the noise. We see this problem mounting for more noise infested images. By the time we get to t = 999, we aren't very surprised at the minimal impact of classical denoising.

1.3 - One Step Denoising

Original Image

Noisy Images

t = 250

t = 500

t = 750

t = 999

One Step Estimate

t = 250

t = 500

t = 750

t = 999

The one-step method does work significantly better than classical, but needs to take baby steps. If we feed it too much noise, like in t = 999, one-step denoising will not be able to erase all of the noise injected.

1.4 - Iterative Denoising

Previous, one step denoising

t = 500

Previous, classical denoising

t = 500

Iterative Denoising

t = 690

t = 540

t = 390

t = 240

t = 90

Final

Another seperate final

We split the denoising steps into more "strided" timesteps for efficiency, with a step size of -30. Every 5th loop, we visualize the image (which is what you are seeing, albeit moreso related to multiples of 30). For the specific run here, we set i_start = 10. We have already provided the image with information through a prior test image, so not much of a need to force the model to delve down the same rabbit hole. In comparison to the classical denoising, we see that the denoising process performs significantly better, and is able to "hallucinate" new data. In comparison to the one step estimate, the improvements are not so overt from the iterative denoising; however the additional control offered has the opportunity to provide accessibility for low compute budgets.

1.5 - Sampling Images

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Quality isn't the best for the images. They go all over place, and aren't as remarkably well designed as an iterative denoising with classifier free guidance support.

1.6 - Classifier Free Guidance

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

We share some output, where we can control the alignment of the diffusion model by adjusting the scale value to skew the model towards or farther from the desired concept. In our case, we set CFG = 7, and generated 5 images with the conditional prompt being "a high quality photo".

1.7 - Image to Image Translation

Sample test image

Sample 1 - i_start = 1

Sample 2 - i_start = 3

Sample 3 - i_start = 5

Sample 4 - i_start = 7

Sample 5 - i_start = 10

Sample 6 - i_start = 20

Actual sample

Custom image 1 - Mega Charizard X

Actual Sample

Custom image 2 - Appa from Avatar

Actual Sample

1.7.1 - Editing Hand Drawn and Web Images

Custom image 2 - Web Drawn Test Image

Actual Sample

Colab Interface Drawn Test Image

Actual Sample

Illustrator Drawn Test Image

Actual Sample

1.7.2 - Inpainting

Avocado

Mask

Data To replace

Original Sample

Replaced Sample

Professor Efros Headshot

Mask

Data To replace

Original Sample

Replaced Sample

1.7.3 Text Conditioned Image to Image Translation

Sample Provided Test Image

Actual

Illustrator Image

Actual

Hand Drawn Image

Actual

1.8 Visual Anagrams

Sample Requested

an oil painting of people around a campfire

an oil painting of an old man

Custom Anagram 1

an oil painting of a snowy mountain village

an oil painting of the amalfi "cost"

Custom Anagram 2

an oil painting of a snowy mountain village

an oil painting of the amalfi "cost"

Custom Anagram 3

a pencil

a rockete ship

1.9 Visual Anagrams

waterfall close up, skull far away

a photograph of a dog, a lithograph of a waterfall

To see the dog, check this low frequency image

a rocket, a lithograph of a waterfall

To see the rocket, check this low frequency image

This is the end of part A, thanks for checking this out!