//=time() ?>
the same images, decoded via VAE.
white-haired Reimu not actually a mistake!
my fast decode is bad at saturated reds. learnable, but underrepresented in training set. bias weights might help — each latent channel has a different center, and bias lets us learn an offset.
fast approximate decode of latents produces nice pixel art
we could probably eliminate the dithering artifacts by learning a convolution instead of just using a Linear layer, but honestly I quite like it
PyTorch 1.12.1 on Mac:
- diffusers Unet working once again
- k-diffusion DPM-Solver++ (2M) sampler working probably for the first time
this older version of PyTorch has fewer training bugs, so might give us a path to running diffusers fine-tuning on Mac.
implemented structured diffusion, the #stablediffusion supplementary paper
supposedly it improves compositionality, but honestly I'm not seeing improvements like in the paper.
made it more parallel, fixed a bug in sequence alignment
https://t.co/Yoho5PHJyC
https://t.co/vy57Newd0s
left = usual
right = masked
this is a longer prompt. 62 tokens utilised (incl. BOS,EOS). 15 padding tokens masked away
I only mask cond. I tried masking uncond (which would mask 75 padding tokens), but it made it way less watercolour. SD has trained loads on the uncond embedding.
a float16 Unet is all you need.
so long as you *sample* at float32.
50% smaller model (3.44->1.77GB)
23% faster generation (9.53->7.73s; 15 steps)
left is float16 Unet, right is float32 Unet
both sampled in float32; similar images
thanks to marunine for the idea!
#stablediffusion
spent the day benchmarking #stablediffusion on CoreML; got some nice watercolours along the way
@f_j_j_ for the VAE, I found that amortizing the weight change over both q and k made the eyes less wonky:
left = just Unet fused
right = VAE like so:
q_proj.weights *= √scale
k_proj.weights *= √scale
Whereas the parent tweet (wonky eyes) modified only Q:
q_proj.weights *= scale