Birchlabsさんのイラストまとめ 3ページ目

Birchlabs
@Birchlabs

the same images, decoded via VAE.
white-haired Reimu not actually a mistake!
my fast decode is bad at saturated reds. learnable, but underrepresented in training set. bias weights might help — each latent channel has a different center, and bias lets us learn an offset.

0 5

2022-12-20

Birchlabs
@Birchlabs

fast approximate decode of latents produces nice pixel art
we could probably eliminate the dithering artifacts by learning a convolution instead of just using a Linear layer, but honestly I quite like it

5 34

2022-12-20

Birchlabs
@Birchlabs

"holo, holding fruit, apple"

0 6

2022-12-18

Birchlabs
@Birchlabs

PyTorch 1.12.1 on Mac:
- diffusers Unet working once again
- k-diffusion DPM-Solver++ (2M) sampler working probably for the first time
this older version of PyTorch has fewer training bugs, so might give us a path to running diffusers fine-tuning on Mac.

1 3

2022-12-16

Birchlabs
@Birchlabs

implemented structured diffusion, the #stablediffusion supplementary paper
supposedly it improves compositionality, but honestly I'm not seeing improvements like in the paper.
made it more parallel, fixed a bug in sequence alignment
https://t.co/Yoho5PHJyC
https://t.co/vy57Newd0s

2 33

2022-12-12

Birchlabs
@Birchlabs

left = usual
right = masked
this is a longer prompt. 62 tokens utilised (incl. BOS,EOS). 15 padding tokens masked away
I only mask cond. I tried masking uncond (which would mask 75 padding tokens), but it made it way less watercolour. SD has trained loads on the uncond embedding.

0 0

2022-12-11

Birchlabs
@Birchlabs

a float16 Unet is all you need.
so long as you *sample* at float32.
50% smaller model (3.44->1.77GB)
23% faster generation (9.53->7.73s; 15 steps)
left is float16 Unet, right is float32 Unet
both sampled in float32; similar images
thanks to marunine for the idea!
#stablediffusion

5 20

2022-12-06

Birchlabs
@Birchlabs

other backgrounds I got that weren't as prompt-coherent

0 2

2022-12-05

Birchlabs
@Birchlabs

spent the day benchmarking #stablediffusion on CoreML; got some nice watercolours along the way

0 5

2022-12-04

Birchlabs
@Birchlabs

@f_j_j_ for the VAE, I found that amortizing the weight change over both q and k made the eyes less wonky:
left = just Unet fused
right = VAE like so:
q_proj.weights *= √scale
k_proj.weights *= √scale
Whereas the parent tweet (wonky eyes) modified only Q:
q_proj.weights *= scale

0 2

2022-12-01