Birchlabsさんのプロフィール画像

Birchlabsさんのイラストまとめ


ML Engineer at Anlatan (@novelaiofficial). co-author of HDiT (Hourglass Diffusion Transformers). works on diffusion models and LLMs. 日本語を勉強してる。
birchlabs.co.uk

フォロー数:191 フォロワー数:4507

the same images, decoded via VAE.
white-haired Reimu not actually a mistake!
my fast decode is bad at saturated reds. learnable, but underrepresented in training set. bias weights might help — each latent channel has a different center, and bias lets us learn an offset.

0 5

fast approximate decode of latents produces nice pixel art
we could probably eliminate the dithering artifacts by learning a convolution instead of just using a Linear layer, but honestly I quite like it

5 34

"holo, holding fruit, apple"

0 6

PyTorch 1.12.1 on Mac:
- diffusers Unet working once again
- k-diffusion DPM-Solver++ (2M) sampler working probably for the first time
this older version of PyTorch has fewer training bugs, so might give us a path to running diffusers fine-tuning on Mac.

1 3

implemented structured diffusion, the supplementary paper
supposedly it improves compositionality, but honestly I'm not seeing improvements like in the paper.
made it more parallel, fixed a bug in sequence alignment
https://t.co/Yoho5PHJyC
https://t.co/vy57Newd0s

2 33

left = usual
right = masked
this is a longer prompt. 62 tokens utilised (incl. BOS,EOS). 15 padding tokens masked away
I only mask cond. I tried masking uncond (which would mask 75 padding tokens), but it made it way less watercolour. SD has trained loads on the uncond embedding.

0 0

a float16 Unet is all you need.
so long as you *sample* at float32.
50% smaller model (3.44->1.77GB)
23% faster generation (9.53->7.73s; 15 steps)
left is float16 Unet, right is float32 Unet
both sampled in float32; similar images
thanks to marunine for the idea!

5 20

other backgrounds I got that weren't as prompt-coherent

0 2

spent the day benchmarking on CoreML; got some nice watercolours along the way

0 5

for the VAE, I found that amortizing the weight change over both q and k made the eyes less wonky:
left = just Unet fused
right = VAE like so:
q_proj.weights *= √scale
k_proj.weights *= √scale
Whereas the parent tweet (wonky eyes) modified only Q:
q_proj.weights *= scale

0 2