Birchlabsさんのイラストまとめ

Birchlabs
@Birchlabs

@AiEleuther it changed this one a bit more drastically, but hey we went down to one person

0 1

2023-04-04

Birchlabs
@Birchlabs

generating out-of-distribution (bigger) image sizes by fiddling with self-attn softmax.
(incorrectly) implemented advice from Kharr at @AiEleuther… it worked anyway.
left=usual
right=smaller softmax denominator; topk(preferred_key_tokens) attn scores per query
#stablediffusion

1 18

2023-04-04

Birchlabs
@Birchlabs

can we use this to make images *larger* than those in the training distribution? fix "double body parts"?
trying to evaluate. need to implement mem-efficient attn version first (or use Mac); fp32 quantile() uses lots of VRAM.
median (50%ile) fixes body shape. encouraging.

0 7

2023-04-03

Birchlabs
@Birchlabs

dynamic-thresholding latents in pixel space.
at sigmas≥1.1: we decode to pixel space, do Imagen-style thresholding, encode to latents.
trained a tiny latent decoder + RGB encoder on VAE outputs (you could call this distillation).
left = CFG 30, usual
right = dynthreshed

5 31

2023-03-29

Birchlabs
@Birchlabs

@Ethan_smith_20 @Ameen_Roayan if I abort CFG too early (here's a more aggressive cutoff at 1.4), then medium/fine details are solved without CFG, look more like "most likely unconditional prediction". lost eyelashes and blush.
left = full CFG
right = CFG until sigma=1.4

0 2

2023-03-27

Birchlabs
@Birchlabs

brings bokeh backgrounds into focus

0 1

2023-03-06

Birchlabs
@Birchlabs

latent channels' means drift from 0 with each sampling step (especially when CFG is applied).
I re-centered denoising outputs on each latent channel's mean. it didn't help with CFG clipping, but I think it brings out high-frequency/low-sigma details (grass blades, leaves).

4 31

2023-03-06

Birchlabs
@Birchlabs

here's how my latent thresholding technique preserves dynamic range at high CFG scales
we mimic the dynamic range of known-good CFG7.5.
center latent channels on means, clamp out 99.9%ile outliers, multiply by ratio "CFG7.5's max / my 99.9%ile", un-center.
#stablediffusion

8 101

2023-03-05

Birchlabs
@Birchlabs

had an art night
I brought VSCode, an ssh tunnel and a wireless mouse
#stablediffusion

1 9

2023-03-04

Birchlabs
@Birchlabs

multi-cond guidance, without cubic easing.
since it's a linear interpolation schedule: we have more frames at the midpoints between conditions, where the mixing is least coherent.
still: it's a visual interpolation, so the motion's easy to track.

0 1

2023-02-27