//=time() ?>
@AiEleuther it changed this one a bit more drastically, but hey we went down to one person
generating out-of-distribution (bigger) image sizes by fiddling with self-attn softmax.
(incorrectly) implemented advice from Kharr at @AiEleuther… it worked anyway.
left=usual
right=smaller softmax denominator; topk(preferred_key_tokens) attn scores per query
#stablediffusion
can we use this to make images *larger* than those in the training distribution? fix "double body parts"?
trying to evaluate. need to implement mem-efficient attn version first (or use Mac); fp32 quantile() uses lots of VRAM.
median (50%ile) fixes body shape. encouraging.
dynamic-thresholding latents in pixel space.
at sigmas≥1.1: we decode to pixel space, do Imagen-style thresholding, encode to latents.
trained a tiny latent decoder + RGB encoder on VAE outputs (you could call this distillation).
left = CFG 30, usual
right = dynthreshed
@Ethan_smith_20 @Ameen_Roayan if I abort CFG too early (here's a more aggressive cutoff at 1.4), then medium/fine details are solved without CFG, look more like "most likely unconditional prediction". lost eyelashes and blush.
left = full CFG
right = CFG until sigma=1.4
latent channels' means drift from 0 with each sampling step (especially when CFG is applied).
I re-centered denoising outputs on each latent channel's mean. it didn't help with CFG clipping, but I think it brings out high-frequency/low-sigma details (grass blades, leaves).
here's how my latent thresholding technique preserves dynamic range at high CFG scales
we mimic the dynamic range of known-good CFG7.5.
center latent channels on means, clamp out 99.9%ile outliers, multiply by ratio "CFG7.5's max / my 99.9%ile", un-center.
#stablediffusion
had an art night
I brought VSCode, an ssh tunnel and a wireless mouse
#stablediffusion
multi-cond guidance, without cubic easing.
since it's a linear interpolation schedule: we have more frames at the midpoints between conditions, where the mixing is least coherent.
still: it's a visual interpolation, so the motion's easy to track.