//=time() ?>
@f_j_j_ left = just Unet fused
right = Unet + VAE fused
her eyes went a bit wonky, her hairband and the browns on her collar got paler
@f_j_j_ same perceptually. hash is different.
left = original Unet
right = Unet with scaling fused into qk projection
@RiversHaveWings actually you can get it down to 3 steps, 4 model calls if you skip 0
6.1080, 1.5968, 0.4765, 0.1072
it won't be fully denoised but could you tell?
@RiversHaveWings yes of course these are cherry-picked the yield is terrible
it's 4 steps but admittedly 5 model calls (extra 2S call to warmup the linear multistep).
still a win for few-sigma sampling. helps us get the most out of that first, biggest sigma.
sampled in just 4 steps of DPM-Solver++ (2M)
hand-picked sigmas:
6.1080, 1.5968, 0.4765, 0.1072, 0.0000
#stablediffusion
implemented @RiversHaveWings's suggestion to add a 2S warmup to 2M sampler; it works great!
https://t.co/YN5p2tu1jv
what if I told you you could denoise latents in float16 but do DPM-Solver++ sampling in float32
left=fp16 Saber
right=mixed-precision Saber
supposedly high-precision sampling helps to converge more stably towards the true image (Karras used fp64)
https://t.co/C0Y0nD9EXS
optimized the #stablediffusion Unet for Neural Engine. this time the whole thing, not just self-attention.
I think CoreML is still running it on GPU instead though (GPU-only benchmark gets same speeds).
anybody know how to peer into the black box?
https://t.co/JH1bdG8Htj
it occasionally does these saturated, striking background colours. the colour choice is surprising.
don't worry; the hands are out-of-shot
waifu-diffusion has nice watercolour conditioning
don't look at the hands don't look at the