Birchlabsさんのプロフィール画像

Birchlabsさんのイラストまとめ


ML Engineer at Anlatan (@novelaiofficial). co-author of HDiT (Hourglass Diffusion Transformers). works on diffusion models and LLMs. 日本語を勉強してる。
birchlabs.co.uk

フォロー数:191 フォロワー数:4507

got Unet compiled to CoreML, targeting all compute units (including Neural Engine).
replaced self-attention with the MultiHeadedAttention that Apple optimized for Neural Engine.
not faster yet (need cross-attention too).
https://t.co/9tdKdGIK7X

6 27

k-diffusion added Brownian tree noise sampling, increasing the stability of convergence.
10, 15, 20, 25, 30, 35, 40, 100 step counts:
left = default noise
right = Brownian noise
default strategy has it jumping all over the place, but Brownian sampling is stable.

2 48

skipping the clamping-by-%ile and just denormalizing CFG20's latents to CFG7.5's abs().max() is very similar to reducing cond_scale, but not quite.

think it works out something like:
(uncond + (cond - uncond) * 20)/(20/7.5)
versus
uncond + (cond - uncond) * (20/(20/7.5))

0 0

99.99999999999%ile dynthresh
I think this shows that "clamping out n%ile outliers" is only important when you have excessive outliers. the rest of the battle is "what range of values do you span". hence denormalizing the latents to CFG7.5's abs().max(), gives us a safer range.

0 2

a couple more examples before I post the code

0 5

each step: compute CFG7.5. for each channel: flatten, center on mean, grab abs().max()
compute CFG20. each channel: flatten, center on mean, compute 99.x%ile of abs(), pick larger of %ile or of CFG7.5's max. clamp channel by that. normalize. multiply by CFG7.5's max.
code coming

0 6

made a new algorithm for dynamic thresholding in
enables us to set CFG scale high (e.g. 20) without clipping; keep dynamic range / subtlety in shadows, highlights
we refer to a known-good CFG (7.5)'s dynamic range, which helps us pick a ceiling.
detail to follow

7 63

got official DPMSolver++ sampler working on Mac.
today, Cheng Lu added a trick to improve performance on <15 steps. probably k-diffusion doesn't have this yet.
dynthresh only works on pixel space; remains an unsolved problem on latents.
https://t.co/22rDBzDWiW

1 40

with DPM-Solver++(2M) sampler, we get coherent images in 5 sampler steps!
and these aren't Heun steps (where n steps = 2n-1 model calls), this is just 5 model calls! less than 3.5 secs on Mac!
Katherine released this implementation yesterday in k-diffusion — great work as usual!

21 156

classifier-free guidance:
ask model to denoise gauss noise.
no condition: model predicts a salad.
shrine maiden condition: model predicts graffiti of faces.
CFG is "what makes shrine maiden different from salad", multiplied by your guidance scale.
repeat this every sampler step.

1 4