Jaerin Lee @_ironjr_ Twitter Profile

Jaerin Lee

@_ironjr_

Yet another AI engineer. @ Computer Vision Lab, Seoul National University. Research: Gaussian splatting, diffusion model, grokking

Joined August 2015

267Posts 946Followers 91Following

Pinned

Jaerin Lee

@_ironjr_

6 Jun

📣📣📣 We are excited to announce our new paper, “Grokfast: Accelerated Grokking by Amplifying Slow Gradients”! 🤩 Reinterpreting ML optimization processes as control systems with gradients acting as signals, we accelerate the #grokking phenomenon up to X50, making a step…

Jaerin Lee Reposted

will brown

@willccbb

6 Jul

some thoughts on the grokfast paper, which has some fun theory + nice results but i suspect is maybe fluffing up a simpler thing that would also work

will brown

@willccbb

6 Jul

like a somewhat similar thing you could test is just using bigger batches/gradient accumulation. would’ve liked to see that experiment, their “slow-moving only” ablation doesn’t really address whether the issue is just due to memorization capacity for each batch

Jaerin Lee

@_ironjr_

23 Jun

🚨 #StreamMultiDiffusion now supports #StableDiffusion3. Public demo's uploaded. 🤩 Try now at @huggingface 🤗 Space 👉 huggingface.co/spaces/ironjr/… Amazed at #SD3 but bored of single text-to-image generation? Try out our demo by drawing with brushes 🖌️ that paints multiple meanings…

Jaerin Lee

@_ironjr_

23 Jun

Fast multi-prompt arbitrary-sized generation @gradio demo with #StableDiffusion3... Almost done. Generating 2560x1024 images from five regional prompts under 30 sec.

Jaerin Lee

@_ironjr_

28 Jun

Thanks for trying Grokfast and sharing the progress! I will also try for this setup, too.

Sebastian

@omouamoua

27 Jun

I trained a 46M LLM for 260 epochs on wikitext using [@_ironjr_](x.com/_ironjr_) et al.'s grokfast algorithm. (1/9)

Jaerin Lee Reposted

🇺🇦🇮🇱dmitriy samsonov

@d0rc

26 Jun

This is truly groundbreaking.

Jaerin Lee

@_ironjr_

6 Jun

Jaerin Lee

@_ironjr_

23 Jun

Oh, sorry for mistaking your question. tl;dr: I am very open to other types of gradient filters that can have smooth transitions like MA/EMA or can have sharp cutoffs like traditional FIR filters. However, in my short experiences, I couldn't find a better sharp-transition…

Rogs 🔍🔸

@ESRogs

22 Jun

Sorry, I might not have written my question very clearly. What I mean to be asking about is how much you're boosting the different frequencies. In the paper you mention using a low-pass filter. And what I'm wondering about is whether that filter has a sharp cutoff at some…

Jaerin Lee Reposted

gfodor.id

@gfodor

23 Jun

I’m stunned at the fact they thought to try this Modeling gradients as signals? Wtf? This seems like a bigger deal than their application, was there prior art?

Jaerin Lee Reposted

Gary Basin 🍍

@garybasin

23 Jun

Grokked weights are much closer to random init in weight space than where models are ending up today. Makes some sense intuitively but pretty cool to see.

Jaerin Lee

@_ironjr_

6 Jun

In our discussion section, we show that our Grokfast algorithm leads to alternative generalization states in the parameter space with smaller variance and much shorter distance from the initial weights than those reached by the baseline. 🧵 [8/9]

Jaerin Lee

@_ironjr_

23 Jun

Fast multi-prompt arbitrary-sized generation @gradio demo with #StableDiffusion3... Almost done. Generating 2560x1024 images from five regional prompts under 30 sec.

Jaerin Lee

@_ironjr_

22 Jun

🔥🔥 StreamMultiDiffusion now supports Stable Diffusion 3. 👉 github.com/ironjr/StreamM… Enabling super fast multiple region-based text-to-image generation by merging FlashDiffusion and StreamMultiDiffusion framework. @huggingface Space demo coming very soon.…

Jaerin Lee Reposted

Michael Timothy Bennett

@MiTiBennett

23 Jun

Grokking now reliable

Jaerin Lee

@_ironjr_

6 Jun

Jaerin Lee

@_ironjr_

22 Jun

I just have tried. Seems like it can do inpainting with #SD3 from multiple masked text prompts in 5 sec (1024x1024). The code is uploaded 👉 github.com/ironjr/StreamM…

Daniel

@Draw_you_in

22 Jun

Can this be used as inpainting as well

Jaerin Lee Reposted

davidad 🎇

@davidad

22 Jun

If you care about generalization to unseen data, throw an exponential moving average in your gradient descent. You’ll converge 3× slower—but grok 50× faster.

Jaerin Lee

@_ironjr_

6 Jun

Jaerin Lee Reposted

Weights & Biases

@weights_biases

18 Jun

Explore the phenomenon of grokking and how the Grokfast algorithm accelerates generalization in neural networks by 50x. Learn the technical details and see how it can speed up your training. Read the full article here: wandb.ai/byyoung3/mlnew…

Jaerin Lee

@_ironjr_

22 Jun

I just updated the readme of the main repository of #Grokfast with hyperparameter setup guidelines. github.com/ironjr/grokfast

Rogs 🔍🔸

@ESRogs

22 Jun

How do you draw the boundary between the fast-varying, high frequency component and slow-varying, low frequency component? Does your LPF apply an arbitrary cutoff? Could you make it a sliding scale instead, where you boost the lowest frequencies the most and highest the least?

Jaerin Lee

@_ironjr_

22 Jun

Thanks for the acknowledgement! Demonstrating with windowed/exponential moving average, the most simplest form of LPF, Grokfast paper is just a proof-of-concept of augmenting optimizers with signal filters to modulate the generalization process. We have showed that MA/EMA…

Edward Kmett

@kmett

22 Jun

This is a really good paper! My main question is: there's been a lot of work using FFTs to try to get access to frequency domain information, be it here, or in attention in general, etc. I've not seen anybody try a fast wavelet transform, though. It seems to me that the FWT…