Yuqian Fu @Chert_Fu Twitter Profile

Yuqian Fu

@Chert_Fu

PhD Student @ DRL｜ML

234Posts 74Followers 770Following

Similar User

@chitanerk

@doveZYM

@chenfly6

@Br_MonMx_Roy

@bfdlin

@liuyanxue2

@canselozann

@owlkery0702

@MKGodknows

@xiang088

@sulczgabor

@Moneyvism

@StevenChen2022

@carmentjwern

@wilsen_0xfa6606

Pinned

Yuqian Fu

@Chert_Fu

5 Nov

Honored to be selected as a Top Reviewer of #NeurIPS2024 🎉

Yuqian Fu Reposted

I don’t know why I didn’t work on this at early OpenAI, despite going around everywhere giving talks about the magic of autoregressive language models around that time. I went deep into RL like everyone else that time. Biggest, most confusing research career mistake ever

Yuqian Fu Reposted

15 Nov

"LLMs can't reason, look at how their accuracy drops if you change the numbers in the problem!!!1!" The accuracy drop (%):

Yuqian Fu Reposted

14 Nov

Ah, i missed this new discrete diffusion in town. GPT is hitting the wall?

Yuqian Fu Reposted

9 Nov

Steerability is the next frontier of generative models! Having knobs that control the behavior of AI systems will greatly improve their safety & usability. I’m very excited to present ✨Conditional Language Policy (CLP)✨, a multi-objective RL framework for steering language…

Yuqian Fu Reposted

Jason Wei

@_jasonwei

10 Nov

There is a nuanced but important difference between chain-of-thought before and after o1. Before the o1 paradigm (i.e., chain-of-thought prompting), there was a mismatch between what chain of thought was and what we wanted it to be. We wanted chain of thought to reflect the…

Yuqian Fu Reposted

Ahmad Beirami

@abeirami

9 Nov

RLHF provably can't teach models any new knowledge. If you need to teach new skills, you need to look at pre-training and SFT. Why? 👇

Yuqian Fu Reposted

Guoqing Liu

@fiberleif

6 Nov

Same here. In the original "Let's Verify Step by Step" paper, the process reward model represents the immediate reward in RL, but in many recent papers, the term process reward model means the value function actually

Yuqian Fu Reposted

30 Oct

📢Annoucing EDLM, our brand-new Energy-based Language Model embedded with Diffusion framework! Key results: 1. We (for the first time?) almost match AR perplexity. 2. Significantly improved generation quality. 3. Considerable sampling speedup without quality drop. 🧵1/n

Yuqian Fu Reposted

Julian Schrittwieser

@Mononofu

28 Oct

I'm very excited to announce that I'm joining @AnthropicAI this week, after 10 amazing years at @GoogleDeepMind ! Thank you also to all the amazing people I got to meet and work with, and I'm really looking forward to meeting all my new colleagues 💖 ! furidamu.org/blog/2024/10/2…

Yuqian Fu Reposted

Jiacheng Ye

@JiachengYe15

25 Oct

Autoregressive language models, despite their impressive capabilities, sometimes struggle with complex reasoning and long-term planning tasks. Can we go beyond autoregression on these challanges?🤔

Yuqian Fu Reposted

24 Oct

Video generation can serve as world models and embodied planning tools, but they must be grounded in the physical world. Check out: VideoAgent for self-improving video generation using feedback from VLMs and action executions. Paper: arxiv.org/abs/2410.10076 Code:…

Yuqian Fu Reposted

Birchlabs

@Birchlabs

18 Oct

Meta have shared an LLM pretraining codebase github.com/facebookresear…

GitHub - facebookresearch/lingua: Meta Lingua: a lean, efficient, and easy-to-hack codebase to...

Source: https://t.co/2mxq3IDHd1

Yuqian Fu Reposted

13 Oct

I just tried out playing Counter-Strike in a neural network on my MacBook. In my first run, it diverged into mush pretty quickly. The recording is sped up 5x.

This post is unavailable.

Yuqian Fu Reposted

15 Oct

🚨 Exciting new results with dense process reward models (PRMs) for reasoning. Our PRMs scale ✅ search compute by 1.5-5x ✅ RL sample efficiency by 6x ✅ 3-4x ⬆️ accuracy gains vs prior works ❌ human supervision What's the secret sauce 🤔?: See 🧵 ⬇️ arxiv.org/pdf/2410.08146

Yuqian Fu Reposted

10 Oct

We just released Pixtral 12B paper on Arxiv: arxiv.org/abs/2410.07073

Yuqian Fu Reposted

Michael Kirchhof

@mkirchhof_

7 Oct

The cool thing: This does not only apply to papers. It works whenever you consume information. Watching a youtube video, listening to a podcast. Make it active! Reflect on why you consume it, extract the important bits, then repeat them actively to memorize. Hope it helps 🤗

Yuqian Fu Reposted

6 Oct

🚨This week’s top AI/ML research papers: - MovieGen - Were RNNs All We Needed? - Contextual Document Embeddings - RLEF - ENTP - VinePPO - When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 - LLMs Know More Than…