@Chert_Fu Profile picture

Yuqian Fu

@Chert_Fu

PhD Student @ DRL|ML

Similar User
chitaner photo

@chitanerk

不萬能風 photo

@doveZYM

chenfly photo

@chenfly6

BMo photo

@Br_MonMx_Roy

bfdlin photo

@bfdlin

CryptoNinja photo

@liuyanxue2

Cansel.eth photo

@canselozann

貓頭鷹 photo

@owlkery0702

bwyz 🐐 photo

@MKGodknows

xiang꧁IP꧂ photo

@xiang088

SulczG ♥️ Memecoin photo

@sulczgabor

Fedvity photo

@Moneyvism

chdonger ♦️ Tabi 🟧 🛸(🌸, 🌿)🛡️🩸💧꧁IP꧂ photo

@StevenChen2022

Carmen Sandiego Teoh photo

@carmentjwern

Will-0xfa photo

@wilsen_0xfa6606

Pinned

Honored to be selected as a Top Reviewer of #NeurIPS2024 🎉

Tweet Image 1

Yuqian Fu Reposted

I don’t know why I didn’t work on this at early OpenAI, despite going around everywhere giving talks about the magic of autoregressive language models around that time. I went deep into RL like everyone else that time. Biggest, most confusing research career mistake ever


Yuqian Fu Reposted

"LLMs can't reason, look at how their accuracy drops if you change the numbers in the problem!!!1!" The accuracy drop (%):

Tweet Image 1

Yuqian Fu Reposted

Ah, i missed this new discrete diffusion in town. GPT is hitting the wall?

Tweet Image 1

Yuqian Fu Reposted

Steerability is the next frontier of generative models! Having knobs that control the behavior of AI systems will greatly improve their safety & usability. I’m very excited to present ✨Conditional Language Policy (CLP)✨, a multi-objective RL framework for steering language…

Tweet Image 1

Yuqian Fu Reposted

There is a nuanced but important difference between chain-of-thought before and after o1. Before the o1 paradigm (i.e., chain-of-thought prompting), there was a mismatch between what chain of thought was and what we wanted it to be. We wanted chain of thought to reflect the…


Yuqian Fu Reposted

RLHF provably can't teach models any new knowledge. If you need to teach new skills, you need to look at pre-training and SFT. Why? 👇


Yuqian Fu Reposted

Same here. In the original "Let's Verify Step by Step" paper, the process reward model represents the immediate reward in RL, but in many recent papers, the term process reward model means the value function actually


Yuqian Fu Reposted

📢Annoucing EDLM, our brand-new Energy-based Language Model embedded with Diffusion framework! Key results: 1. We (for the first time?) almost match AR perplexity. 2. Significantly improved generation quality. 3. Considerable sampling speedup without quality drop. 🧵1/n


Yuqian Fu Reposted

I'm very excited to announce that I'm joining @AnthropicAI this week, after 10 amazing years at @GoogleDeepMind ! Thank you also to all the amazing people I got to meet and work with, and I'm really looking forward to meeting all my new colleagues 💖 ! furidamu.org/blog/2024/10/2…


Yuqian Fu Reposted

Autoregressive language models, despite their impressive capabilities, sometimes struggle with complex reasoning and long-term planning tasks. Can we go beyond autoregression on these challanges?🤔


Yuqian Fu Reposted

Video generation can serve as world models and embodied planning tools, but they must be grounded in the physical world. Check out: VideoAgent for self-improving video generation using feedback from VLMs and action executions. Paper: arxiv.org/abs/2410.10076 Code:…

Tweet Image 1

Yuqian Fu Reposted

I just tried out playing Counter-Strike in a neural network on my MacBook. In my first run, it diverged into mush pretty quickly. The recording is sped up 5x.

This post is unavailable.

Yuqian Fu Reposted

🚨 Exciting new results with dense process reward models (PRMs) for reasoning. Our PRMs scale ✅ search compute by 1.5-5x ✅ RL sample efficiency by 6x ✅ 3-4x ⬆️ accuracy gains vs prior works ❌ human supervision What's the secret sauce 🤔?: See 🧵 ⬇️ arxiv.org/pdf/2410.08146

Tweet Image 1

Yuqian Fu Reposted

We just released Pixtral 12B paper on Arxiv: arxiv.org/abs/2410.07073

Tweet Image 1

Yuqian Fu Reposted

The cool thing: This does not only apply to papers. It works whenever you consume information. Watching a youtube video, listening to a podcast. Make it active! Reflect on why you consume it, extract the important bits, then repeat them actively to memorize. Hope it helps 🤗


Yuqian Fu Reposted

🚨This week’s top AI/ML research papers: - MovieGen - Were RNNs All We Needed? - Contextual Document Embeddings - RLEF - ENTP - VinePPO - When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 - LLMs Know More Than…

Tweet Image 1

Yuqian Fu Reposted

deepseek multi latent attention is wild cool to see aggressive innovation like this

Tweet Image 1

Yuqian Fu Reposted

macOS Sequoia 不再允许使用「Option +数字/字母」配置成快捷键的官方回答,不出意外,是安全原因…行,服,这很 Apple。 forums.developer.apple.com/forums/thread/…

Tweet Image 1

Loading...

Something went wrong.


Something went wrong.