Huizhuo Yuan @HuizhuoY Twitter Profile

Huizhuo Yuan

@HuizhuoY

Graduate student @UCLA AGI lab, Researcher on LLMs, Diffusion Models, Reinforcement Learning, Games and AI for Science. Opinions are my own.

Joined May 2023

75Posts 920Followers 941Following

Huizhuo Yuan Reposted

ModelCloud

@ModelCloudAi

22 Nov

🤯 ModelCloud has Tested and Validated the new MARS optimizer from UCLA over two models Llama 3.1 8B and Qwen 2.5 32B which resulted in significant time savings in BF16 finetuning versus Paged-AdamW-8bit. @QuanquanGu👇 Paper: arxiv.org/pdf/2411.10438 Code: github.com/AGI-Arena/MARS

Huizhuo Yuan Reposted

YIFENG LIU

@YIFENGLIU562806

22 Nov

I re-implemented your experiments using PyTorch, with the same setting (github.com/AGI-Arena/MARS…), getting higher performances for all the optimizers. And when I changed optimizer_1d=False for MARS, MARS shows clear edge over other optimizers about 1%.

Huizhuo Yuan Reposted

Rohan Paul

@rohanpaul_ai

19 Nov

Finally making variance reduction practical for modern AI training. Basically, teaching AI models faster by reducing the noise in their learning Original Problem 🎯: Training LLMs faces high gradient variance issues. Current adaptive optimizers like AdamW, while widely used,…

Huizhuo Yuan Reposted

Quanquan Gu

@QuanquanGu

19 Nov

What’s New in MARS? Variance reduction techniques have been extensively developed over the past decade to accelerate stochastic optimization in both convex and nonconvex settings. However, their application to training deep neural networks and LLMs has met with limited success,…

Quanquan Gu

@QuanquanGu

18 Nov

Today’s the day to launch! Introducing MARS (Make vAriance Reduction Shine): the ultimate LLM optimizer. Let’s unite, innovate, and take our shot at MARS! 🚀🚀🚀 Paper: arxiv.org/pdf/2411.10438 Code: github.com/AGI-Arena/MARS

Huizhuo Yuan Reposted

Quanquan Gu

@QuanquanGu

19 Nov

What is Variance Reduction? Variance reduction in Monte Carlo estimation leverages correlations between random variables to improve estimation accuracy. If X is the target random variable and Y is a similar one with a known expectation E[Y], we can define an estimator as…

Quanquan Gu

@QuanquanGu

18 Nov

Huizhuo Yuan Reposted

Quanquan Gu

@QuanquanGu

18 Nov

Huizhuo Yuan Reposted

Yi Zhou

@dugu9sword

23 Oct

1/ 🔍 Wonder about the answer? cryo-EM ✖️ Foundation Model 🟰 ❓ cryo-EM ✖️ Flow Matching 🟰 ❓ cryo-EM ✖️ Diffusion Transformer 🟰 ❓ Excited to introduce our new work--cryoFM, the first cryo-EM foundation model for protein densities with flow matching, which generalizes to…

Huizhuo Yuan

@HuizhuoY

30 Jun

We've applied Self-Play Preference Optimization (SPPO) to the latest Gemma-2-9B-instruct model, achieving a 53.27% LC win rate on AlpacaEval 2.0 leaderboard. Check out the models and code here: 🤗models: huggingface.co/collections/UC… ✨code: github.com/uclaml/SPPO

AK

@_akhaliq

2 May

Self-Play Preference Optimization for Language Model Alignment Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences.

Huizhuo Yuan Reposted

Quanquan Gu

@QuanquanGu

25 Jun

We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀 ⭐ code: github.com/uclaml/SPPO 🤗models: huggingface.co/collections/UC…

Quanquan Gu

@QuanquanGu

2 May

Another triumph for Self-Play! Self-Play Preference Optimization (SPPO) has surpassed (iterative) DPO, IPO, Self-Rewarding LMs, and others on AlpacaEval, MT-Bench, and the Open LLM Leaderboard. Remarkably, Mistral-7B-instruct-v0.2 fine-tuned by SPPO achieves superior…

Huizhuo Yuan Reposted

Aran Komatsuzaki

@arankomatsuzaki

5 Jun

Google presents To Believe or Not to Believe Your LLM arxiv.org/abs/2406.02543

Huizhuo Yuan Reposted

Yuchen Cui

@YuchenCui1

17 May

Thrilled to announce that I am joining @CS_UCLA as an Assistant Professor this Fall! 🐻 Many thanks to my incredible advisors, mentors, family and friends for the encouragement and support. ❤️Looking forward to this exciting new chapter and all the opportunities ahead! 🤖🤖🤖

Huizhuo Yuan Reposted

Baharan Mirzasoleiman

@baharanm

6 May

Why CLIP is more robust to distribution shift than supervised learning? This #ICLR2024 paper provides the first rigorous proof! TL;DR details specified in the captions allow learning more generalizable features from images. Check it out: Tue, PS#1, P#113 arxiv.org/pdf/2319.04971

Yihao Xue

@xue_yihao65785

20 Mar

Happy to share that I have two papers (arxiv.org/pdf/2310.04971… , arxiv.org/pdf/2403.11391… ) accepted at #ICLR2024! ⬇️🧵 1/

Huizhuo Yuan Reposted

Ziming Liu

@ZimingLiu11

3 May

Thanks everyone for cheering applause and constructive criticism. I wrote a few paragraphs responding to the recent KAN hype. In short, I think it is too early to say KANs will replace MLPs, but there are indeed many interesting directions to explore. github.com/KindXiaoming/p…

pykan/README.md at master · KindXiaoming/pykan

Source: https://t.co/fNEQqRVgif

Huizhuo Yuan Reposted

Zhiqing Sun

@EdwardSun0909

3 May

⭐Self-Play Preference Optimization for Language Model Alignment⭐ arxiv.org/abs/2405.00675 Bradley-Terry models in RLHF fall short in capturing the intransitivity and irrationality in human preferences. How can we identify the Nash equilibrium policy with general preferences?🧵

Huizhuo Yuan

@HuizhuoY

2 May

Thanks @_akhaliq for advertising our SPPO work!

AK

@_akhaliq

2 May

Huizhuo Yuan Reposted

AK

@_akhaliq

2 May

Huizhuo Yuan Reposted

Quanquan Gu

@QuanquanGu

2 May

Huizhuo Yuan Reposted

Aran Komatsuzaki

@arankomatsuzaki

2 May

Self-Play Preference Optimization for Language Model Alignment SPPO serves as the RLHF counterpart of SPIN and outperforms iterative DPO, Snorkel AI, Self-Rewarding LM, GPT-4 0613 etc arxiv.org/abs/2405.00675

Huizhuo Yuan Reposted

Quanquan Gu

@QuanquanGu

9 Apr

🚨Excited to introduce ConfDiff for protein conformation generation! Background: Protein folding can be likened to text-to-image generation, while protein conformation generation is akin to text-to-video generation. For protein folding, notable methods include AlphaFold,…

Lihao Wang

@leowang_1

9 Apr

1/ Proteins exhibit a dynamic nature. We stand by the belief that steering with physical knowledge is vital in real-world dynamic structure prediction, and we're delighted to introduce our force-guided diffusion model for generating protein conformations. arxiv.org/abs/2403.14088

Huizhuo Yuan Reposted

Xiangxin Zhou

@Xiangxin_Zhou

2 Mar

🫡Excited to see this strong and versatile diffusion-based protein language model🧬 which shows superiority in numerous predictive and generative tasks. Let's eagerly anticipate its transformative potential! Congrats to my friends! 👏

This post is unavailable.