@HuizhuoY Profile picture

Huizhuo Yuan

@HuizhuoY

Graduate student @UCLA AGI lab, Researcher on LLMs, Diffusion Models, Reinforcement Learning, Games and AI for Science. Opinions are my own.

Joined May 2023
Huizhuo Yuan Reposted

🤯 ModelCloud has Tested and Validated the new MARS optimizer from UCLA over two models Llama 3.1 8B and Qwen 2.5 32B which resulted in significant time savings in BF16 finetuning versus Paged-AdamW-8bit. @QuanquanGu👇 Paper: arxiv.org/pdf/2411.10438 Code: github.com/AGI-Arena/MARS

Tweet Image 1

Huizhuo Yuan Reposted

I re-implemented your experiments using PyTorch, with the same setting (github.com/AGI-Arena/MARS…), getting higher performances for all the optimizers. And when I changed optimizer_1d=False for MARS, MARS shows clear edge over other optimizers about 1%.

Tweet Image 1

Huizhuo Yuan Reposted

Finally making variance reduction practical for modern AI training. Basically, teaching AI models faster by reducing the noise in their learning Original Problem 🎯: Training LLMs faces high gradient variance issues. Current adaptive optimizers like AdamW, while widely used,…

Tweet Image 1

Huizhuo Yuan Reposted

What’s New in MARS? Variance reduction techniques have been extensively developed over the past decade to accelerate stochastic optimization in both convex and nonconvex settings. However, their application to training deep neural networks and LLMs has met with limited success,…

Today’s the day to launch! Introducing MARS (Make vAriance Reduction Shine): the ultimate LLM optimizer. Let’s unite, innovate, and take our shot at MARS! 🚀🚀🚀 Paper: arxiv.org/pdf/2411.10438 Code: github.com/AGI-Arena/MARS

Tweet Image 1


Huizhuo Yuan Reposted

What is Variance Reduction? Variance reduction in Monte Carlo estimation leverages correlations between random variables to improve estimation accuracy. If X is the target random variable and Y is a similar one with a known expectation E[Y], we can define an estimator as…

Today’s the day to launch! Introducing MARS (Make vAriance Reduction Shine): the ultimate LLM optimizer. Let’s unite, innovate, and take our shot at MARS! 🚀🚀🚀 Paper: arxiv.org/pdf/2411.10438 Code: github.com/AGI-Arena/MARS

Tweet Image 1


Huizhuo Yuan Reposted

Today’s the day to launch! Introducing MARS (Make vAriance Reduction Shine): the ultimate LLM optimizer. Let’s unite, innovate, and take our shot at MARS! 🚀🚀🚀 Paper: arxiv.org/pdf/2411.10438 Code: github.com/AGI-Arena/MARS

Tweet Image 1

Huizhuo Yuan Reposted

1/ 🔍 Wonder about the answer? cryo-EM ✖️ Foundation Model 🟰 ❓ cryo-EM ✖️ Flow Matching 🟰 ❓ cryo-EM ✖️ Diffusion Transformer 🟰 ❓ Excited to introduce our new work--cryoFM, the first cryo-EM foundation model for protein densities with flow matching, which generalizes to…


We've applied Self-Play Preference Optimization (SPPO) to the latest Gemma-2-9B-instruct model, achieving a 53.27% LC win rate on AlpacaEval 2.0 leaderboard. Check out the models and code here: 🤗models: huggingface.co/collections/UC… ✨code: github.com/uclaml/SPPO

Tweet Image 1

Self-Play Preference Optimization for Language Model Alignment Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences.

Tweet Image 1


Huizhuo Yuan Reposted

We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀 ⭐ code: github.com/uclaml/SPPO 🤗models: huggingface.co/collections/UC…

Tweet Image 1

Another triumph for Self-Play! Self-Play Preference Optimization (SPPO) has surpassed (iterative) DPO, IPO, Self-Rewarding LMs, and others on AlpacaEval, MT-Bench, and the Open LLM Leaderboard. Remarkably, Mistral-7B-instruct-v0.2 fine-tuned by SPPO achieves superior…

Tweet Image 1


Huizhuo Yuan Reposted

Google presents To Believe or Not to Believe Your LLM arxiv.org/abs/2406.02543

Tweet Image 1

Huizhuo Yuan Reposted

Thrilled to announce that I am joining @CS_UCLA as an Assistant Professor this Fall! 🐻 Many thanks to my incredible advisors, mentors, family and friends for the encouragement and support. ❤️Looking forward to this exciting new chapter and all the opportunities ahead! 🤖🤖🤖


Huizhuo Yuan Reposted

Why CLIP is more robust to distribution shift than supervised learning? This #ICLR2024 paper provides the first rigorous proof! TL;DR details specified in the captions allow learning more generalizable features from images. Check it out: Tue, PS#1, P#113 arxiv.org/pdf/2319.04971

Happy to share that I have two papers (arxiv.org/pdf/2310.04971… , arxiv.org/pdf/2403.11391… ) accepted at #ICLR2024! ⬇️🧵 1/

Tweet Image 1
Tweet Image 2


Huizhuo Yuan Reposted

Thanks everyone for cheering applause and constructive criticism. I wrote a few paragraphs responding to the recent KAN hype. In short, I think it is too early to say KANs will replace MLPs, but there are indeed many interesting directions to explore. github.com/KindXiaoming/p…


Huizhuo Yuan Reposted

⭐Self-Play Preference Optimization for Language Model Alignment⭐ arxiv.org/abs/2405.00675 Bradley-Terry models in RLHF fall short in capturing the intransitivity and irrationality in human preferences. How can we identify the Nash equilibrium policy with general preferences?🧵

Tweet Image 1

Thanks @_akhaliq for advertising our SPPO work!

Self-Play Preference Optimization for Language Model Alignment Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences.

Tweet Image 1


Huizhuo Yuan Reposted

Self-Play Preference Optimization for Language Model Alignment Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences.

Tweet Image 1

Huizhuo Yuan Reposted

Another triumph for Self-Play! Self-Play Preference Optimization (SPPO) has surpassed (iterative) DPO, IPO, Self-Rewarding LMs, and others on AlpacaEval, MT-Bench, and the Open LLM Leaderboard. Remarkably, Mistral-7B-instruct-v0.2 fine-tuned by SPPO achieves superior…

Tweet Image 1

Huizhuo Yuan Reposted

Self-Play Preference Optimization for Language Model Alignment SPPO serves as the RLHF counterpart of SPIN and outperforms iterative DPO, Snorkel AI, Self-Rewarding LM, GPT-4 0613 etc arxiv.org/abs/2405.00675

Tweet Image 1

Huizhuo Yuan Reposted

🚨Excited to introduce ConfDiff for protein conformation generation! Background: Protein folding can be likened to text-to-image generation, while protein conformation generation is akin to text-to-video generation. For protein folding, notable methods include AlphaFold,…

1/ Proteins exhibit a dynamic nature. We stand by the belief that steering with physical knowledge is vital in real-world dynamic structure prediction, and we're delighted to introduce our force-guided diffusion model for generating protein conformations. arxiv.org/abs/2403.14088

Tweet Image 2


Huizhuo Yuan Reposted

🫡Excited to see this strong and versatile diffusion-based protein language model🧬 which shows superiority in numerous predictive and generative tasks. Let's eagerly anticipate its transformative potential! Congrats to my friends! 👏

This post is unavailable.

United States Trends
Loading...

Something went wrong.


Something went wrong.