Hao Zhang @haozhangml Twitter Profile

Hao Zhang

@haozhangml

Asst. Prof. @HDSIUCSD and @ucsd_cse running @haoailab. Cofounder and runs @lmsysorg. 20% with @SnowflakeDB

436Posts 3KFollowers 356Following

Similar User

@zhuohan123

@tri_dao

@skypilot_org

@lm_zheng

@lmarena_ai

@p_nawrot

@woosuk_k

@xinyun_chen_

@profjoeyg

@zicokolter

@ericxing

@ying11231

@DachengLi177

@tqchenml

@lileics

Pinned

Hao Zhang

@haozhangml

18 Mar

Check out our latest blogpost discussing a better metric -- goodput (throughput s.t. latency constraints) -- for LLM serving, and our new technique prefill-decoding disaggregation that optimizes goodput and achieves lower cost-per-query and high service quality at the same time!

Hao AI Lab

@haoailab

18 Mar

Still optimizing throughput for LLM Serving? Think again: Goodput might be a better choice! Splitting prefill from decode to different GPUs yields - up to 4.48x goodput - up to 10.2x stricter latency criteria Blog: hao-ai-lab.github.io/blogs/distserv… Paper: arxiv.org/abs/2401.09670

Hao Zhang Reposted

Hao Zhang

@haozhangml

14 Nov

📈 more to come

Logan Kilpatrick

@OfficialLoganK

14 Nov

Yeah, Gemini-exp-1114 is pretty good :)

Hao Zhang

@haozhangml

15 Nov

Glad to see both projects we have been doing since last year get recognitions. More to come ✊🤟

vLLM

@vllm_project

15 Nov

Thank you @sequoia for supporting the open source community! We believe openness is the way for the future infrastructure of AI, and humbled by our amazing users, contributors, and supporters! ❤️

Hao Zhang

@haozhangml

13 Nov

I will be at this event in person tonight -- happy to chat about anything about AI/LLMs! 🙂

vLLM

@vllm_project

29 Oct

The last in-person vLLM meetup of the year is happening in two weeks, on November 13, at @SnowflakeDB HQ! Join the vLLM developers and engineers from Snowflake AI Research to chat about the latest LLM inference optimizations and your 2025 vLLM wishlist! lu.ma/h0qvrajz

Hao Zhang Reposted

Hao Zhang

@haozhangml

28 Oct

It's official! We are now an incubation project @LFAIDataFdn We firmly believe in open governance and are committed to ensuring that vLLM remains a shared community project. ❤️

LF AI & Data Foundation

@LFAIDataFdn

28 Oct

🚀 Introducing vLLM, LF AI & Data’s newest incubation project! vLLM is a high-throughput, memory-efficient engine for LLM inference & serving, solving the challenge of slow model serving. Read the full announcement ➡️ hubs.la/Q02V_5270 #opensource #oss

Hao Zhang

@haozhangml

23 Oct

Speculative decoding is really interesting problem; the community has published many papers in a short amount of time, but very few of them have really discussed how it can be made really useful in a real serving system (they mostly assume batch size = 1). This is a nice step.…

Lily Liu

@eqhylxx

23 Oct

This is the first time we formally introduce speculative decoding in vllm. Actually this feature has been there for a long time since @cdnamz built the general framework months ago. It’s such a long way and a big community effort to make it really work.

Hao Zhang

@haozhangml

22 Oct

exciting new development!!

lmarena.ai (formerly lmsys.org)

@lmarena_ai

22 Oct

🔥New benchmark: Preference Proxy Evaluations (PPE) Can reward models guide RLHF? Can LLM judge replace real human evals? PPE addresses these questions! Highlights: - Real-world human preference from Chatbot Arena💬 - 16,000+ prompts and 32,000+ diverse model responses🗿 -…

Hao Zhang

@haozhangml

22 Oct

Congrats @_parasj @ajayj_ . This is huge to the community and my students are all excited about what the model can enable!!!

Genmo

@genmoai

22 Oct

Introducing Mochi 1 preview. A new SOTA in open-source video generation. Apache 2.0. magnet:?xt=urn:btih:441da1af7a16bcaa4f556964f8028d7113d21cbb&dn=weights&tr=udp://tracker.opentrackr.org:1337/announce

Hao Zhang

@haozhangml

17 Oct

wow, congrats @angcharlesli @haojian_jin and @HDSIUCSD !!

Ang Li

@angcharlesli

17 Oct

Thrilled to share that our collaborative work with @haojian_jin got the Distinguished Paper Award at CCS'24. Congrats to all the co-authors! @eceumd paper: arxiv.org/pdf/2408.07728

Hao Zhang

@haozhangml

16 Oct

whoo, UCSD is doing great!!

Zeta Alpha

@ZetaVector

9 Oct

And big changes in the ranking of various AI labs in the Zeta Alpha top-100 most cited papers. Microsoft made Google dance, also in the most cited AI papers of 2023. Big moves up in the ranking for AI Research at @CarnegieMellon @MIT @hkust and @UCSanDiego

Hao Zhang Reposted

Hao Zhang

@haozhangml

14 Oct

Join us on Wednesday, October 16th at 4pm PT to explore how we’re optimizing LLM serving under stricter latency requirements by maximizing goodput! 🙌 Link: hubs.la/Q02T9pcR0

PyTorch

@PyTorch

14 Oct

Join us for our next #PyTorch Expert Exchange Webinar on Wednesday, October 16th at 4 PM PT ➡️ Distsserve: disaggregating prefill and decoding for goodput-optimized LLM inference with @haozhangml Asst. Prof. at @HDSIUCSD & @ucsd_cse Tune in at: hubs.la/Q02T9pcR0

Hao Zhang

@haozhangml

14 Oct

I'll talk about DistServe (hao-ai-lab.github.io/blogs/distserv…) at PyTorch webinar this Wed. Looking forward😃. Thank @PyTorch and @AIatMeta for the support!

PyTorch

@PyTorch

14 Oct

Throughput is Not All You Need: Maximizing Goodput in LLM Serving using Prefill-Decode Disaggrega...

Source: https://t.co/i40b4WPJV2

Hao Zhang Reposted

Hao Zhang

@haozhangml

11 Oct

My @Google colleague and longtime @UCBerkeley faculty member David Patterson has a great essay out in this month's Communications of the ACM (@TheOfficialACM):🎉 "Life Lessons from the First Half-Century of My Career Sharing 16 life lessons, and nine magic words." I saw an…

Hao Zhang Reposted

Hao Zhang

@haozhangml

9 Oct

BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction.”

Hao Zhang Reposted

Hao Zhang

@haozhangml

7 Oct

📢📢 We are releasing TxT360: a globally deduplicated dataset for LLM pretraining 🌐 99 Common Crawls 📘 14 Curated Sources 👨‍🍳 recipe to easily adjust data weighting and train the most performant models Dataset: huggingface.co/datasets/LLM36… Blog: huggingface.co/spaces/LLM360/…

Hao Zhang Reposted

Hao Zhang

@haozhangml

8 Oct

✨ Check out our revamped repo! Analysis360: Open Implementations of LLM Analyses 🔗 github.com/LLM360/Analysi… Featuring tutorials on: 💾 Data memorization 🧠 LLM unlearning ⚖️ AI safety, toxicity, & bias 🔍 Mechanistic interpretability 📊 Evaluation metrics

Hao Zhang Reposted

Hao Zhang

@haozhangml

8 Oct

Former @SCSatCMU faculty member Geoffrey Hinton has been awarded the 2024 Nobel Prize in Physics! 👏 Hinton, now at @UofT, was recognized alongside John J. Hopfield of @Princeton for their work in machine learning with artificial neural networks. ➡️ cmu.is/Hinton-Nobel-P…

Hao Zhang Reposted

Hao Zhang

@haozhangml

8 Oct

Geoff and John are a truly inspired choice for the Nobel Prize in Physics. Not only because they have done groundbreaking work for machine learning research, but also since this choice reflects an understanding that machine learning methods are changing how science is done (1/2)

Hao Zhang

@haozhangml

8 Oct

Congrats @geoffreyhinton, who really laid the foundations for deep learning which enables so many breakthroughs. I quite like @bschoelkopf's interpretation x.com/bschoelkopf/st… that that AI/ML is gradually yet fundamentally changing *how science is done*, and this is truly…

The Nobel Prize

@NobelPrize

8 Oct

BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”

Hao Zhang Reposted

Hao Zhang

@haozhangml

7 Oct

Join our online meetup on Oct. 16 for efficient LLM deployment and serving, co-hosted by SGLang, FlashInfer, and MLC LLM! 🥳 You are all welcome to join by filling out the Google form forms.gle/B3YeedLxmrrhL1… It will cover topics such as low CPU overhead scheduling, DeepSeek MLA…