@Zefan_Cai Profile picture

Zefan Cai @ EMNLP 2024

@Zefan_Cai

Now Ph.D student @UWMadison Previous @PKU1898

Joined May 2023
Similar User
Barış USA 🇺🇸🇹🇷 photo

@Baris19051974

Nhel 0541927155 photo

@Red0541927155

MOSES FOUNDATION✝️ 😇 photo

@farzy04506692

Blaine photo

@BlaineF74

Óscar Cuánto photo

@oscarcuanto

Akwa photo

@astoldbyphoebe

jorge photo

@jorgegarciacam

coke stevenson photo

@BigBleuOx

Zefan Cai @ EMNLP 2024 Reposted

New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…


Zefan Cai @ EMNLP 2024 Reposted

Excited to present my paper on role-playing LLM agents at #EMNLP2024! 🎉 Paper Title: “Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks” Come say hi and let's chat about some exciting research! 🤖🧠✨ TL;DR: How can we make LLM agents…

Tweet Image 1

Zefan Cai @ EMNLP 2024 Reposted

We are excited to share our latest work on benchmarking current LLM-based machine translation and traditional NMT on culture-specific concepts. Chat with us at 4-5:30pm on the poster session #EMNLP2024! Joint work w/ @Binnie8545 @SeleenaJiang @Diyi_Yang

🚀 Excited to present our paper "Benchmarking Machine Translation with Cultural Awareness" at #EMNLP2024! We build CAMT, a novel parallel corpus enriched with culture-specific item annotations, and evaluate how well NMT and LLM-MT systems handle cultural entities.

Tweet Image 1


Arriving in #Miami for #EMNLP2024! Excited to see friends! Would like to chat about inference acceleratoon —feel free to reach out!


Zefan Cai @ EMNLP 2024 Reposted

About to arrive in #Miami 🌴 after a 30-hour flight for #EMNLP2024! Excited to see new and old friends :) I’d love to chat about data synthesis and deep reasoning for LLMs (or anything else) —feel free to reach out!


Zefan Cai @ EMNLP 2024 Reposted

🌐 Are LLM agents prepared to navigate the rich diversity of cultural and social norms? 🏠 CASA tests them on real-world tasks like online shopping and social discussion forums, revealing that current agents show less than 10% awareness and over 40% norm violations. 🧠 We’re…

Tweet Image 1

Thanks for sharing! Our new work HeadKV intelligently compresses LLM memory by identifying and prioritizing crucial attention heads. Specifically, this is the first work that targets at global memory allocation aross 32 heads in 32 layers inside Llama-3 model.

Not all brain cells are equal - same goes for LLM attention heads! 💡 Why store everything when you can just remember the important stuff? Smart KV cache compression that knows which attention heads matter most. Hence, HeadKV intelligently compresses LLM memory by identifying…

Tweet Image 1


Zefan Cai @ EMNLP 2024 Reposted

@OpenAI @junshernchan @ChowdhuryNeil Thank you for your work on MLE-bench. I wanted to bring to your attention our highly relevant work in 2023: "ML-BENCH: Evaluating Large Language Models and Agents for Machine Learning Tasks" (arxiv.org/abs/2311.09835). We'd appreciate…

We’re releasing a new benchmark, MLE-bench, to measure how well AI agents perform at machine learning engineering. The benchmark consists of 75 machine learning engineering-related competitions sourced from Kaggle. openai.com/index/mle-benc…



Zefan Cai @ EMNLP 2024 Reposted

Progress in this field is truly rapid!

🔥 MovieChat recently received its 100th citation. Thank you all for your support! A year after its release, we’ve updated MovieChat in CVPR 2024, the first large multimodal model designed for long video understanding. Thanks to its training-free design, we’ve upgraded the…

Tweet Image 1


Zefan Cai @ EMNLP 2024 Reposted

🌍 I’ve always had a dream of making AI accessible to everyone, regardless of location or language. However, current open MLLMs often respond in English, even to non-English queries! 🚀 Introducing Pangea: A Fully Open Multilingual Multimodal LLM supporting 39 languages! 🌐✨…

Tweet Image 1

Looking forward to see how Lex react to Anthropic

I'm doing a podcast with Dario Amodei, CEO of Anthropic (creator of Claude) soon, all about AI. Let me know if you have questions/topic suggestions. Also, I'll stop by SF for a bit. Let me know if you have suggestions of who I should talk to.



Thanks for sharing!

6/ **Large Language Models are not Fair Evaluators** Want to use LLMs as evaluators? There are many things to be aware of, one of them is positional bias! This paper not only shows that but also develops simple yet effective calibration mechanisms to align LLM judgments more…

Tweet Image 1


Zefan Cai @ EMNLP 2024 Reposted

(Perhaps a bit late) Excited to announce our survey on ICL has been accepted to #EMNLP2024 main conf and been cited 1,000+ times! Thanks to all collaborators and contributors to this field! We've updated the survey arxiv.org/abs/2301.00234. Excited to keep pushing boundaries!


Our previous work ML-Bench also evaluates how well agents perform ML developments! Super excited that this high-quality dataset is released to help develop code agents! ARXIV: arxiv.org/abs/2311.09835 Code: github.com/gersteinlab/ML…

We’re releasing a new benchmark, MLE-bench, to measure how well AI agents perform at machine learning engineering. The benchmark consists of 75 machine learning engineering-related competitions sourced from Kaggle. openai.com/index/mle-benc…



Zefan Cai @ EMNLP 2024 Reposted

How can we guide LLMs to continually expand their own capabilities with limited annotation? SynPO: a self-boosting paradigm training LLM to auto-learn generative rewards and synthesize preference data. After 4 iterations, Llama3&Mistral achieve over 22.1% win rate improvements

Tweet Image 1
Tweet Image 2

Zefan Cai @ EMNLP 2024 Reposted

🤔How much potential do LLMs have for self-acceleration through layer sparsity? 🚀 🚨 Excited to share our latest work: SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration. Arxiv: arxiv.org/abs/2410.06916 🧵1/n

Tweet Image 1

Zefan Cai @ EMNLP 2024 Reposted

Need to address my earlier tweet: the #ACL2025 deadline has now been updated to February 15. Be sure to check out the updated CFP for all the details at 2025.aclweb.org/calls/main_con…. Thank you for your understanding as we navigate these changes! 📝✨


Do you still think VQ can not do text reconstruction? DND-Transformer can definitely change your mind! We empirically prove that Auto-Regressive Transformers can generate images with rich text and graphical elements.

✨A Spark of Vision-Language Intelligence! We introduce DnD-Transformer, a new auto-regressive image gen model beats GPT/Llama w/o extra cost. AR gen beats diffusion in joint VL modeling in a self-supervised way! Github: github.com/chenllliang/Dn… Paper: huggingface.co/papers/2410.01…



Loading...

Something went wrong.


Something went wrong.