Chien-Sheng (Jason) Wu @jasonwu0731 Twitter Profile

Chien-Sheng (Jason) Wu

@jasonwu0731

Director at @SFResearch leading the Interactive AI team. Working on #NLProc, particularly #TrustAI, #ConvAI, #AgentAI and #HCI_NLP. Opinions are my own.

286Posts 1KFollowers 402Following

Similar User

@CaimingXiong

@JotyShafiq

@sewon__min

@kaiwei_chang

@AkariAsai

@billyuchenlin

@xiangrenNLP

@hllo_wrld

@HannaHajishirzi

@ysu_nlp

@taoyds

@hhsun1

@mohitban47

@stefan_fee

@yizhongwyz

Chien-Sheng (Jason) Wu

@jasonwu0731

8 Nov

I'll be at EMNLP next week in Miami, presenting our recent work on Summary-in-a-haystack as well as prompt leakage and defense. Also, our team has released multiple AI agent related works that I'll be more than happy to discuss. Look forward to meeting you!

Chien-Sheng (Jason) Wu Reposted

Kung-Hsiang Steeve Huang

@steeve__huang

5 Nov

Here are the highlights of our work: 1. Our data generation strategy is grounded on real-world data schemas, simulating realistic scenarios with great diversity and quality checks, such as deduplication and content verification. 2. We uploaded our generated data to a Salesforce…

Chien-Sheng (Jason) Wu

@jasonwu0731

5 Nov

Excited to announce CRMArena! Our framework aligns with the Salesforce schema, and tasks are tailored for multiple professionals. You can test it directly on login.salesforce.com or via APIs. This will be a live leaderboard with more CRM tasks coming soon! Stay tuned! 🔥

Kung-Hsiang Steeve Huang

@steeve__huang

5 Nov

🚀 Exploring the Wild West of AI in Business🤠 🔥 Introducing CRMArena - a work-oriented benchmark for LLM agents to prove their mettle in real-world business scenarios! CRMArena features nine distinct tasks within a complex business environment filled with rich and realistic…

Chien-Sheng (Jason) Wu

@jasonwu0731

5 Nov

Thanks @youdotcom for valuing our evaluation framework! @RichardSocher

you.com

@youdotcom

5 Nov

This one deserves a spot on the fridge: 🏆 Most accurate search, most reliable, and most balanced. We've been trying to tell you, but now you can see for yourself.

Chien-Sheng (Jason) Wu

@jasonwu0731

4 Nov

Check our work CASA! 🚨 LLM-based agents can forget on Trust & Safety standards they should already know. Always keep an eye out— T&S needs constant vigilance!

Haoyi Qiu ✈️ NeurIPS24

@HaoyiQiu

4 Nov

🌐 Are LLM agents prepared to navigate the rich diversity of cultural and social norms? 🏠 CASA tests them on real-world tasks like online shopping and social discussion forums, revealing that current agents show less than 10% awareness and over 40% norm violations. 🧠 We’re…

Chien-Sheng (Jason) Wu Reposted

Philippe Laban

@PhilippeLaban

4 Nov

How good is #SearchGPT? How does it compare to other answer engines like You.com, Perplexity, or Bing Chat? The AnswerEngineEval benchmark we developed with @PranavVenkit helps us evaluate scientifically.

Chien-Sheng (Jason) Wu

@jasonwu0731

1 Nov

Generative Answer Engines are booming—but how well do they really perform? Through user studies, we uncover 16 current limitations in 4 dimensions: Answer, Citation, Sources, and UI. We propose 16 design recommendations tied to 8 key metrics.

Pranav Venkit

@PranavVenkit

1 Nov

🥳New Paper Alert🥳 Excited to share my work from @salesforce —where we audited answer engines (aka generative search) like Perplexity that use RAG for cited responses. Spoiler: they’ve got a lot of room to grow in getting it right! Paper: arxiv.org/pdf/2410.22349 Check it out!

Chien-Sheng (Jason) Wu Reposted

Salesforce AI Research

@SFResearch

4 Oct

Meet Generative Canvas for Lightning⚡️, an innovative AI-powered research canvas tailor-made for real-world sales productivity. This new tool helps sellers reimagine business applications for the AI era. Check it out: 💥Blog: bit.ly/4gSqjUj 💥Product Website:…

Chien-Sheng (Jason) Wu Reposted

Maziyar PANAHI

@MaziyarPanahi

24 Oct

Microsoft just dropped OmniParser model on ⁦@huggingface⁩, so casually! 😂 “OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent.” 🔥 huggingface.co/microsoft/Omni…

microsoft/OmniParser · Hugging Face

Source: https://t.co/h9nzhyUUQB

Chien-Sheng (Jason) Wu Reposted

killian

@hellokillian

24 Oct

Want to use Claude to control your computer? pip install open-interpreter interpreter --os Works on Windows and Mac. Have fun :)

Chien-Sheng (Jason) Wu Reposted

Marktechpost AI Research News ⚡

@Marktechpost

25 Oct

Salesforce AI Research Introduces a Novel Evaluation Framework for Retrieval-Augmented Generation (RAG) Systems based on Sub-Question Coverage Salesforce AI researchers introduce a new framework for evaluating RAG systems based on a metric called “sub-question coverage.” Instead…

Chien-Sheng (Jason) Wu

@jasonwu0731

24 Oct

Want to improve your AI response quality and user preference? Let your RAG systems focus more on "Core Question", a little bit on "Background Question", and less on "Follow-up Questions"! Check our work to get more details!

Salesforce AI Research

@SFResearch

24 Oct

❓Beyond "right” or “wrong": Introducing a novel RAG evaluation framework based on sub-question coverage. How do we measure if RAG systems are giving complete answers to complex questions? Enter: “Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with…

Chien-Sheng (Jason) Wu Reposted

Pranav Venkit

@PranavVenkit

21 Oct

Heyoo! I'll be at @AIESConf this week in San Jose! I'll be presenting my work on understanding cultural harms in image generation models along with @Sanjana08395511 and @SourojitGhosh3 (on Tuesday). If you're here or around, come say hi. 👋 @RealAAAI (Also check out our work 😊)

Pranav Venkit

@PranavVenkit

26 Jul

I'm excited to announce that our paper, "Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach," has been accepted to @AIESConf ! 🎉 🥳 #AI #Ethics @aylin_cim @SourojitGhosh3 @Sanjana08395511 @ShomirWilson

Chien-Sheng (Jason) Wu Reposted

Yixin Liu

@YixinLiu17

17 Oct

LLMs are often used to evaluate the instruction-following capabilities of other LLMs – but which LLM should we choose, and how should we use it? 🤔 We're excited to share "ReIFE: Re-evaluating Instruction-Following Evaluation"! Preprint: arxiv.org/abs/2410.07069 📊 Our study is…

Chien-Sheng (Jason) Wu Reposted

Salesforce AI Research

@SFResearch

16 Oct

🔖 BOOKMARK ME! 🔖 The Top-100 most cited AI papers in 2023 list is out, and #Salesforce AI Research comes in hot with two in the top ten! 🔥 Check out the list: bit.ly/3UfnqUa #5 Top Paper: "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image…

Chien-Sheng (Jason) Wu

@jasonwu0731

9 Oct

Meet ReGenesis, a new Coarse-to-fine framework to boost your LLM's reasoning! Fascinating insight: different LLMs self-develop "preferred" reasoning paths, improving generalization after finetuning — just like humans!

Becky Xiangyu Peng

@beckypeng6

9 Oct

🚨🆕🚨Introducing ReGenesis: Reasoning Generalists via Self-Improvement! Our method self-synthesizes reasoning paths, moving from abstract to concrete. 🔥While others see a 4.6% drop in OOD performance, ReGenesis delivers a 6.1% boost! 🚀 🔗arxiv.org/abs/2410.02108

Chien-Sheng (Jason) Wu

@jasonwu0731

2 Oct

GPT4o1 shows that better reasoning alone doesn’t boost writing quality. So, what’s the real solution? Expert edits! Let’s align AI writing with human expertise — especially on creative tasks.💡✍️

Tuhin Chakrabarty

@TuhinChakr

2 Oct

New paper on human-AI interaction. We hire 18 writers to edit quirks in AI writing & see if #AI can mimic this process to improve its own writing Verdict: Writer-edited > AI-edited > AI-generated In other words:🚨Edits enhance alignment in writing🚨 🔗arxiv.org/pdf/2409.14509

Chien-Sheng (Jason) Wu Reposted

Salesforce AI Research

@SFResearch

2 Oct

🏆 🏆 🏆 Our groundbreaking research on prompt leakage in multi-turn LLM interactions is amongst the top-50% industry-track papers accepted to #EMNLP2024! We propose a novel threat model, uncover social engineering vulnerabilities, measure fine-grained leakage, and apply…

Chien-Sheng (Jason) Wu Reposted

Philippe Laban

@PhilippeLaban

23 Sep

🎉 Summary of a Haystack accepted to #EMNLP2024! New results since submission: - o1-preview best in RAG setup (+10), but lags Gemini on long-context - 3.5-Sonnet lags 3-Opus due to worse citation - o1-mini/Mistral-large2 decent in RAG, but not in long-context

Chien-Sheng (Jason) Wu

@jasonwu0731

24 Aug

Great contribution from @hsu_byron on boosting model training efficiency! 🔥

Byron Hsu

@hsu_byron

23 Aug

(1/n) Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training. github.com/linkedin/Liger…