Yiheng Xu @yihengxu_ Twitter Profile

Yiheng Xu

@yihengxu_

digital ai agent research @hkuniversity | ex @msftresearch | layoutlm / lemur / aguvis | from automation to autonomy

Joined May 2020

95Posts 517Followers 582Following

Similar User

@TianbaoX

@ikekong

@jinyang34647007

@_TobiasLee

@qx_dong

@MingZhong_

@JiachengYe15

@redoragd

@Siru_Ouyang

@ChenHenryWu

@beichen1019

@ChengZhoujun

@bailin_28

@YifeiLiPKU

@jiahuigao3

Pinned

Yiheng Xu

@yihengxu_

17 Jan

Very happy to share that Lemur has been accepted to #ICLR2024 as a spotlight! 🥳 Great thanks to all my amazing coauthors!

1/ 🧵 🎉 Introducing Lemur-70B & Lemur-70B-Chat: 🚀Open & SOTA Foundation Models for Language Agents! The closest open model to GPT-3.5 on 🤖15 agent tasks🤖! 📄Paper: arxiv.org/abs/2310.06830 🤗Model @huggingface : huggingface.co/OpenLemur More details 👇

Yiheng Xu Reposted

Zhiyong Wu

@zywu_hku

4 Nov

🚀 Excited to introduce a new member of the OS-Copilot family: OS-Atlas - a foundational action model for GUI agents Paper: huggingface.co/papers/2410.23… Website: osatlas.github.io A thread on why this matters for the future of OS automation 🧵 TL;DR: OS-Atlas offers: 1.…

Yiheng Xu Reposted

SpaceX

@SpaceX

13 Oct

Splashdown confirmed! Congratulations to the entire SpaceX team on an exciting fifth flight test of Starship!

Yiheng Xu Reposted

SpaceX

@SpaceX

13 Oct

Mechazilla has caught the Super Heavy booster!

Yiheng Xu Reposted

John Yang

@jyangballin

7 Oct

We're launching SWE-bench Multimodal to eval agents' ability to solve visual GitHub issues. - 617 *brand new* tasks from 17 JavaScript repos - Each task has an image! Existing agents struggle here! We present SWE-agent Multimodal to remedy some issues Led w/ @_carlosejimenez 🧵

Yiheng Xu Reposted

Fan Zhou

@FaZhou_998

5 Oct

Just took a look at the ICLR '25 submissions. 1. 👀 LLMs are still crushing it as the TOP-1 topic this year, with diffusion models in the next position. 2. 🤔 Evaluation & benchmarks might be what we should focus more on in the future, because making something like O1 or…

Yiheng Xu Reposted

Fan Zhou

@FaZhou_998

26 Sep

🚀 Still relying on human-crafted rules to improve pretraining data? Time to try Programming Every Example(ProX)! Our latest efforts use LMs to refine data with unprecedented accuracy, and brings up to 20x faster training in general and math domain! 👇 Curious about the details?

Yiheng Xu Reposted

Binyuan Hui

@huybery

6 Jun

After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: ⭐ Base and Instruct models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. Having been trained on data in 27 additional…

Yiheng Xu Reposted

ICLR 2025

@iclr_conf

3 May

Who is ready to rock some AI? 🧑‍🎤🤘 #ICLR2024

Yiheng Xu Reposted

Junxian He

@junxian_he

22 Apr

Downstream scores can be noisy. If you wonder about Llama 3's compression perf in this figure, we have tested the BPC: Llama3 8B: 0.427, best at its size, comparable to Yi-34B Llama3 70B: 0.359, way ahead of all the models here Details at github.com/hkust-nlp/llm-…

Aran Komatsuzaki

@arankomatsuzaki

16 Apr

Compression Represents Intelligence Linearly LLMs' intelligence – reflected by average benchmark scores – almost linearly correlates with their ability to compress external text corpora repo: github.com/hkust-nlp/llm-… abs: arxiv.org/abs/2404.09937

GitHub - hkust-nlp/llm-compression-intelligence: Official github repo for the paper "Compression...

Source: https://t.co/FJnFQGJETV

Yiheng Xu Reposted

Binyuan Hui

@huybery

16 Apr

🔥 Do you want an open and versatile code assistant? Today, we are delighted to introduce CodeQwen1.5-7B and CodeQwen1.5-7B-Chat, are specialized codeLLMs built upon the Qwen1.5 language model! 🔋 CodeQwen1.5 has been pretrained with 3T tokens of code-related data and exhibits…

Yiheng Xu Reposted

Tao Yu

@taoyds

12 Apr

🚀Multimodal agents is on rise in 2024! But even building app/domain-specific agent env is hard😰. Our real computer OSWorld env allows you to define agent tasks about arbitrary apps on diff. OS w.o crafting new envs. 🧐Benchmarked #VLMs on 369 OSWorld tasks: #GPT4V >> #Claude3

Tianbao Xie

@TianbaoX

12 Apr

🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks…

Yiheng Xu Reposted

Tianbao Xie

@TianbaoX

12 Apr

Yiheng Xu Reposted

Aran Komatsuzaki

@arankomatsuzaki

12 Apr

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments The first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating…

Yiheng Xu Reposted

FW

@thegenerality

5 Apr

Visualization-of-Thoughts (VoT): Mind's Eye of LLMs

elvis

@omarsar0

5 Apr

Visualization-of-Thought Elicits Spatial Reasoning in LLMs Inspired by a human cognitive capacity to imagine unseen worlds, this new work proposes Visualization-of-Thought (VoT) prompting to elicit spatial reasoning in LLMs. VoT enables LLMs to "visualize" their reasoning…

Yiheng Xu Reposted

John Yang

@jyangballin

2 Apr

SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source! We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code github.com/princeton-nlp/…

Yiheng Xu Reposted

Zora Wang

@ZhiruoW

26 Mar

Our arxiv preprint is released now! 🔗: arxiv.org/abs/2403.15452 If you know other awesome papers on tool use in LLMs, please let us know and feel free to open a PR! 👩‍💻: github.com/zorazrw/awesom…

Zora Wang

@ZhiruoW

20 Mar

Tools can empower LMs to solve many tasks. But what are tools anyway? github.com/zorazrw/awesom… Our survey studies tools for LLM agents w/ –A formal def. of tools –Methods/scenarios to use&make tools –Issues in testbeds and eval metrics –Empirical analysis of cost-gain trade-off

GitHub - zorazrw/awesome-tool-llm

Source: https://t.co/4ZpNx9wFOc

Yiheng Xu Reposted

Hanze Dong

@hendrydong

23 Mar

Wanna train a SOTA reward model? 🌟New Blog Alert: "Reward Modeling for RLHF" (with @weixiong_1 & @RuiYang70669025) is live this weekend! 🌐✨ We delve into the insights behind achieving groundbreaking performance on the RewardBench (by @natolambert). efficient-unicorn-451.notion.site/Reward-Modelin…

Reward Modeling for RLHF | Notion

Source: https://t.co/XX0PqVe6wb

Yiheng Xu Reposted

Caiming Xiong

@CaimingXiong

18 Mar

🎉🎉We are excited to release a full package for AI Agent R&D: 1) For Data & Training, 🎙️AgentOhana🎙️: Design Unified Data and Training Pipeline for Effective Agent Learning. 2) For model, 🔥xLAM-v0.1-R🔥: A strong large action model for AI Agent while maintaining abilities on…

Yiheng Xu Reposted

Ofir Press

@OfirPress

19 Mar

Since we released SWE-bench, we've been asked for a smaller & slightly easier subset of the benchmark, to make it easier to develop and test new ideas in language modeling for code. Today we're releasing SWE-bench Lite. By @_carlosejimenez @jyangballin @JiayiiGeng!

carlos

@_carlosejimenez

19 Mar

SWE-bench Lite is a smaller & slightly easier *subset* of SWE-bench, with 23 dev / 300 test examples (full SWE-bench is 225 dev / 2,294 test). We hopes this makes SWE-bench evals easier. Special thanks to @JiayiiGeng for making this happen. Download here: swebench.com/lite

Yiheng Xu Reposted

15 Mar

Happy to share REPLUG🔌 is accepted to #NAACL2024 Introduce a retrieval-augmented LM framework that combines a frozen LM with a frozen/tunable retriever. Improving GPT-3 in language modeling & downstream tasks by prepending retrieved docs to LM inputs arxiv.org/abs/2301.12652