Jixuan Chen @chenjx210734 Twitter Profile

Jixuan Chen

@chenjx210734

Senior Undergrad @NanjingUnivers1 | RA @HKUNLP | Digital Agents | Code generation

Joined August 2023

8Posts 38Followers 163Following

Jixuan Chen Reposted

Tao Yu

@taoyds

22 Oct

🍅Excited to see @AnthropicAI using 🚀our OSWorld🚀(NeurIPS'24) to benchmark computer use! 🍋OSWorld will soon support parallel cloud running, much faster! 🍓More multimodal agent open-source big projects coming soon from @XLangNLP in Nov- stay tuned! 👇os-world.github.io

Anthropic

@AnthropicAI

22 Oct

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

Jixuan Chen Reposted

Tianbao Xie

@TianbaoX

26 Sep

OSWorld has been accepted by NeurIPS 2024 D&B track! 🎺✌️ Again, graceful thanks to all of our collaborators for their invaluable contributions to the project: @_zdy023, @chenjx210734, @xiaochuanlee, @SihengZhao, @RuishengC49326, @nikushii_, @ChengZhoujun, @dongchan, @fangyu_lei,…

Tianbao Xie

@TianbaoX

12 Apr

🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks…

Jixuan Chen Reposted

XLANG NLP Lab

@XLangNLP

18 Jul

🤔Can multimodal agents automate data science & engineering workflows? Check out Spider2-V, a multimodal agent benchmark with: ✅494 CLI/GUI tasks covering 20 enterprise-level apps, e.g. dbt, Snowflake, BigQuery ✅Built in real computer OSWorld env 👇: spider2-v.github.io

Jixuan Chen Reposted

Alex Reibman 🖇️

@AlexReibman

29 Apr

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Ever since OpenInterpreter, we've all been wondering just how effective agents can be if you give them a computer. Now we have a proper benchmark. Let's take a look (🧵):

Jixuan Chen

@chenjx210734

12 Apr

🥳Introducing our recent work OSWorld Env on benchmarking multimodal agents👏

Tianbao Xie

@TianbaoX

12 Apr

Jixuan Chen Reposted

Aran Komatsuzaki

@arankomatsuzaki

12 Apr

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments The first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating…

Jixuan Chen Reposted

AK

@_akhaliq

12 Apr

OSWorld Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing