@chenjx210734 Profile picture

Jixuan Chen

@chenjx210734

Senior Undergrad @NanjingUnivers1 | RA @HKUNLP | Digital Agents | Code generation

Joined August 2023
Jixuan Chen Reposted

🍅Excited to see @AnthropicAI using 🚀our OSWorld🚀(NeurIPS'24) to benchmark computer use! 🍋OSWorld will soon support parallel cloud running, much faster! 🍓More multimodal agent open-source big projects coming soon from @XLangNLP in Nov- stay tuned! 👇os-world.github.io

Tweet Image 1

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

Tweet Image 1


Jixuan Chen Reposted

OSWorld has been accepted by NeurIPS 2024 D&B track! 🎺✌️ Again, graceful thanks to all of our collaborators for their invaluable contributions to the project: @_zdy023, @chenjx210734, @xiaochuanlee, @SihengZhao, @RuishengC49326, @nikushii_, @ChengZhoujun, @dongchan, @fangyu_lei,…

🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks…



Jixuan Chen Reposted

🤔Can multimodal agents automate data science & engineering workflows? Check out Spider2-V, a multimodal agent benchmark with: ✅494 CLI/GUI tasks covering 20 enterprise-level apps, e.g. dbt, Snowflake, BigQuery ✅Built in real computer OSWorld env 👇: spider2-v.github.io

Tweet Image 1

Jixuan Chen Reposted

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Ever since OpenInterpreter, we've all been wondering just how effective agents can be if you give them a computer. Now we have a proper benchmark. Let's take a look (🧵):


🥳Introducing our recent work OSWorld Env on benchmarking multimodal agents👏

🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks…



Jixuan Chen Reposted

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments The first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating…


Jixuan Chen Reposted

OSWorld Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing


Jixuan Chen Reposted

Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning proj: text-to-reward.github.io abs: arxiv.org/abs/2309.11489

Tweet Image 1

United States Trends
Loading...

Something went wrong.


Something went wrong.