Jixuan Chen
@chenjx210734Senior Undergrad @NanjingUnivers1 | RA @HKUNLP | Digital Agents | Code generation
🍅Excited to see @AnthropicAI using 🚀our OSWorld🚀(NeurIPS'24) to benchmark computer use! 🍋OSWorld will soon support parallel cloud running, much faster! 🍓More multimodal agent open-source big projects coming soon from @XLangNLP in Nov- stay tuned! 👇os-world.github.io
Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.
OSWorld has been accepted by NeurIPS 2024 D&B track! 🎺✌️ Again, graceful thanks to all of our collaborators for their invaluable contributions to the project: @_zdy023, @chenjx210734, @xiaochuanlee, @SihengZhao, @RuishengC49326, @nikushii_, @ChengZhoujun, @dongchan, @fangyu_lei,…
🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks…
🤔Can multimodal agents automate data science & engineering workflows? Check out Spider2-V, a multimodal agent benchmark with: ✅494 CLI/GUI tasks covering 20 enterprise-level apps, e.g. dbt, Snowflake, BigQuery ✅Built in real computer OSWorld env 👇: spider2-v.github.io
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Ever since OpenInterpreter, we've all been wondering just how effective agents can be if you give them a computer. Now we have a proper benchmark. Let's take a look (🧵):
🥳Introducing our recent work OSWorld Env on benchmarking multimodal agents👏
🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks…
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments The first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating…
OSWorld Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing
Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning proj: text-to-reward.github.io abs: arxiv.org/abs/2309.11489
United States Trends
- 1. Thanksgiving 646 B posts
- 2. Custom 86,2 B posts
- 3. #BillboardIsOverParty 89 B posts
- 4. Mbappe 429 B posts
- 5. #CONVICT 6.720 posts
- 6. Zuck 5.285 posts
- 7. Vindman 31,5 B posts
- 8. Madrid 537 B posts
- 9. #ConorMcGregor 6.905 posts
- 10. #YIAYlist N/A
- 11. Verify 30,8 B posts
- 12. Liverpool 341 B posts
- 13. #drwfirstgoal N/A
- 14. HAZBINTOOZ 8.808 posts
- 15. Brandon Crawford 3.128 posts
- 16. Gonzaga 6.936 posts
- 17. Kissing 49,7 B posts
- 18. Juan Williams 1.702 posts
- 19. Providence 2.893 posts
- 20. Ferrari 35,2 B posts
Something went wrong.
Something went wrong.