Jon Ander Campos @jaa_campos Twitter Profile

Jon Ander Campos

@jaa_campos

Member of Technical Staff @cohere. PhD in Natural Language Processing @IxaGroup. Also interned at @Apple, @AIatMeta, @CNRS and @nyuniversity.

68Posts 279Followers 329Following

Similar User

@AnderSala

@Hitz_zentroa

@Aitor57

@eagirre

@IxaGroup

@oierldl

@glabaka

@4nderB

@IxaTaldea

@aormazabalo

@liruizhe94

@ItziarGD

@YueDongCS

@ragerri

@ZEYULIU10

Jon Ander Campos Reposted

Max Bartolo

@max_nlp

14 Nov

Incredibly honoured to see our work recognised as an outstanding paper. @magikarp_tokens dove deep into the dark depths of tokenisation on this one and fished up some very interesting insights. Be sure to catch him at #EMNLP2024 if you're around! 🎣 Thank you @emnlpmeeting ❤️

EMNLP 2024

@emnlpmeeting

14 Nov

Announcing the 20 **Outstanding Papers** for #EMNLP2024

Jon Ander Campos Reposted

Max Bartolo

@max_nlp

10 Oct

The state of AI in 2024 -- also featuring some of our recent work on synthetic critiques with @Daniella_yz, @FraserGreenlee, Phil Blunsom, @jaa_campos and @mgalle at @Cohere

Nathan Benaich

@nathanbenaich

10 Oct

🪩The @stateofaireport 2024 has landed! 🪩 Our seventh installment is our biggest and most comprehensive yet, covering everything you *need* to know about research, industry, safety and politics. As ever, here's my director’s cut (+ video tutorial!) 🧵

Jon Ander Campos Reposted

Yanai Elazar

@yanaiela

6 Aug

Concerned about data contamination? We asked the community for known contamination in different datasets and models, and summarized these finding in this report. arxiv.org/pdf/2407.21530

Oscar Sainz

@osainz59

3 Aug

Thank you to all the contributors! As part of the CONDA Workshop, we have created a report with all the contributions. You can find it already available in arxiv: arxiv.org/abs/2407.21530

Jon Ander Campos

@jaa_campos

7 Jul

New work led by @Daniella_yz during her internship at Cohere 🚀 In the paper we show that synthetic critiques are not only helpful but also more efficient than vanilla preference pairs when training reward models.

Daniella Ye

@Daniella_yz

6 Jul

Beyond their use in assisting human evaluation (e.g. CriticGPT), can critiques directly enhance preference learning? During my @Cohere internship, we explored using synthetic critiques from large language models to improve reward models. 📑Preprint: arxiv.org/abs/2405.20850

Jon Ander Campos Reposted

Sebastian Ruder

@seb_ruder

1 Jul

Understanding and Mitigating Language Confusion 😵‍💫 User: ¿De qué trata nuestro artículo? LLM: We analyze one of LLMs’ most jarring errors: their failure to generate text in the user’s desired language. 📑 arxiv.org/abs/2406.20052 💻 github.com/for-ai/languag…

Jon Ander Campos Reposted

Sander Land

@magikarp_tokens

10 May

Our paper about reliably finding under-trained or 'glitch' tokens is out! We find up to thousands of these tokens in some #LLMs, and give examples for most popular models. arxiv.org/abs/2405.05417 More in 🧵

Jon Ander Campos Reposted

Oscar Sainz

@osainz59

16 Apr

Can you imagine having all the evidence of data contamination gathered in one place? 📢As part of the CONDA workshop, we present the Data Contamination Evidence Collection, a shared task on reporting contamination. Available as a @huggingface space: hf.co/spaces/CONDA-W…

🐍💨 Data Contamination Database - a Hugging Face Space by CONDA-Workshop

Source: https://t.co/wQHjzL6Ydt

Jon Ander Campos Reposted

Julen Etxaniz

@juletxara

10 Apr

In our new paper, we introduce Latxa, a family of LLMs for Basque from 7 to 70B parameters that outperform open models and GPT3.5. Models and datasets @huggingface hf.co/collections/Hi… Code: github.com/hitz-zentroa/l… Blog: hitz.eus/en/node/343 Paper: arxiv.org/abs/2403.20266

Jon Ander Campos

@jaa_campos

9 Apr

Command R+ is now at position 6 on the arena leaderboard! 🚀 It's wonderful to see such positive reception! 🤩 If you enjoyed the model, you can explore the RAG and Tool Use capabilities at coral.cohere.com or download the weights from 🤗

lmarena.ai (formerly lmsys.org)

@lmarena_ai

9 Apr

Exciting news - the latest Arena result are out! @cohere's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥 Big congrats to @cohere's incredible work & valuable contribution…

Jon Ander Campos

@jaa_campos

4 Apr

Super happy and proud to share that ⌘R+ is out! 🚀 Working for this launch with such an amazing team has been an incredible journey. Try it out at coral.cohere.com or download the weights at 🤗 and play with it on your machine!

Aidan Gomez

@aidangomez

4 Apr

⌘R+ Welcoming Command R+, our latest model focused on scalability, RAG, and Tool Use. Like last time, we're releasing the weights for research use, we hope they're useful to everyone! txt.cohere.com/command-r-plus…

Jon Ander Campos

@jaa_campos

29 Mar

Uncontaminated test sets and methods for detecting contamination are invaluable these days! If you're working on related topics please consider submitting to the CONDA 🐍 workshop at ACL conda-workshop.github.io

Aidan Gomez

@aidangomez

29 Mar

Another pro-tip for doing really well on evals: just train on the test set. Literally just do it, you have the examples right there. Ie. here's [redacted] on HumanEval.

Jon Ander Campos Reposted

Julen Etxaniz

@juletxara

16 Mar

Excited to share that our paper "Do multilingual language models think better in English?" has been accepted at the NAACL 2024 main conference! 🎉🎉🎉 Thanks to all coauthors! @gazkune @Aitor57 @oierldl @artetxem @IxaGroup @Hitz_zentroa

Julen Etxaniz

@juletxara

3 Aug 2023

Do multilingual language models think better in English? 🤔 Yes, they do! We show that using an LLM to translate its input into English and performing the task over the translated input works better than using the original non-English input! 😯 arxiv.org/abs/2308.01223

Jon Ander Campos

@jaa_campos

11 Mar

🚀 Very excited to share that Command-R is out! 🚀 Command-R is multilingual, capable of handling long contexts, and powered by RAG and Tool Use! You can try it out at coral.cohere.com or simply download the weights and run it yourself 🤩! huggingface.co/CohereForAI/c4…

Aidan Gomez

@aidangomez

11 Mar

⌘-R Introducing Command-R, a model focused on scalability, RAG, and Tool Use. We've also released the weights for research use, we hope they're useful to the community! txt.cohere.com/command-r/

CohereForAI/c4ai-command-r-v01 · Hugging Face

Source: https://t.co/qnW2BgCijl

Jon Ander Campos Reposted

Cohere For AI

@CohereForAI

13 Feb

Today, we’re launching Aya, a new open-source, massively multilingual LLM & dataset to help support under-represented languages. Aya outperforms existing open-source models and covers 101 different languages – more than double covered by previous models. cohere.com/research/aya

Jon Ander Campos Reposted

Alon Jacovi

@alon_jacovi

2 Feb

👋 Check out our new paper and benchmark: REVEAL, a dataset with step-by-step correctness labels for chain-of-thought reasoning in open-domain QA 🧵🧵🧵 arxiv.org/abs/2402.00559 huggingface.co/datasets/googl…

Jon Ander Campos Reposted

Jérémy Scheurer

@jeremy_scheurer

30 Jan

This seems like a great workshop! I hope and expect that analyzing the potential of data conamination will become a standard part of any rigorous eval. Just like model cards, impact statements etc. are part of high qulity papers. Excited that @jaa_campos is organizing this.

Jon Ander Campos

@jaa_campos

30 Jan

Data contamination in large scale models - an issue acknowledged by many that hasn't been widely discussed yet. 🚨 We are organizing CONDA, the first Workshop on Data Contamination that will be co-located with ACL24 (Aug16)🚨 Please consider submitting: conda-workshop.github.io

Jon Ander Campos

@jaa_campos

30 Jan

Oscar Sainz

@osainz59

29 Jan

📢 Excited to announce that our Workshop on Data Contamination (CONDA) will be co-located with ACL24 in Bangkok, Thailand on Aug. 16. We are looking forward to seeing you there! Check out the CFP and more information here: conda-workshop.github.io

Jon Ander Campos Reposted

HiTZ zentroa (UPV/EHU)

@Hitz_zentroa

26 Jan

Pozarren aukezten dugu Latxa eredu irekien familia, euskarazko hizkuntza eredurik handiena eta hoberena duena. @Meta-ren Llama ereduetan oinarritutakoa eta 7-70 mila miloi parametro arteko ereduak biltzen ditu, Llama-2 delako lizentzia irekia dute. 1/n