@dvilasuero Profile picture

Daniel Vila Suero

@dvilasuero

Building @argilla_io (acquired by @huggingface)

Similar User
Nils Reimers photo

@Nils_Reimers

Philipp Schmid photo

@_philschmid

Argilla photo

@argilla_io

Sylvain Gugger photo

@GuggerSylvain

Goku Mohandas photo

@GokuMohandas

Lewis Tunstall photo

@_lewtun

Ines Montani 〰️ photo

@_inesmontani

Lysandre photo

@LysandreJik

AllenNLP photo

@ai2_allennlp

spaCy photo

@spacy_io

Julien Chaumond photo

@julien_c

Thomas Wolf photo

@Thom_Wolf

Hugo Larochelle photo

@hugo_larochelle

Ross Wightman photo

@wightmanr

Omar Sanseviero photo

@osanseviero

Pinned

🔥@argilla_io is joining @huggingface 🤗 Time to double down on community, data, and open source AI! So proud of the team, so excited to join a larger mission and amazing company Special thanks to @osanseviero for being such a great partner during the acquisition!…


Daniel Vila Suero Reposted

If you're working on code models you should check out this notebook with distilabel, argilla, and Qwen Coder 2.5. You could use it for use cases like this: - Code generation dataset in a specific domain or language - Code classification dataset - Code retrieval dataset for a…

Tweet Image 1

Daniel Vila Suero Reposted

Synthetic data can be quicker, cleaner and more modern, even without using a SmolLM! Congrats Gradio on the 5.0 release and quick bug fixes! Generate Text Classification or SFT data for free and directly use it in Argilla or the Hugging Face Hub. Demo: buff.ly/3Y1S99z

Tweet Image 1

Daniel Vila Suero Reposted

Journalists, which dataset could we build/improve collaboratively? ✅ Fact-checked news? ✍️ Multilingual writing styles? 👥 User needs classification? @argilla_io's no-code tool makes it simple. Join us! huggingface.co/spaces/Journal… @dvilasuero

Tweet Image 1

Daniel Vila Suero Reposted

‼️ Dataset update ‼️ we just pushed timecoded speech to text to FineVideo. Details in the dataset card 🤓

Open video datasets are badly missing and slowing down the development of open-source video AI. This is why we're excited to introduce 🎥 FineVideo! 43k+ videos/3.4k hours annotated with rich descriptions, narrative details scene splits and QA pairs. huggingface.co/spaces/Hugging…



Daniel Vila Suero Reposted

Inspired by the latest events in Valencia, I'd like to show you how I used the "Disaster Response Messages" dataset to upload a csv file into Argilla to quickly start annotating and identify pleas of help. No code needed. loom.com/share/952c157c…


Daniel Vila Suero Reposted

🌟 Argilla 2.4 is Out – No Code, No Problem! You can now import any of the 230K+ datasets from the Hub without writing a single line of code. Start curating your data in just a few clicks! And because a video speaks louder than words, here’s a quick example:


Daniel Vila Suero Reposted

📢 Build datasets for AI on the @huggingface Hub—10x easier! How it works: 1. Pick a dataset—upload your own or choose from 240K open datasets 2. Paste the dataset ID and set up your labeling interface 3. Share with your team or the whole community! huggingface.co/blog/argilla-u…


Daniel Vila Suero Reposted

Introducing SmolLM2: the new, best, and open 1B-parameter language model. We trained smol models on up to 11T tokens of meticulously curated datasets. Fully open-source Apache 2.0 and we will release all the datasets and training scripts!

Tweet Image 1

Daniel Vila Suero Reposted

For our new OIDA Image Collection, we used #AI to write captions describing the images. But we need your help! Contribute to an open dataset that refines AI models to write better captions. Learn more about our collaboration with @HuggingFace: industrydocuments.ucsf.edu/wpost/oida-col…


Daniel Vila Suero Reposted

🤩 *NEW* feature: preview videos directly in Hugging Face datasets. Easiest dataset vibe checks! 🛠️Recipe to build/consume your video repo: github.com/huggingface/vi…


Daniel Vila Suero Reposted

Should we integrate synthetic data generation workflows into the @argilla_io UI? You describe the dataset in natural language, see some samples, tweak the data gen prompt, build the dataset, label a few samples, add those as few shots, add more human reviews from your team...

🔥Big update to the Synthetic data generator @huggingface Space: Build text classification datasets with natural language 👩 Human-in-the-loop: iterate on prompts and samples and review in @argilla_io ⚙ Tons of configs 🦙 Powered by Llama-3.1. Run on Spaces or locally

Tweet Image 1


Daniel Vila Suero Reposted

🚀 Big update to the Synthetic Data Generator: generate text classification datasets by describing them in natural language! Stop using big and costly LLMs. Start fine-tuning smaller and more efficient models with custom data – generated without a single line of code!


Daniel Vila Suero Reposted

Time to step up you AI builder game. Two live workshops not to miss Tomorrow and Thursday! - Tomorrow, join our Co-founder and CEO @ClementDelangue for a tour of the HF enterprise hub platform: streamyard.com/watch/JS2jHsUP… - Thursday, join @jeffboudier @alvarobartt & @brandon_royal

Tweet Image 1

Daniel Vila Suero Reposted

First time creating an interface and dataset with @argilla_io, it's soooooooo great, thanks @dvilasuero and the team 🖥️


Daniel Vila Suero Reposted

⚡️ LLMs do a good job at NER, but don't you want to do learn how to do more with less? Go from 🐢 -> 🐇 If you want a small model, you need to fine-tune it. Bootstrap with a teacher model Correct mistakes Fine-tune a student model Go brrr: buff.ly/4ebcvlo

Tweet Image 1

Daniel Vila Suero Reposted

How do you release an impactful dataset on the @huggingface Hub? We're enhancing how we track dataset downloads on the Hub, so I wanted to share some common themes I've noticed for datasets with high downloads. 🧵


Loading...

Something went wrong.


Something went wrong.