Daniel Vila Suero
@dvilasueroBuilding @argilla_io (acquired by @huggingface)
Similar User
@Nils_Reimers
@_philschmid
@argilla_io
@GuggerSylvain
@GokuMohandas
@_lewtun
@_inesmontani
@LysandreJik
@ai2_allennlp
@spacy_io
@julien_c
@Thom_Wolf
@hugo_larochelle
@wightmanr
@osanseviero
🔥@argilla_io is joining @huggingface 🤗 Time to double down on community, data, and open source AI! So proud of the team, so excited to join a larger mission and amazing company Special thanks to @osanseviero for being such a great partner during the acquisition!…
If you're working on code models you should check out this notebook with distilabel, argilla, and Qwen Coder 2.5. You could use it for use cases like this: - Code generation dataset in a specific domain or language - Code classification dataset - Code retrieval dataset for a…
Synthetic data can be quicker, cleaner and more modern, even without using a SmolLM! Congrats Gradio on the 5.0 release and quick bug fixes! Generate Text Classification or SFT data for free and directly use it in Argilla or the Hugging Face Hub. Demo: buff.ly/3Y1S99z
Journalists, which dataset could we build/improve collaboratively? ✅ Fact-checked news? ✍️ Multilingual writing styles? 👥 User needs classification? @argilla_io's no-code tool makes it simple. Join us! huggingface.co/spaces/Journal… @dvilasuero
‼️ Dataset update ‼️ we just pushed timecoded speech to text to FineVideo. Details in the dataset card 🤓
Open video datasets are badly missing and slowing down the development of open-source video AI. This is why we're excited to introduce 🎥 FineVideo! 43k+ videos/3.4k hours annotated with rich descriptions, narrative details scene splits and QA pairs. huggingface.co/spaces/Hugging…
Inspired by the latest events in Valencia, I'd like to show you how I used the "Disaster Response Messages" dataset to upload a csv file into Argilla to quickly start annotating and identify pleas of help. No code needed. loom.com/share/952c157c…
🌟 Argilla 2.4 is Out – No Code, No Problem! You can now import any of the 230K+ datasets from the Hub without writing a single line of code. Start curating your data in just a few clicks! And because a video speaks louder than words, here’s a quick example:
Follow @BruleNaudet on @huggingface He's a dataset hero 🦸 huggingface.co/louisbrulenaud…
📢 Build datasets for AI on the @huggingface Hub—10x easier! How it works: 1. Pick a dataset—upload your own or choose from 240K open datasets 2. Paste the dataset ID and set up your labeling interface 3. Share with your team or the whole community! huggingface.co/blog/argilla-u…
Introducing SmolLM2: the new, best, and open 1B-parameter language model. We trained smol models on up to 11T tokens of meticulously curated datasets. Fully open-source Apache 2.0 and we will release all the datasets and training scripts!
For our new OIDA Image Collection, we used #AI to write captions describing the images. But we need your help! Contribute to an open dataset that refines AI models to write better captions. Learn more about our collaboration with @HuggingFace: industrydocuments.ucsf.edu/wpost/oida-col…
🤩 *NEW* feature: preview videos directly in Hugging Face datasets. Easiest dataset vibe checks! 🛠️Recipe to build/consume your video repo: github.com/huggingface/vi…
Should we integrate synthetic data generation workflows into the @argilla_io UI? You describe the dataset in natural language, see some samples, tweak the data gen prompt, build the dataset, label a few samples, add those as few shots, add more human reviews from your team...
🔥Big update to the Synthetic data generator @huggingface Space: Build text classification datasets with natural language 👩 Human-in-the-loop: iterate on prompts and samples and review in @argilla_io ⚙ Tons of configs 🦙 Powered by Llama-3.1. Run on Spaces or locally
🚀 Big update to the Synthetic Data Generator: generate text classification datasets by describing them in natural language! Stop using big and costly LLMs. Start fine-tuning smaller and more efficient models with custom data – generated without a single line of code!
Time to step up you AI builder game. Two live workshops not to miss Tomorrow and Thursday! - Tomorrow, join our Co-founder and CEO @ClementDelangue for a tour of the HF enterprise hub platform: streamyard.com/watch/JS2jHsUP… - Thursday, join @jeffboudier @alvarobartt & @brandon_royal…
First time creating an interface and dataset with @argilla_io, it's soooooooo great, thanks @dvilasuero and the team 🖥️
⚡️ LLMs do a good job at NER, but don't you want to do learn how to do more with less? Go from 🐢 -> 🐇 If you want a small model, you need to fine-tune it. Bootstrap with a teacher model Correct mistakes Fine-tune a student model Go brrr: buff.ly/4ebcvlo
How do you release an impactful dataset on the @huggingface Hub? We're enhancing how we track dataset downloads on the Hub, so I wanted to share some common themes I've noticed for datasets with high downloads. 🧵
United States Trends
- 1. Mike 1,81 Mn posts
- 2. #Arcane 169 B posts
- 3. Serrano 243 B posts
- 4. Jayce 28,5 B posts
- 5. Vander 8.229 posts
- 6. Canelo 17,2 B posts
- 7. MADDIE 13,5 B posts
- 8. #NetflixFight 74,8 B posts
- 9. Father Time 10,8 B posts
- 10. Logan 80,2 B posts
- 11. #netflixcrash 16,7 B posts
- 12. Jinx 83,8 B posts
- 13. He's 58 28,4 B posts
- 14. Boxing 309 B posts
- 15. ROBBED 101 B posts
- 16. Rosie Perez 15,2 B posts
- 17. Shaq 16,6 B posts
- 18. #buffering 11,2 B posts
- 19. Tori Kelly 5.412 posts
- 20. Roy Jones 7.319 posts
Who to follow
-
Nils Reimers
@Nils_Reimers -
Philipp Schmid
@_philschmid -
Argilla
@argilla_io -
Sylvain Gugger
@GuggerSylvain -
Goku Mohandas
@GokuMohandas -
Lewis Tunstall
@_lewtun -
Ines Montani 〰️
@_inesmontani -
Lysandre
@LysandreJik -
AllenNLP
@ai2_allennlp -
spaCy
@spacy_io -
Julien Chaumond
@julien_c -
Thomas Wolf
@Thom_Wolf -
Hugo Larochelle
@hugo_larochelle -
Ross Wightman
@wightmanr -
Omar Sanseviero
@osanseviero
Something went wrong.
Something went wrong.