@_rchaves_ Profile picture

Rogerio Chaves

@_rchaves_

🍷 LLM sommellier 🧙 Open Sourcerer 📊 DSPy Visualization 🚀 Building LangWatch - https://t.co/ez7kW1C6z9 🇳🇱🇧🇷

Similar User
JurassicPark photo

@p2pCapitalist

João Lucas photo

@jlucasps

leetcode never made any sense, if you want to hire good devs, chill pair programming is the best technical interview there is, trial working with the person, in their comfortable editor, ai and all, debate and decide things together, because well, that's what it will be irl


one of the biggest unexpected shifts with AI B2B SaaS v traditional SaaS: companies want to run on prem things we are automating are just too sensitive, not only for enterprise, SMBs too choose portability over scalability, you'll thank me later


Google figured out something big, and it ain’t stopping

Woah, huge news again from Chatbot Arena🔥 @GoogleDeepMind’s just released Gemini (Exp 1121) is back stronger (+20 points), tied #1🏅Overall with the latest GPT-4o-1120 in Arena! Ranking gains since Gemini-Exp-1114: - Overall #3#1 - Overall (StyleCtrl): #5 -> #2 - Hard…

Tweet Image 1


watch llms getting surprised by their own naming conventions

not themselves...

Tweet Image 1
Tweet Image 2
Tweet Image 3
Tweet Image 4


Seems like the other network is getting more and more traction, follow me there too, let’s bring more LLM and DSPy talk there, the place can use some bsky.app/profile/rchave…

I'm late to the other interaction as well. Same @!



anyone trying Windsurf editor? Is the tab completion better than Cursor?


Forget about the wall, this is next phase of AI we are entering into, getting the accuracy on things that matter impressively close to 100%

We just hit a milestone in document processing — 91 page scanned PDF w/ nested tables, handwriting, and complex layout. 10,400 data points extracted with 100% accuracy.

Tweet Image 1


Ouch, I haven’t thought from that perspective but she is totally right, AI should help us be better and more productive, not fake stupid stuff Do not make it easier for people to keep the bullshit, normalize being real instead

Apple Intelligence commercials make it clear that Apple’s latest software is for idiots — and using some of the new tools feels pretty stupid, too. @BridgetCarey points out the problems with early Apple Intelligence features, why you should feel good about turning it off.



that's why you should evaluate on your own data, don't go picking models based on somebody else's benchmark (but if you do chose sonnet 3.5)

the fun thing about designing unconventional benchmarks is that you can instantly see which models were desperately overfit to LMSYS to please managers vs which were focused on raw intelligence (hint, the new 3.5 sonnet)

Tweet Image 1


forget about QA, the best way to find bugs is giving live demos


LLM-as-a-judge is super hard y'all, specially when it's smarter than you it classified "Melanzane alla parmigiana" as not vegetarian, so I went to debug it, what is this hallucinating machine talking about? but it was true, TIL

Tweet Image 1

I've heard more and more people say they are using gemini 1.5 on dev meetups, I guess google finally coming up with easy to use API keys without all the google cloud shenanigans really made a difference


tomorrow I'll present a little hacking I glued together using whisper, ffmpeg, and LLMs to automate my video editing at AI Tinkerers, if you are in Amsterdam, come check it out amsterdam.aitinkerers.org/p/ai-tinkerers…


just tested 4 different AI meeting note taking apps right now, and @getshadowai was by far the best, best user experience by not trying to do much "magic", transcription runs locally on my device, the summary is on pair with more expensive competitors, it's simple and awesome


I know it seems like the field moves very fast, but there is still a long way until good practices becomes mature on the AI industry, so take your time, focus on doing the right thing, and you will be quite ahead already

I've been AI consulting for ~ 2 years. Client: "The AI isn't working in XYZ scenario" Me: "Can we look at a trace together?" ~70%: No traces, no logging ~20%: Log traces, but never look at them ~10%: Actively looking at data Unbelievable alpha in looking at data.



Rogerio Chaves Reposted

Was fascinating to see how MIPRO prompt optimization fared for this pipeline, across six LMs. As much as a 41% increase in quality and a 68% decrease in leakage, straight out of the box. Not bad.

We use DSPy to optimize the prompts for drafting the private prompt and synthesizing the personalized output. After prompt optimization, Llama-3.1-8B performs quite well at using the untrusted 4o-mini as a tool. Retains quality for 85% of the queries and privacy 93% of the time!

Tweet Image 1


Rogerio Chaves Reposted

This meme was a small moment of enlightenment for me a long long while back

Tweet Image 1

is youtube just a ffmpeg wrapper


United States Trends
Loading...

Something went wrong.


Something went wrong.