Prompt @engineerrprompt Twitter Profile

Prompt

@engineerrprompt

Creator of localGPT | Building something cool! Generative AI, Tech, Arts, Life!

430Posts 2KFollowers 705Following

Prompt

@engineerrprompt

6 h

This is so true. Agents have a long way to go to get consistent performance. My work flow usually involves hard-coded/hand crafted states along with agents.

The biggest blocker for AI agents? It is performance quality, BY FAR For all the talk of cost, latency, safety... the fact is, most people are still just struggling to get agents to work 🤷‍♀️ Full survey here: langchain.com/stateofaiagents

Prompt

@engineerrprompt

14 Nov

It's great to see the shift towards visual RAG with the new embedding model from @voyageAi The OG was the intro of ColPali. If you are looking for a complete local solution for visual RAG, checkout localgpt-vision, which builds on top of colpali/qwenpali github.com/PromtEngineer/…

Voyage AI

@VoyageAI

12 Nov

📢 Announcing voyage-multimodal-3, our first multimodal embedding model! It vectorizes interleaved text & images, capturing key visual features from screenshots of PDFs, slides, tables, figures, etc. +19.63% accuracy gain on 3 multimodal retrieval tasks (20 datasets)! 🧵🧵

GitHub - PromtEngineer/localGPT-Vision: Chat with your documents using Vision Language Models. This...

Source: https://t.co/BlpESNEimw

Prompt

@engineerrprompt

13 Nov

Qwen-2.5-32B-Instruct from @Alibaba_Qwen is the best in class open weight model and trending on @huggingface How is it building webapps with it? I put it to the test in my new video. youtu.be/KYvVl0UT1Sk

Prompt

@engineerrprompt

13 Nov

Updates to localgpt-vision coming soon. It's built on top of byaldi and the fantastic work on VLM community for vision based end to end multimodal RAG.

Benjamin Clavié

@bclavie

13 Nov

🐭byaldi v0.0.7 is out! It's been a while, so time for a re-introduction: 🐭byaldi is a library that makes it ridiculously easy to use models with complex mechanisms, so you don't have to understand what a "late-interaction multi-modal VLM-based retrievers" * is before you can…

Prompt

@engineerrprompt

11 Nov

Qwen models are one of the best open weight models and are often missing from comparisons when other frontier labs release new models ( that should tell you a lot). They just released the Qwen2.5 32B coding model. Check it out here. qwenlm.github.io/blog/qwen2.5-c…

Qwen2.5-Coder Series: Powerful, Diverse, Practical.

Source: https://t.co/hoyLImSOCc

Prompt

@engineerrprompt

11 Nov

This is so true. The key to a good RAG system is understanding your own data. It's something folks shy away from. GenAI is not magic. Spend time understanding your data and the use case, and then implement a custom RAG system if you need one.

Jo Kristian Bergum

@jobergum

11 Nov

x.com/i/article/1855…

Prompt

@engineerrprompt

8 Nov

Learn how to use the multimodal llama 3.2 from @AIatMeta for building a vision based RAG system thanks to the recent integration with @ollama Here is a video to get you started youtu.be/45LJT-bt500

Prompt

@engineerrprompt

6 Nov

Primaries are important!

Prompt

@engineerrprompt

28 Oct

Biggest mistakes i have seen in RAG systems. When "chunking your documents" for RAG, make sure you are chunking them on "tokens," not "characters". You are welcome 😊

Prompt

@engineerrprompt

24 Oct

What is happening! Never a dull moment. @Microsoft released OmniParser "OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent."

Maziyar PANAHI

@MaziyarPanahi

24 Oct

Microsoft just dropped OmniParser model on ⁦@huggingface⁩, so casually! 😂 “OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent.” 🔥 huggingface.co/microsoft/Omni…

Prompt

@engineerrprompt

24 Oct

Sometimes, I do want to pull my hair when I don't "feel the AGI" while coding!!!

Anton Osika

@antonosika

24 Oct

coding with AI unfortunately doesn't make you "feel the AGI"

Prompt

@engineerrprompt

23 Oct

This is an interesting idea and definitely a much needed tool. Not only for ownership but also for figuring out what is real vs. synthetic!

Google DeepMind

@GoogleDeepMind

23 Oct

Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit. Available freely to developers and businesses, it will help them identify their AI-generated content. 🔍 Find out more → goo.gle/40apGQh

Prompt

@engineerrprompt

22 Oct

You can read through the blog posts to learn what @AnthropicAI released or can just watch my video with a quick summary of what was released today. This will make me happy 😊 youtu.be/lnWrF-xcwq0

Anthropic

@AnthropicAI

22 Oct

We believe these developments will open up new possibilities for how you work with Claude, and we look forward to seeing what you'll create. Read the updates in full: anthropic.com/news/3-5-model…

Prompt

@engineerrprompt

22 Oct

Hard to say what is real anymore! You can run this SOTA Video generation model IF you have access to at least 4 H100 GPUs but hopefully we will have smaller models soon.

Genmo

@genmoai

22 Oct

Introducing Mochi 1 preview. A new SOTA in open-source video generation. Apache 2.0. magnet:?xt=urn:btih:441da1af7a16bcaa4f556964f8028d7113d21cbb&dn=weights&tr=udp://tracker.opentrackr.org:1337/announce

Prompt

@engineerrprompt

22 Oct

Wow, now Claude can use your computer!

Anthropic

@AnthropicAI

22 Oct

We believe these developments will open up new possibilities for how you work with Claude, and we look forward to seeing what you'll create. Read the updates in full: anthropic.com/news/3-5-model…

Prompt

@engineerrprompt

19 Oct

I approve this message 😀

elvis

@omarsar0

19 Oct

Learn to code

Prompt

@engineerrprompt

18 Oct

Friday releases is a new thing now :) A multimodal model from @AIatMeta Text/Speech input and text/Speech output! Here goes my weekend ;)

AI at Meta

@AIatMeta

18 Oct

Today we released Meta Spirit LM — our first open source multimodal language model that freely mixes text and speech. Many existing AI voice experiences today use ASR to techniques to process speech before synthesizing with an LLM to generate text — but these approaches…

Prompt

@engineerrprompt

17 Oct

Curious who is actively using the @OpenAI GPT-4o advanced voice on daily basis? What is your use case? I probably use it once or twice a week at most, that also just for looking up some info, not conversations.