@sbmaruf Profile picture

M Saiful Bari (MARUF)

@sbmaruf

@NTU, Singapore, Intern'20,21,22 Amazon Web Inc. (@awscloud), T0, BLOOMZ, UXLA, xCodeEval, I train LLM at SDAIA! - Scaling Maximalist, Core maintainer of ALLaM

Similar User
Shafiq Joty photo

@JotyShafiq

Xiao Liu (Shaw) photo

@ShawLiu12

Yizhong Wang photo

@yizhongwyz

Qian Liu photo

@sivil_taram

Li Dong photo

@donglixp

Katherine Lee photo

@katherine1ee

Steven Hoi photo

@stevenhoi

Hailin Chen photo

@HailinChen3

CLS photo

@ChengleiSi

Mahesh Sathiamoorthy photo

@madiator

Adam Roberts photo

@ada_rob

Makarand Tapaswi photo

@MakarandTapaswi

Liangming Pan photo

@PanLiangming

Tuhin Chakrabarty photo

@TuhinChakr

Kevin Meng photo

@mengk20

Pinned

May 2024 was a roller coaster! In the first two weeks, our team worked 24/7 to deliver ALLaM, the latest best-in-class Arabic LLM. After 8 months of hard work, it was released at IBM Think's main keynote and is now available on IBM WatsonX! #AI #LLM #IBM #WatsonX Work done with…

Tweet Image 1
Tweet Image 2

50k 👀

Deepseek has over 50k Hopper GPUs to be clear. People need to stop acting like they only have that 10k A100 cluster. They are omega cracked on ML research and infra management but they aren't doing it with that many fewer GPUs



M Saiful Bari (MARUF) Reposted

chinese ai researchers are incredibly cracked, and driving a lot of open-source progress > best coding model (Deepseek) > best open-source Multimodal language model (Qwen2-VL) > two of best open-source anything models (Yi-lightning and also Qwen again)


M Saiful Bari (MARUF) Reposted

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚 @uwnlp @allen_ai With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts. Try out our demo! We also introduce ꜱᴄʜᴏʟᴀʀQᴀʙᴇɴᴄʜ,…


llama3.1 checkpoints are nothing but troubles. raise your hands if you have good experience with llama3.1.


[Hottake] LMSys Arena Performance Correlation on unrelated topics: - Performance is polynomially correlated # of cocky pretrainer that the company has. - Performance is inversely proportional to “Politics per flops”. - Performance is polynomially proportional to “how humble is…


mike tyson doesn’t have to win today, he is in the ring at 58yo, he is the winner.


M Saiful Bari (MARUF) Reposted

It would be great if someone figured out how to make super hard evals economically a viable business. Every lab would pay 💰

3/10 We evaluated six leading models, including Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro. Even with extended thinking time (10,000 tokens), Python access, and the ability to run experiments, success rates remained below 2%—compared to over 90% on traditional benchmarks.

Tweet Image 1


😅

As a serious angel investor in AI startups, I judge based on their CTO (Chief Tweet Officer). CTO has to be technical.



my shit coins are safe 😆 @haidarkk1


M Saiful Bari (MARUF) Reposted

RAG is the standard choice for retrieving relevant info in the #NLProc & #ComputerVision. But what if there’s no explicit query? 🤔 Our #ACL2024nlp paper presents a new content selection mechanism! 📄💡 With co-lead @FaisalTareque & my mentor @JotyShafiq Key details (1/n) 👇

Tweet Image 1

M Saiful Bari (MARUF) Reposted

Evaluation techniques, reproducibility and reliability have always been close to me. Great to see new works pointing out the challenges with LLM evaluations. This work provides a comprehensive insight into the challenges, limitations and offers suggestions for LLM evals. Needed!

🚨 New Paper Alert 🚨 A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations (1/) VCs are putting thousands of $$$ depending of certain benchmarks while they can be easily hacked or misinterpreted. (2/) A…

Tweet Image 1


it's a great sign that he has started tweeting...!!!!! 💙


Loading...

Something went wrong.


Something went wrong.