M Saiful Bari (MARUF) @sbmaruf Twitter Profile

M Saiful Bari (MARUF)

@sbmaruf

@NTU, Singapore, Intern'20,21,22 Amazon Web Inc. (@awscloud), T0, BLOOMZ, UXLA, xCodeEval, I train LLM at SDAIA! - Scaling Maximalist, Core maintainer of ALLaM

3KPosts 630Followers 298Following

Similar User

@JotyShafiq

@ShawLiu12

@yizhongwyz

@sivil_taram

@donglixp

@katherine1ee

@stevenhoi

@HailinChen3

@ChengleiSi

@madiator

@ada_rob

@MakarandTapaswi

@PanLiangming

@TuhinChakr

@mengk20

Pinned

M Saiful Bari (MARUF)

@sbmaruf

2 Jun

May 2024 was a roller coaster! In the first two weeks, our team worked 24/7 to deliver ALLaM, the latest best-in-class Arabic LLM. After 8 months of hard work, it was released at IBM Think's main keynote and is now available on IBM WatsonX! #AI #LLM #IBM #WatsonX Work done with…

M Saiful Bari (MARUF)

@sbmaruf

21 Nov

50k 👀

Dylan Patel

@dylan522p

20 Nov

Deepseek has over 50k Hopper GPUs to be clear. People need to stop acting like they only have that 10k A100 cluster. They are omega cracked on ML research and infra management but they aren't doing it with that many fewer GPUs

M Saiful Bari (MARUF) Reposted

jack morris

@jxmnop

20 Nov

chinese ai researchers are incredibly cracked, and driving a lot of open-source progress > best coding model (Deepseek) > best open-source Multimodal language model (Qwen2-VL) > two of best open-source anything models (Yi-lightning and also Qwen again)

M Saiful Bari (MARUF) Reposted

Akari Asai

@AkariAsai

19 Nov

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚 @uwnlp @allen_ai With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts. Try out our demo! We also introduce ꜱᴄʜᴏʟᴀʀQᴀʙᴇɴᴄʜ,…

M Saiful Bari (MARUF)

@sbmaruf

19 Nov

llama3.1 checkpoints are nothing but troubles. raise your hands if you have good experience with llama3.1.

M Saiful Bari (MARUF)

@sbmaruf

19 Nov

[Hottake] LMSys Arena Performance Correlation on unrelated topics: - Performance is polynomially correlated # of cocky pretrainer that the company has. - Performance is inversely proportional to “Politics per flops”. - Performance is polynomially proportional to “how humble is…

M Saiful Bari (MARUF)

@sbmaruf

15 Nov

mike tyson doesn’t have to win today, he is in the ring at 58yo, he is the winner.

M Saiful Bari (MARUF) Reposted

Boris Power

@BorisMPower

10 Nov

It would be great if someone figured out how to make super hard evals economically a viable business. Every lab would pay 💰

Epoch AI

@EpochAIResearch

8 Nov

3/10 We evaluated six leading models, including Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro. Even with extended thinking time (10,000 tokens), Python access, and the ability to run experiments, success rates remained below 2%—compared to over 90% on traditional benchmarks.

M Saiful Bari (MARUF)

@sbmaruf

8 Nov

😅

Shane Gu

@shaneguML

8 Nov

As a serious angel investor in AI startups, I judge based on their CTO (Chief Tweet Officer). CTO has to be technical.

M Saiful Bari (MARUF)

@sbmaruf

6 Nov

my shit coins are safe 😆 @haidarkk1

M Saiful Bari (MARUF) Reposted

Mir Tafseer Nayeem

@mtnayeem

5 Nov

RAG is the standard choice for retrieving relevant info in the #NLProc & #ComputerVision. But what if there’s no explicit query? 🤔 Our #ACL2024nlp paper presents a new content selection mechanism! 📄💡 With co-lead @FaisalTareque & my mentor @JotyShafiq Key details (1/n) 👇

M Saiful Bari (MARUF) Reposted

Riashat Islam

@riashatislam

5 Nov

Evaluation techniques, reproducibility and reliability have always been close to me. Great to see new works pointing out the challenges with LLM evaluations. This work provides a comprehensive insight into the challenges, limitations and offers suggestions for LLM evals. Needed!

M Saiful Bari (MARUF)

@sbmaruf

4 Nov

🚨 New Paper Alert 🚨 A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations (1/) VCs are putting thousands of $$$ depending of certain benchmarks while they can be easily hacked or misinterpreted. (2/) A…