Richard Diehl Martinez @richarddm1 Twitter Profile

Richard Diehl Martinez

@richarddm1

CS PhD at University of Cambridge. Previously Applied Scientist @Amazon, MS/BS @Stanford.

Joined December 2009

45Posts 139Followers 996Following

Similar User

@christinedekock

@ZhijiangG

@dominsta_nlp

@anitaveroe

@RamintaIeva

@HelenaXie_

@CurriedAmanda

@pietro_lesci

@Ramiyaly

@MoyYuan

@guyaglionby

@Eric_chamoun

@Tugelezi

@brianleixia

Richard Diehl Martinez

@richarddm1

21 Nov

The golden pacifier has made it back to Cambridge! Big thank you to the #babylm and #connl teams for the award and a great workshop! Checkout the paper: arxiv.org/pdf/2410.22906

Small language models are worse than large models because they have less parameters ... duh! Well not so fast. In a recent #EMNLP2024 paper, we find that small models have WAY less stable learning trajectories which leads them to underperform.📉 arxiv.org/abs/2410.11451

Richard Diehl Martinez Reposted

Pietro Lesci

@pietro_lesci

11 Nov

🙋‍♂️My Lab mates are @emnlpmeeting this week: drop by their posters! If you want to know more about our recent work [1] on small language models, catch @richarddm1 who will answer all your questions! 🧑‍💻#EMNLP2024 #NLProc [1]: arxiv.org/abs/2410.11451

CambridgeNLP

@cambridgenlp

11 Nov

Our Lab is presenting at @emnlpmeeting this week. Come chat with us! #EMNLP2024 #NLProc

Richard Diehl Martinez

@richarddm1

12 Mar

What's new in NLP this week? Kingma (better known as the Adam Optimizer guy) published a paper showing that Diffusion Models as well as Flow Models are basically combinations of VAE models (Variational Auto-Encoders) behind the scenes. Turns out Kingma also wrote the original…

Richard Diehl Martinez

@richarddm1

5 Mar

It’s well known that training and serving AI models requires a lot of energy — what’s less obvious is that they require lots of water for cooling. Karen Hao (whose writing style I like) has a piece out about a server complex in Arizona that hosts OpenAI models. Also, Anthropic’s…

Richard Diehl Martinez

@richarddm1

27 Feb

If Gemini wasn’t interesting enough for you, Google this week published a paper where they show how to train a language model as a universal regressor. Regression is the bread-and-butter of machine learning, and they propose that you can use a language model to perform arbitrary…

Richard Diehl Martinez

@richarddm1

20 Feb

Last week, Google released it’s new consumer “Gemini” product (what used to be BARD). This week Google released some analysis about its performance - the main headline: it can do context lengths of around 10 million tokens. nlpinsightfilter.substack.com/p/nlp-insight-…

Richard Diehl Martinez

@richarddm1

13 Feb

A paper this week analysis the scaling properties of your predictions as you increase the number of LLM agents — the title of the paper makes the conclusion pretty obvious “more agents is all you need”. nlpinsightfilter.substack.com/p/nlp-insight-…

Richard Diehl Martinez

@richarddm1

9 Feb

As you might now, state space models like Mamba are the new cool kid on the block. Although they seem to be all the rage, a recent paper shows that these types of models are worse than transformers at tasks that require copying text. open.substack.com/pub/nlpinsight…

Richard Diehl Martinez

@richarddm1

31 Jan

Microsoft has a new way of doing PEFT that they call SliceGPT. The idea is to compute the PCA of certain weight matrices, and slice off the ‘un-important’ dimensions (roughly 25% of params can be cut off). I talk about it in my substack this week: nlpinsightfilter.substack.com/p/nlp-insight-…

Richard Diehl Martinez

@richarddm1

26 Jan

Ever wonder whether people posting their papers on X really helps them become more popular? Answer: yeah, that’s sort of obvious. But how much more popular? Turns out 2-3 times more popular — kind of crazy. Checkout the details on my bi-weekly substack: nlpinsightfilter.substack.com/p/nlp-insight-…

Richard Diehl Martinez

@richarddm1

23 Jan

Check out my substack for this week's hot NLP research takes NLP Insight Filter [Jan 23 2024] open.substack.com/pub/nlpinsight…

Richard Diehl Martinez

@richarddm1

17 Jan

To my new X/Twitter friends - I've been publishing a bi-weekly substack newsletter on NLP research papers. Check out my most recent post (and consider subscribing): NLP Insight Filter [Jan 17 2024] open.substack.com/pub/nlpinsight…

Richard Diehl Martinez Reposted

ELLIS Amsterdam

@Ellis_Amsterdam

13 Dec

🧐 Curious about diverse, human-centered perspectives on cross-lingual models? Join the HumanCLAIM workshop ✨ 📆 11 Jan ‘24 📍Amsterdam We'll dive into boosting diversity in language technology! 🔗clap-lab.github.io/workshop

Richard Diehl Martinez Reposted

Leshem Choshen 🤖🤗

@LChoshen

8 Dec

Best Negative results Curriculum learning methods are hard to get right But their advantages stack Tooons of exps. A thesis could not be summed into a tweet: arxiv.org/abs/2311.08886 @richarddm1 Zebulon Goriely @hope_mcgovern @c_davis90 Paula Buttery @lisabeinborn

Richard Diehl Martinez Reposted

Tengyu Ma

@tengyuma

25 May 2023

Releasing the code of Sophia 😀, a new optimizer (⬇️）. code: github.com/Liuhong99/Soph… twitter.com/tengyuma/statu…

Tengyu Ma

@tengyuma

24 May 2023

Adam, a 9-yr old optimizer, is the go-to for training LLMs (eg, GPT-3, OPT, LLAMA). Introducing Sophia, a new optimizer that is 2x faster than Adam on LLMs. Just a few more lines of code could cut your costs from $2M to $1M (if scaling laws hold). arxiv.org/abs/2305.14342 🧵⬇️