@richarddm1 Profile picture

Richard Diehl Martinez

@richarddm1

CS PhD at University of Cambridge. Previously Applied Scientist @Amazon, MS/BS @Stanford.

Joined December 2009
Similar User
Christine de Kock photo

@christinedekock

Zhijiang Guo photo

@ZhijiangG

Dominik Stammbach photo

@dominsta_nlp

Anita Verő photo

@anitaveroe

Ieva Raminta @ievaraminta.bsky.social photo

@RamintaIeva

Helena Xie photo

@HelenaXie_

Amanda Cercas Curry photo

@CurriedAmanda

Pietro Lesci photo

@pietro_lesci

Rami Aly photo

@Ramiyaly

Moy Yuan photo

@MoyYuan

Guy Aglionby photo

@guyaglionby

Eric photo

@Eric_chamoun

JuliusPrince photo

@Tugelezi

Lei Xia photo

@brianleixia

The golden pacifier has made it back to Cambridge! Big thank you to the #babylm and #connl teams for the award and a great workshop! Checkout the paper: arxiv.org/pdf/2410.22906

Tweet Image 1

Small language models are worse than large models because they have less parameters ... duh! Well not so fast. In a recent #EMNLP2024 paper, we find that small models have WAY less stable learning trajectories which leads them to underperform.📉 arxiv.org/abs/2410.11451


Richard Diehl Martinez Reposted

🙋‍♂️My Lab mates are @emnlpmeeting this week: drop by their posters! If you want to know more about our recent work [1] on small language models, catch @richarddm1 who will answer all your questions! 🧑‍💻#EMNLP2024 #NLProc [1]: arxiv.org/abs/2410.11451

Our Lab is presenting at @emnlpmeeting this week. Come chat with us! #EMNLP2024 #NLProc

Tweet Image 1


What's new in NLP this week? Kingma (better known as the Adam Optimizer guy) published a paper showing that Diffusion Models as well as Flow Models are basically combinations of VAE models (Variational Auto-Encoders) behind the scenes. Turns out Kingma also wrote the original…


It’s well known that training and serving AI models requires a lot of energy — what’s less obvious is that they require lots of water for cooling. Karen Hao (whose writing style I like) has a piece out about a server complex in Arizona that hosts OpenAI models. Also, Anthropic’s…


If Gemini wasn’t interesting enough for you, Google this week published a paper where they show how to train a language model as a universal regressor. Regression is the bread-and-butter of machine learning, and they propose that you can use a language model to perform arbitrary…


Last week, Google released it’s new consumer “Gemini” product (what used to be BARD). This week Google released some analysis about its performance - the main headline: it can do context lengths of around 10 million tokens. nlpinsightfilter.substack.com/p/nlp-insight-…


A paper this week analysis the scaling properties of your predictions as you increase the number of LLM agents — the title of the paper makes the conclusion pretty obvious “more agents is all you need”. nlpinsightfilter.substack.com/p/nlp-insight-…


As you might now, state space models like Mamba are the new cool kid on the block. Although they seem to be all the rage, a recent paper shows that these types of models are worse than transformers at tasks that require copying text. open.substack.com/pub/nlpinsight…


Microsoft has a new way of doing PEFT that they call SliceGPT. The idea is to compute the PCA of certain weight matrices, and slice off the ‘un-important’ dimensions (roughly 25% of params can be cut off). I talk about it in my substack this week: nlpinsightfilter.substack.com/p/nlp-insight-…


Ever wonder whether people posting their papers on X really helps them become more popular? Answer: yeah, that’s sort of obvious. But how much more popular? Turns out 2-3 times more popular — kind of crazy. Checkout the details on my bi-weekly substack: nlpinsightfilter.substack.com/p/nlp-insight-…


Check out my substack for this week's hot NLP research takes NLP Insight Filter [Jan 23 2024] open.substack.com/pub/nlpinsight…


To my new X/Twitter friends - I've been publishing a bi-weekly substack newsletter on NLP research papers. Check out my most recent post (and consider subscribing): NLP Insight Filter [Jan 17 2024] open.substack.com/pub/nlpinsight…


Richard Diehl Martinez Reposted

🧐 Curious about diverse, human-centered perspectives on cross-lingual models? Join the HumanCLAIM workshop ✨ 📆 11 Jan ‘24 📍Amsterdam We'll dive into boosting diversity in language technology! 🔗clap-lab.github.io/workshop


Richard Diehl Martinez Reposted

Best Negative results Curriculum learning methods are hard to get right But their advantages stack Tooons of exps. A thesis could not be summed into a tweet: arxiv.org/abs/2311.08886 @richarddm1 Zebulon Goriely @hope_mcgovern @c_davis90 Paula Buttery @lisabeinborn


Richard Diehl Martinez Reposted

Releasing the code of Sophia 😀, a new optimizer (⬇️). code: github.com/Liuhong99/Soph… twitter.com/tengyuma/statu…

Adam, a 9-yr old optimizer, is the go-to for training LLMs (eg, GPT-3, OPT, LLAMA). Introducing Sophia, a new optimizer that is 2x faster than Adam on LLMs. Just a few more lines of code could cut your costs from $2M to $1M (if scaling laws hold). arxiv.org/abs/2305.14342 🧵⬇️

Tweet Image 1


Loading...

Something went wrong.


Something went wrong.