@theshawwn Profile picture

Shawn Presser

@theshawwn

Looking for AI work. DMs open. ML discord: https://t.co/2J63isabrY projects: https://t.co/6XsuoK4lu0

Joined January 2009
Similar User
Jürgen Schmidhuber photo

@SchmidhuberAI

Chip Huyen photo

@chipro

Tim Dettmers photo

@Tim_Dettmers

Yannic Kilcher 🇸🇨 photo

@ykilcher

Georgi Gerganov photo

@ggerganov

Leo Gao photo

@nabla_theta

Jonathan Frankle photo

@jefrankle

Sam Bowman photo

@sleepinyourhat

Eric Jang photo

@ericjang11

EleutherAI photo

@AiEleuther

Abhi Venigalla photo

@ml_hardware

andy jones photo

@andy_l_jones

anton photo

@abacaj

ML Collective photo

@ml_collective

William Falcon ⚡️ photo

@_willfalcon

Pinned

Last night, someone asked me what I've been up to since 2010. My reply turned into a short autobiography. I considered deleting it, but people encouraged me to post it instead: gist.github.com/shawwn/3110ab6… If you're unhappy with your life, it's important to believe you can fix it.


Shawn Presser Reposted

Seven years ago, the paper Attention is all you need introduced the Transformer architecture. The world of deep learning has never been the same since then. Transformers are used for every modality nowadays. Despite their nearly universal adoption, especially for large language…

A_K_Nain's tweet image. Seven years ago, the paper Attention is all you need introduced the Transformer architecture. The world of deep learning has never been the same since then. Transformers are used for every modality nowadays.
Despite their nearly universal adoption, especially for large language…
A_K_Nain's tweet image. Seven years ago, the paper Attention is all you need introduced the Transformer architecture. The world of deep learning has never been the same since then. Transformers are used for every modality nowadays.
Despite their nearly universal adoption, especially for large language…
A_K_Nain's tweet image. Seven years ago, the paper Attention is all you need introduced the Transformer architecture. The world of deep learning has never been the same since then. Transformers are used for every modality nowadays.
Despite their nearly universal adoption, especially for large language…
A_K_Nain's tweet image. Seven years ago, the paper Attention is all you need introduced the Transformer architecture. The world of deep learning has never been the same since then. Transformers are used for every modality nowadays.
Despite their nearly universal adoption, especially for large language…

Shawn Presser Reposted

The Transformer architecture has changed surprisingly little from the original paper in 2017 (over 7 years ago!). The diff: - The nonlinearity in the MLP has undergone some refinement. Almost every model uses some form of gated nonlinearity. A silu or gelu nonlinearity is…


Shawn Presser Reposted

Report of a conversation with Terence Tao when he was seven. (via @cremieuxrecueil)

paulg's tweet image. Report of a conversation with Terence Tao when he was seven. (via @cremieuxrecueil)

Surprisingly accurate

theshawwn's tweet image. Surprisingly accurate

bro this guy is brutal

samlakig's tweet image. bro this guy is brutal


Shawn Presser Reposted

Eigenvalues & Eigenvectors clearly explained:


Happy to see Docker falling out of fashion. I’ve always carefully avoided it. Back at peak mania, your coworkers would think you were a fossil at best. But I’ve seen many waves come and go.

stop using docker pls

Purring_Lynx's tweet image. stop using docker pls


Shawn Presser Reposted

one of my favorite examples even if a bit niche was voice conversion models - something which was basically ~solved 2 years ago, took a full 6-12 months to get from there to end-users actually using it, and then produce videos this good w a 10min finetune: youtube.com/watch?v=tJjhOb…


Loading...

Something went wrong.


Something went wrong.