@vithursant19 Profile picture

Vithu Thangarasa

@vithursant19

Machine Learning Research at @CerebrasSystems, previously at @Tesla and @UberAILabs, and former grad student at @uoguelph_mlrg and @VectorInst. Thamilan ௐ.

Joined August 2018
Similar User
Danijar Hafner photo

@danijarh

Hattie Zhou photo

@oh_that_hat

prof-g photo

@robertghrist

Anca Dragan photo

@ancadianadragan

Peyman Milanfar photo

@docmilanfar

François Fleuret photo

@francoisfleuret

Greg Yang photo

@TheGregYang

Aidan Gomez photo

@aidangomez

Kevin Zakka photo

@kevin_zakka

Taco Cohen photo

@TacoCohen

Sanjeev Arora photo

@prfsanjeevarora

Maithra Raghu photo

@maithra_raghu

Michael Black photo

@Michael_J_Black

Miles Cranmer photo

@MilesCranmer

Devi Parikh photo

@deviparikh

Pinned

🚀 2100+ tokens/s with Llama3.1-70B Instruct on @CerebrasSystems Wafer Scale Engine—Zero Loss in Model Quality! Proud to hit this milestone in LLM inference speed, setting a new standard in AI hardware performance. Grateful for the team effort! Let's keep pushing limits with ML…

Tweet Image 1

🚨 Cerebras Inference is now 3x faster: Llama3.1-70B just broke 2,100 tokens/s - 16x faster than the fastest GPU solution - 8x faster than GPUs running Llama *3B* - It's like the perf of a new hardware generation in a single software release Available now at…



Vithu Thangarasa Reposted

2:4 Sparsity + @AIatMeta Llama-3.1: At @neuralmagic, we've developed a recipe to produce very competitive sparse LLMs, and we are starting by open-sourcing the first one: Sparse-Llama-3.1-8B-2of4. We also show how to leverage it for blazingly fast inference in @vllm_project

Tweet Image 1
Tweet Image 2

Vithu Thangarasa Reposted

Here is what instant 405B looks like: Cerebras vs. fastest GPU cloud:


Vithu Thangarasa Reposted

Llama 3.1 405B is now running on Cerebras! – 969 tokens/s, frontier AI now runs at instant speed – 12x faster than GPT-4o, 18x Claude, 12x fastest GPU cloud – 128K context length, 16-bit weights – Industry’s fastest time-to-first token @ 240ms

Tweet Image 1

Vithu Thangarasa Reposted

Cerebras is capable of offering Llama 3.1 405B at 969 output tokens/s and they have announced they will soon be offering a public inference endpoint 🏁 We have independently benchmarked a private endpoint shared by @CerebrasSystems and have measured 969 output tokens/s, >10X…

Tweet Image 1

Vithu Thangarasa Reposted

There are 4500+ NeurIPS papers... 🤯 The NeurIPS Navigator lets you search, summarize and instantly chat with the 4500+ papers accepted into NeurIPS 2024, powered by Llama3.1-70b on Cerebras. 👉 neurips.cerebras.ai


Vithu Thangarasa Reposted

Last week, I spoke at @CerebrasSystems's Llamapalooza in front of 400+ people But the day before they dropped a huge announcement Llama3.1-70b at 2148 tokens / second 🤯 The morning of, I decided to drop everything and make three open-source demos from scratch 👇


Vithu Thangarasa Reposted

Congrats @andrewdfeldman and @CerebrasSystems for a huge leap forward and setting a new speed record for serving Llama 3.1-70B. 2100 tokens/sec is blazingly fast for a 70B model. This is great for agentic AI!

🚨 Cerebras Inference is now 3x faster: Llama3.1-70B just broke 2,100 tokens/s - 16x faster than the fastest GPU solution - 8x faster than GPUs running Llama *3B* - It's like the perf of a new hardware generation in a single software release Available now at…



Vithu Thangarasa Reposted

Cerebras has launched a major upgrade and is now achieving >2,000 output token/s on Llama 3.1 70B, >3x their prior speeds This is a dramatic new world record for language model inference. @CerebrasSystems' language model inference offering runs on their custom "wafer scale" AI…

Tweet Image 1

Vithu Thangarasa Reposted

Embrace the speed. Let's go fast and far, together! 🧡 @CerebrasSystems

🚨 Cerebras Inference is now 3x faster: Llama3.1-70B just broke 2,100 tokens/s - 16x faster than the fastest GPU solution - 8x faster than GPUs running Llama *3B* - It's like the perf of a new hardware generation in a single software release Available now at…



Vithu Thangarasa Reposted

🚨 Cerebras Inference is now 3x faster: Llama3.1-70B just broke 2,100 tokens/s - 16x faster than the fastest GPU solution - 8x faster than GPUs running Llama *3B* - It's like the perf of a new hardware generation in a single software release Available now at…


Vithu Thangarasa Reposted

This is how I imagine the future with AR/VR


Vithu Thangarasa Reposted

I’ve walked through poor neighbourhoods in India, Africa and LatAm many times. Yet, I recently walked through one of the most depressing ones in terms of poverty, drug abuse, and sheer hopelessness: San Francisco. Giant tech AI companies promise to make the world a better…

Tweet Image 1

Vithu Thangarasa Reposted

1/5 - Our paper "Self-Data Distillation for Pruned LLMs" has been accepted at NeurIPS 2024 Workshop on Machine Learning and Compression {neuralcompression.github.io/workshop24} organized by @nyuniversity, @AIatMeta, @UCIrvine Paper: arxiv.org/abs/2410.09982

Tweet Image 1

Vithu Thangarasa Reposted

Announcing Llamapalooza NYC on Oct 25! 🦙 Join Cerebras for a one-of-a-kind event around fine-tuning and using llama models in production! Headliners include talks from Hugging Face, Cerebras, Crew AI. We'll also have food and drinks 🍹🍟 RSVP here: lu.ma/d3e81idy

Tweet Image 1
Tweet Image 2
Tweet Image 3
Tweet Image 4

Vithu Thangarasa Reposted

BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”

Tweet Image 1

Vithu Thangarasa Reposted

Cerebras continues to deliver output speed improvements, breaking the 2,000 tokens/s barrier on Llama 3.1 8B and 550 tokens/s on 70B Since launching less than a month ago, @CerebrasSystems has continued to improve output speed inference performance on their custom chips. We…

Tweet Image 1

Vithu Thangarasa Reposted

🚨 Major perf update: Llama3.1-70B now runs at 560 tokens/s 24% faster in 3 weeks Available now on Cerebras Inference API and chat inference.cerebras.ai


Vithu Thangarasa Reposted

🚀 Introducing the Dream Machine API. Developers can now build and scale creative products with the world's most popular and intuitive video generation model without building complex tools in their apps. Start today lumalabs.ai/dream-machine/… #LumaDreamMachine


With @CerebrasSystems' Llama3.1-8B now at 1927 tokens/s (up from 1800) and Llama3.1-70B reaching 481 tokens/s (up from 450), it's clear that not all Llama3.1 models are created equal—we remain the most accurate and fastest inference provider in the world! 🥇 🚀 To ensure the…

Tweet Image 1

Loading...

Something went wrong.


Something went wrong.