@_josh_meyer_ Profile picture

Josh Meyer

@_josh_meyer_

https://t.co/PxzriWj2jt

Similar User
Samuele Cornell photo

@SamueleCornell

arXiv Sound photo

@ArxivSound

INTERSPEECH 2025 photo

@ISCAInterspeech

SpeechBrain photo

@SpeechBrain1

BUT Speech photo

@ButSpeech

Shinji Watanabe photo

@shinjiw_at_cmu

Wei-Ning Hsu photo

@mhnt1580

Neil Zeghidour photo

@neilzegh

Eduardo Fonseca photo

@edfonseca_

Mirco Ravanelli photo

@mirco_ravanelli

AlphaCephei photo

@alphacep

WAVLab | @CarnegieMellon photo

@WavLab

Desh Raj photo

@rdesh26

erogol photo

@erogol

Hervé

@hbredin

Pinned

My dissertation as a podcast :)


Josh Meyer Reposted

After spending some hours on F5, I found passion to finalize this small post. I'm telling this for quite some time already though. alphacephei.com/nsh/2024/10/18…


Josh Meyer Reposted

Awesome new project: Whisper Turbo MLX by Josef Albers. A clean, single file (< 250 lines), and blazing fast implementation of Whisper Turbo in MLX:

Tweet Image 1

190ms TTFB 👀

Today we’re introducing our latest Text-To-Speech model, Play 3.0 mini. It’s faster, more accurate, handles multiple languages, supports streaming from LLMs, and it’s more cost-efficient than ever before. Try it out here: play.ht/playground/?ut…

Tweet Image 1


Josh Meyer Reposted

Inspired by the @AIatMeta's Chameleon and Llama Herd papers, llama3-s (Ichigo) is an early-fusion, audio and text, multimodal model. We're experimenting with this research entirely in the open, with an open-source codebase, open data, and open weights. 2/10

Tweet Image 1

Josh Meyer Reposted

3 steps to run @huggingface "Parler TTS" AI Voice on your local machine. New tutorial video out now 😊! My step-by-step technical tutorial is now available on my "Thorsten-Voice" youtube channel. youtu.be/1X2LxAGn9tU


Josh Meyer Reposted

We just released Pixtral 12B paper on Arxiv: arxiv.org/abs/2410.07073

Tweet Image 1

Josh Meyer Reposted

🍏 Apple ML research in Paris has multiple open internship positions!🍎 We are looking for Ph.D. students interested in generative modeling, optimization, large-scale learning or uncertainty quantification, with applications to challenging scientific problems. Details below 👇


Josh Meyer Reposted

I’ll be presenting a deep dive into how Moshi works at the next NLP Meetup in Paris, this Wednesday the 9th at 7pm. Register if you want to attend ! 🧩🔎🟢 meetup.com/fr-FR/paris-nl…


impressive

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in…



👀

Looking forward to tomorrow … 👀



Josh Meyer Reposted

Under-appreciated that Moshi (by @kyutai_labs) is a big simplification over more traditional speech-to-speech pipelines. It's really just two models: - A speech encoder/decoder (like EnCodec) - An LLM (trained to input and output speech tokens) Traditionally building something…


Josh Meyer Reposted

``MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages,'' Marco Gaido, Sara Papi, Luisa Bentivogli, Alessio Brutti, Mauro Cettolo, Roberto Gretter, Marco Matassoni, Mohamed Nabih, Matteo Negri, ift.tt/visfyaK


Today I let my team know that I'll be leaving Rabbit. I'm immensely grateful to have worked with such a driven team. We made strides in pushing the boundaries of AI in everyday life, and we consistently shipped at high velocity with excellent partners. I want to thank my team…


Josh Meyer Reposted

My key takeaways from the first 17 pages of the Moshi technical report, which details the models and architecture (a thread):

Tweet Image 1

Josh Meyer Reposted

Behold: NeMo ASR now runs easily 2000-6000 faster than realtime (RTFx) on @nvidia GPU. We developed a series of optimizations to make RNN-T, TDT, and CTC models go brrrrrrr!🔥 In addition to topping the HF Open ASR Leaderboard they are now fast and cheap. All in pure PyTorch!

Tweet Image 1
Tweet Image 2

Josh Meyer Reposted

New paper: efficient multimodal machine translation training (EMMeTT). It's a milestone on our road to providing everybody with multimodal foundation model training infra. Result: single multimodal model handles both speech and text translation without loss of NMT performance.

Tweet Image 1

Josh Meyer Reposted

I'm excited to share that Pindo Voice AI is now in beta! After sending 120M+ texts, we found that SMS & USSD are hard to access for many in Africa. @pindoio helps African businesses engage customers in their native languages. Join waitlist - pindo.ai/waitlist


Josh Meyer Reposted

We're releasing updated versions of Command R (35B) and Command R+ (104B). Command R (now with GQA) in particular should perform significantly better multilingually. 🤗 model weights: - ⌘ R 08-2024: huggingface.co/CohereForAI/c4… - ⌘ R+ 08-2024: huggingface.co/CohereForAI/c4…

We’re releasing improved versions of the Command R series, our enterprise-grade AI models optimized for business use cases. You can access them on our API, @awscloud Sagemaker, and additional platforms soon. cohere.com/blog/command-s…



Josh Meyer Reposted

Hi all, This is the third call for papers about the SynData4GenAI workshop. Good news! While the submission data was originally due on June 18th, we'll extend it to June 24th. Please submit your papers at syndata4genai.org We look forward to your submissions!

This is the second call for papers about the SynData4GenAI workshop. Please mark your calendar for the submission due date (June 18, 2024, after the Interspeech acceptance notification)! I'm also pasting the CFP.

Tweet Image 1


Loading...

Something went wrong.


Something went wrong.