Matthias Minderer @MJLM3 Twitter Profile

Matthias Minderer

@MJLM3

Research Scientist at @GoogleResearch.

74Posts 486Followers 89Following

Similar User

@janundnik

@barret_zoph

@m__dehghani

@XiaohuaZhai

@noahdgoodman

@DaniYogatama

@PreetumNakkiran

@vansteenkiste_s

@OlivierBachem

@liuzhuang1234

@KL_Div

@SadhikaMalladi

@adam_golinski

@brainshawn

@xiuyu_l

Matthias Minderer Reposted

Daniel Keysers

@keysers

2 Oct

Vision LM inference without accelerators: Gemma.cpp (open source inference for CPU) now supports PaliGemma! If you're at #ECCV2024, @AndreasPSteiner will demo it tomorrow (Thu Oct 3) at 10:30 at the Google booth. Github: github.com/google/gemma.c…

Matthias Minderer Reposted

Ibrahim Alabdulmohsin | إبراهيم العبدالمحسن

@ibomohsin

6 Feb

How is next-token prediction capable of such intelligent behavior? I’m very excited to share our work, where we study the fractal structure of language. TLDR: thinking of next-token prediction in language as “word statistics” is a big oversimplification! arxiv.org/abs/2402.01825

Matthias Minderer Reposted

Niels Rogge

@NielsRogge

13 Oct 2023

Excited to share that @Google's OWLv2 model is now available in 🤗 Transformers! This model is one of the strongest zero-shot object detection models out there, improving upon OWL-ViT v1 which was released last year🔥 How? By self-training on web-scale data of over 1B examples⬇️

Matthias Minderer Reposted

Thomas Kipf

@tkipf

2 Oct 2023

I'll give a talk on object-centric models for video and 3D at the @ICCVConference Workshop on Large-scale Video Object Segmentation! Today @ 3:30pm (Room S02) Website: youtube-vos.org/challenge/2023/ I'll cover DORSal (see below) & recent work from our team on structured video models.

Thomas Kipf

@tkipf

26 Jun 2023

Excited to announce DORSal: a 3D structured diffusion model for generation and object-level editing of 3D scenes. DORSal is “geometry-free” and learns 3D scene structure purely from data – no expensive volume rendering! 🖥️ sjoerdvansteenkiste.com/dorsal/ 📜 arxiv.org/abs/2306.08068 1/6

Matthias Minderer

@MJLM3

26 Sep 2023

We just open-sourced OWL-ViT v2, our improved open-vocabulary object detector that uses self-training to reach >40% zero-shot LVIS APr. Check out the paper, code, and pretrained checkpoints: arxiv.org/abs/2306.09683 github.com/google-researc…. With @agritsenko and @neilhoulsby

Matthias Minderer

@MJLM3

14 Jul 2023

Check out NaViT, a Vision Transformer that processes images at their native resolution. Apart from improving efficiency and performance of image-level tasks, pretraining at native resolution also produces better backbones for localization tasks like object detection.

AK

@_akhaliq

13 Jul 2023

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution paper page: huggingface.co/papers/2307.06… The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been…

Matthias Minderer Reposted

AK

@_akhaliq

13 Jul 2023

Matthias Minderer Reposted

Aran Komatsuzaki

@arankomatsuzaki

19 Jun 2023

Scaling Open-Vocabulary Object Detection Proposes OWLv2, which achieves SotA open-vocabulary detection already at 10M examples and further large improvement by scaling to over 1B examples. arxiv.org/abs/2306.09683

Matthias Minderer Reposted

AK

@_akhaliq

19 Jun 2023

Scaling Open-Vocabulary Object Detection paper page: huggingface.co/papers/2306.09… Open-vocabulary object detection has benefited greatly from pretrained vision-language models, but is still limited by the amount of available detection training data. While detection training data can…

Matthias Minderer Reposted

Mostafa Dehghani

@m__dehghani

13 Feb 2023

1/ There is a huge headroom for improving capabilities of our vision models and given the lessons we've learned from LLMs, scaling is a promising bet. We are introducing ViT-22B, the largest vision backbone reported to date: arxiv.org/abs/2302.05442

Matthias Minderer Reposted

AK

@_akhaliq

16 Dec 2022

FlexiViT: One Model for All Patch Sizes abs: arxiv.org/abs/2212.08013 github: github.com/google-researc…

Matthias Minderer Reposted

Alara Dirik

@alaradirik

24 Nov 2022

Transformers now supports image-guided object detection with OWL-ViT - find similar objects within an image using a query image of your target object. 🔥 Check it out, share it around and let me know what you think! Colab: colab.research.google.com/github/hugging… Demo: huggingface.co/spaces/adirik/…

Matthias Minderer Reposted

Google AI

@GoogleAI

25 Oct 2022

Stop by the Google booth at #ECCV2022 at 3:30 pm today to see a demo presented by Austin Stone, @MJLM3 and @agritsenko about OWL-ViT, a simple and scalable approach for open-vocabulary object detection and image-conditioned detection. Try it yourself at bit.ly/owl-vit-demo.

Matthias Minderer Reposted

Niels Rogge

@NielsRogge

5 Aug 2022

OWL-ViT by @GoogleAI is now available @huggingface Transformers. The model is a minimal extension of CLIP for zero-shot object detection given text queries. 🤯 🥳 It has impressive generalization capabilities and is a great first step for open-vocabulary object detection! (1/2)

Matthias Minderer Reposted

Asiedu Brempong

@asiedubrempong

27 Jun 2022

Announcing decoder denoising pretraining for semantic segmentation: arxiv.org/abs/2205.11423 Take a U-Net, pretrain the encoder on classification, pretrain the decoder on denoising, and fine-tune on semantic segmentation. This achieves a new SoTA on label efficient segmentation.

Matthias Minderer Reposted

AK

@_akhaliq

13 May 2022

Simple Open-Vocabulary Object Detection with Vision Transformers abs: arxiv.org/abs/2205.06230

Matthias Minderer

@MJLM3

7 Dec 2021

If you are interested in neural network calibration, please join us now at #NeurIPS2021 poster 27728: eventhosts.gather.town/app/DOlsyaA92T…

Matthias Minderer Reposted

Mostafa Dehghani

@m__dehghani

1 Nov 2021

A few weeks ago, we open-sourced SCENIC, a JAX library/codebase that we like it a lot and wanted to share our joy with the community. GitHub: github.com/google-researc… Paper: arxiv.org/abs/2110.11403