@MJLM3 Profile picture

Matthias Minderer

@MJLM3

Research Scientist at @GoogleResearch.

Similar User
Jannik Kossen photo

@janundnik

Barret Zoph photo

@barret_zoph

Mostafa Dehghani photo

@m__dehghani

Xiaohua Zhai photo

@XiaohuaZhai

noahdgoodman photo

@noahdgoodman

Dani Yogatama photo

@DaniYogatama

Preetum Nakkiran photo

@PreetumNakkiran

Sjoerd van Steenkiste photo

@vansteenkiste_s

Olivier Bachem photo

@OlivierBachem

Zhuang Liu photo

@liuzhuang1234

Ke Li 🍁 photo

@KL_Div

Sadhika Malladi photo

@SadhikaMalladi

Adam Golinski  photo

@adam_golinski

Xiao Wang photo

@brainshawn

Xiuyu Li photo

@xiuyu_l

Matthias Minderer Reposted

Vision LM inference without accelerators: Gemma.cpp (open source inference for CPU) now supports PaliGemma! If you're at #ECCV2024, @AndreasPSteiner will demo it tomorrow (Thu Oct 3) at 10:30 at the Google booth. Github: github.com/google/gemma.c…


Matthias Minderer Reposted

How is next-token prediction capable of such intelligent behavior? I’m very excited to share our work, where we study the fractal structure of language. TLDR: thinking of next-token prediction in language as “word statistics” is a big oversimplification! arxiv.org/abs/2402.01825

Tweet Image 1

Matthias Minderer Reposted

Excited to share that @Google's OWLv2 model is now available in 🤗 Transformers! This model is one of the strongest zero-shot object detection models out there, improving upon OWL-ViT v1 which was released last year🔥 How? By self-training on web-scale data of over 1B examples⬇️

Tweet Image 1

Matthias Minderer Reposted

I'll give a talk on object-centric models for video and 3D at the @ICCVConference Workshop on Large-scale Video Object Segmentation! Today @ 3:30pm (Room S02) Website: youtube-vos.org/challenge/2023/ I'll cover DORSal (see below) & recent work from our team on structured video models.

Excited to announce DORSal: a 3D structured diffusion model for generation and object-level editing of 3D scenes. DORSal is “geometry-free” and learns 3D scene structure purely from data – no expensive volume rendering! 🖥️ sjoerdvansteenkiste.com/dorsal/ 📜 arxiv.org/abs/2306.08068 1/6



We just open-sourced OWL-ViT v2, our improved open-vocabulary object detector that uses self-training to reach >40% zero-shot LVIS APr. Check out the paper, code, and pretrained checkpoints: arxiv.org/abs/2306.09683 github.com/google-researc…. With @agritsenko and @neilhoulsby

Tweet Image 1

Check out NaViT, a Vision Transformer that processes images at their native resolution. Apart from improving efficiency and performance of image-level tasks, pretraining at native resolution also produces better backbones for localization tasks like object detection.

Tweet Image 1

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution paper page: huggingface.co/papers/2307.06… The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been…

Tweet Image 1


Matthias Minderer Reposted

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution paper page: huggingface.co/papers/2307.06… The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been…

Tweet Image 1

Matthias Minderer Reposted

Scaling Open-Vocabulary Object Detection Proposes OWLv2, which achieves SotA open-vocabulary detection already at 10M examples and further large improvement by scaling to over 1B examples. arxiv.org/abs/2306.09683

Tweet Image 1

Matthias Minderer Reposted

Scaling Open-Vocabulary Object Detection paper page: huggingface.co/papers/2306.09… Open-vocabulary object detection has benefited greatly from pretrained vision-language models, but is still limited by the amount of available detection training data. While detection training data can…

Tweet Image 1

Matthias Minderer Reposted

1/ There is a huge headroom for improving capabilities of our vision models and given the lessons we've learned from LLMs, scaling is a promising bet. We are introducing ViT-22B, the largest vision backbone reported to date: arxiv.org/abs/2302.05442

Tweet Image 1

Matthias Minderer Reposted

FlexiViT: One Model for All Patch Sizes abs: arxiv.org/abs/2212.08013 github: github.com/google-researc…

Tweet Image 1

Matthias Minderer Reposted

Transformers now supports image-guided object detection with OWL-ViT - find similar objects within an image using a query image of your target object. 🔥 Check it out, share it around and let me know what you think! Colab: colab.research.google.com/github/hugging… Demo: huggingface.co/spaces/adirik/…

Tweet Image 1

Matthias Minderer Reposted

Stop by the Google booth at #ECCV2022 at 3:30 pm today to see a demo presented by Austin Stone, @MJLM3 and @agritsenko about OWL-ViT, a simple and scalable approach for open-vocabulary object detection and image-conditioned detection. Try it yourself at bit.ly/owl-vit-demo.


Matthias Minderer Reposted

OWL-ViT by @GoogleAI is now available @huggingface Transformers. The model is a minimal extension of CLIP for zero-shot object detection given text queries. 🤯 🥳 It has impressive generalization capabilities and is a great first step for open-vocabulary object detection! (1/2)


Matthias Minderer Reposted

Announcing decoder denoising pretraining for semantic segmentation: arxiv.org/abs/2205.11423 Take a U-Net, pretrain the encoder on classification, pretrain the decoder on denoising, and fine-tune on semantic segmentation. This achieves a new SoTA on label efficient segmentation.


Matthias Minderer Reposted

Simple Open-Vocabulary Object Detection with Vision Transformers abs: arxiv.org/abs/2205.06230

Tweet Image 1

If you are interested in neural network calibration, please join us now at #NeurIPS2021 poster 27728: eventhosts.gather.town/app/DOlsyaA92T…


Matthias Minderer Reposted

A few weeks ago, we open-sourced SCENIC, a JAX library/codebase that we like it a lot and wanted to share our joy with the community. GitHub: github.com/google-researc… Paper: arxiv.org/abs/2110.11403

Tweet Image 1

Matthias Minderer Reposted

SCENIC: A JAX Library for Computer Vision Research and Beyond abs: arxiv.org/abs/2110.11403 github: github.com/google-researc…

Tweet Image 1

Loading...

Something went wrong.


Something went wrong.