0xtob's profile picture. Researcher / Software Engineer @GoogleDeepMind working on video understanding.

Tobias Weyand

@0xtob

Researcher / Software Engineer @GoogleDeepMind working on video understanding.

Joined June 2008
Similar User
Jimantha's profile picture. 3D vision fanatic. Professor @cornell_tech & Researcher @GoogleDeepmind. He or they. https://t.co/m7Rs5xUFfG

@Jimantha

giotolias's profile picture. Associate Professor at CTU in Prague - Visual Recognition Group. Computer Vision Researcher

@giotolias

dlarlus's profile picture. Computer Vision & Machine Learning researcher @naverlabseurope
she/her - more active there 🦋

@dlarlus

aliathar94's profile picture. Research Scientist at ByteDance
Prev: PhD in CS from RWTH Aachen, MSc. from TUM

@aliathar94

pararths's profile picture. Building something new. ex-Google Gemini, Meta, Stanford, IIT-B

@pararths

kwangmoo_yi's profile picture. Assistant Professor of Computer Science at the University of British Columbia

@kwangmoo_yi

SattlerTorsten's profile picture. Computer Vision researcher, academic

@SattlerTorsten

FuaPv's profile picture. I am a Professor of Computer Science at EPFL in Switzerland. My main research interests are in Computer Vision, Machine Learning, and Biomedical imaging.

@FuaPv

avapamini's profile picture. senior researcher @MSFTResearch | AI for biomedicine |
instructor @MITDeepLearning | biophysics @Harvard | alumna @MIT

@avapamini

Istvan_Sarandi's profile picture. Postdoc @ Real Virtual Humans, University of Tübingen, Germany @uni_tue.
Previously @RWTHVisionLab.

@Istvan_Sarandi

NagraniArsha's profile picture. Senior research scientist @GoogleDeepMind @GoogleAI. PhD from @Oxford_VGG, before that @Cambridge_Uni. 🇮🇳 🇬🇧 🇺🇸. she/her

@NagraniArsha

rishit_dagli's profile picture. CS, Math @UofT |
Research ML, Vision @UofT, Vector |
RT @kubernetesio 1.26-9

@rishit_dagli

ElorHadar's profile picture. Assistant Professor at @cs_cornell and @cornell_tech

@ElorHadar

pmh47_ml's profile picture. Assistant Prof. in Machine Learning at University of Glasgow. I research computer vision/graphics, deep generative models, ML for sciences & healthcare

@pmh47_ml

mihaidusmanu's profile picture. Senior Research Scientist @ Microsoft. Previously @ETH, @Inria, @ENS_ULM.

@mihaidusmanu

Excited to share Long-Video Masked Autoencoder (LVMAE) our team just published at @NeurIPSConf! We boost the context length of video models using an adaptive decoder and a dual-masking strategy and achieve SotA on several video benchmarks. Paper: arxiv.org/abs/2411.13683

Training video understanding models on longer contexts is computationally intensive. To address this, we present a novel approach that reduces the computational load while also improving the quality of the learned representations. More at: goo.gle/4fW5aIc

GoogleAI's tweet image. Training video understanding models on longer contexts is computationally intensive. To address this, we present a novel approach that reduces the computational load while also improving the quality of the learned representations. More at: goo.gle/4fW5aIc


Thank you @JeffDean , very much appreciate the boost! This is really a team effort with my amazing colleagues @NagraniArsha, Mingda Zhang, @raminia, Rachel Hornung, @nitesh_ai, @under_fitting, Austin Meyers, @zhouxy2017, @BoqingGo, @CordeliaSchmid, @sirotenko_m, @ZhuZhu66595

A nice new benchmark for long video understanding by Tobias Weyand @0xtob and others. This is likely to be one of the new frontiers of capabilities for large-scale multimodal models, and it's great to have a new benchmark to assess others in this area.



Excited that our work on Long video understanding is being featured by @GoogleAI !

Can #AI truly understand long videos? Tobias Weyand & the Google Research team are testing the limits w/ Neptune, an open-source benchmark for long video understanding. Dive into the details & see how AI tackles temporal reasoning, cause & effect, & more →goo.gle/4esTTNM



The other day I let my kids talk to Gemini live. Today my 3 year old asked my 6 year old: "Can you tell me a joke?" - 6 year old: "Sorry, I'm just a language model."


Excited to share what our team has been working on! With expanding context lengths, frontier models are able to process longer and longer videos. But how well do they really understand them? Today we release Neptune, a challenging benchmark for long video understanding.

Datasets for evaluation of long video understanding are rare. So with this in mind, today we describe Neptune, an open-source evaluation dataset that includes multiple-choice and open-ended questions for videos of variable lengths up to 15 minutes. More →goo.gle/3B41nZV

GoogleAI's tweet image. Datasets for evaluation of long video understanding are rare. So with this in mind, today we describe Neptune, an open-source evaluation dataset that includes multiple-choice and open-ended questions for videos of variable lengths up to 15 minutes. More →goo.gle/3B41nZV


New long video understanding benchmark from my colleagues @GoogleDeepMind pushing LLMs to their limits!

Can current LLMs solve video reasoning Qs like: Over 1-hour, when does the camera holder go down stairs... ?? Watch the teaser... Can you distinguish up/down stairs - p.s. stairs are not visible when you go down any youtu.be/Ddgvr4OReL4 Hour-Long PerceptionTest VQA @eccvconf



Tobias Weyand Reposted

Congratulations to the authors of "VideoPoet: A Large Language Model for Zero-Shot Video Generation" for winning one of this year's @icmlconf Best Paper Awards! #ICML2024 Paper: openreview.net/forum?id=LRkJw… Blog post: goo.gle/4atanoj

GoogleAI's tweet image. Congratulations to the authors of "VideoPoet: A Large Language Model for Zero-Shot Video Generation" for winning one of this year's @icmlconf Best Paper Awards!  #ICML2024

Paper: openreview.net/forum?id=LRkJw… 
Blog post: goo.gle/4atanoj

Tobias Weyand Reposted

Computer Vision conference's acceptance criteria these days: #CVPR2024 #eccc2024 #AI #ComputerVision


Tobias Weyand Reposted

Introducing VideoPrism, a single model for general-purpose video understanding that can handle a wide range of tasks, including classification, localization, retrieval, captioning and question answering. Learn how it works at goo.gle/49ltEXW


New work from my colleagues: NeRF without the need for SfM to obtain camera poses!

Presenting MELON, a technique that can determine object-centric camera poses entirely from scratch while reconstructing the object in 3D. MELON can easily be integrated into existing NeRF methods and requires as few as 4–6 images of an object. Learn more →…

GoogleAI's tweet image. Presenting MELON, a technique that can determine object-centric camera poses entirely from scratch while reconstructing the object in 3D. MELON can easily be integrated into existing NeRF methods and requires as few as 4–6 images of an object. Learn more →…


My 5yo daughter is already coming up with image generation prompts to test generalization beyond the training data: "Unicorn kitty in space", "Princess astronaut". Or maybe she's just asking me to print coloring pages for her, idk.


Tobias Weyand Reposted

Introducing SANPO, a multi-attribute video dataset for outdoor human egocentric scene understanding composed of both real-world and synthetic data, including depth maps and video panoptic masks with a wide variety of semantic class labels. Read more → goo.gle/3ZISInU


Tobias Weyand Reposted

I just spent 3 days with dear friends, all of whom have kids ages 8mo to 4y. Something I need to get off my chest about being a parent of young kids and the culture we live in:


Tobias Weyand Reposted

New work from our team. We studied how various video foundation models perform on different benchmarks and with different adaptation methods.

VideoGLUE: Video General Understanding Evaluation of Foundation Models paper page: huggingface.co/papers/2307.03… We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action…

_akhaliq's tweet image. VideoGLUE: Video General Understanding Evaluation of Foundation Models

paper page: huggingface.co/papers/2307.03…

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action…


Very clear and concise tutorial on transformers

My Transformer tutorial slides are now available at lucasb.eyer.be/transformer I'll append recordings to this thread as I get them. If you want to use some of the slides for your lecture, you may, as long as you credit me. If you'd like me to give the lecture: maybe; e-mail me.

giffmana's tweet image. My Transformer tutorial slides are now available at lucasb.eyer.be/transformer

I'll append recordings to this thread as I get them.

If you want to use some of the slides for your lecture, you may, as long as you credit me.

If you'd like me to give the lecture: maybe; e-mail me.


Tobias Weyand Reposted

Our team is looking for student researchers to work on foundation video models. You'll work with @BoqingGo and @0xtob DM if you're interested


Tobias Weyand Reposted

The Universal Image Embedding Challenge (kaggle.com/competitions/g…) of our @eccvconf Instance-Level Recognition workshop (ilr-workshop.github.io/ECCVW2022/) is online now! The workshop is co-organized by @0xtob and @giotolias among others.


Tobias Weyand Reposted

🖼️The Met Dataset: a large-scale dataset for instance-level recognition in the artwork domain. Consists of 400k images from more than 224k classes. It can be used for research in few-shot learning, self-supervised and supervised contrastive learning. paperswithcode.com/dataset/met

paperswithdata's tweet image. 🖼️The Met Dataset: a large-scale dataset for instance-level recognition in the artwork domain. Consists of 400k images from more than 224k classes.

It can be used for research in few-shot learning, self-supervised and supervised contrastive learning.

paperswithcode.com/dataset/met

Loading...

Something went wrong.


Something went wrong.