Ansh Shah @baymax3009 Twitter Profile

Ansh Shah

@baymax3009

Research Associate at RRC, IIIT Hyderabad. Previously undergrad at BITS Pilani | Interested in Robot Learning

131Posts 52Followers 746Following

Similar User

@_mob_mob_mob_

@AaronTrashman

@Muchindamuhombe

@CharlesWDean1

@sesegere

@Jr74Ski

@marvin_whiteeee

@David55th

@quentin33287364

@mullar_22

@realroshan2001

@CryptoPayel

@Tenrai_44

@Lino_Banks

@Sbbwlover256

Ansh Shah Reposted

Ansh Shah

@baymax3009

15 Nov

Check out @binghao_huang 's new touch censor from CoRL24!

Want to use tactile sensing but not familiar with hardware? No worries! Just follow the steps, and you’ll have a high-resolution tactile sensor ready in 30 mins! It’s as simple as making a sandwich! 🥪 🎥 YouTube Tutorial: youtube.com/watch?v=8eTpFY… 🛠️ Open Source & Hardware…

Ansh Shah Reposted

Ansh Shah

@baymax3009

7 Nov

Excited to finally share Generative Value Learning (GVL), my @GoogleDeepMind project on extracting universal value functions from long-context VLMs via in-context learning! We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+…

Ansh Shah Reposted

Ansh Shah

@baymax3009

7 Nov

Our seminal paper (yes, I do believe this is transformative to the field) "Spatial Cognition from Egocentric Video: Out of Sight not Out of Mind" is accepted @3DVconf #3DV2025 Camera ready soon Congrats 2 great coauthors @plizzari38126 @goelshbhm Toby @JacobChalkie @akanazawa

Dima Damen

@dimadamen

9 Apr

🆕on ArXiv Out of Sight, Not Out of Mind Spatial Cognition from Egocentric Video. dimadamen.github.io/OSNOM/ arxiv.org/abs/2404.05072 3D tracking active objects using observations captured through egocentric camera. Objects are tracked while in hand, from cupboards and into drawers

Ansh Shah Reposted

Ansh Shah

@baymax3009

31 Oct

Wrote a blogpost on using image and video diffusion models to "draw actions" Summary: - LLMs can model arbitrary sequences, diffusion models can generate arbitrary patterns - Images can serve as a common format across modalities like vision, audio, actions Link below

Ansh Shah Reposted

Ansh Shah

@baymax3009

31 Oct

Very happy to start sharing our work at Pi 🤖❤️ - a 3B pre-trained generalist model trained on 8+ robot platforms - a post-training recipe that allows robots to do dexterous, long-horizon tasks physicalintelligence.company/blog/pi0 What's exciting isn't laundry, but the recipe- a short 🧵

Ansh Shah Reposted

Ansh Shah

@baymax3009

30 Oct

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We…

Ansh Shah Reposted

Ansh Shah

@baymax3009

29 Oct

Diffusion-based approach beats autoregressive models at solving puzzles and planning 🤖 Original Problem: Autoregressive LLMs struggle with complex reasoning and long-term planning tasks despite their impressive capabilities. They have inherent difficulties maintaining global…

Ansh Shah Reposted

Ansh Shah

@baymax3009

29 Oct

Last Sunday, we competed in the Vision Assistance Race at the @cybathlon 2024—the "cyber Olympics" designed to push the boundaries of assistive technology. In this race, our system guided a blind participant through everyday tasks such as walking along a sidewalk, sorting colors,…

Ansh Shah Reposted

Ansh Shah

@baymax3009

29 Oct

How do we represent 3D world knowledge for spatial intelligence in next-generation robots? We recently wrote an extensive survey paper on this emerging topic, covering recent state-of-the-art! 🦾 🚀 Check it out below. Feedback/Suggestions welcome! 📖arXiv:…

Ansh Shah Reposted

Ansh Shah

@baymax3009

25 Oct

📢 Excited to share our new paper with @fabreetseo: "Beyond Position: How Rotary Embeddings Shape Representations and Memory in Autoregressive Transformers"! arxiv.org/abs/2410.18067 Keep reading to find out how RoPE affects Transformer models beyond just positional encoding 🧵

Ansh Shah Reposted

Ansh Shah

@baymax3009

24 Oct

Pretraining can transform RL, but it might need rethinking how to pretrain with RL on unlabeled data to bootstrap downstream exploration. In our new work, we show how to accomplish this with unsupervised skills and exploration.

Qiyang Li

@qiyang_li

24 Oct

Latest work on leveraging prior trajectory data with *no* reward label to accelerate online RL exploration! Our method leverages our prior work (ExPLORe) and skill pretraining to achieve better sample efficiency on a range of spare-reward tasks than all prior approaches!

Ansh Shah Reposted

Ansh Shah

@baymax3009

17 Oct

Sequence models have skyrocketed in popularity for their ability to analyze data & predict what to do next. MIT’s "Diffusion Forcing" method combines the strengths of next-token prediction (like w/ChatGPT) & video diffusion (like w/Sora), training neural networks to handle…

Ansh Shah Reposted

Ansh Shah

@baymax3009

14 Oct

state space models are super neat & interesting, but i have never seen any evidence that they’re *smarter* than transformers - only more efficient any architectural innovation that doesn’t advance the pareto frontier of intelligence-per-parameter is an offramp on the road to AGI

Ansh Shah Reposted

Ansh Shah

@baymax3009

14 Oct

Sirui's new work presents a nice system design with a user-friendly interface, for data collection without a robot. Collecting robot data without robots but with humans only is the right way to go.

Sirui Chen

@eric_srchen

14 Oct

How can we collect high-quality robot data without teleoperation? AR can help! Introducing ARCap, a fully open-sourced AR solution for collecting cross-embodiment robot data (gripper and dex hand) directly using human hands. 🌐:stanford-tml.github.io/ARCap/ 📜:arxiv.org/abs/2410.08464

Ansh Shah Reposted

Ansh Shah

@baymax3009

13 Oct

Mechazilla has caught the Super Heavy booster!

Ansh Shah Reposted

Ansh Shah

@baymax3009

8 Oct

#NobelPrize2024

Ansh Shah Reposted

Ansh Shah

@baymax3009

6 Oct

A perfect real-world example of equivariance haha

Olexandr Isayev 🇺🇦🇺🇸

@olexandr

5 Oct

Cats are invariant under SO(3) transformations! 😼

Ansh Shah Reposted

Ansh Shah

@baymax3009

3 Oct

The 3D vision community really hates the bitter lesson. Dust3r is what you get when you take the lesson seriously.

Ansh Shah Reposted

Ansh Shah

@baymax3009

1 Oct

The paper contains many ablation studies on various ways to use the LLM backbone 👇🏻 🦩 Flamingo-like cross-attention (NVLM-X) 🌋 Llava-like concatenation of image and text embeddings to a decoder-only model (NVLM-D) ✨ a hybrid architecture (NVLM-H)