@LINJIEFUN Profile picture

Linjie (Lindsey) Li

@LINJIEFUN

researching @Microsoft, @UW, contributing to https://t.co/a3zper7NJG

Similar User
Zhe Gan photo

@zhegan4

Jialu Li photo

@JialuLi96

Yining Hong photo

@yining_hong

Tianlong Chen photo

@TianlongChen4

Gedas Bertasius photo

@gberta227

Freda Shi photo

@fredahshi

Xin Eric Wang photo

@xwang_lk

Li Dong photo

@donglixp

Tao Yu photo

@taoyds

Wanrong Zhu photo

@ZhuWanrong

Jianwei Yang photo

@jw2yang4ai

Fangyu Liu photo

@hardy_qr

Chunyuan Li photo

@ChunyuanLi

Shoubin Yu photo

@shoubin621

Rui Zhang @ EMNLP 2024 photo

@ruizhang_nlp

Pinned

Sorry to leave out one important detail on this job posting. The research area is multimodal understanding and generation.

We are hiring full-time/part-time research interns all year round. If you are interested, please send your resume to linjli@microsoft.com



Linjie (Lindsey) Li Reposted

🚀🚀Excited to introduce GenXD: Generating Any 3D and 4D Scenes! A joint framework for general 3D and 4D generation, supporting both object-level and scene-level generation. Project Page: gen-x-d.github.io Arxiv: arxiv.org/abs/2411.02319


Linjie (Lindsey) Li Reposted

🎬Meet SlowFast-VGen: an action-conditioned long video generation system that learns like a human brain! 🧠Slow learning builds the world model, while fast learning captures memories - enabling incredibly long, consistent videos that respond to your actions in real-time.…

Tweet Image 1

Linjie (Lindsey) Li Reposted

🌟NEW Benchmark Release Alert🌟 We introduce 📚MMIE, a knowledge-intensive benchmark to evaluate interleaved multimodal comprehension and generation in LVLMs, covering 20K+ examples covering 12 fields and 102 subfields. 🔗 [Explore MMIE here](mmie-benchmark.github.io)

Tweet Image 1

Linjie (Lindsey) Li Reposted

[6/N] Led by: @richardxp888, @lillianwei423, @StephenQS0710, and nice collab w/ @LINJIEFUN, @dingmyu and others. - Paper: arxiv.org/pdf/2410.10139 - Project page: mmie-bench.github.io


Linjie (Lindsey) Li Reposted

I am attending #COLM2024 in Philly! Will present our paper “List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs” on Monday morning ⏰ Come and chat if you are interested in multimodal LLMs, synthetic data and training recipes!

Tweet Image 1

Linjie (Lindsey) Li Reposted

This example makes me doubt whether large models like GPT-4o truly have intelligence. Q: Which iron ball will land first, A or B? GPT-4o: Both will land at the same time. I: ??? This example is from our proposed benchmark MM-Vet v2. Paper: huggingface.co/papers/2408.00… Code & data:…

Tweet Image 1

Linjie (Lindsey) Li Reposted

💻We live in the digital era, where screens (PC/Phone) are integral to our lives. 🧐Curious about how AI assistant can help with computer tasks? 🚀Check out the latest progress in repo: github.com/showlab/Awesom… ✨A collection of the up-to-date GUI-related papers and resources.

Tweet Image 1

Recordings of all talks are now available on YouTube and Bilibili. Links are updated on our website. Enjoy!

Interested in vision foundation models like GPT-4o and Sora? Come and join us at our CVPR2024 tutorial on “Recent Advances in Vision Foundation Models” tomorrow (6/17 9:00AM-17:00PM) in Summit 437-439, Seattle Convention Center. Website: vlp-tutorial.github.io #cvpr2024

Tweet Image 1


Afternoon session is starting! Join us in person or online via zoom. For more information, visit vlp-tutorial.github.io

Interested in vision foundation models like GPT-4o and Sora? Come and join us at our CVPR2024 tutorial on “Recent Advances in Vision Foundation Models” tomorrow (6/17 9:00AM-17:00PM) in Summit 437-439, Seattle Convention Center. Website: vlp-tutorial.github.io #cvpr2024

Tweet Image 1


Try joining us online via Zoom if you cannot get into the room. For more information, visit vlp-tutorial.github.io

Interested in vision foundation models like GPT-4o and Sora? Come and join us at our CVPR2024 tutorial on “Recent Advances in Vision Foundation Models” tomorrow (6/17 9:00AM-17:00PM) in Summit 437-439, Seattle Convention Center. Website: vlp-tutorial.github.io #cvpr2024

Tweet Image 1


Happening now at Summit 437-439. For more information, visit vlp-tutorial.github.io

Interested in vision foundation models like GPT-4o and Sora? Come and join us at our CVPR2024 tutorial on “Recent Advances in Vision Foundation Models” tomorrow (6/17 9:00AM-17:00PM) in Summit 437-439, Seattle Convention Center. Website: vlp-tutorial.github.io #cvpr2024

Tweet Image 1


Linjie (Lindsey) Li Reposted

Thanks for sharing! ❓Can AI assistants recreate this animation effects in *PowerPoint*? 🆕We present VideoGUI -- A Benchmark for GUI Automation from Instructional Videos 👉Check it out at showlab.github.io/videogui/

VideoGUI: A Benchmark for GUI Automation from Instructional Videos Presents a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks like Photoshop and video editing abs: arxiv.org/abs/2406.10227 proj: showlab.github.io/videogui/

Tweet Image 1


Interested in vision foundation models like GPT-4o and Sora? Come and join us at our CVPR2024 tutorial on “Recent Advances in Vision Foundation Models” tomorrow (6/17 9:00AM-17:00PM) in Summit 437-439, Seattle Convention Center. Website: vlp-tutorial.github.io #cvpr2024

Tweet Image 1


Come join us in Summit 437-439 tomorrow (6/17 9:00AM-5:00PM)! We are excited to host our 5th tutorial “Recent Advances in Vision Foundation Models”!

Interested in vision foundation models like GPT-4o and Sora? Come and join us at our CVPR2024 tutorial on “Recent Advances in Vision Foundation Models” tomorrow (6/17 9:00AM-17:00PM) in Summit 437-439, Seattle Convention Center. Website: vlp-tutorial.github.io #cvpr2024

Tweet Image 1


Linjie (Lindsey) Li Reposted
Tweet Image 1

Linjie (Lindsey) Li Reposted

🌟Thrilled to introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multimodal video understanding, towards evaluating the "world modeling" capabilities in Multimodal LLMs. 🔥 🔍 Key Features of MMWorld: - Multi-discipline: 7 disciplines, Art🎨 & Sports🥎,…

Tweet Image 1

MMWorld Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities,

Tweet Image 1


Linjie (Lindsey) Li Reposted

The deadline for paper submission is approaching: **15 Mar 2024**. Join us if you are interested in the emerging prompting-based pardigm. @_amirbar @liuziwei7 @YGandelsman @SharonYixuanLi @hyojinbahng @LINJIEFUN @amirgloberson @zhang_yuanhan @BoLi68567011 @JingkangY

#CVPR2024 Please consider submitting your work to our workshop on Prompting in Vision (Track on Emerging Topics) More details at prompting-in-vision.github.io/index_cvpr24.h…



Loading...

Something went wrong.


Something went wrong.