Event

Talk on The surprising efficacy of “ungrounded” models for image and video understanding, generation, and humanoid locomotion

Featured image

Location

Date

Title 

The surprising efficacy of “ungrounded” models for image and video understanding, generation, and humanoid locomotion 

 

Abstract 

Recently released open-source text LLMs have provided significant leverage towards multimodal perception, via lightweight fusion with learned visual representations, or even–somewhat paradoxically—as a unimodal source of knowledge in another domain. I’ll cover our recent work exploring this premise, including recent advances toward a modern form of visual routines a.k.a. visual programming, methods for recursive explainable visual question answering, an approach to multimodal gesture animation, and image and video generation with LLM-constrained diffusion models. As time permits I’ll also discuss advances in large scale next token prediction models for vision and humanoid locomotion.

 

Bio

Prof. Darrell is on the faculty of the CS and EE Divisions of the EECS Department at UC Berkeley. He founded and co-leads Berkeley’s Berkeley Artificial Intelligence Research (BAIR) lab, the Berkeley DeepDrive (BDD) Industrial Consortia, and the recently launched BAIR Commons program in partnership with Facebook, Google, Microsoft, Amazon, and other partners. He also was Faculty Director of the PATH research center at UC Berkeley, and led the Vision group at the UC-affiliated International Computer Science Institute in Berkeley from 2008-2014. Prior to that, Prof. Darrell was on the faculty of the MIT EECS department from 1999-2008, where he directed the Vision Interface Group. He was a member of the research staff at Interval Research Corporation from 1996-1999, and received the S.M., and PhD. degrees from MIT in 1992 and 1996, respectively. He obtained the B.S.E. degree from the University of Pennsylvania in 1988.

Darrell’s group develops algorithms for large-scale perceptual learning, including object and activity recognition and detection, for a variety of applications including autonomous vehicles, media search, and multimodal interaction with robots and mobile devices. His areas of interest include computer vision, machine learning, natural language processing, and perception-based human computer interfaces.

Prof. Darrell also co-founded and serves as President of Prompt AI. Darrell is an advisor to several other ventures, including SafelyYou, Grabango, and Nexar, and SuperAnnotate.. Previously, Darrell advised Pinterest, Tyzx (acquired by Intel), IQ Engines (acquired by Yahoo), Koozoo, BotSquare/Flutter (acquired by Google), MetaMind (acquired by Salesforce), Trendage, Center Stage, KiwiBot, WaveOne, and DeepScale. Darrell has also served as an expert witness for patent litigation relating to computer vision.

Visit Prof.  Darrell’s website here for more information.