Parsa Mahmoudieh

I am currently working on intersections of Reinforcement Learning, NLP, and Computer Vision research at Google Deepmind in Mountain View. My team's latest work on LearnLM is preferred substantially by expert raters across a diverse set of learning scenarios, with average preference strengths of 31% over GPT-4o, 11% over Claude 3.5, and 13% over the Gemini 1.5 Pro model LearnLM was based on.

We integrated our latest work into the SFT, RM, and RL stages of Gemini 2.0. Our previous iteration of LearnLM was presented at Google I/O 2024 here! My main role has been in co-leading the RM & RL stages for LearnLM.

Previously I received my CS PhD at UC Berkeley in BAIR advised by Trevor Darrell and have been mentored by Deepak Pathak and Evan Shelhamer. I have also worked with Pulkit Agrawal, Alyosha Efros, and Jitendra Malik.

Before grad school, I did a double major in EECS and MechE at UC Berkeley and had done undergraduate research in Ron Fearing's Robotics lab. I've also had the pleasure to do internships at GM and Ford.

Email / Google Scholar / LinkedIn / PhD Thesis

Research

LearnLM: Improving Gemini for Learning
One of the core contributors, 2024
[Blog] [Tech Report] [arXiv]

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach
One of the core contributors, Google I/O 2024
[Blog] [Tech Report] [arXiv]

Zero-Shot Reward Specification via Grounded Natural Language
Parsa Mahmoudieh, Deepak Pathak, Trevor Darrell
International Conference on Machine Learning (ICML), 2022 (Spotlight talk)

Weakly-Supervised Trajectory Segmentation for Learning Reusable Skills
Parsa Mahmoudieh, Trevor Darrell, Deepak Pathak
International Conference on Learning Representations (ICLR) Workshop, 2020

Zero-Shot Visual Imitation
Deepak Pathak*, Parsa Mahmoudieh*, Guanghao Luo*, Pulkit Agrawal*, Dian Chen, Fred Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell (* equal contribution)
International Conference on Learning Representations (ICLR), 2018 (Oral Presentation)

Loss is its own Reward: Self-Supervision for Reinforcement Learning
Evan Shelhamer, Parsa Mahmoudieh, Max Argus, Trevor Darrell
International Conference on Learning Representations (ICLR) Workshop , 2017

Modeling and Control of an Ornithopter for Diving
Cameron J. Rose, Parsa Mahmoudieh, Ronald S. Fearing
International Conference on Intelligent Robots and Systems (IROS) , 2016

Coordinated Launching of an Ornithopter with a Hexapedal Robot
Cameron J. Rose, Parsa Mahmoudieh, Ronald S. Fearing
International Conference on Robotics and Automation (ICRA) , 2015

Template from Jon Barron