Hey I'm Bowen! I am a research scientist on OpenAI's Multi-Agent Team. I am interested in environments that allow for unbounded learning, multi-agent reinforcement learning and social dilemmas, and generalization to unseen environments (e.g. simulation to reality). While at OpenAI I've worked on emergence from multi-agent autocurricula, state estimation from vision, attention based network architectures for reinforcement learning, and most recently multi-agent social dilemma games. I have an M.Eng. in Electrical Engineering and Computer Science with a focus in AI from MIT, and also a B.S. in EECS and Physics from MIT. During my masters I worked on neural network architecture search, a subfield at the intersection of meta-modeling and hyperparameter optimization.

Find me on scholar, github, or linkedin!


Excited to release my work on emergent reciprocity and team formation in reinforcement learning agents published at NeurIPS this year! See paper here.

Very excited to release results on agents learning progessively more complex strategy and tool use from multi-agent hide-and-seek; see blog/paper and some cool press coverage in Finanical Times, Vox, MIT Tech Review, Tech Crunch, IEEE, New Scientist, Venture Beat, and Sync.

I had an awesome year working with the robotics team at OpenAI. I’m transitioning to the Multi-Agent team working closely with Igor Mordatch and others!

Happy to help the robotics team publish exciting results on in hand manipulation on a real robot! See blog/paper.

I finished my M.Eng. at MIT! My masters thesis, Towards Practical Neural Network Meta-Modeling, has been awarded the Second Place Charles and Jennifer Johnson Computer Science MEng Thesis Award from MIT’s EECS department.

I’ve accepted a full time role on OpenAI’s robotics team as a research scientist!

I’ve joined OpenAI’s robotics team as a research scientist intern!

I submitted my thesis to complete my M.Eng degree at MIT! Find it here.

I gave a presentation at for the Boston Machine Learning meetup on practical CNN meta-modeling. Slides here.

We’ve finally released the MetaQNN Code! Find it here.

I gave a presentation at Google Research - Cambridge on practical CNN meta-modeling. Slides here.

I gave a presentation for the MIT Vision Group on CNN meta-modeling. Slides here.


Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences

Bowen Baker

Neural Information Processing Systems, 2020

Multi-agent reinforcement learning (MARL) has shown recent success in increasingly complex fixed-team zero-sum environments. However, the real world is not zero-sum nor does it have fixed teams; humans face numerous social dilemmas and must learn when to cooperate and when to compete. To successfully deploy agents into the human world, it may be important that they be able to understand and help in our conflicts. Unfortunately, selfish MARL agents typically fail when faced with social dilemmas. In this work, we show evidence of emergent direct reciprocity, indirect reciprocity and reputation, and team formation when training agents with randomized uncertain social preferences (RUSP), a novel environment augmentation that expands the distribution of environments agents play in. RUSP is generic and scalable; it can be applied to any multi-agent environment without changing the original underlying game dynamics or objectives. In particular, we show that with RUSP these behaviors can emerge and lead to higher social welfare equilibria in both classic abstract social dilemmas like Iterated Prisoner’s Dilemma as well in more complex intertemporal environments.

Emergent Tool Use from Multi-Agent Autocurricula

Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch

International Conference on Learning Representations, 2020

Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. We find clear evidence of six emergent phases in agent strategy in our environment, each of which creates a new pressure for the opposing team to adapt; for instance, agents learn to build multi-object shelters using moveable boxes which in turn leads to agents discovering that they can overcome obstacles using ramps. We further provide evidence that multi-agent competition may scale better with increasing environment complexity and leads to behavior that centers around far more human-relevant skills than other self-supervised reinforcement learning methods such as intrinsic motivation. Finally, we propose transfer and fine-tuning as a way to quantitatively evaluate targeted capabilities, and we compare hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests.

Learning Dexterous In-Hand Manipulation

OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, et. al.

The International Journal of Robotics Research 39 (1), 3-20

We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object’s appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five.

Towards Practical Neural Network Meta-Modeling

Bowen Baker

MIT Masters of Engineering Thesis

This thesis largely expounds the work presented in [Designing Neural Network Architectures Using Reinforcement Learning] and in [Practical Neural Network Performance Prediction for Early Stopping]. We present all the material described in these papers, as well as some updated results. Notably, after re-analyzing the MetaQNN models, we found that MetaQNN was actually able to achieve 4.7% error on CIFAR-10, a new record for models with only standard convolution and pooling layers. We also present some brief work on visualizing varying architectures and an improved algorithm for speeding up Hyperband.

Accelerating Neural Architecture Search using Performance Prediction

Bowen Baker*, Otkrist Gupta*, Ramesh Raskar, and Nikhil Naik

NIPS Meta Learning Workshop 2017

Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations. In this paper, we show that standard frequentist regression models can predict the final performance of partially trained model configurations using features based on network architectures, hyperparameters, and time-series validation performance data. We empirically show that our performance prediction models are much more effective than prominent Bayesian counterparts, are simpler to implement, and are faster to train. Our models can predict final performance in both visual classification and language modeling domains, are effective for predicting performance of drastically varying model architectures, and can even generalize between model classes. Using these prediction models, we also propose an early stopping method for hyperparameter optimization and meta-modeling, which obtains a speedup of a factor up to 6x in both hyperparameter optimization and meta-modeling. Finally, we empirically show that our early stopping method can be seamlessly incorporated into both reinforcement learning-based architecture selection algorithms and bandit based search methods. Through extensive experimentation, we empirically show our performance prediction models and early stopping algorithm are state-of-the-art in terms of prediction accuracy and speedup achieved while still identifying the optimal model configurations.

Designing neural network architectures using reinforcement learning

Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar

International Conference on Learning Representations, 2017

At present, designing convolutional neural network (CNN) architectures requires both human expertise and labor. New architectures are handcrafted by careful experimentation or modified from a handful of existing networks. We introduce MetaQNN, a meta-modeling algorithm based on reinforcement learning to automatically generate high-performing CNN architectures for a given learning task. The learning agent is trained to sequentially choose CNN layers using Q-learning with an ϵ-greedy exploration strategy and experience replay. The agent explores a large but finite space of possible architectures and iteratively discovers designs with improved performance on the learning task. On image classification benchmarks, the agent-designed networks (consisting of only standard convolution, pooling, and fully-connected layers) beat existing networks designed with the same layer types and are competitive against the state-of-the-art methods that use more complex layer types. We also outperform existing meta-modeling approaches for network design on image classification tasks.

Determining the resolution limits of electron-beam lithography: direct measurement of the point-spread function

Vitor R. Manfrinato, Jianguo Wen, Lihua Zhang, Yujia Yang, Richard G. Hobbs, Bowen Baker, Dong Su, Dmitri Zakharov, Nestor J. Zaluzec, Dean J. Miller, Eric A. Stach, and Karl K. Berggren

Nano letters 14, no. 8 (2014): 4406-4412.

One challenge existing since the invention of electron-beam lithography (EBL) is understanding the exposure mechanisms that limit the resolution of EBL. To overcome this challenge, we need to understand the spatial distribution of energy density deposited in the resist, that is, the point-spread function (PSF). During EBL exposure, the processes of electron scattering, phonon, photon, plasmon, and electron emission in the resist are combined, which complicates the analysis of the EBL PSF. Here, we show the measurement of delocalized energy transfer in EBL exposure by using chromatic aberration-corrected energy-filtered transmission electron microscopy (EFTEM) at the sub-10 nm scale. We have defined the role of spot size, electron scattering, secondary electrons, and volume plasmons in the lithographic PSF by performing EFTEM, momentum-resolved electron energy loss spectroscopy (EELS), sub-10 nm EBL, and Monte Carlo simulations. We expect that these results will enable alternative ways to improve the resolution limit of EBL. Furthermore, our approach to study the resolution limits of EBL may be applied to other lithographic techniques where electrons also play a key role in resist exposure, such as ion-beam-, X-ray-, and extreme-ultraviolet lithography.



I am a co-founder at Perch. We are an early stage weight room analytics startup and went through the MIT delta v accelerator last summer (2016). I work on machine vision, rep tracking algorithms, and most other aspects of the product back-end.


I was a Data Science Intern at Quora in the summer of 2015. I worked on identifying and fixing categorically misused topics, improving automated topic labeling, and exploring topic geometries. I also helped in creating metric dashboards, responding to company data inquiries, and fixing bugs in data logging.


I was a Data Science Intern at AgilOne during my 2014 Summer break. I created a framework for validating customer data before running the machine learning models. On top of this, I built a deployment framework that would automatically select features to use and initialize models for new customers. I also did some minor work on the product front end.


Kinect 2-Chain

The Kinect 2-Chain was a project I worked on for HackMIT 2015. The goal of the project was to aid the visually impaired in navigation. We used a Kinect 2 to map the space in front of the user and send stereo audio signals with varying pitch to indicate the direction and distance of obstacles. We also used a deep learning API so that the user could also request that a description of the scene in front of them be read aloud. We took 2nd place overall and also won the Microsoft prize; some news coverage can be found here.

MIT Robotics Team

I co-founded the MIT Robotics Team in late 2013. I led the software team for 2 years, during which we placed 2nd in the 2014 NASA RASC-AL ROBO-OPS Competition and competed in the 2015 NASA Sample Return Centennial Challenge.