My current research interests include reinforcement learning and its application to motor control problems. Part of this research aims to reduce the variance in estimating policy gradients by reasoning about an agent’s sensor data. Improving the gradient estimation task allows us to build efficient learning algorithms. These algorithms are used to quickly learn effective controllers for a simulated dart throwing problem and for a simulated quadruped locomotion problem.
I was an active member of Black Graduate Engineering and Science Students (BGESS) when I was in graduate school.
Highly Refereed Publications
- Gregory Lawrence and Stuart Russell. Improving Gradient Estimation by Incorporating Sensor Data. In Proceedings of the Twenty-Fourth International Conference on Uncertainty in Artificial Intelligence, Helsinki, Finland, 2008.
Abstract
The task of estimating the gradient of a function in the presence of noise is central to several forms of reinforcement learning, including policy search methods. We present two techniques for reducing gradient estimation errors in the presence of observable input noise applied to the control signal. The first method extends the idea of a reinforcement baseline by fitting a local model to the response function whose gradient is being estimated; we show how to find the response surface model that minimizes the variance of the gradient estimate, and how to estimate the model from data. The second method improves this further by discounting components of the gradient vector that have high variance. These methods are applied to the problem of motor control learning, where actuator noise has a significant influence on behavior. In particular, we apply the techniques to learn locally optimal controllers for a dart-throwing task using a simulated three-link arm; we demonstrate that the proposed methods significantly improve the response function gradient estimate and, consequently, the learning curve, over existing methods.
- Gregory Lawrence, Noah Cowan, and Stuart Russell. Efficient Gradient Estimation for Motor Control Learning. In Proceedings of the Nineteenth International Conference on Uncertainty in Artificial Intelligence, Acapulco, Mexico, 2003.
Abstract
An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimator with lower variance.
Other Publications
- Gregory Lawrence, Aurora Skarra-Gallagher, and Zhichen Xu. Leveraging Social Connections Improves the Performance of a Recommendation System for Yahoo! Web Applications. In Yahoo! Tech Pulse, Santa Clara, CA, 2010.
- Mark A. Paskin and Gregory Lawrence. Junction Tree Algorithms for Solving Sparse Linear Systems. Technical Report UCB/CSD-03-1271, University of California, Berkeley, 2003.
Abstract
In this technical report we demonstrate how message passing on a junction tree can be used to efficiently solve a linear system
when A is an n x n sparse matrix. The method requires
time and
space, where w is the treewidth of A’s sparsity graph.
Dissertation
- Gregory Lawrence. Efficient Motor Control Learning. Ph.D. dissertation, University of California at Berkeley, 2009.
Abstract
There are many challenges to learning optimal motor control. These challenges include noisy environments and sensors, nonlinear dynamics, continuous variables, high-dimensional problem domains, and redundancy. Reinforcement learning can be used, in principle, to find optimal controllers; however, the traditional learning algorithms are often too slow because obtaining training data is expensive. Although policy gradient methods have shown some promising results, they are limited by the rate at which they can estimate the gradient of the objective function with respect to a given policy’s parameters. These algorithms typically estimate the gradient from a number of policy trials. In the noisy setting, however, many policy trials may be necessary to achieve a desired level of performance. This dissertation presents techniques that may be used to minimize the total number of trials required.
The main difficulty arises because each policy trial returns a noisy estimate of the performance measure. As a result, we have noisy gradient estimates. One source of noise is caused by the use of randomized policies (often used for exploration purposes). We use response surface models to predict the effect that this noise has on the observed performance. This allows us to reduce the variance of the gradient estimates, and we derive expressions for the minimal-variance model for a variety of problem settings. Other sources of noise come from the environment and from the agent’s actuators. Sensor data, which partially measures the effect of this noise, can be used to explain away the noise-induced perturbations in the expected performance. We show how to incorporate the sensor information into the gradient estimation task, further reducing the variance of the gradient estimates. In addition, we show that useful sensor encodings have the following properties: the sensor data is uncorrelated with the agent’s choice of action and the sensor data is correlated with the perturbations in performance. Finally, we demonstrate the effectiveness of our approach by learning controllers for a simulated dart thrower and quadruped locomotion task.
Workshops
- Gregory Lawrence. Improving Gradient Estimation by Incorporating Sensor Data. NIPS Workshop on Robotics Challenges for Machine Learning, Whistler, B.C., Canada, 2007.
Panels
- Panel Member, “Defining and Sustaining Quality Mentoring.” Richard Tapia Conference, 2003.