Policy Learning with an Efficient Black-Box Optimization Algorithm

Hwangbo, Jemin; Gehring, Christian; Sommer, Hannes; Siegwart, Roland; Buchli, Jonas

doi:10.1142/s0219843615500292

Back

Journal article

Policy Learning with an Efficient Black-Box Optimization Algorithm

Hwangbo, Jemin Autonomous Systems Lab, ETH Zürich, Switzerland
Gehring, Christian Autonomous Systems Lab, ETH Zürich, Switzerland
Sommer, Hannes Autonomous Systems Lab, ETH Zürich, Switzerland
Siegwart, Roland Autonomous Systems Lab, ETH Zürich, Switzerland
Buchli, Jonas Agile and Dexterous Robotics Lab, ETH Zürich, Switzerland

2015-9-17

Published in:

International Journal of Humanoid Robotics. - World Scientific Pub Co Pte Lt. - 2015, vol. 12, no. 03, p. 1550029

English Robotic learning on real hardware requires an efficient algorithm which minimizes the number of trials needed to learn an optimal policy. Prolonged use of hardware causes wear and tear on the system and demands more attention from an operator. To this end, we present a novel black-box optimization algorithm, Reward Optimization with Compact Kernels and fast natural gradient regression (ROCK⋆). Our algorithm immediately updates knowledge after a single trial and is able to extrapolate in a controlled manner. These features make fast and safe learning on real hardware possible. The performance of our method is evaluated with standard benchmark functions that are commonly used to test optimization algorithms. We also present three different robotic optimization examples using ROCK⋆. The first robotic example is on a simulated robot arm, the second is on a real articulated legged system, and the third is on a simulated quadruped robot with 12 actuated joints. ROCK⋆ outperforms the current state-of-the-art algorithms in all tasks sometimes even by an order of magnitude.

Language

English

Open access status

closed

Identifiers

DOI 10.1142/s0219843615500292
ISSN 0219-8436

Persistent URL

https://sonar.ch/global/documents/272401

Statistics

Document views: 32 File downloads: