SHIV: Reducing Supervisor Burden using Support Vectors to Efficiently Learn Robot Control Policies from Demonstrations

Michael Laskey, Sam Stazak, Wesley Yu-Shu Hsieh, Florian T. Pokorny, Anca D. Dragan, Ken Goldberg
In IEEE ICRA, 2016

Abstract

Online learning from demonstration algorithms, such as DAgger, can learn policies for problems where the system dynamics and the cost function are unknown. How- ever, during learning, they impose a burden on supervisors to respond to queries each time the robot encounters new states while executing its current best policy. Algorithms such as MMD-IL reduce supervisor burden by filtering queries with insufficient discrepancy in distribution and maintaining multiple policies. We introduce the SHIV algorithm (Svm-based reduction in Human InterVention), which converges to a single policy and reduces supervisor burden in non-stationary high dimensional state distributions. To facilitate scaling and outlier rejection, filtering is based on distance to an approximate level set boundary defined by a One Class support vector machine. We report on experiments in three contexts: 1) a driving simulator with a 27,936 dimensional visual feature space, 2) a push-grasping in clutter simulation with a 22 dimensional state space, and 3) physical surgical needle insertion with a 16 dimensional state space. Results suggest that SHIV can efficiently learn policies with equivalent performance requiring up to 70% fewer queries.

Files

Download this publication

Bibtex

@inproceedings{laskey2016a, title={SHIV: Reducing Supervisor Burden using Support Vectors to Efficiently Learn Robot Control Policies from Demonstrations}, author={Laskey, Michael and Stazak, Sam and Hsieh, Wesley Yu-Shu and Pokorny, Florian T. and Dragan, Anca D. and Goldberg, Ken}, booktitle = {IEEE ICRA}, url={http://goldberg.berkeley.edu/pubs/icra16-submitted-SHIV.pdf}, year = {2016}, }