Oxford Robotics Institute
Using AI to Advance Robot LearningPublished 2 May 2023
Efficient Skill Acquisition for Complex Manipulation Tasks in Obstructed Environments
Project Background
The ability for robots to learn new skills using limited supervision is essential for maximising their up-time and productivity. For example, small-batch manufacturing - where there are a limited number of parts to be produced - is an ideal use case scenario that would greatly benefit from efficient skill acquisition. In such a setting, a robot must learn to manipulate new objects while maintaining efficiency in potentially obstructed environments, however, existing methods such as motion planning (MP) and reinforcement learning (RL) struggle to satisfy the requirements to facilitate this capability.
MP [1-3] can generate collision-free paths capable of guiding a robot safely in obstructed environments if a detailed state of the environment and outcome are specified. However, MP is not designed for cases where complex manipulation tasks are required or environmental interaction is necessary. On the other hand RL has shown promising outcomes in controlling a robot for complex manipulation tasks such as grasping and insertion [5-7], however, previous studies have focussed on simulated environments [8]. Combining MP and RL has been investigated in the past [4, 9], and demonstrates the potential of leveraging the strengths of both methods to solve manipulation tasks in obstructed environments, but the model has required re-training for each new target object.
Project Approach
The ORI team proposed a system that leverages an object-centric generative model (OCGM) [10] taking advantage of MP and RL techniques. By identifying a target object from a single human demonstration, an OCGM would be pre-trained on a range of diverse synthetic scenes, leading to robust re-identification of that object in new scenes by matching to its object-centric representation. MP was used to generate a collision-free path to the target object while avoiding obstacles, before a learned RL policy was executed to complete the complex manipulation tasks, and a skill transition network is employed to bridge the gap between terminal states of MP and feasible start states of a sample efficient RL policy.
The project involved various distinct elements - firstly, the proposal of a system for efficient skill acquisition in obstructed environments that leverages an OCGM for object-agnostic, one-shot goal specification. Secondly the ORI aimed to show that the OCGM-based one-shot goal specification method achieves comparable accuracy against several goal identification baselines and finally to demonstrate that the system performs significantly better in real-world environments compared to baselines, including a state of the art RL algorithm.
The ORI team chose to evaluate its framework on four assembly tasks commonly found in industry - connections for VGA, RJ45, E-model and USB-A. Each socket is attached to a mount of varying size and colour to demonstrate the versatility and efficiency of our one-shot goal specification using an OCGM.
Project Results
Real-world industrial assembly task were carried out in obstructed environments, with the OCGM used to specify a goal for MP, followed by the skill transition network and a learned RL policy.
As previously highlighted, the performance of the ORI approach was benchmarked against a state of the art RL algorithm and four comparable examples of their approach. These were as follows:
• Soft Actor Critic (SAC) - the RL algorithm trained with 25 demonstrations using FERM [6]
• MP + Demonstration Replay - substitutes replaying a single expert demonstration for the learned RL policy execution in our method, inspired by previous work [11]
• MP + BC - replaces the learned RL policy in our method with Behaviour Cloning (BC) [12], [13], trained from 25 demonstrations
• MP + Heuristic - uses a manually designed heuristic policy [9] instead of the learned RL policy in our method to solve the task
• MP + RL - the ORI method but without no skill transition network for comparison
• MP + RL - the ORI method
The below table shows the various success rates of each approach from an average of over 30 trials and a confidence interval of 95%.
Success Rates | ||||
---|---|---|---|---|
Method | VGA | RJ45 | E-model | USB-A |
SAC | 0.0% | 0.0% | 0.0% | 0.0% |
MP + Demonstration Replay | 3.3% | 0.0% | 3.3% | 0.0% |
MP + BC | 16.7% | 16.7% | 23.3% | 26.7% |
MP + Heuristic | 10.0% | 16.7% | 36.7% | 43.3% |
MP + RL (w/o) | 73.3% | 46.7% | 80.0% | 70.0% |
MP + RL (ORI) | 86.7% | 83.3% | 93.3% | 96.7% |
Conclusions
The experimental results show that the ORI method for one-shot goal identification provides competitive accuracy to other baseline approaches and that the modular framework outperforms competitive baselines, including a state of the art RL algorithm, by a significant margin for complex manipulation tasks in obstructed environments. In addition, this method successfully solves real-world industrial insertion tasks in obstructed environments from fewer demonstrations. In future work the team plan to investigate more advanced settings, such as randomising the socket orientation.
Further information on these robotic experiments and the results can found by accessing the full white paper here.
References
[1] N.M.Amatoand, Y.Wu,“A randomized road map method for path and manipulation planning,” in IEEE International Conference on Robotics and Automation, 1996
[2] S. M. LaValle, “Rapidly-exploring random trees: A new tool for path planning,” Computer Science Department, Iowa State University, Tech. Rep. TR 98-11, 1998
[3] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” International Journal of Robotics Research, vol. 30, no. 7, pp. 846–894, 2011
[4] M. A. Lee, C. Florensa, J. Tremblay, N. Ratliff, A. Garg, F. Ramos, and D. Fox, “Guided uncertainty-aware policy optimization: Combining learning and model-based strategies for sample-efficient policy learn- ing,” IEEE International Conference on Robotics and Automation, 2020
[5] D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke et al., “Scalable deep reinforcement learning for vision-based robotic manipulation,” in Conference on Robot Learning, 2018, pp. 651–673
[6] A. Zhan, P. Zhao, L. Pinto, P. Abbeel, and M. Laskin, “A framework for efficient robotic manipulation,” arXiv preprint arXiv:2012.07975, 2020
[7] J. Luo, O. Sushkov, R. Pevceviciute, W. Lian, C. Su, M. Vecerik, N. Ye, S. Schaal, and J. Scholz, “Robust multi-modal policies for industrial assembly via reinforcement learning and demonstrations: A large-scale study,” arXiv preprint arXiv:2103.11512, 2021
[8] T.Haarnoja, A.Zhou, P.Abbeel, and S.Levine,“Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning, 2018
[9] J. Yamada, Y. Lee, G. Salhotra, K. Pertsch, M. Pflueger, G. S. Sukhatme, J. J. Lim, and P. Englert, “Motion planner augmented reinforcement learning for obstructed environments,” in Conference on Robot Learning, 2020
[10] Y. Wu, O. P. Jones, M. Engelcke, and I. Posner, “Apex: Unsupervised, object-centric scene segmentation and tracking for robot manipulation,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 3375–3382
[11] E. Johns, “Coarse-to-fine imitation learning: Robot manipulation from a single demonstration,” in IEEE International Conference on Robotics and Automation (ICRA), 2021
[12] T. Zhang, Z. McCarthy, O. Jow, D. Lee, X. Chen, K. Goldberg, and P. Abbeel, “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,” in IEEE International Conference on Robotics and Automation, 2018, pp. 5628–5635
[13] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016
The Scan Partnership
Scan AI team supports ORI projects by providing a hardware cluster of six NVIDIA DGX A100 servers combined with a multi-GPU NVIDIA RTX 6000 server and an AI-optimised NVMe all-flash storage array. This cluster is overlaid with Run:AI software in order to virtualise the GPU pool across the DGX nodes to facilitate maximum utilisation, and to provide a mechanism of scheduling and allocation of ORI workflows’ across the combined GPU resource. This infrastructure is delivered to the ORI team over the Scan Cloud platform and is hosted in a secure UK datacentre.
‘The Scan clusters have been incredibly useful for my research, which required a significant amount of computational resources. Additionally, the Scan team has consistently offered prompt and helpful support whenever I had any issues or questions. Overall, it has been a fantastic experience using Scan.’
Shaohong Zhong, DPhil student at the Oxford Robotics Institude
Related content
Using AI to Advance Robot Learning
Helping robots to be able to sense and interpret their surroundings in order to perform their intended tasks.
Read more