Please use this identifier to cite or link to this item:
Title: UR robot manipulator collision avoidance via reinforcement learning
Authors: Ding, Yuxin
Keywords: Engineering::Electrical and electronic engineering::Control and instrumentation::Robotics
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Ding, Y. (2021). UR robot manipulator collision avoidance via reinforcement learning. Master's thesis, Nanyang Technological University, Singapore.
Project: ISM-DISS-02244
Abstract: With the development of intelligent technology, robots start to try to complete more complex tasks, so this requires higher stability of robots to the complex environment, and meanwhile there are new requirements for the self-adaptive ability of robots. Deep reinforcement learning has become a research hot spot recently. This thesis mainly studies the motion planning of multi-dimensional robot manipulators. First, traditional planning methods based on random sampling are explained. Simulations are carried out in ROS to compare effect of four different algorithms. The advantage of sampling based methods is with high searching speed, however, the solution is not optimal, and the path searched each time may be different from the previous. Then the basic idea and methods of reinforcement learning is introduced. Combined reinforcement learning with deep learning, target network and experienced replay buffer are introduced to reduce relevance between sampled data. Based on policy gradient, actor network of DDPG can choose action on continuous intervals. PPO algorithm minimizes the impact of strategy changing on the learning results of the agent by controlling the magnitude of updating from previous action policy to new policy. At last, simulation platform is built based on mujoco. And algorithms based on DDPG and PPO are employed to UR5 robot manipulator. Both DDPG and PPO can achieve relative optimal decision making for each joint of manipulator and finally complete motion planning to move to target point with obstacle avoidance. Meanwhile motion path of each joint is smooth. Results of experiment show that DDPG achieves more efficient learning in continuous space, and the hyperparameters in the PPO algorithm are easier to determine, and the algorithm can be realized more intuitively.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
DingYuxin_dissertation_final version.pdf
  Restricted Access
5.07 MBAdobe PDFView/Open

Page view(s)

Updated on Dec 7, 2021

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.