Please use this identifier to cite or link to this item:
|Title:||UR robot manipulator collision avoidance via reinforcement learning||Authors:||Ding, Yuxin||Keywords:||Engineering::Electrical and electronic engineering::Control and instrumentation::Robotics||Issue Date:||2021||Publisher:||Nanyang Technological University||Source:||Ding, Y. (2021). UR robot manipulator collision avoidance via reinforcement learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/152895||Project:||ISM-DISS-02244||Abstract:||With the development of intelligent technology, robots start to try to complete more complex tasks, so this requires higher stability of robots to the complex environment, and meanwhile there are new requirements for the self-adaptive ability of robots. Deep reinforcement learning has become a research hot spot recently. This thesis mainly studies the motion planning of multi-dimensional robot manipulators. First, traditional planning methods based on random sampling are explained. Simulations are carried out in ROS to compare effect of four different algorithms. The advantage of sampling based methods is with high searching speed, however, the solution is not optimal, and the path searched each time may be different from the previous. Then the basic idea and methods of reinforcement learning is introduced. Combined reinforcement learning with deep learning, target network and experienced replay buffer are introduced to reduce relevance between sampled data. Based on policy gradient, actor network of DDPG can choose action on continuous intervals. PPO algorithm minimizes the impact of strategy changing on the learning results of the agent by controlling the magnitude of updating from previous action policy to new policy. At last, simulation platform is built based on mujoco. And algorithms based on DDPG and PPO are employed to UR5 robot manipulator. Both DDPG and PPO can achieve relative optimal decision making for each joint of manipulator and finally complete motion planning to move to target point with obstacle avoidance. Meanwhile motion path of each joint is smooth. Results of experiment show that DDPG achieves more efficient learning in continuous space, and the hyperparameters in the PPO algorithm are easier to determine, and the algorithm can be realized more intuitively.||URI:||https://hdl.handle.net/10356/152895||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||EEE Theses|
Updated on Oct 15, 2021
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.