Please use this identifier to cite or link to this item:
Title: Depth map generation : depth estimation from images
Authors: Zhao, Yukai
Keywords: Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Zhao, Y. (2021). Depth map generation : depth estimation from images. Master's thesis, Nanyang Technological University, Singapore.
Abstract: Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Traditional methods mostly use binocular or multi-view images for depth estimation. The most commonly used method is stereo matching technology, which uses triangulation to estimate scene depth information from the image, but it can be easily affected by the diversity of scenes, and the amount of calculation is huge. Monocular images have lower requirements for equipment and environment, and depth estimation through monocular images is closer to the actual situation and has a wider range of application scenarios. With the breakthrough of deep learning, convolutional neural networks (CNN) have demonstrated exciting and outstanding performance on depth estimation. State-of-the-art methods indicate that monocular depth estimation methods require less memory and less calculation time, and at the same time have relatively good accuracy. Monocular depth estimation has therefore attracted more attention. Our objective is to build a monocular image depth estimation model based on deep learning. In this dissertation, three published models BTSNet, BANet, and ViP-DeepLab are investigated and implemented. Our BTSNet has achieved almost the same results as the original paper, which can retain the local detail information and restore the boundaries of the objects by the local plane guidance (LPG) layer in the model. Due to the lack of detailed information on the depth to space (D2S) layer which is used to get the full resolution feature map in BANet, we propose a new model by replacing the D2S layer with the outputs of the LPG layer. The model is trained on the KITTI dataset and has achieved better performance compared with the state-of-the-art methods. Inspired by the LPG layer in the BTSNet, a new upsampling layer called LPG upsampling layer is proposed. The LPG upsampling layer aims to provide detailed information when enlarging the resolution. By introducing the LPG upsampling layer to the ViP-DeepLab, the model achieves a 5% improvement in root mean square error (RMSE).
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
Dissertation_Zhao Yukai_G2002488K.pdf
  Restricted Access
4.37 MBAdobe PDFView/Open

Page view(s)

Updated on Jan 21, 2022


Updated on Jan 21, 2022

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.