Please use this identifier to cite or link to this item:
Title: Loop unroll optimization for GPU implementation
Authors: Wu, Jianghua.
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Issue Date: 2012
Abstract: This report presents the process of implementation and optimization of two image resize algorithms namely, Bilinear and Bicubic Interpolation. The purpose of the optimization seeks to improve execution time and is primarily done with the use of Nvidia’s Compute Unified Device Architecture (CUDA). Both Algorithms are implemented in C++ before subsequent CUDA codes are added. The challenge in the project is to pick up CUDA programming and also the requirement of understanding the math involved before converting into algorithms. It was evident how the integration of CUDA, by substituting the use of loops in computations with threads running in parallel demonstrated a significant speed up in execution time. There is still room for code refactoring, better CUDA implementation and use of more powerful of Graphics Processing Unit (GPU) that will see improvements to both design and greater optimization of the developed application. In conclusion, the project has shown that under certain conditions, the leveraging the power of by the use of CUDA is a viable optimization tool in Graphics Processing Algorithms such as the ones mentioned above.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.35 MBAdobe PDFView/Open

Page view(s)

Updated on May 17, 2021


Updated on May 17, 2021

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.