Please use this identifier to cite or link to this item:
Title: Product image matching based on natural language processing
Authors: Wu, Tianxing
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Wu, T. (2022). Product image matching based on natural language processing. Master's thesis, Nanyang Technological University, Singapore.
Project: ISM-DISS-02502
Abstract: Nowadays, faced with an exploding number of retailers selling similar competitive products on the online platform, product matching has become an important topic in E-commerce. This task can be formed as a classic machine learning problem with retrieval, clustering, or binary classification settings. With the rapid development of Computer Vision community in recent years, plenty of work has been made in related topics, such as image retrieval, image clustering, and image classification. However, image-based solutions could face severe problems in E-commerce environment, since images posted on online platforms usually lack certain key information about the attributes that can not be inferred through appearance. In addition, some fine-grained features of fashion products are also extremely difficult to extract from images. On the other hand, these attributes are usually included in product titles. As a result, developing an algorithm based on Natural Language Processing (NLP) to use text information to solve product matching problems has become a practical direction. Recently, large pre-trained language models like BERT have demonstrated powerful capabilities in solving a variety of NLP tasks, but since their training objective is not directly related to E-commerce, directly using them for our task may not lead to promising results. In view of the above problems, this project aims to find an appropriate way of adapting BERT-like models into E-commerce domain to solve the product matching problems. Specifically, three fine-tune schemas for the chosen pre-trained model are explored, and a two-stage text-based product matching pipeline is proposed. Furthermore, a novel loss function is proposed to assist the fine-tuning process. By conducting extensive experiments on a public dataset, the effectiveness of the proposed pipeline is verified, and the new loss function is proved to have superior text representation learning ability than other conventional methods examined for our specific task.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
  Restricted Access
7.03 MBAdobe PDFView/Open

Page view(s)

Updated on May 15, 2022


Updated on May 15, 2022

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.