Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/138634
Title: Context based patent classification and search : part A
Authors: Yoong, Jia Hui
Keywords: Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Issue Date: 2020
Publisher: Nanyang Technological University
Project: A3049-191
Abstract: This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, which calls for a need to modify state-ofthe-art Transformer-based models such as BERT before they can be applied effectively to the classification framework. As such, this project will cover different methods of developing the classifier model using NLP and evaluate which segments of a patent work best in training the model. For this project, a multi-label classification model is developed to predict the categories that a patent would fall under. This report is a summary of the usage of finetuning the XLNet and ALBERT pre-trained models using different components from a custom dataset obtained from patent text, and a comparative analysis of the different pre-processing methods and models tested. The first approach was to test out the model’s accuracy when fine-tuned on different segments of a patent. From the results obtained, it can be concluded that the description segment holds the most promise when upscaling the model and dataset. The second approach was using both the abstract and claims segment of a patent. While there were no significant improvements, it is worth noting that the model could handle a larger variety of inputs for a more reliable classification output. The last approach attempted to fine-tune the model by merging the last hidden states of the model output from both abstract and claims segments of a patent. However, this method proved ineffective and did not have any significant results. In summary, the best scores in the empirical study achieved a score of 95.3% accuracy for one in top three prediction in the main group level, and 58.1% for the sub-group level of Patent IPC classifications.
URI: https://hdl.handle.net/10356/138634
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP Report_YOONG_JIA_HUI_Final.pdf
  Restricted Access
EEE FYP Report2.45 MBAdobe PDFView/Open

Page view(s)

256
Updated on Jan 29, 2023

Download(s) 50

21
Updated on Jan 29, 2023

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.