Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/157772
Title: Exploring language model for better semantic matching of text paragraphs
Authors: Ng, Kwang Sheng
Keywords: Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Engineering::Electrical and electronic engineering
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Ng, K. S. (2022). Exploring language model for better semantic matching of text paragraphs. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157772
Project: A3049-211
Abstract: Natural Language Processing (NLP) has come a long way and modern NLP study has catapulted the proliferate use of NLP incorporated into our everyday lives for short texts. However, the same cannot be said for long text sentence documents. Some current NLP models work well for short texts but suffer when the length of the text increases in size, processing time growing in exponential time with poor results. In recent times, state-of-the-art (SOTA) BERT NLP model propelled existing work forward significantly with their approach. New methods such as Sentence-BERT (SBERT) or Simple Contrasting Learning (SimCSE), basing their work of BERT, experimented and achieved similar outcome as BERT. This report aims to learn how effective the two new models are. In this project, the two models will be put to the test with a patent dataset available online, ‘PatentMatch’ that consist of patent claims and when tested out by the PatentMatch team with the SOTA BERT only managed to achieve 54% accuracy. Utilising pretrained models from SBERT and SimCSE, the PatentMatch test balanced dataset was tested with training and without training to learn how the average cosine similarity score would change and how the models will perform. The experiment was replicated several times with different parameters set. The output from the 2 models varies with the pretrained models used, with models having an accuracy rate around the same as BERT model but was done so at a much quicker time. F1 score for both models look promising with some fine-tuned pretrained models scoring around 66% with quite a high precision and recall score. Both models have the potential to perform even better but a better and more complex pretrained model will need to be used for them to shine.
URI: https://hdl.handle.net/10356/157772
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
NTU FYP Final Report.pdf
  Restricted Access
1.48 MBAdobe PDFView/Open

Page view(s)

39
Updated on Dec 9, 2022

Download(s)

6
Updated on Dec 9, 2022

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.