Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/163161
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTan, Samantha Swee Yunen_US
dc.date.accessioned2022-11-28T23:38:16Z-
dc.date.available2022-11-28T23:38:16Z-
dc.date.issued2022-
dc.identifier.citationTan, S. S. Y. (2022). Named entity recognition for information extraction. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/163161en_US
dc.identifier.urihttps://hdl.handle.net/10356/163161-
dc.description.abstractNamed Entity Recognition (NER) for Information Extraction (IE) has grown in importance due to its capability to streamline processes such as administrative tasks by providing real-time feedback overview. This is achieved by conducting data mining to extract and provide useful information for each feedback. This can help users and organisations to obtain a quick overview of how others perceive a particular product or service, enabling them to take further action to improve their businesses. Additionally, as Singapore is a well-known multicultural country, which consists of unique food, street and location names that may not always be in English, it is thus important for us to investigate NER on Singapore-based datasets. However, as the quality of NER is known to be affected by factors such as noise and data diversity, we propose the use of an NEM dictionary instead to increase the performance of the IE process. Hence, the aim of this project is to study and evaluate different NER models for building an NEM dictionary such as a Singapore Food Location NEM Dictionary. As a result of this project, three different NER models known as FLERT XLM-R, CL-KL and XLNet, have been evaluated on a benchmark dataset. Top performing models were then applied to two Singapore-based datasets to evaluate its effectiveness in extracting Singapore location names and addresses. Empirical results obtained from this project showed that LUKE with CL-KL, without external context retrieval was the best performing model that was able to meet our project objective. For future work, we recommend building a labelled Singapore dataset with BIO tagging scheme to improve the NER performance on Singapore-based datasets and we propose further works such as generating a more domain-specific NEM dictionary such as a Food NEM Dictionary as well as evaluating the use of NEM dictionary on real applications such as the NTU Food Hunter System.en_US
dc.language.isoenen_US
dc.publisherNanyang Technological Universityen_US
dc.relationSCSE21-0906en_US
dc.subjectEngineering::Computer science and engineering::Computing methodologies::Artificial intelligenceen_US
dc.titleNamed entity recognition for information extractionen_US
dc.typeFinal Year Project (FYP)en_US
dc.contributor.supervisorHui Siu Cheungen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeBachelor of Engineering (Computer Science)en_US
dc.contributor.supervisoremailASSCHUI@ntu.edu.sgen_US
item.fulltextWith Fulltext-
item.grantfulltextrestricted-
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)
Files in This Item:
File Description SizeFormat 
FINAL FYP REPORT - Samantha Tan Swee Yun.pdf
  Restricted Access
2.13 MBAdobe PDFView/Open

Page view(s)

275
Updated on Jun 15, 2024

Download(s) 50

57
Updated on Jun 15, 2024

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.