Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/136585
Title: Named entity recognition and linking with knowledge base
Authors: Phan, Cong Minh
Keywords: Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Issue Date: 2019
Publisher: Nanyang Technological University
Source: Phan, C. M. (2019). Named entity recognition and linking with knowledge base. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Named entities such as people, organizations, and locations appear in various kinds of textual contexts and under different surface forms. Successful extraction of these entities enables machines to understand and organize information in a systematic manner. This thesis addresses both named entity recognition (NER) and entity linking (EL) processes. The former aims at recognizing mentions of specific classes such as persons, organizations, and locations, while the latter maps these mentions to their associated entities in a knowledge base. Different from humans who can quickly identify these named entities using their commonsense knowledge and inference-making ability, machines do not have that intelligence. The main challenges arise when the mentions and local contexts are ambiguous. Moreover, the variance of entity names also introduces additional difficulty in resolving the mentions' identities. As such, the recognition and disambiguation of these entity mentions greatly depend on machine understanding of the input contexts, knowledge base entities, as well as the relations between them. In this thesis, we introduce several novel approaches to tackle these challenges in both NER and EL. First, we propose a collective NER framework for the recognition task. Apart from local contexts, our approach utilizes relevant contexts in related documents to perform NER in a collective manner. The proposed model demonstrates superior performance on user comments in which the context of each individual comment is often limited. Second, we tackle the EL problem by first addressing the ambiguity of mentions. We study a local context-based approach that disambiguates each mention individually based on its local context. We propose an attention-based neural network architecture to estimate the semantic similarity between a mention's local context and its entity candidates. Our model utilizes Wikipedia hyperlinks as the training data and obtains competitive performance on different benchmark datasets. Third, we investigate a collective EL approach, which utilizes the semantic relatedness between entities to collectively resolve the mentions' ambiguity. We first analyze the semantic coherence between entities in a document. In contrast to the assumptions made in previous works, our analysis reveals that not all entities (in a document) are highly related to each other. This insight leads us to relax the coherence constraint and develop a significantly faster and more effective collecting linking algorithm. Finally, we study a special setting of EL in which the disambiguation is based on the matching between the mentions and entity names. This setting is commonly seen in particular applications such as biomedical concept, product name, and job title normalizations. In this setting, we focus on learning semantic representations for entity names such that representations of synonymous names are close to each other. We then evaluate the learned representations in the biomedical concept linking task. All in all, despite the problems of NER and EL have been established and investigated for the last decade, this thesis contributes several key ideas that could further improve the performance and shed light on few potential directions for future work.
URI: https://hdl.handle.net/10356/136585
DOI: 10.32657/10356/136585
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Minh_thesis.pdf4.8 MBAdobe PDFView/Open

Page view(s) 50

410
Updated on Jun 25, 2022

Download(s) 5

694
Updated on Jun 25, 2022

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.