Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/83091
Title: | HDSKG: Harvesting domain specific knowledge graph from content of webpages | Authors: | Zhao, Xuejiao Xing, Zhenchang Kabir, Muhammad Ashad Sawada, Naoya Li, Jing Lin, Shang-Wei |
Keywords: | Knowledge graph Structural information extraction |
Issue Date: | 2017 | Source: | Zhao, X., Xing, Z., Kabir, M. A., Sawada, N., Li, J., & Lin, S.-W. (2017). HDSKG: Harvesting domain specific knowledge graph from content of webpages. 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), 56-67. | metadata.dc.contributor.conference: | 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) | Abstract: | Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data. | URI: | https://hdl.handle.net/10356/83091 http://hdl.handle.net/10220/42426 |
ISBN: | 978-1-5090-5501-2 | DOI: | 10.1109/SANER.2017.7884609 | Schools: | School of Computer Science and Engineering | Research Centres: | Rolls-Royce@NTU Corporate Lab NTU-UBC Research Centre of Excellence in Active Living for the Elderly |
Rights: | © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [https://doi.org/10.1109/SANER.2017.7884609]. | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Conference Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Finalmain.pdf | 294.47 kB | Adobe PDF | ![]() View/Open |
SCOPUSTM
Citations
10
49
Updated on May 26, 2023
Page view(s) 10
747
Updated on Jun 1, 2023
Download(s) 5
906
Updated on Jun 1, 2023
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.