Please use this identifier to cite or link to this item:
Title: Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
Authors: Chen, Chunyang
Keywords: DRNTU::Engineering::Computer science and engineering::Software::Software engineering
Issue Date: 2018
Source: Chen, C. (2017). Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the most popular Q&A site, Stack Overflow has accumulated abundant software development knowledge. Effectively leveraging such a big data can help developers reuse the experience there to further improve their working efficiency. However, the rich yet unstructured large-scale data in Stack Overflow makes it difficult to search due to two reasons. First, there are too many questions and answers within the site, and there may be lingual gap (the same meaning can be written in different languages) between the query and content in Stack Overflow. In addition, the decay of information quality such as misspelling, inconsistency, and abuse of domain-specific abbreviations aggravates the search performance. Second, some higher-order knowledge in Stack Overflow is implicit for searching and it needs certain distillation from existing raw data. In this thesis, I present methods for supporting developers’ information search over Stack Overflow. To overcome the lexical gap and information decay, I also develop an edit recommendation tool to ensure the post quality of Stack Overflow so that posts can be more easily searched by the query. But such explicit information search still requires developers to read, understand and summarize, which is time-consuming. So I propose to shift from the document (information) search to entity (knowledge) search by mining the implicit knowledge from tags in Stack Overflow to render direct answers to developers instead of asking them to read lengthy documents. I first build a basic software-specific knowledge graph including thousands of software-engineering terms and their associations by association rule mining and community detection. Then, I enrich the knowledge graph with more fine-grained relationships i.e., analogy among different third-party libraries. Finally, I combine both semantic and lexical information to infer morphological forms of software terms so that the knowledge graph is more robust for knowledge search.
DOI: 10.32657/10356/75873
Schools: School of Computer Science and Engineering 
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
thesis.pdfthesis6.1 MBAdobe PDFThumbnail

Page view(s) 50

Updated on Dec 9, 2023

Download(s) 1

Updated on Dec 9, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.