Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/62806
Title: Intelligent forum search: knowledge discovery through co-occurrence analysis in the forum document set
Authors: Zhang, Danyang
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Issue Date: 2015
Abstract: In many use cases of search engines, users need to deal with large collections of documents from unfamiliar domains. Searching and browsing documents are full of frustration without high familiarity with the domain. Users need a way to get a quick understanding of the key terms and key topics that are particular to that domain of texts. Searching for the documents you do not know, that is discovering new knowledge in unfamiliar domains, is the problem that this project aims to address. We developed an intelligent search engine to equip users with the ability to extract key terms as well as key phrases from a totally new domain of texts, by leveraging co-occurrence analysis. Specifically, we extended the existing Lucene searching engine core, implemented the RAKE phrase extraction algorithm, the document clustering analysis, and the co-occurrence analysis for both terms and phrases. We applied the intelligent search engine to search in the domain of a local forum, which demonstrated the richness and effectiveness of co-occurrence analysis for query term suggestions and query phrase suggestions.
URI: http://hdl.handle.net/10356/62806
Schools: School of Computer Engineering 
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
final_report_final_v1.2.pdf
  Restricted Access
Main Article1.89 MBAdobe PDFView/Open

Page view(s)

416
Updated on Mar 26, 2025

Download(s) 50

34
Updated on Mar 26, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.