Please use this identifier to cite or link to this item:
Title: Extract, integrate and search healthcare knowledge from the web
Authors: Tan, Kang Zhuang.
Keywords: DRNTU::Engineering::Computer science and engineering
Issue Date: 2013
Abstract: In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questions that have already been resolved. With the increased usage of different CQA portals, there is a need for a common avenue to congregate information from these different websites to serve convenience for users to be able to retrieve answers without the hassle of going through many websites, especially in the field of health care where time can be a crucial factor. In addition, a better search technique to retrieve the most helpful health care information is important. Therefore in this project, we look at how to extract and distill CQA knowledge of high quality on the Web to build a database on health care, integrate the different type of data and search the data to answer queries from users. To extract information, Java-programmed web crawlers are implemented to retrieve a total of six hundred thousand QA pairs from various websites. They are stored as XML files and questions without answers are removed before the remaining useful data was indexed by using Lucene, a Java Information Retrieval library, so that they can be used for searching.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
2.03 MBAdobe PDFView/Open

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.