Please use this identifier to cite or link to this item:
|Title:||Extract, integrate and search healthcare knowledge from the web||Authors:||Tan, Kang Zhuang.||Keywords:||DRNTU::Engineering::Computer science and engineering||Issue Date:||2013||Abstract:||In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questions that have already been resolved. With the increased usage of different CQA portals, there is a need for a common avenue to congregate information from these different websites to serve convenience for users to be able to retrieve answers without the hassle of going through many websites, especially in the field of health care where time can be a crucial factor. In addition, a better search technique to retrieve the most helpful health care information is important. Therefore in this project, we look at how to extract and distill CQA knowledge of high quality on the Web to build a database on health care, integrate the different type of data and search the data to answer queries from users. To extract information, Java-programmed web crawlers are implemented to retrieve a total of six hundred thousand QA pairs from various websites. They are stored as XML files and questions without answers are removed before the remaining useful data was indexed by using Lucene, a Java Information Retrieval library, so that they can be used for searching.||URI:||http://hdl.handle.net/10356/52527||Rights:||Nanyang Technological University||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Student Reports (FYP/IA/PA/PI)|
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.