Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/1523
Title: Design and development of web crawler for indexing an intranet.
Authors: Lee, Chee Onn.
Keywords: DRNTU::Library and information science::Libraries::Technologies
Issue Date: 2005
Abstract: The paper discusses database and program design of the Web crawler. The Web crawler performance was found to improve by storing the hyperlinks into three different types namely internal HTML URL addresses, internal non-HTML URL addresses and external URL addresses. In addition, the storing of word location such as line number and the word sequence number of the Web page allows phrases or strings of words to be searched and identified effectively. Better handling of comments, scripts and tags by the Web crawler was found to increase the data quality in the data collection process. It was found that enhancing the tags identification modules to handle the tags with attributes allows the Web crawler to be more effective in the retrieval of the data collection process.
URI: http://hdl.handle.net/10356/1523
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:WKWSCI Theses

Files in This Item:
File Description SizeFormat 
LeeCheeOnn05.pdf
  Restricted Access
5.83 MBAdobe PDFView/Open

Page view(s) 50

360
Updated on Oct 27, 2021

Download(s)

11
Updated on Oct 27, 2021

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.