Please use this identifier to cite or link to this item:
Title: A study on content similarity between web pages
Authors: He, Shanshan.
Keywords: DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2008
Abstract: Searching for information in World Wide Web can be tedious sometime, even with the help of web search engine. In this project, the author will introduce the operation of web search engine, more focusing on web crawler. This project helps readers to understand of the correlation between the content similarity level and the inter-webpage hyperlinks. Such understanding may be of great help to the future developments of web crawlers. One major challenge was to design programs, which was new to the author. The author’s project will describe in full details how to design these two programs using C# programming language and Microsoft Visual Studio 2005. Three methods were use to test whether directly connected web pages really have a higher content similarity level than that of non-directly connect web pages. The results gathered from the tests were tabulated and graphically presented to illustrate better.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.34 MBAdobe PDFView/Open

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.