Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/2373
Title: Structured web indexing
Authors: Li, Xu.
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Issue Date: 2000
Abstract: The rapid growth of Web information and applications has made the Web not only an important source of information but also a hub for e-commerce activities. However, the current unstructured web documents in the form of HTML files have limited support for advanced web applications. To overcome this shortcoming, the future web documents will likely be formatted in XML and existing HTML documents will gradually be converted to XML documents. With XML, the structure of web documents in form of DTDs can be provided as input to a search engine allowing the latter to exploit the structural knowledge in its query processing. In this report, we propose a query model that supports expressive queries on XML documents that share some common DTDs. As XML documents can embed well-structured links among one another, the query model also supports queries involving inter-document links. With both intra- and inter-document structures in our proposed query model, it is clear that the conventional indexing techniques can no longer be adequate. We have therefore designed a new indexing scheme that is built upon both the content and structures of XML documents. Based on the new indexing scheme, a new search engine that supports queries on the content and structures of web documents has been developed.
URI: http://hdl.handle.net/10356/2373
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
LiXu00.pdf
  Restricted Access
Main report26.72 MBAdobe PDFView/Open

Page view(s)

280
checked on Sep 28, 2020

Download(s)

6
checked on Sep 28, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.