Please use this identifier to cite or link to this item:
Title: Document reference and citation analysis
Authors: Yap, Lina.
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Issue Date: 2012
Abstract: With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just retrieve an article of interest, and system should based on that article recommend related articles. With this idea in mind, the main goals were outlined to first retrieve two hops of articles based on the selected document, then find clusters of similar papers defined by a set of features, and finally rank and recommend papers. For experimental purposes, the data set used in this project was in the field of Biomedical and Life Sciences, downloaded from PMC Open Access Subset. The features selected to cluster the dataset were namely bibliographic coupling degree, and degree of similarity for title's, and degree of similarity for abstract's topic vector. Bibliographic coupling degree is defined as the number of matching outgoing citation between two articles. Topic vector, obtained through a Topic Modelling library, was then computed for its cosine angle to determine the degree of similarity. This project also employed an interesting No-SQL database, MongoDB, for persisting articles. One of the limitation of this project was evaluating the relevancy of recommendation. While PMC Open Access Subset had provided a large data set enough to gather sufficient articles from two hops, it would require personnel who are highly knowledgeable in this field to evaluate if the recommended articles were relevant. Moreover, the prototype was limited by only three features and could have been further enhanced by using better criteria to select better results. For example, weight of each citation in the feature vector could be represented count of the number of times referenced in an article, instead of representing the feature vector in a binary form.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.74 MBAdobe PDFView/Open

Page view(s)

checked on Sep 30, 2020


checked on Sep 30, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.