Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/74009
Title: Information extraction from bibliography data
Authors: Toh, Joel Zhu Er
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
DRNTU::Engineering::Computer science and engineering::Information systems::Information interfaces and presentation
Issue Date: 2018
Abstract: DBLP is a computer science bibliography hosted by the University of Trier from Germany. It contains bibliographic information on major computer science journals and proceedings. As of Dec 2017, there were 4,004,065 publications, 2,012,222 authors, 5,263 conferences and 1566 journals. Due to the magnitude of information, it is tedious for users to gain valuable insights and information from the data. In order to bridge this gap, this report consists of 4 main objectives. Firstly, parsing the large DBLP XML file and other datasets into a relational database to accommodate efficient querying. Secondly, an exploration of techniques used to extract author’s career length, ethnicity, area of specialization and gender from the DBLP data. In addition, this paper also explored the data to discover knowledge. Thirdly, modeling the data to perform link prediction to predict who might an author collaborate with in future. This includes improving the existing link prediction methods with the concept of homophily. Fourthly, this report also introduces a web application that was developed for data analysis and data visualization of the DBLP data. This helps users gain insight and make sense of the data. Finally, this report discusses the results from the link prediction and interprets the newly discovered insights.
URI: http://hdl.handle.net/10356/74009
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
TOH_ZHU_ER_JOEL_FYP_Report.pdf
  Restricted Access
Final Year Project Report on Information extraction from bibliography data3.55 MBAdobe PDFView/Open

Page view(s) 50

93
checked on Oct 21, 2020

Download(s) 50

18
checked on Oct 21, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.