Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/70552
Title: Information extraction and analysis of DBLP data
Authors: Neo, Lynette Shi Yun
Keywords: DRNTU::Engineering::Computer science and engineering
Issue Date: 2017
Abstract: There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done during the course of this final year project. There are two main objectives for this project. The first is to parse an XML file containing DBLP bibliography data into CSV files and load them into a relational database. The second is to do data analytics and mining on the data to extract useful information. For the second objective of this project, three data analytics tasks were done to analyse the DBLP data. This project aims to analyse the collaboration between authors of the DBLP community. In this project, the collaboration network of the authors was analysed to show the trend in collaboration between authors. Next, the collaborators of individual authors were obtained to analyse if there was a relation between the authors and their collaborators. Lastly, topic modelling was done on the titles of the publications and the topics are used to suggest collaborators for authors based on the past topics where the author had published in. This report then discusses the results of these analysis done and conclude with suggestions to future work.
URI: http://hdl.handle.net/10356/70552
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Final_Report.pdf
  Restricted Access
2.87 MBAdobe PDFView/Open

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.