Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/137909
Title: | Information extraction from bibliography data | Authors: | Ng, Jian Cheng | Keywords: | Engineering::Computer science and engineering::Data Engineering::Computer science and engineering::Software |
Issue Date: | 2020 | Publisher: | Nanyang Technological University | Project: | SCE19-0333 | Abstract: | Digital Bibliography and Library Project (DBLP) is an online service which provides rich amounts of information in various Computer Science publications. This project aims to build a sentiment analysis model to analyse the polarity of an author’s comment on a citation using the publications in the DBLP dataset. This aim can be achieved in the following steps. Firstly, the DBLP XML file was parsed using StAX Parser to extract relevant features before loading into MySQL database. Secondly, data analytics was conducted to understand the DBLP data to discover interesting insights that DBLP data might have. These insights include analysing the distribution of publication, author’s experience, collaborator analysis and prediction and Topic Modelling. Thirdly, the sentiment analysis model was built using various approaches. Before building the model, sentiment text was collected from the publications in the DBLP dataset, and their polarity will be determined based on their direct mentions to another paper, or a list of common positive and negative unigrams and bigram. After collection of the dataset, the model was then built upon various approaches. These approaches include Lexicon Based Approach using TextBlob and VADER Sentiment, Deep Learning Approach using LSTM, and Machine Learning Approach using Decision Tree, Logistic Regression and Naïve Bayes. The parameters were fine tuned to their best accuracy. A comparison between the different models was evaluated using precision and recall. Lastly, a GUI was built to facilitate querying for publication by their name, author, field of study or year of publication. Publicly available PDF file will be downloaded to analyse sentences containing citations. These sentences will have their polarity classified based on the sentiment analysis model. | URI: | https://hdl.handle.net/10356/137909 | Schools: | School of Computer Science and Engineering | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
CZ4079 FYP Amended Final Report .pdf Restricted Access | 2.24 MB | Adobe PDF | View/Open |
Page view(s)
218
Updated on Mar 28, 2024
Download(s) 50
35
Updated on Mar 28, 2024
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.