Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/147544
Title: | Fake news detection using social media data | Authors: | Widjaja, Elbert | Keywords: | Engineering::Computer science and engineering | Issue Date: | 2021 | Publisher: | Nanyang Technological University | Source: | Widjaja, E. (2021). Fake news detection using social media data. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/147544 | Project: | SCSE20-0449 | Abstract: | Along with the large transition to the online social media market, a large number of "Fake News", i.e., articles that purposefully contain false information, are being spread across the network[1]. Fake news can be produced for many purposes, such as financial or political gain, and can have a negative impact on society. Therefore, to mitigate the negative impact of fake news, it is crucial to develop a method to detect fake news on social media. This project involves discovering the best "state-of-art" machine learning model that can be used to detect Fake News in social media. By researching and analyzing several data sources, experimenting on the past model used and exploring new models using Transformers, this project aims to determine which models were the most optimal to classify news into their respective classes accurately. In this report, the author will review multiple data sources and applying multiple exploratory data analysis to filter out biased dataset. The author created three crucial metrics to inspect the dataset: Amount of data, credibility, and bias. By applying the above techniques and metrics, the author was able to determine the best data sources that are unbiased and fit to be trained. This report will also explore the pre-processing steps done to news articles. After research, the author found out that the level of text preprocessing needed was determined by the data domain and data amount. By implementing multiple versions of data pre-processing, the author was able to grasp the dataset domain and was able to use the most optimal data pre-processing method. Furthermore, based on this experiment, the author was also able to determine a trend or pattern, of which pairings of combinations between each machine learning algorithm and the corresponding preprocessing technique would be the best to obtain the highest accuracy. For this experiment, multiple machine learning algorithms such as Naïve Bayes, Word Embedding LSTM, and the new transformer model will be introduced. To evaluate the model's performance, the author will split the data into three sets: train, validation, and test to further mitigate the overfit and reduce bias. With accuracy as the model's main metrics, the author also had multiple metrics to support the verdict, such as F1-score, precision, recall, and MCC. These metrics will further support the author's decision in determining the best model without concern about overfitting. The experiment results reflect that the newest model developed, transformers perform the best amongst all models. The models consistently perform at the highest benchmark, ultimately surpassing the previous model developed from the range of 5 to 15%. The transformer models performed at the highest accuracy of around 87-88% consistently without overfitting and while using a standard base-parameters. The results indicate that the transformers model (particularly ELECTRA and BERT) is the best "state-of-art" machine learning model for fake news classification problems. The experiments also imply that further research and experiment can be done with a larger parameter, combining with generative upscaling and sentiment analysis, to obtain even higher performance. | URI: | https://hdl.handle.net/10356/147544 | Schools: | School of Computer Science and Engineering | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
FYP - Elbert Widjaja - Fake News Detection using Social Media Data.pdf Restricted Access | 1.59 MB | Adobe PDF | View/Open |
Page view(s) 20
762
Updated on Mar 17, 2025
Download(s) 50
81
Updated on Mar 17, 2025
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.