Please use this identifier to cite or link to this item:
|Title:||Event detection from social media on COVID-19||Authors:||Ho, Yin Wee||Keywords:||Engineering::Computer science and engineering::Computing methodologies::Document and text processing||Issue Date:||2022||Publisher:||Nanyang Technological University||Source:||Ho, Y. W. (2022). Event detection from social media on COVID-19. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156483||Abstract:||Event detection has been one of the most important research topics in social media analysis this decade due to the widespread availability of rich data generated by social media platforms. These platforms have become a major source of information describing real-world and trending events. However, major challenges are faced in detecting events due to the dynamic nature and high volume of data production in social media streams. Previously, most works were either applicable to detect breaking news or localised events, only to overlook on other significant events. Furthermore, these works were focused on processing Twitter data and the same techniques cannot be directly adopted for Facebook data. In this project, we implemented an event detection system based on word embeddings, adapted for detecting events in our Facebook dataset. This system is comprised of 1) Stream Splitter, 2) Word Embedder and Document Clustering (within individual time windows), 3) Document Clustering (across all time windows) and 4) Event Summarisation. In 1), we first performed some natural language processing on our data before splitting them into separate time windows. Next, we embedded our documents with 3 different models: Skip-gram, TF-IDF and GloVe, and clustered the documents within their individual time windows using a modified version of the Jarvis-Patrick clustering algorithm. Document similarity was determined by finding the cosine similarity score of any pair of documents and placing them in the same event cluster if their score was above a certain threshold. In 3), we applied the same techniques used in the previous component but now we clustered the event clusters across the entire time frame. Finally, the last component extracted a representative post, as well as the top 5 most frequent occurring words, that describes the event cluster. After tuning the hyperparameters to obtain the best possible set of results for each model, we found out that TF-IDF produced the highest quality events but was only able to detect a moderate number of events. In contrast, Skip-gram and GloVe were able to produce more events with slightly lower quality but more work is needed to filter out events that are not as significant. Finally, we also tracked the development of some sample topics over time and the public’s reactions to them. These insights can help to qualify the public’s perception of certain topics which can aid in shaping the authorities’ approach when introducing them to the public.||URI:||https://hdl.handle.net/10356/156483||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Student Reports (FYP/IA/PA/PI)|
Updated on May 27, 2022
Updated on May 27, 2022
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.