Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/166482
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Eng, Jing Keat | en_US |
dc.date.accessioned | 2023-05-02T04:50:07Z | - |
dc.date.available | 2023-05-02T04:50:07Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Eng, J. K. (2023). Interpretable vector language models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166482 | en_US |
dc.identifier.uri | https://hdl.handle.net/10356/166482 | - |
dc.description.abstract | Natural Language Processing (NLP) is a branch of computer science that focuses on the development of algorithms for understanding, interpreting, and generating human language texts. A crucial technique in NLP is word embedding, where models such as Word2Vec and GloVe assign vectors to words in a vocabulary such that the Euclidean space structure (norms and angles of word vectors) aligns with the semantic structure of the training corpus. Despite their effectiveness, the individual entries of word embedding models are difficult to interpret due to the simultaneous rotation of all pre-trained word vectors preserves norms and angles while mixing up individual entries. In this study, we proposed a novel approach for generating word embeddings with interpretable entries. To achieve it, we introduced a metric to quantify the interpretability of a word embedding model. Additionally, we connected the interpretability of a word embedding model to a specific loss function defined on the Lie group SO(d). We then compared three loss functions, namely, the Varimax loss function inspired by factor analysis, the l1-norm, and a combination of the two. Our results showed that the Varimax loss function yielded word embeddings with the highest interpretability among the three methods, as it maximizes the sum of the variances of squared entries, enabling successful interpretation of some columns in the resulting word embedding matrices. This study offers insights into generating interpretable word embeddings while preserving semantic structure. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Nanyang Technological University | en_US |
dc.subject | Science::Mathematics::Algebra | en_US |
dc.subject | Science::Mathematics::Applied mathematics::Data visualization | en_US |
dc.title | Interpretable vector language models | en_US |
dc.type | Final Year Project (FYP) | en_US |
dc.contributor.supervisor | Fedor Duzhin | en_US |
dc.contributor.school | School of Physical and Mathematical Sciences | en_US |
dc.description.degree | Bachelor of Science in Mathematical Sciences | en_US |
dc.contributor.supervisoremail | FDuzhin@ntu.edu.sg | en_US |
item.grantfulltext | restricted | - |
item.fulltext | With Fulltext | - |
Appears in Collections: | SPMS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Final_Year_Project.pdf Restricted Access | 765.13 kB | Adobe PDF | View/Open |
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.