Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/184512
Title: Facet-aware multi-head mixture-of-experts model for sequential recommendation
Authors: Liu, Mingrui
Zhang, Sixiao
Long, Cheng
Keywords: Computer and Information Science
Issue Date: 2025
Source: Liu, M., Zhang, S. & Long, C. (2025). Facet-aware multi-head mixture-of-experts model for sequential recommendation. 18th ACM International Conference on Web Search and Data Mining (WSDM '25), 127-135. https://dx.doi.org/10.1145/3701551.3703552
Project: MOE-T2EP20221- 0013
MOE-T2EP20220-0011
RG20/24
Conference: 18th ACM International Conference on Web Search and Data Mining (WSDM '25)
Abstract: Sequential recommendation (SR) systems excel at capturing users' dynamic preferences by leveraging their interaction histories. Most existing SR systems assign a single embedding vector to each item to represent its features, and various types of models are adopted to combine these item embeddings into a sequence representation vector to capture the user intent. However, we argue that this representation alone is insufficient to capture an item's multi-faceted nature (e.g., movie genres, starring actors). Besides, users often exhibit complex and varied preferences within these facets (e.g., liking both action and musical films in the facet of genre), which are challenging to fully represent. To address the issues above, we propose a novel structure called Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation (FAME). We leverage sub-embeddings from each head in the last multi-head attention layer to predict the next item separately. A gating mechanism integrates recommendations from each head and dynamically determines their importance. Furthermore, we introduce a Mixture-of-Experts (MoE) network in each attention head to disentangle various user preferences within each facet. Each expert within the MoE focuses on a specific preference. A learnable router network is adopted to compute the importance weight for each expert and aggregate them. We conduct extensive experiments on four public sequential recommendation datasets and the results demonstrate the effectiveness of our method over existing baseline models.
URI: https://hdl.handle.net/10356/184512
ISBN: 9798400713293
DOI: 10.1145/3701551.3703552
Schools: College of Computing and Data Science 
Rights: © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1145/3701551.3703552.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Conference Papers

Files in This Item:
File Description SizeFormat 
3701551.3703552.pdf1.52 MBAdobe PDFView/Open

Page view(s)

18
Updated on May 7, 2025

Download(s)

1
Updated on May 7, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.