TY - GEN
T1 - COHESION
T2 - 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025
AU - Xu, Jinfeng
AU - Chen, Zheyu
AU - Wang, Wei
AU - Hu, Xiping
AU - Kim, Sang Wook
AU - Ngai, Edith C.H.
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/7/13
Y1 - 2025/7/13
N2 - Recent works in multimodal recommendations, which leverage diverse modal information to address data sparsity and enhance recommendation accuracy, have garnered considerable interest. Two key processes in multimodal recommendations are modality fusion and representation learning. Previous approaches in modality fusion often employ simplistic attentive or pre-defined strategies at early or late stages, failing to effectively handle irrelevant information among modalities. In representation learning, prior research has constructed heterogeneous and homogeneous graph structures encapsulating user-item, user-user, and item-item relationships to better capture user interests and item profiles. Modality fusion and representation learning were considered as two independent processes in previous work. This paper reveals that these two processes are complementary and can support each other. Specifically, powerful representation learning enhances modality fusion, while effective fusion improves representation quality. Stemming from these two processes, we introduce a COmposite grapH convolutional nEtwork with dual-stage fuSION for the multimodal recommendation, named COHESION. Specifically, it introduces a dual-stage fusion strategy to reduce the impact of irrelevant information, refining all modalities using behavior modality in the early stage and fusing their representations at the late stage. It also proposes a composite graph convolutional network that utilizes user-item, user-user, and item-item graphs to extract heterogeneous and homogeneous latent relationships within users and items. Besides, it introduces a novel adaptive optimization to ensure balanced and reasonable representations across modalities. Extensive experiments on three public datasets demonstrate the significant superiority of COHESION over various competitive baselines.
AB - Recent works in multimodal recommendations, which leverage diverse modal information to address data sparsity and enhance recommendation accuracy, have garnered considerable interest. Two key processes in multimodal recommendations are modality fusion and representation learning. Previous approaches in modality fusion often employ simplistic attentive or pre-defined strategies at early or late stages, failing to effectively handle irrelevant information among modalities. In representation learning, prior research has constructed heterogeneous and homogeneous graph structures encapsulating user-item, user-user, and item-item relationships to better capture user interests and item profiles. Modality fusion and representation learning were considered as two independent processes in previous work. This paper reveals that these two processes are complementary and can support each other. Specifically, powerful representation learning enhances modality fusion, while effective fusion improves representation quality. Stemming from these two processes, we introduce a COmposite grapH convolutional nEtwork with dual-stage fuSION for the multimodal recommendation, named COHESION. Specifically, it introduces a dual-stage fusion strategy to reduce the impact of irrelevant information, refining all modalities using behavior modality in the early stage and fusing their representations at the late stage. It also proposes a composite graph convolutional network that utilizes user-item, user-user, and item-item graphs to extract heterogeneous and homogeneous latent relationships within users and items. Besides, it introduces a novel adaptive optimization to ensure balanced and reasonable representations across modalities. Extensive experiments on three public datasets demonstrate the significant superiority of COHESION over various competitive baselines.
KW - Dual-Stage Fusion
KW - Multimodal
KW - Recommender System
UR - http://www.scopus.com/pages/publications/105011829275
U2 - 10.1145/3726302.3729927
DO - 10.1145/3726302.3729927
M3 - Conference contribution
AN - SCOPUS:105011829275
T3 - SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 1830
EP - 1839
BT - SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
Y2 - 13 July 2025 through 18 July 2025
ER -