COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation

Jinfeng Xu, Zheyu Chen, Wei Wang, Xiping Hu, Sang Wook Kim, Edith C.H. Ngai*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent works in multimodal recommendations, which leverage diverse modal information to address data sparsity and enhance recommendation accuracy, have garnered considerable interest. Two key processes in multimodal recommendations are modality fusion and representation learning. Previous approaches in modality fusion often employ simplistic attentive or pre-defined strategies at early or late stages, failing to effectively handle irrelevant information among modalities. In representation learning, prior research has constructed heterogeneous and homogeneous graph structures encapsulating user-item, user-user, and item-item relationships to better capture user interests and item profiles. Modality fusion and representation learning were considered as two independent processes in previous work. This paper reveals that these two processes are complementary and can support each other. Specifically, powerful representation learning enhances modality fusion, while effective fusion improves representation quality. Stemming from these two processes, we introduce a COmposite grapH convolutional nEtwork with dual-stage fuSION for the multimodal recommendation, named COHESION. Specifically, it introduces a dual-stage fusion strategy to reduce the impact of irrelevant information, refining all modalities using behavior modality in the early stage and fusing their representations at the late stage. It also proposes a composite graph convolutional network that utilizes user-item, user-user, and item-item graphs to extract heterogeneous and homogeneous latent relationships within users and items. Besides, it introduces a novel adaptive optimization to ensure balanced and reasonable representations across modalities. Extensive experiments on three public datasets demonstrate the significant superiority of COHESION over various competitive baselines.

Original languageEnglish
Title of host publicationSIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages1830-1839
Number of pages10
ISBN (Electronic)9798400715921
DOIs
Publication statusPublished - 13 Jul 2025
Event48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025 - Padua, Italy
Duration: 13 Jul 202518 Jul 2025

Publication series

NameSIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025
Country/TerritoryItaly
CityPadua
Period13/07/2518/07/25

Keywords

  • Dual-Stage Fusion
  • Multimodal
  • Recommender System

Fingerprint

Dive into the research topics of 'COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation'. Together they form a unique fingerprint.

Cite this