COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation

Jinfeng Xu; Zheyu Chen; Wei Wang; Xiping Hu; Sang Wook Kim; Edith C.H. Ngai

doi:10.1145/3726302.3729927

COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation

Jinfeng Xu, Zheyu Chen, Wei Wang, Xiping Hu, Sang Wook Kim, Edith C.H. Ngai^*

^*此作品的通讯作者

医学技术学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Recent works in multimodal recommendations, which leverage diverse modal information to address data sparsity and enhance recommendation accuracy, have garnered considerable interest. Two key processes in multimodal recommendations are modality fusion and representation learning. Previous approaches in modality fusion often employ simplistic attentive or pre-defined strategies at early or late stages, failing to effectively handle irrelevant information among modalities. In representation learning, prior research has constructed heterogeneous and homogeneous graph structures encapsulating user-item, user-user, and item-item relationships to better capture user interests and item profiles. Modality fusion and representation learning were considered as two independent processes in previous work. This paper reveals that these two processes are complementary and can support each other. Specifically, powerful representation learning enhances modality fusion, while effective fusion improves representation quality. Stemming from these two processes, we introduce a COmposite grapH convolutional nEtwork with dual-stage fuSION for the multimodal recommendation, named COHESION. Specifically, it introduces a dual-stage fusion strategy to reduce the impact of irrelevant information, refining all modalities using behavior modality in the early stage and fusing their representations at the late stage. It also proposes a composite graph convolutional network that utilizes user-item, user-user, and item-item graphs to extract heterogeneous and homogeneous latent relationships within users and items. Besides, it introduces a novel adaptive optimization to ensure balanced and reasonable representations across modalities. Extensive experiments on three public datasets demonstrate the significant superiority of COHESION over various competitive baselines.

源语言	英语
主期刊名	SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
出版商	Association for Computing Machinery, Inc
页	1830-1839
页数	10
ISBN（电子版）	9798400715921
DOI	http://doi.org/10.1145/3726302.3729927
出版状态	已出版 - 13 7月 2025
活动	48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025 - Padua, 意大利期限: 13 7月 2025 → 18 7月 2025

出版系列

姓名	SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

会议

会议	48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025
国家/地区	意大利
市	Padua
时期	13/07/25 → 18/07/25

访问文件

10.1145/3726302.3729927

其它文件与链接

链接到 Scopus 的出版物

引用此

Xu, J., Chen, Z., Wang, W., Hu, X., Kim, S. W., & Ngai, E. C. H. (2025). COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation. 在 SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (页码 1830-1839). (SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery, Inc. http://doi.org/10.1145/3726302.3729927

Xu, Jinfeng ; Chen, Zheyu ; Wang, Wei 等. / COHESION : Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation. SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2025. 页码 1830-1839 (SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval).

@inproceedings{f43e800a835744bb92267a6d6396a194,

title = "COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation",

abstract = "Recent works in multimodal recommendations, which leverage diverse modal information to address data sparsity and enhance recommendation accuracy, have garnered considerable interest. Two key processes in multimodal recommendations are modality fusion and representation learning. Previous approaches in modality fusion often employ simplistic attentive or pre-defined strategies at early or late stages, failing to effectively handle irrelevant information among modalities. In representation learning, prior research has constructed heterogeneous and homogeneous graph structures encapsulating user-item, user-user, and item-item relationships to better capture user interests and item profiles. Modality fusion and representation learning were considered as two independent processes in previous work. This paper reveals that these two processes are complementary and can support each other. Specifically, powerful representation learning enhances modality fusion, while effective fusion improves representation quality. Stemming from these two processes, we introduce a COmposite grapH convolutional nEtwork with dual-stage fuSION for the multimodal recommendation, named COHESION. Specifically, it introduces a dual-stage fusion strategy to reduce the impact of irrelevant information, refining all modalities using behavior modality in the early stage and fusing their representations at the late stage. It also proposes a composite graph convolutional network that utilizes user-item, user-user, and item-item graphs to extract heterogeneous and homogeneous latent relationships within users and items. Besides, it introduces a novel adaptive optimization to ensure balanced and reasonable representations across modalities. Extensive experiments on three public datasets demonstrate the significant superiority of COHESION over various competitive baselines.",

keywords = "Dual-Stage Fusion, Multimodal, Recommender System",

author = "Jinfeng Xu and Zheyu Chen and Wei Wang and Xiping Hu and Kim, \{Sang Wook\} and Ngai, \{Edith C.H.\}",

note = "Publisher Copyright: {\textcopyright} 2025 Copyright held by the owner/author(s).; 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025 ; Conference date: 13-07-2025 Through 18-07-2025",

year = "2025",

month = jul,

day = "13",

doi = "10.1145/3726302.3729927",

language = "English",

series = "SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval",

publisher = "Association for Computing Machinery, Inc",

pages = "1830--1839",

booktitle = "SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval",

}

Xu, J, Chen, Z, Wang, W, Hu, X, Kim, SW & Ngai, ECH 2025, COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation. 在 SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, 页码 1830-1839, 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025, Padua, 意大利, 13/07/25. http://doi.org/10.1145/3726302.3729927

COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation. / Xu, Jinfeng; Chen, Zheyu; Wang, Wei 等.
SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2025. 页码 1830-1839 (SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - COHESION

T2 - 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025

AU - Xu, Jinfeng

AU - Chen, Zheyu

AU - Wang, Wei

AU - Hu, Xiping

AU - Kim, Sang Wook

AU - Ngai, Edith C.H.

PY - 2025/7/13

Y1 - 2025/7/13

N2 - Recent works in multimodal recommendations, which leverage diverse modal information to address data sparsity and enhance recommendation accuracy, have garnered considerable interest. Two key processes in multimodal recommendations are modality fusion and representation learning. Previous approaches in modality fusion often employ simplistic attentive or pre-defined strategies at early or late stages, failing to effectively handle irrelevant information among modalities. In representation learning, prior research has constructed heterogeneous and homogeneous graph structures encapsulating user-item, user-user, and item-item relationships to better capture user interests and item profiles. Modality fusion and representation learning were considered as two independent processes in previous work. This paper reveals that these two processes are complementary and can support each other. Specifically, powerful representation learning enhances modality fusion, while effective fusion improves representation quality. Stemming from these two processes, we introduce a COmposite grapH convolutional nEtwork with dual-stage fuSION for the multimodal recommendation, named COHESION. Specifically, it introduces a dual-stage fusion strategy to reduce the impact of irrelevant information, refining all modalities using behavior modality in the early stage and fusing their representations at the late stage. It also proposes a composite graph convolutional network that utilizes user-item, user-user, and item-item graphs to extract heterogeneous and homogeneous latent relationships within users and items. Besides, it introduces a novel adaptive optimization to ensure balanced and reasonable representations across modalities. Extensive experiments on three public datasets demonstrate the significant superiority of COHESION over various competitive baselines.

AB - Recent works in multimodal recommendations, which leverage diverse modal information to address data sparsity and enhance recommendation accuracy, have garnered considerable interest. Two key processes in multimodal recommendations are modality fusion and representation learning. Previous approaches in modality fusion often employ simplistic attentive or pre-defined strategies at early or late stages, failing to effectively handle irrelevant information among modalities. In representation learning, prior research has constructed heterogeneous and homogeneous graph structures encapsulating user-item, user-user, and item-item relationships to better capture user interests and item profiles. Modality fusion and representation learning were considered as two independent processes in previous work. This paper reveals that these two processes are complementary and can support each other. Specifically, powerful representation learning enhances modality fusion, while effective fusion improves representation quality. Stemming from these two processes, we introduce a COmposite grapH convolutional nEtwork with dual-stage fuSION for the multimodal recommendation, named COHESION. Specifically, it introduces a dual-stage fusion strategy to reduce the impact of irrelevant information, refining all modalities using behavior modality in the early stage and fusing their representations at the late stage. It also proposes a composite graph convolutional network that utilizes user-item, user-user, and item-item graphs to extract heterogeneous and homogeneous latent relationships within users and items. Besides, it introduces a novel adaptive optimization to ensure balanced and reasonable representations across modalities. Extensive experiments on three public datasets demonstrate the significant superiority of COHESION over various competitive baselines.

KW - Dual-Stage Fusion

KW - Multimodal

KW - Recommender System

UR - http://www.scopus.com/pages/publications/105011829275

U2 - 10.1145/3726302.3729927

DO - 10.1145/3726302.3729927

M3 - Conference contribution

AN - SCOPUS:105011829275

T3 - SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 1830

EP - 1839

BT - SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

PB - Association for Computing Machinery, Inc

Y2 - 13 July 2025 through 18 July 2025

ER -

Xu J, Chen Z, Wang W, Hu X, Kim SW, Ngai ECH. COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation. 在 SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc. 2025. 页码 1830-1839. (SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval). doi: 10.1145/3726302.3729927

COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此