TY - JOUR
T1 - FLPC
T2 - Fusing language and point cloud for 3D object classification
AU - Gan, Xiaozheng
AU - Song, Chengtian
AU - Li, Jili
AU - Pan, Lizhi
AU - Xu, Keyu
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/1/15
Y1 - 2026/1/15
N2 - This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 %. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 %. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.
AB - This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 %. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 %. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.
KW - Attention mechanism
KW - Classification
KW - Multimodal fusion
KW - Point cloud
UR - http://www.scopus.com/pages/publications/105010852452
U2 - 10.1016/j.eswa.2025.128430
DO - 10.1016/j.eswa.2025.128430
M3 - Article
AN - SCOPUS:105010852452
SN - 0957-4174
VL - 296
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 128430
ER -