FLPC: Fusing language and point cloud for 3D object classification

Xiaozheng Gan; Chengtian Song; Jili Li; Lizhi Pan; Keyu Xu

doi:10.1016/j.eswa.2025.128430

FLPC: Fusing language and point cloud for 3D object classification

Xiaozheng Gan, Chengtian Song^*, Jili Li, Lizhi Pan, Keyu Xu

^*Corresponding author for this work

School of Mechatronical Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 %. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 %. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.

Original language	English
Article number	128430
Journal	Expert Systems with Applications
Volume	296
DOIs	http://doi.org/10.1016/j.eswa.2025.128430
Publication status	Published - 15 Jan 2026

Keywords

Attention mechanism
Classification
Multimodal fusion
Point cloud

Access to Document

10.1016/j.eswa.2025.128430

Cite this

@article{5866c58f2d534a06909fcd322024e710,

title = "FLPC: Fusing language and point cloud for 3D object classification",

abstract = "This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 \%. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 \%. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.",

keywords = "Attention mechanism, Classification, Multimodal fusion, Point cloud",

author = "Xiaozheng Gan and Chengtian Song and Jili Li and Lizhi Pan and Keyu Xu",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2026",

month = jan,

day = "15",

doi = "10.1016/j.eswa.2025.128430",

language = "English",

volume = "296",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - FLPC

T2 - Fusing language and point cloud for 3D object classification

AU - Gan, Xiaozheng

AU - Song, Chengtian

AU - Li, Jili

AU - Pan, Lizhi

AU - Xu, Keyu

PY - 2026/1/15

Y1 - 2026/1/15

N2 - This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 %. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 %. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.

AB - This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 %. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 %. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.

KW - Attention mechanism

KW - Classification

KW - Multimodal fusion

KW - Point cloud

UR - http://www.scopus.com/pages/publications/105010852452

U2 - 10.1016/j.eswa.2025.128430

DO - 10.1016/j.eswa.2025.128430

M3 - Article

AN - SCOPUS:105010852452

SN - 0957-4174

VL - 296

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 128430

ER -

FLPC: Fusing language and point cloud for 3D object classification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this