A Multi-Task Learning Framework for Reading Comprehension of Scientific Tabular Data

Xu Yang, Meihui Zhang*, Ju Fan, Zeyu Luo, Yuxin Yang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Tabular data in scientific papers provides valuable structured information for knowledge discovery and validation. Although the language models such as BERT and ChatGPT have significantly advanced the research on general domain tables, challenges remain in scientific tables. Specifically, such models have limitations in understanding scientific entities, as well as lacks numerical representation and computation capabilities. Previous studies have focused on scientific tables, but they are limited to individual modules or tasks and lack a comprehensive framework. To address these issues, we introduce a reading comprehension framework for scientific tables, named NRTR, which uses a multi-task learning approach that shares a common encoder, achieves reasoning across various tasks, including question answering, cloze testing, and fact verification. It has the following characteristics: (1) utilizing entity linking and named entity recognition to extract key information from papers, which enhances the models' understanding of scientific entities; (2) injecting numerical representation capabilities into language models and promoting the model's understanding of the relative magnitude of numbers to better reason about maximum and difference values. Notably, the existing scientific corpus lacks tabular contexts or does not integrate computational reasoning, which hinders the evaluation of reasoning models in scientific tables. To this end, we release SciTab, a multi-task dataset that merges high-quality scientific tables with contextual information to provide a benchmark for future research. Our experimental results show that NRTR outperforms existing models on SciTab.

源语言英语
主期刊名Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
出版商IEEE Computer Society
3710-3724
页数15
ISBN(电子版)9798350317152
DOI
出版状态已出版 - 2024
活动40th IEEE International Conference on Data Engineering, ICDE 2024 - Utrecht, 荷兰
期限: 13 5月 202417 5月 2024

出版系列

姓名Proceedings - International Conference on Data Engineering
ISSN(印刷版)1084-4627
ISSN(电子版)2375-0286

会议

会议40th IEEE International Conference on Data Engineering, ICDE 2024
国家/地区荷兰
Utrecht
时期13/05/2417/05/24

指纹

探究 'A Multi-Task Learning Framework for Reading Comprehension of Scientific Tabular Data' 的科研主题。它们共同构成独一无二的指纹。

引用此