CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Jiahao Zhao; Jingwei Zhu; Minghuan Tan; Min Yang; Renhao Li; Di Yang; Chenhao Zhang; Guancheng Ye; Chengming Li; Xiping Hu; Derek F. Wong

CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Jiahao Zhao, Jingwei Zhu, Minghuan Tan, Min Yang, Renhao Li, Di Yang, Chenhao Zhang, Guancheng Ye, Chengming Li, Xiping Hu, Derek F. Wong

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese examination systems. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. We collect 22k questions from 39 psychology-related subjects across four Chinese examination systems. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques. Furthermore, we evaluate a range of existing large language models (LLMs), spanning from open-sourced to proprietary models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.

源语言	英语
主期刊名	Main Conference
编辑	Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
出版商	Association for Computational Linguistics (ACL)
页	11248-11260
页数	13
ISBN（电子版）	9798891761964
出版状态	已出版 - 2025
已对外发布	是
活动	31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, 阿拉伯联合酋长国期限: 19 1月 2025 → 24 1月 2025

出版系列

姓名	Proceedings - International Conference on Computational Linguistics, COLING
卷	Part F206484-1
ISSN（印刷版）	2951-2093

会议

会议	31st International Conference on Computational Linguistics, COLING 2025
国家/地区	阿拉伯联合酋长国
市	Abu Dhabi
时期	19/01/25 → 24/01/25

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhao, J., Zhu, J., Tan, M., Yang, M., Li, R., Yang, D., Zhang, C., Ye, G., Li, C., Hu, X., & Wong, D. F. (2025). CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations. 在 O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. Di Eugenio, & S. Schockaert (编辑), Main Conference (页码 11248-11260). (Proceedings - International Conference on Computational Linguistics, COLING; 卷 Part F206484-1). Association for Computational Linguistics (ACL).

Zhao, Jiahao ; Zhu, Jingwei ; Tan, Minghuan 等. / CPsyExam : A Chinese Benchmark for Evaluating Psychology using Examinations. Main Conference. 编辑 / Owen Rambow ; Leo Wanner ; Marianna Apidianaki ; Hend Al-Khalifa ; Barbara Di Eugenio ; Steven Schockaert. Association for Computational Linguistics (ACL), 2025. 页码 11248-11260 (Proceedings - International Conference on Computational Linguistics, COLING).

@inproceedings{406c33eaa6544a89bb12def951d7ce66,

title = "CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations",

abstract = "In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese examination systems. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. We collect 22k questions from 39 psychology-related subjects across four Chinese examination systems. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques. Furthermore, we evaluate a range of existing large language models (LLMs), spanning from open-sourced to proprietary models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.",

author = "Jiahao Zhao and Jingwei Zhu and Minghuan Tan and Min Yang and Renhao Li and Di Yang and Chenhao Zhang and Guancheng Ye and Chengming Li and Xiping Hu and Wong, \{Derek F.\}",

note = "Publisher Copyright: {\textcopyright} 2025 Association for Computational Linguistics.; 31st International Conference on Computational Linguistics, COLING 2025 ; Conference date: 19-01-2025 Through 24-01-2025",

year = "2025",

language = "English",

series = "Proceedings - International Conference on Computational Linguistics, COLING",

publisher = "Association for Computational Linguistics (ACL)",

pages = "11248--11260",

editor = "Owen Rambow and Leo Wanner and Marianna Apidianaki and Hend Al-Khalifa and \{Di Eugenio\}, Barbara and Steven Schockaert",

booktitle = "Main Conference",

address = "United States",

}

Zhao, J, Zhu, J, Tan, M, Yang, M, Li, R, Yang, D, Zhang, C, Ye, G, Li, C, Hu, X & Wong, DF 2025, CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations. 在 O Rambow, L Wanner, M Apidianaki, H Al-Khalifa, B Di Eugenio & S Schockaert (编辑), Main Conference. Proceedings - International Conference on Computational Linguistics, COLING, 卷 Part F206484-1, Association for Computational Linguistics (ACL), 页码 11248-11260, 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, 阿拉伯联合酋长国, 19/01/25.

CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations. / Zhao, Jiahao; Zhu, Jingwei; Tan, Minghuan 等.
Main Conference. 编辑 / Owen Rambow; Leo Wanner; Marianna Apidianaki; Hend Al-Khalifa; Barbara Di Eugenio; Steven Schockaert. Association for Computational Linguistics (ACL), 2025. 页码 11248-11260 (Proceedings - International Conference on Computational Linguistics, COLING; 卷 Part F206484-1).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - CPsyExam

T2 - 31st International Conference on Computational Linguistics, COLING 2025

AU - Zhao, Jiahao

AU - Zhu, Jingwei

AU - Tan, Minghuan

AU - Yang, Min

AU - Li, Renhao

AU - Yang, Di

AU - Zhang, Chenhao

AU - Ye, Guancheng

AU - Li, Chengming

AU - Hu, Xiping

AU - Wong, Derek F.

PY - 2025

Y1 - 2025

N2 - In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese examination systems. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. We collect 22k questions from 39 psychology-related subjects across four Chinese examination systems. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques. Furthermore, we evaluate a range of existing large language models (LLMs), spanning from open-sourced to proprietary models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.

AB - In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese examination systems. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. We collect 22k questions from 39 psychology-related subjects across four Chinese examination systems. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques. Furthermore, we evaluate a range of existing large language models (LLMs), spanning from open-sourced to proprietary models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.

UR - http://www.scopus.com/pages/publications/85218505520

M3 - Conference contribution

AN - SCOPUS:85218505520

T3 - Proceedings - International Conference on Computational Linguistics, COLING

SP - 11248

EP - 11260

BT - Main Conference

A2 - Rambow, Owen

A2 - Wanner, Leo

A2 - Apidianaki, Marianna

A2 - Al-Khalifa, Hend

A2 - Di Eugenio, Barbara

A2 - Schockaert, Steven

PB - Association for Computational Linguistics (ACL)

Y2 - 19 January 2025 through 24 January 2025

ER -

Zhao J, Zhu J, Tan M, Yang M, Li R, Yang D 等. CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations. 在 Rambow O, Wanner L, Apidianaki M, Al-Khalifa H, Di Eugenio B, Schockaert S, 编辑, Main Conference. Association for Computational Linguistics (ACL). 2025. 页码 11248-11260. (Proceedings - International Conference on Computational Linguistics, COLING).