TY - JOUR
T1 - Periodic watermarking for copyright protection of large language models in cloud computing security
AU - Ye, Pei Gen
AU - Li, Zhengdao
AU - Yang, Zuopeng
AU - Chen, Pengyu
AU - Zhang, Zhenxin
AU - Li, Ning
AU - Zheng, Jun
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/8
Y1 - 2025/8
N2 - Large Language Models (LLMs) have become integral in advancing content understanding and generation, leading to the proliferation of Embedding as a Service (EaaS) within cloud computing platforms. EaaS leverages LLMs to offer scalable, on-demand linguistic embeddings, enhancing various cloud-based applications. However, the proprietary nature of EaaS makes it a target for model extraction attacks, where the timing of such infringements often remains elusive. This paper introduces TimeMarker, a novel framework that enhances temporal traceability in cloud computing environments by embedding distinct watermarks at different sub-periods, marking the first attempt to identify the timing of model extraction attacks. TimeMarker employs an adaptive watermark strength method based on information entropy and frequency domain transformations to refine the detection accuracy of model extraction attacks within cloud infrastructures. The granularity of time frame identification for theft improves as more sub-periods are used. Furthermore, our approach investigates single sub-period theft and extends to multi-sub-period theft scenarios where attackers steal data across many sub-periods to train their models in cloud settings. Validated across five widely used datasets, TimeMarker is capable of detecting model extraction over various sub-periods and assessing its impact on the accuracy and robustness of large models deployed in the cloud. The results demonstrate that TimeMarker effectively identifies different periods of extraction attacks, enhancing EaaS security within cloud computing and extending traditional watermarking to offer copyright protection for LLMs.
AB - Large Language Models (LLMs) have become integral in advancing content understanding and generation, leading to the proliferation of Embedding as a Service (EaaS) within cloud computing platforms. EaaS leverages LLMs to offer scalable, on-demand linguistic embeddings, enhancing various cloud-based applications. However, the proprietary nature of EaaS makes it a target for model extraction attacks, where the timing of such infringements often remains elusive. This paper introduces TimeMarker, a novel framework that enhances temporal traceability in cloud computing environments by embedding distinct watermarks at different sub-periods, marking the first attempt to identify the timing of model extraction attacks. TimeMarker employs an adaptive watermark strength method based on information entropy and frequency domain transformations to refine the detection accuracy of model extraction attacks within cloud infrastructures. The granularity of time frame identification for theft improves as more sub-periods are used. Furthermore, our approach investigates single sub-period theft and extends to multi-sub-period theft scenarios where attackers steal data across many sub-periods to train their models in cloud settings. Validated across five widely used datasets, TimeMarker is capable of detecting model extraction over various sub-periods and assessing its impact on the accuracy and robustness of large models deployed in the cloud. The results demonstrate that TimeMarker effectively identifies different periods of extraction attacks, enhancing EaaS security within cloud computing and extending traditional watermarking to offer copyright protection for LLMs.
KW - Cloud computing security
KW - Copyright protection
KW - Large language model
KW - Watermark technology
UR - http://www.scopus.com/pages/publications/85218150923
U2 - 10.1016/j.csi.2025.103983
DO - 10.1016/j.csi.2025.103983
M3 - Article
AN - SCOPUS:85218150923
SN - 0920-5489
VL - 94
JO - Computer Standards and Interfaces
JF - Computer Standards and Interfaces
M1 - 103983
ER -