Periodic watermarking for copyright protection of large language models in cloud computing security

Pei Gen Ye; Zhengdao Li; Zuopeng Yang; Pengyu Chen; Zhenxin Zhang; Ning Li; Jun Zheng

doi:10.1016/j.csi.2025.103983

Periodic watermarking for copyright protection of large language models in cloud computing security

Pei Gen Ye, Zhengdao Li, Zuopeng Yang, Pengyu Chen, Zhenxin Zhang, Ning Li, Jun Zheng^*

^*此作品的通讯作者

网络空间安全学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Large Language Models (LLMs) have become integral in advancing content understanding and generation, leading to the proliferation of Embedding as a Service (EaaS) within cloud computing platforms. EaaS leverages LLMs to offer scalable, on-demand linguistic embeddings, enhancing various cloud-based applications. However, the proprietary nature of EaaS makes it a target for model extraction attacks, where the timing of such infringements often remains elusive. This paper introduces TimeMarker, a novel framework that enhances temporal traceability in cloud computing environments by embedding distinct watermarks at different sub-periods, marking the first attempt to identify the timing of model extraction attacks. TimeMarker employs an adaptive watermark strength method based on information entropy and frequency domain transformations to refine the detection accuracy of model extraction attacks within cloud infrastructures. The granularity of time frame identification for theft improves as more sub-periods are used. Furthermore, our approach investigates single sub-period theft and extends to multi-sub-period theft scenarios where attackers steal data across many sub-periods to train their models in cloud settings. Validated across five widely used datasets, TimeMarker is capable of detecting model extraction over various sub-periods and assessing its impact on the accuracy and robustness of large models deployed in the cloud. The results demonstrate that TimeMarker effectively identifies different periods of extraction attacks, enhancing EaaS security within cloud computing and extending traditional watermarking to offer copyright protection for LLMs.

源语言	英语
文章编号	103983
期刊	Computer Standards and Interfaces
卷	94
DOI	http://doi.org/10.1016/j.csi.2025.103983
出版状态	已出版 - 8月 2025

访问文件

10.1016/j.csi.2025.103983

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{a0fba4e2b9924e36a886b957973d2c91,

title = "Periodic watermarking for copyright protection of large language models in cloud computing security",

abstract = "Large Language Models (LLMs) have become integral in advancing content understanding and generation, leading to the proliferation of Embedding as a Service (EaaS) within cloud computing platforms. EaaS leverages LLMs to offer scalable, on-demand linguistic embeddings, enhancing various cloud-based applications. However, the proprietary nature of EaaS makes it a target for model extraction attacks, where the timing of such infringements often remains elusive. This paper introduces TimeMarker, a novel framework that enhances temporal traceability in cloud computing environments by embedding distinct watermarks at different sub-periods, marking the first attempt to identify the timing of model extraction attacks. TimeMarker employs an adaptive watermark strength method based on information entropy and frequency domain transformations to refine the detection accuracy of model extraction attacks within cloud infrastructures. The granularity of time frame identification for theft improves as more sub-periods are used. Furthermore, our approach investigates single sub-period theft and extends to multi-sub-period theft scenarios where attackers steal data across many sub-periods to train their models in cloud settings. Validated across five widely used datasets, TimeMarker is capable of detecting model extraction over various sub-periods and assessing its impact on the accuracy and robustness of large models deployed in the cloud. The results demonstrate that TimeMarker effectively identifies different periods of extraction attacks, enhancing EaaS security within cloud computing and extending traditional watermarking to offer copyright protection for LLMs.",

keywords = "Cloud computing security, Copyright protection, Large language model, Watermark technology",

author = "Ye, \{Pei Gen\} and Zhengdao Li and Zuopeng Yang and Pengyu Chen and Zhenxin Zhang and Ning Li and Jun Zheng",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2025",

month = aug,

doi = "10.1016/j.csi.2025.103983",

language = "English",

volume = "94",

journal = "Computer Standards and Interfaces",

issn = "0920-5489",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Periodic watermarking for copyright protection of large language models in cloud computing security

AU - Ye, Pei Gen

AU - Li, Zhengdao

AU - Yang, Zuopeng

AU - Chen, Pengyu

AU - Zhang, Zhenxin

AU - Li, Ning

AU - Zheng, Jun

PY - 2025/8

Y1 - 2025/8

N2 - Large Language Models (LLMs) have become integral in advancing content understanding and generation, leading to the proliferation of Embedding as a Service (EaaS) within cloud computing platforms. EaaS leverages LLMs to offer scalable, on-demand linguistic embeddings, enhancing various cloud-based applications. However, the proprietary nature of EaaS makes it a target for model extraction attacks, where the timing of such infringements often remains elusive. This paper introduces TimeMarker, a novel framework that enhances temporal traceability in cloud computing environments by embedding distinct watermarks at different sub-periods, marking the first attempt to identify the timing of model extraction attacks. TimeMarker employs an adaptive watermark strength method based on information entropy and frequency domain transformations to refine the detection accuracy of model extraction attacks within cloud infrastructures. The granularity of time frame identification for theft improves as more sub-periods are used. Furthermore, our approach investigates single sub-period theft and extends to multi-sub-period theft scenarios where attackers steal data across many sub-periods to train their models in cloud settings. Validated across five widely used datasets, TimeMarker is capable of detecting model extraction over various sub-periods and assessing its impact on the accuracy and robustness of large models deployed in the cloud. The results demonstrate that TimeMarker effectively identifies different periods of extraction attacks, enhancing EaaS security within cloud computing and extending traditional watermarking to offer copyright protection for LLMs.

AB - Large Language Models (LLMs) have become integral in advancing content understanding and generation, leading to the proliferation of Embedding as a Service (EaaS) within cloud computing platforms. EaaS leverages LLMs to offer scalable, on-demand linguistic embeddings, enhancing various cloud-based applications. However, the proprietary nature of EaaS makes it a target for model extraction attacks, where the timing of such infringements often remains elusive. This paper introduces TimeMarker, a novel framework that enhances temporal traceability in cloud computing environments by embedding distinct watermarks at different sub-periods, marking the first attempt to identify the timing of model extraction attacks. TimeMarker employs an adaptive watermark strength method based on information entropy and frequency domain transformations to refine the detection accuracy of model extraction attacks within cloud infrastructures. The granularity of time frame identification for theft improves as more sub-periods are used. Furthermore, our approach investigates single sub-period theft and extends to multi-sub-period theft scenarios where attackers steal data across many sub-periods to train their models in cloud settings. Validated across five widely used datasets, TimeMarker is capable of detecting model extraction over various sub-periods and assessing its impact on the accuracy and robustness of large models deployed in the cloud. The results demonstrate that TimeMarker effectively identifies different periods of extraction attacks, enhancing EaaS security within cloud computing and extending traditional watermarking to offer copyright protection for LLMs.

KW - Cloud computing security

KW - Copyright protection

KW - Large language model

KW - Watermark technology

UR - http://www.scopus.com/pages/publications/85218150923

U2 - 10.1016/j.csi.2025.103983

DO - 10.1016/j.csi.2025.103983

M3 - Article

AN - SCOPUS:85218150923

SN - 0920-5489

VL - 94

JO - Computer Standards and Interfaces

JF - Computer Standards and Interfaces

M1 - 103983

ER -

Periodic watermarking for copyright protection of large language models in cloud computing security

摘要

访问文件

其它文件与链接

指纹

引用此