Anomaly Detection for Advanced Persistent Threats With Graph Node Embedding

Zhe Heng Peng; Chang Zhen Hu; Chun Shan

doi:10.6688/JISE.202505_41(3).0012

Anomaly Detection for Advanced Persistent Threats With Graph Node Embedding

Zhe Heng Peng, Chang Zhen Hu, Chun Shan^*

^*Corresponding author for this work

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

In recent years, Advanced Persistent Threat (APT) attacks have increasingly become a menace to national cybersecurity. Due to their complex tactics and persistent nature, traditional anomaly detection methods make it difficult to detect APT attacks effectively. The provenance graph is now widely adopted for APT attack analysis because it possesses greater semantic expression, provenance, and causation abilities. However, many current anomaly detection methods, grounded in provenance graphs and network attack knowledge bases, face inherent complexities in design. Moreover, these methods mainly harness features from the entire provenance graph and overlook the rich semantic intricacies within its architecture, which diminishes their efficacy in spotting anomalous nodes. This research introduces an innovative anomaly detection method for provenance graphs, utilizing heterogeneous graph node embedding and clustering analysis. Drawing from the W3CPROV’s PROV-DM model, we craft a distinct heterogeneous graph structure. We design a new meta-path strategy for better semantic understanding. By employing a heterogeneous graph learning algorithm, we obtain node embeddings. We use K-means clustering to classify benign nodes to get multiple clusters, and then use the benign node clusters to accurately differentiate between benign and anomalous nodes. Experimental validations on the Unicorn SC-2 dataset and the DARPA TC dataset confirm that our approach has better anomaly detection capacity compared to two current anomaly detection systems.

Original language	English
Pages (from-to)	713-728
Number of pages	16
Journal	Journal of Information Science and Engineering
Volume	41
Issue number	3
DOIs	http://doi.org/10.6688/JISE.202505_41(3).0012
Publication status	Published - May 2025
Externally published	Yes

Keywords

anomaly detection
cluster analysis
graph node embedding
heterogeneous graph neural network
provenance graph

Access to Document

10.6688/JISE.202505_41(3).0012

Cite this

@article{9fc654a2fbd74c56bac0d360f1734f45,

title = "Anomaly Detection for Advanced Persistent Threats With Graph Node Embedding",

abstract = "In recent years, Advanced Persistent Threat (APT) attacks have increasingly become a menace to national cybersecurity. Due to their complex tactics and persistent nature, traditional anomaly detection methods make it difficult to detect APT attacks effectively. The provenance graph is now widely adopted for APT attack analysis because it possesses greater semantic expression, provenance, and causation abilities. However, many current anomaly detection methods, grounded in provenance graphs and network attack knowledge bases, face inherent complexities in design. Moreover, these methods mainly harness features from the entire provenance graph and overlook the rich semantic intricacies within its architecture, which diminishes their efficacy in spotting anomalous nodes. This research introduces an innovative anomaly detection method for provenance graphs, utilizing heterogeneous graph node embedding and clustering analysis. Drawing from the W3CPROV{\textquoteright}s PROV-DM model, we craft a distinct heterogeneous graph structure. We design a new meta-path strategy for better semantic understanding. By employing a heterogeneous graph learning algorithm, we obtain node embeddings. We use K-means clustering to classify benign nodes to get multiple clusters, and then use the benign node clusters to accurately differentiate between benign and anomalous nodes. Experimental validations on the Unicorn SC-2 dataset and the DARPA TC dataset confirm that our approach has better anomaly detection capacity compared to two current anomaly detection systems.",

keywords = "anomaly detection, cluster analysis, graph node embedding, heterogeneous graph neural network, provenance graph",

author = "Peng, \{Zhe Heng\} and Hu, \{Chang Zhen\} and Chun Shan",

year = "2025",

month = may,

doi = "10.6688/JISE.202505\_41(3).0012",

language = "English",

volume = "41",

pages = "713--728",

journal = "Journal of Information Science and Engineering",

issn = "1016-2364",

publisher = "Institute of Information Science",

number = "3",

}

TY - JOUR

T1 - Anomaly Detection for Advanced Persistent Threats With Graph Node Embedding

AU - Peng, Zhe Heng

AU - Hu, Chang Zhen

AU - Shan, Chun

PY - 2025/5

Y1 - 2025/5

N2 - In recent years, Advanced Persistent Threat (APT) attacks have increasingly become a menace to national cybersecurity. Due to their complex tactics and persistent nature, traditional anomaly detection methods make it difficult to detect APT attacks effectively. The provenance graph is now widely adopted for APT attack analysis because it possesses greater semantic expression, provenance, and causation abilities. However, many current anomaly detection methods, grounded in provenance graphs and network attack knowledge bases, face inherent complexities in design. Moreover, these methods mainly harness features from the entire provenance graph and overlook the rich semantic intricacies within its architecture, which diminishes their efficacy in spotting anomalous nodes. This research introduces an innovative anomaly detection method for provenance graphs, utilizing heterogeneous graph node embedding and clustering analysis. Drawing from the W3CPROV’s PROV-DM model, we craft a distinct heterogeneous graph structure. We design a new meta-path strategy for better semantic understanding. By employing a heterogeneous graph learning algorithm, we obtain node embeddings. We use K-means clustering to classify benign nodes to get multiple clusters, and then use the benign node clusters to accurately differentiate between benign and anomalous nodes. Experimental validations on the Unicorn SC-2 dataset and the DARPA TC dataset confirm that our approach has better anomaly detection capacity compared to two current anomaly detection systems.

AB - In recent years, Advanced Persistent Threat (APT) attacks have increasingly become a menace to national cybersecurity. Due to their complex tactics and persistent nature, traditional anomaly detection methods make it difficult to detect APT attacks effectively. The provenance graph is now widely adopted for APT attack analysis because it possesses greater semantic expression, provenance, and causation abilities. However, many current anomaly detection methods, grounded in provenance graphs and network attack knowledge bases, face inherent complexities in design. Moreover, these methods mainly harness features from the entire provenance graph and overlook the rich semantic intricacies within its architecture, which diminishes their efficacy in spotting anomalous nodes. This research introduces an innovative anomaly detection method for provenance graphs, utilizing heterogeneous graph node embedding and clustering analysis. Drawing from the W3CPROV’s PROV-DM model, we craft a distinct heterogeneous graph structure. We design a new meta-path strategy for better semantic understanding. By employing a heterogeneous graph learning algorithm, we obtain node embeddings. We use K-means clustering to classify benign nodes to get multiple clusters, and then use the benign node clusters to accurately differentiate between benign and anomalous nodes. Experimental validations on the Unicorn SC-2 dataset and the DARPA TC dataset confirm that our approach has better anomaly detection capacity compared to two current anomaly detection systems.

KW - anomaly detection

KW - cluster analysis

KW - graph node embedding

KW - heterogeneous graph neural network

KW - provenance graph

UR - http://www.scopus.com/pages/publications/105010355804

U2 - 10.6688/JISE.202505_41(3).0012

DO - 10.6688/JISE.202505_41(3).0012

M3 - Article

AN - SCOPUS:105010355804

SN - 1016-2364

VL - 41

SP - 713

EP - 728

JO - Journal of Information Science and Engineering

JF - Journal of Information Science and Engineering

IS - 3

ER -

Anomaly Detection for Advanced Persistent Threats With Graph Node Embedding

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this