A novel host-based intrusion detection approach leveraging audit logs

Jiaqing Jiang; Hongyang Chu; Donghai Tian

doi:10.1016/j.future.2025.107995

A novel host-based intrusion detection approach leveraging audit logs

Jiaqing Jiang, Hongyang Chu, Donghai Tian^*

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Host-based intrusion detection systems (HIDS) struggle to detect advanced cyber attacks (e.g., APT, LoTL) due to their stealthy nature and reliance on either structural or semantic features alone. We hypothesize that integrating semantic audit log analysis with structural provenance graph learning improves detection accuracy and adaptability. To validate this, we propose MalSnif, a novel framework that (1) parses audit logs to construct provenance graphs enriched with process/event relationships, (2) simplifies graphs by pruning peripheral nodes while retaining critical attack trajectories, and (3) employs NLP techniques (word2vec, GRU, BiLSTM) to extract semantic features, combined with a graph convolutional network (GCN) for detection. Implemented using PyTorch and ETW, MalSnif addresses data imbalance via strategic downsampling during training. Evaluations show that our approach can effectively detect different kinds of cyber attacks and outperforms recent methods. In addition, our methods for simplifying process event sequences and provenance graphs also yield effective and explainable results.

Original language	English
Article number	107995
Journal	Future Generation Computer Systems
Volume	174
DOIs	http://doi.org/10.1016/j.future.2025.107995
Publication status	Published - Jan 2026

Keywords

Audit log analysis
Graph neural network
Provenance graph
Semantic-structural fusion

Access to Document

10.1016/j.future.2025.107995

Cite this

@article{01fed06d10d94dcc82cab9585ab4c72f,

title = "A novel host-based intrusion detection approach leveraging audit logs",

abstract = "Host-based intrusion detection systems (HIDS) struggle to detect advanced cyber attacks (e.g., APT, LoTL) due to their stealthy nature and reliance on either structural or semantic features alone. We hypothesize that integrating semantic audit log analysis with structural provenance graph learning improves detection accuracy and adaptability. To validate this, we propose MalSnif, a novel framework that (1) parses audit logs to construct provenance graphs enriched with process/event relationships, (2) simplifies graphs by pruning peripheral nodes while retaining critical attack trajectories, and (3) employs NLP techniques (word2vec, GRU, BiLSTM) to extract semantic features, combined with a graph convolutional network (GCN) for detection. Implemented using PyTorch and ETW, MalSnif addresses data imbalance via strategic downsampling during training. Evaluations show that our approach can effectively detect different kinds of cyber attacks and outperforms recent methods. In addition, our methods for simplifying process event sequences and provenance graphs also yield effective and explainable results.",

keywords = "Audit log analysis, Graph neural network, Provenance graph, Semantic-structural fusion",

author = "Jiaqing Jiang and Hongyang Chu and Donghai Tian",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2026",

month = jan,

doi = "10.1016/j.future.2025.107995",

language = "English",

volume = "174",

journal = "Future Generation Computer Systems",

issn = "0167-739X",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - A novel host-based intrusion detection approach leveraging audit logs

AU - Jiang, Jiaqing

AU - Chu, Hongyang

AU - Tian, Donghai

PY - 2026/1

Y1 - 2026/1

N2 - Host-based intrusion detection systems (HIDS) struggle to detect advanced cyber attacks (e.g., APT, LoTL) due to their stealthy nature and reliance on either structural or semantic features alone. We hypothesize that integrating semantic audit log analysis with structural provenance graph learning improves detection accuracy and adaptability. To validate this, we propose MalSnif, a novel framework that (1) parses audit logs to construct provenance graphs enriched with process/event relationships, (2) simplifies graphs by pruning peripheral nodes while retaining critical attack trajectories, and (3) employs NLP techniques (word2vec, GRU, BiLSTM) to extract semantic features, combined with a graph convolutional network (GCN) for detection. Implemented using PyTorch and ETW, MalSnif addresses data imbalance via strategic downsampling during training. Evaluations show that our approach can effectively detect different kinds of cyber attacks and outperforms recent methods. In addition, our methods for simplifying process event sequences and provenance graphs also yield effective and explainable results.

AB - Host-based intrusion detection systems (HIDS) struggle to detect advanced cyber attacks (e.g., APT, LoTL) due to their stealthy nature and reliance on either structural or semantic features alone. We hypothesize that integrating semantic audit log analysis with structural provenance graph learning improves detection accuracy and adaptability. To validate this, we propose MalSnif, a novel framework that (1) parses audit logs to construct provenance graphs enriched with process/event relationships, (2) simplifies graphs by pruning peripheral nodes while retaining critical attack trajectories, and (3) employs NLP techniques (word2vec, GRU, BiLSTM) to extract semantic features, combined with a graph convolutional network (GCN) for detection. Implemented using PyTorch and ETW, MalSnif addresses data imbalance via strategic downsampling during training. Evaluations show that our approach can effectively detect different kinds of cyber attacks and outperforms recent methods. In addition, our methods for simplifying process event sequences and provenance graphs also yield effective and explainable results.

KW - Audit log analysis

KW - Graph neural network

KW - Provenance graph

KW - Semantic-structural fusion

UR - http://www.scopus.com/pages/publications/105011250539

U2 - 10.1016/j.future.2025.107995

DO - 10.1016/j.future.2025.107995

M3 - Article

AN - SCOPUS:105011250539

SN - 0167-739X

VL - 174

JO - Future Generation Computer Systems

JF - Future Generation Computer Systems

M1 - 107995

ER -

A novel host-based intrusion detection approach leveraging audit logs

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this