TY - JOUR
T1 - A novel host-based intrusion detection approach leveraging audit logs
AU - Jiang, Jiaqing
AU - Chu, Hongyang
AU - Tian, Donghai
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2026/1
Y1 - 2026/1
N2 - Host-based intrusion detection systems (HIDS) struggle to detect advanced cyber attacks (e.g., APT, LoTL) due to their stealthy nature and reliance on either structural or semantic features alone. We hypothesize that integrating semantic audit log analysis with structural provenance graph learning improves detection accuracy and adaptability. To validate this, we propose MalSnif, a novel framework that (1) parses audit logs to construct provenance graphs enriched with process/event relationships, (2) simplifies graphs by pruning peripheral nodes while retaining critical attack trajectories, and (3) employs NLP techniques (word2vec, GRU, BiLSTM) to extract semantic features, combined with a graph convolutional network (GCN) for detection. Implemented using PyTorch and ETW, MalSnif addresses data imbalance via strategic downsampling during training. Evaluations show that our approach can effectively detect different kinds of cyber attacks and outperforms recent methods. In addition, our methods for simplifying process event sequences and provenance graphs also yield effective and explainable results.
AB - Host-based intrusion detection systems (HIDS) struggle to detect advanced cyber attacks (e.g., APT, LoTL) due to their stealthy nature and reliance on either structural or semantic features alone. We hypothesize that integrating semantic audit log analysis with structural provenance graph learning improves detection accuracy and adaptability. To validate this, we propose MalSnif, a novel framework that (1) parses audit logs to construct provenance graphs enriched with process/event relationships, (2) simplifies graphs by pruning peripheral nodes while retaining critical attack trajectories, and (3) employs NLP techniques (word2vec, GRU, BiLSTM) to extract semantic features, combined with a graph convolutional network (GCN) for detection. Implemented using PyTorch and ETW, MalSnif addresses data imbalance via strategic downsampling during training. Evaluations show that our approach can effectively detect different kinds of cyber attacks and outperforms recent methods. In addition, our methods for simplifying process event sequences and provenance graphs also yield effective and explainable results.
KW - Audit log analysis
KW - Graph neural network
KW - Provenance graph
KW - Semantic-structural fusion
UR - http://www.scopus.com/pages/publications/105011250539
U2 - 10.1016/j.future.2025.107995
DO - 10.1016/j.future.2025.107995
M3 - Article
AN - SCOPUS:105011250539
SN - 0167-739X
VL - 174
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
M1 - 107995
ER -