High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts

Shichao Zhou; Xiangpan Fan; Zhuowei Wang; Wenzheng Wang; Yunpu Zhang

doi:10.3390/rs17132237

High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts

Shichao Zhou, Xiangpan Fan, Zhuowei Wang, Wenzheng Wang^*, Yunpu Zhang

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Visual tracking from the unmanned aerial vehicle (UAV) perspective has been at the core of many low-altitude remote sensing applications. Most of the aerial trackers follow “tracking-by-detection” paradigms or their temporal-context-embedded variants, where the only visual appearance cue is encompassed for representation learning and estimating the spatial likelihood of the target. However, the variation of the target appearance among consecutive frames is inherently unpredictable, which degrades the robustness of the temporal context-aware representation. To address this concern, we advocate extra visual motion exhibiting predictable temporal continuity for complete temporal context-aware representation and introduce a dual-stream tracker involving explicit heterogeneous visual tracking experts. Our technical contributions involve three-folds: (1) high-order temporal context-aware representation integrates motion and appearance cues over a temporal context queue, (2) bidirectional cross-domain refinement enhances feature representation through cross-attention based mutual guidance, and (3) consistent decision-making allows for anti-drifting localization via dynamic gating and failure-aware recovery. Extensive experiments on four UAV benchmarks (UAV123, UAV123@10fps, UAV20L, and DTB70) illustrate that our method outperforms existing aerial trackers in terms of success rate and precision, particularly in occlusion and fast motion scenarios. Such superior tracking stability highlights its potential for real-world UAV applications.

源语言	英语
文章编号	2237
期刊	Remote Sensing
卷	17
期	13
DOI	http://doi.org/10.3390/rs17132237
出版状态	已出版 - 7月 2025
已对外发布	是

访问文件

10.3390/rs17132237

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e1d81cc023064ed28a0ab98f645abab9,

title = "High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts",

abstract = "Visual tracking from the unmanned aerial vehicle (UAV) perspective has been at the core of many low-altitude remote sensing applications. Most of the aerial trackers follow “tracking-by-detection” paradigms or their temporal-context-embedded variants, where the only visual appearance cue is encompassed for representation learning and estimating the spatial likelihood of the target. However, the variation of the target appearance among consecutive frames is inherently unpredictable, which degrades the robustness of the temporal context-aware representation. To address this concern, we advocate extra visual motion exhibiting predictable temporal continuity for complete temporal context-aware representation and introduce a dual-stream tracker involving explicit heterogeneous visual tracking experts. Our technical contributions involve three-folds: (1) high-order temporal context-aware representation integrates motion and appearance cues over a temporal context queue, (2) bidirectional cross-domain refinement enhances feature representation through cross-attention based mutual guidance, and (3) consistent decision-making allows for anti-drifting localization via dynamic gating and failure-aware recovery. Extensive experiments on four UAV benchmarks (UAV123, UAV123@10fps, UAV20L, and DTB70) illustrate that our method outperforms existing aerial trackers in terms of success rate and precision, particularly in occlusion and fast motion scenarios. Such superior tracking stability highlights its potential for real-world UAV applications.",

keywords = "decision-making, low-altitude remote sensing, motion analysis, optical tracking, temporal reasoning, unmanned aerial vehicle",

author = "Shichao Zhou and Xiangpan Fan and Zhuowei Wang and Wenzheng Wang and Yunpu Zhang",

note = "Publisher Copyright: {\textcopyright} 2025 by the authors.",

year = "2025",

month = jul,

doi = "10.3390/rs17132237",

language = "English",

volume = "17",

journal = "Remote Sensing",

issn = "2072-4292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "13",

}

TY - JOUR

T1 - High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts

AU - Zhou, Shichao

AU - Fan, Xiangpan

AU - Wang, Zhuowei

AU - Wang, Wenzheng

AU - Zhang, Yunpu

PY - 2025/7

Y1 - 2025/7

N2 - Visual tracking from the unmanned aerial vehicle (UAV) perspective has been at the core of many low-altitude remote sensing applications. Most of the aerial trackers follow “tracking-by-detection” paradigms or their temporal-context-embedded variants, where the only visual appearance cue is encompassed for representation learning and estimating the spatial likelihood of the target. However, the variation of the target appearance among consecutive frames is inherently unpredictable, which degrades the robustness of the temporal context-aware representation. To address this concern, we advocate extra visual motion exhibiting predictable temporal continuity for complete temporal context-aware representation and introduce a dual-stream tracker involving explicit heterogeneous visual tracking experts. Our technical contributions involve three-folds: (1) high-order temporal context-aware representation integrates motion and appearance cues over a temporal context queue, (2) bidirectional cross-domain refinement enhances feature representation through cross-attention based mutual guidance, and (3) consistent decision-making allows for anti-drifting localization via dynamic gating and failure-aware recovery. Extensive experiments on four UAV benchmarks (UAV123, UAV123@10fps, UAV20L, and DTB70) illustrate that our method outperforms existing aerial trackers in terms of success rate and precision, particularly in occlusion and fast motion scenarios. Such superior tracking stability highlights its potential for real-world UAV applications.

AB - Visual tracking from the unmanned aerial vehicle (UAV) perspective has been at the core of many low-altitude remote sensing applications. Most of the aerial trackers follow “tracking-by-detection” paradigms or their temporal-context-embedded variants, where the only visual appearance cue is encompassed for representation learning and estimating the spatial likelihood of the target. However, the variation of the target appearance among consecutive frames is inherently unpredictable, which degrades the robustness of the temporal context-aware representation. To address this concern, we advocate extra visual motion exhibiting predictable temporal continuity for complete temporal context-aware representation and introduce a dual-stream tracker involving explicit heterogeneous visual tracking experts. Our technical contributions involve three-folds: (1) high-order temporal context-aware representation integrates motion and appearance cues over a temporal context queue, (2) bidirectional cross-domain refinement enhances feature representation through cross-attention based mutual guidance, and (3) consistent decision-making allows for anti-drifting localization via dynamic gating and failure-aware recovery. Extensive experiments on four UAV benchmarks (UAV123, UAV123@10fps, UAV20L, and DTB70) illustrate that our method outperforms existing aerial trackers in terms of success rate and precision, particularly in occlusion and fast motion scenarios. Such superior tracking stability highlights its potential for real-world UAV applications.

KW - decision-making

KW - low-altitude remote sensing

KW - motion analysis

KW - optical tracking

KW - temporal reasoning

KW - unmanned aerial vehicle

UR - http://www.scopus.com/pages/publications/105010332359

U2 - 10.3390/rs17132237

DO - 10.3390/rs17132237

M3 - Article

AN - SCOPUS:105010332359

SN - 2072-4292

VL - 17

JO - Remote Sensing

JF - Remote Sensing

IS - 13

M1 - 2237

ER -

High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts

摘要

访问文件

其它文件与链接

指纹

引用此