Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images

Yingping Liang; Ying Fu; Yutao Hu; Wenqi Shao; Jiaming Liu; Debing Zhang

doi:10.1109/TPAMI.2025.3576851

Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images

Yingping Liang, Ying Fu^*, Yutao Hu, Wenqi Shao, Jiaming Liu, Debing Zhang

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Optical flow estimation is a crucial subfield of computer vision, serving as a foundation for video tasks. However, the real-world robustness is limited by animated synthetic datasets for training. This introduces domain gaps when applied to real-world applications and limits the benefits of scaling up datasets. To address these challenges, we propose Flow-Anything, a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world. We employ two effective steps to make data scaling-up promising. First, we convert a single-view image into a 3D representation using advanced monocular depth estimation networks. This allows us to render optical flow and novel view images under a virtual camera. Second, we develop an Object-Independent Volume Rendering module and a Depth-Aware Inpainting module to model the dynamic objects in the 3D representation. These two steps allow us to generate realistic datasets for training from large-scale single-view images, namely FA-Flow Dataset. For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images, outperforming the most advanced unsupervised methods and supervised methods on synthetic datasets. Moreover, our models serve as a foundation model and enhance the performance of various downstream video tasks.

源语言	英语
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
DOI	http://doi.org/10.1109/TPAMI.2025.3576851
出版状态	已接受/待刊 - 2025
已对外发布	是

访问文件

10.1109/TPAMI.2025.3576851

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9adb8afb5d8447469b67be25f408d470,

title = "Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images",

abstract = "Optical flow estimation is a crucial subfield of computer vision, serving as a foundation for video tasks. However, the real-world robustness is limited by animated synthetic datasets for training. This introduces domain gaps when applied to real-world applications and limits the benefits of scaling up datasets. To address these challenges, we propose Flow-Anything, a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world. We employ two effective steps to make data scaling-up promising. First, we convert a single-view image into a 3D representation using advanced monocular depth estimation networks. This allows us to render optical flow and novel view images under a virtual camera. Second, we develop an Object-Independent Volume Rendering module and a Depth-Aware Inpainting module to model the dynamic objects in the 3D representation. These two steps allow us to generate realistic datasets for training from large-scale single-view images, namely FA-Flow Dataset. For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images, outperforming the most advanced unsupervised methods and supervised methods on synthetic datasets. Moreover, our models serve as a foundation model and enhance the performance of various downstream video tasks.",

keywords = "Optical flow estimation, novel-view synthesis, stereo matching, unsupervised learning, warping",

author = "Yingping Liang and Ying Fu and Yutao Hu and Wenqi Shao and Jiaming Liu and Debing Zhang",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2025",

doi = "10.1109/TPAMI.2025.3576851",

language = "English",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Flow-Anything

T2 - Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images

AU - Liang, Yingping

AU - Fu, Ying

AU - Hu, Yutao

AU - Shao, Wenqi

AU - Liu, Jiaming

AU - Zhang, Debing

PY - 2025

Y1 - 2025

N2 - Optical flow estimation is a crucial subfield of computer vision, serving as a foundation for video tasks. However, the real-world robustness is limited by animated synthetic datasets for training. This introduces domain gaps when applied to real-world applications and limits the benefits of scaling up datasets. To address these challenges, we propose Flow-Anything, a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world. We employ two effective steps to make data scaling-up promising. First, we convert a single-view image into a 3D representation using advanced monocular depth estimation networks. This allows us to render optical flow and novel view images under a virtual camera. Second, we develop an Object-Independent Volume Rendering module and a Depth-Aware Inpainting module to model the dynamic objects in the 3D representation. These two steps allow us to generate realistic datasets for training from large-scale single-view images, namely FA-Flow Dataset. For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images, outperforming the most advanced unsupervised methods and supervised methods on synthetic datasets. Moreover, our models serve as a foundation model and enhance the performance of various downstream video tasks.

AB - Optical flow estimation is a crucial subfield of computer vision, serving as a foundation for video tasks. However, the real-world robustness is limited by animated synthetic datasets for training. This introduces domain gaps when applied to real-world applications and limits the benefits of scaling up datasets. To address these challenges, we propose Flow-Anything, a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world. We employ two effective steps to make data scaling-up promising. First, we convert a single-view image into a 3D representation using advanced monocular depth estimation networks. This allows us to render optical flow and novel view images under a virtual camera. Second, we develop an Object-Independent Volume Rendering module and a Depth-Aware Inpainting module to model the dynamic objects in the 3D representation. These two steps allow us to generate realistic datasets for training from large-scale single-view images, namely FA-Flow Dataset. For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images, outperforming the most advanced unsupervised methods and supervised methods on synthetic datasets. Moreover, our models serve as a foundation model and enhance the performance of various downstream video tasks.

KW - Optical flow estimation

KW - novel-view synthesis

KW - stereo matching

KW - unsupervised learning

KW - warping

UR - http://www.scopus.com/pages/publications/105008531582

U2 - 10.1109/TPAMI.2025.3576851

DO - 10.1109/TPAMI.2025.3576851

M3 - Article

AN - SCOPUS:105008531582

SN - 0162-8828

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

ER -

Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images

摘要

访问文件

其它文件与链接

指纹

引用此