Active Intelligence in Video Avatars via Closed-loop World Modeling
*Work done during internship at Meituan †Corresponding authors ‡Project leader
*Work done during internship at Meituan †Corresponding authors ‡Project leader
Current video avatars excel at visuals but lack agency. We bridge this gap with L-IVA, a benchmark for long-horizon planning, and ORCA, a framework enabling active intelligence. ORCA mimics an Internal World Model via a closed-loop Observe-Think-Act-Reflect (OTAR) cycle and a hierarchical Dual-System architecture. This design allows avatars to autonomously reason, verify outcomes, and correct errors in real-time. Extensive experiments demonstrate that ORCA significantly outperforms baselines, advancing video avatars from passive animation to active, goal-oriented behavior.
Figure 1. The ORCA Framework. ORCA enables active intelligence via a closed-loop OTAR (Observe-Think-Act-Reflect) cycle. It features a dual-system architecture: System 2 performs high-level strategic planning and state tracking, while System 1 grounds abstract plans into precise, model-specific action captions for the video generation model.
Unlike traditional benchmarks that evaluate single-clip aesthetics, L-IVA is the first benchmark designed to assess goal-directed planning in stochastic generative environments.
Comparing ORCA against state-of-the-art baselines
ORCA achieves state-of-the-art performance across all metrics
| Method | Task Success Rate (%) ↑ | Physical Plausibility (1-5) ↑ | ||||
|---|---|---|---|---|---|---|
| Kitchen | Garden | Average | Kitchen | Garden | Average | |
| Reactive Agent | 56.7 | 55.0 | 50.9 | 3.47 | 3.08 | 3.11 |
| Open-Loop Planner | 72.3 | 46.2 | 62.3 | 3.57 | 2.92 | 3.17 |
| VAGEN | 70.8 | 60.0 | 61.2 | 3.56 | 2.54 | 3.22 |
| ORCA (Ours) | 73.8 | 81.5 | 71.0 | 3.53 | 3.77 | 3.72 |
@article{he2025orca,
title={Active Intelligence in Video Avatars via Closed-loop World Modeling},
author={He, Xuanhua and Yang, Tianyu and Cao, Ke and Wu, Ruiqi and Meng, Cheng and Zhang, Yong and Kang, Zhuoliang and Wei, Xiaoming and Chen, Qifeng},
journal={arXiv preprint arXiv:2508.xxxxx},
year={2025}
}