Active Intelligence in Video Avatars via Closed-loop World Modeling

1HKUST 2Meituan 3USTC

*Work done during internship at Meituan   †Corresponding authors   ‡Project leader

Abstract

Current video avatars excel at visuals but lack agency. We bridge this gap with L-IVA, a benchmark for long-horizon planning, and ORCA, a framework enabling active intelligence. ORCA mimics an Internal World Model via a closed-loop Observe-Think-Act-Reflect (OTAR) cycle and a hierarchical Dual-System architecture. This design allows avatars to autonomously reason, verify outcomes, and correct errors in real-time. Extensive experiments demonstrate that ORCA significantly outperforms baselines, advancing video avatars from passive animation to active, goal-oriented behavior.

ORCA Framework

Figure 1. The ORCA Framework. ORCA enables active intelligence via a closed-loop OTAR (Observe-Think-Act-Reflect) cycle. It features a dual-system architecture: System 2 performs high-level strategic planning and state tracking, while System 1 grounds abstract plans into precise, model-specific action captions for the video generation model.

The L-IVA Benchmark

Unlike traditional benchmarks that evaluate single-clip aesthetics, L-IVA is the first benchmark designed to assess goal-directed planning in stochastic generative environments.

100
Interactive Tasks
5
Diverse Scenarios
3-8
Steps per Task
Goal
Oriented Evaluation
Garden
mix soil and fertilizer
mix soil and fertilizer in the wheelbarrow
Kitchen
Fry the egg
fry the egg
Livestream
Product Demo
demonstrate the application of a facial serum
Office
Leave Office
Finish work for the day, pack up personal items
Workshop
Sharpen Saw
Stop the current work, take the saw from the wall, and sharpen it
Garden
mix soil and fertilizer
mix soil and fertilizer in the wheelbarrow
Kitchen
Fry the egg
fry the egg
Garden
Clean up the trash
Put the trash in a bag
Kitchen
Wash Plate
collaborate to wash, dry, and stack a dinner plate
Livestream
Install GPU
collaborate to install the graphics card into the PC case
Office
Fix Printer
collaborate to remove jammed paper from the printer
Garden
Clean up the trash
Put the trash in a bag

Qualitative Results

Comparing ORCA against state-of-the-art baselines

Case 01

Make a cup of tea by scooping leaves into the pot

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 02

Make Coffee

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 03

Prepare simple guacamole

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 01

Demonstrate the multi-light features of the makeup mirror

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 02

Demonstrate how to make a simple vegetable salad

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 03

Put on a diamond necklace and display it

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 01

Mix soil and fertilizer in the wheelbarrow

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 02

Checking beehives

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 03

Start a fire in the fire pit

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 01

Pause reading the book and join an online meeting

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 02

Prepare for an upcoming video meeting and joining the call

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 03

Replace the toner cartridge in the copier

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 01

Sharpen the saw

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 02

Prepare a plant leaf sample and observe it

Open-Loop
Reactive
VAGEN
ORCA (Ours)
Case 03

Add motor oil to the engine

Open-Loop
Reactive
VAGEN
ORCA (Ours)

Quantitative Results

ORCA achieves state-of-the-art performance across all metrics

71.0%
Task Success Rate
3.72
Physical Plausibility
28.7%
Human Preference
Method Task Success Rate (%) ↑ Physical Plausibility (1-5) ↑
Kitchen Garden Average Kitchen Garden Average
Reactive Agent 56.755.050.93.473.083.11
Open-Loop Planner 72.346.262.33.572.923.17
VAGEN 70.860.061.23.562.543.22
ORCA (Ours) 73.881.571.03.533.773.72

Citation

@article{he2025orca,
    title={Active Intelligence in Video Avatars via Closed-loop World Modeling},
    author={He, Xuanhua and Yang, Tianyu and Cao, Ke and Wu, Ruiqi and Meng, Cheng and Zhang, Yong and Kang, Zhuoliang and Wei, Xiaoming and Chen, Qifeng},
    journal={arXiv preprint arXiv:2508.xxxxx},
    year={2025}
}