Ai2 Launches MolmoAct — 3D Action Reasoning Model
The Allen Institute for AI (Ai2) has unveiled MolmoAct, the first fully open-source Action Reasoning Model (ARM) capable of spatial reasoning and actionable planning in three-dimensional environments.
Key Features
Think in 3D: MolmoAct is the first model engineered to perceive and reason spatially by grounding instructions into depth-aware perception tokens distinct from conventional textual representations enabling it to understand geometric structures and object distances.
Structured Planning Pipeline: It operates through a three-stage process:
Perception: Converts observations and instructions into spatial perception tokens.
Planning: Generates a sequence of image-space waypoints—visual trajectory outlines.
Action Execution: Translates these plans into precise low-level motor commands for robotic hardware.
Rapid, Efficient Training with Leading Performance: Training employed 26.3 million samples on 256 NVIDIA H100 GPUs in just one day, with fine-tuning on 64 GPUs in under 2 hours.
Explainable and User-Controlled Behavior: Before executing actions, MolmoAct overlays visual trajectory previews onto input images.
Users can adjust planned paths using natural language or touchscreen sketches, facilitating intuitive control and safer execution in unpredictable environments.
Benchmark Performance
MolmoAct particularly the 7B-D variant achieves state-of-the-art results across diverse benchmarks:
SimplerEnv (visual matching, zero-shot): ~70.5–71.6% accuracy, outperforming NVIDIA’s GR00T N1 and other leading baselines.
LIBERO (long-horizon generalization): ~86.6% average success; an additional +6.3% gain over ThinkAct on extended tasks.
Real-world fine-tuning (Franka robotic arms): Gains of +10% task progression (single-arm) and +22.7% (bimanual) compared to Pi-0-FAST.
Out-of-distribution generalization: MolmoAct outperforms baselines by 23.3%.
Human evaluation (Elo ratings): MolmoAct is strongly favored in open-ended, instruction-following scenarios.
Availability & Open Access
MolmoAct is available with model weights, training and fine-tuning code, evaluation benchmarks, and datasets, including the 10,000+ “robot episodes” in the MolmoAct Dataset all accessible via Ai2’s Hugging Face repository.
Ai2 offers MolmoAct with no usage fees or licensing, emphasizing open sciExplainability & Interaction
Strategic Impact and Vision
Ai2 positions MolmoAct as a foundational technology for future AI systems robots that can reason, perceive, and act coherently across domains, from household assistance to dynamic industrial or disaster relief settings.
News Gist
The Allen Institute for AI (Ai2) has released MolmoAct, a fully open-source Action Reasoning Model that plans and adapts robot actions in 3D.
Available free, it achieves benchmark-leading performance, transparent explainability, and efficient fine-tuning—reshaping embodied AI research and robotics.