AI Tools & Products News

Ai2 Launches MolmoAct — 3D Action Reasoning Model

The Allen Institute for AI (Ai2) has unveiled MolmoAct, the first fully open-source Action Reasoning Model (ARM) capable of spatial reasoning and actionable planning in three-dimensional environments.

Key Features

Think in 3D: MolmoAct is the first model engineered to perceive and reason spatially by grounding instructions into depth-aware perception tokens distinct from conventional textual representations enabling it to understand geometric structures and object distances.

Structured Planning Pipeline: It operates through a three-stage process:

Perception: Converts observations and instructions into spatial perception tokens.

Planning: Generates a sequence of image-space waypoints—visual trajectory outlines.

Action Execution: Translates these plans into precise low-level motor commands for robotic hardware.

Rapid, Efficient Training with Leading Performance: Training employed 26.3 million samples on 256 NVIDIA H100 GPUs in just one day, with fine-tuning on 64 GPUs in under 2 hours.

Explainable and User-Controlled Behavior: Before executing actions, MolmoAct overlays visual trajectory previews onto input images.

Users can adjust planned paths using natural language or touchscreen sketches, facilitating intuitive control and safer execution in unpredictable environments.

Benchmark Performance

MolmoAct particularly the 7B-D variant achieves state-of-the-art results across diverse benchmarks:

SimplerEnv (visual matching, zero-shot): ~70.5–71.6% accuracy, outperforming NVIDIA’s GR00T N1 and other leading baselines.

LIBERO (long-horizon generalization): ~86.6% average success; an additional +6.3% gain over ThinkAct on extended tasks.

Real-world fine-tuning (Franka robotic arms): Gains of +10% task progression (single-arm) and +22.7% (bimanual) compared to Pi-0-FAST.

Out-of-distribution generalization: MolmoAct outperforms baselines by 23.3%.

Human evaluation (Elo ratings): MolmoAct is strongly favored in open-ended, instruction-following scenarios.

Availability & Open Access

MolmoAct is available with model weights, training and fine-tuning code, evaluation benchmarks, and datasets, including the 10,000+ “robot episodes” in the MolmoAct Dataset  all accessible via Ai2’s Hugging Face repository.

Ai2 offers MolmoAct with no usage fees or licensing, emphasizing open sciExplainability & Interaction

Strategic Impact and Vision

Ai2 positions MolmoAct as a foundational technology for future AI systems robots that can reason, perceive, and act coherently across domains, from household assistance to dynamic industrial or disaster relief settings.

News Gist

The Allen Institute for AI (Ai2) has released MolmoAct, a fully open-source Action Reasoning Model that plans and adapts robot actions in 3D.

Available free, it achieves benchmark-leading performance, transparent explainability, and efficient fine-tuning—reshaping embodied AI research and robotics.

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Binger
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.