benchmarking

1 posts with this tag

CAP-X: The Robot Manipulation Benchmark for Coding Agents

CAP-X: The Robot Manipulation Benchmark for Coding Agents

CAP-X is a robot manipulation-specific benchmark that tests whether coding agents truly understand physics, sensors, and sequential planning—not just Python syntax. Built for real robots like Panda and UR5e, it bridges the gap between code generation and embodied action.

Administrator 4/2/2026