• Tagline: Training diffusion models for dual-arm manipulation
  • Category: Current Research
  • GitHub: spacefly

Context

Most VLA models are designed for single-arm manipulation. Bimanual tasks need dataset conversion and action representation that preserve dual-gripper coordination.

Role

Research Engineer owning dataset conversion, action heatmap generation, ControlNet+GenIMA training, and Hugging Face dataset compatibility fixes.

Work completed

  • Converted PerAct2 bimanual dataset (700+ demonstrations) to LLaVa prompt format
  • Generated dual-gripper action heatmaps:
    • left gripper: blue/cyan
    • right gripper: orange/red
    • explicit open/closed states
  • Trained ControlNet with GenIMA on Stable Diffusion 1.5 using image and action conditioning
  • Fixed iterable dataset compatibility issues for large trajectory training
  • Improved validation with configurable guidance scales and trajectory-level logging

Impact and learning

  • Successfully trained with dual-gripper action conditioning at scale
  • Learned to bridge robotics action-vector datasets with modern prompt-driven VLA pipelines