Figure AI (USA) - Press Release: The introduction of Helix, an innovative Vision-Language-Action (VLA) model, promises to revolutionize humanoid robotics by unifying perception, language comprehension, and learned control. This advancement aims to overcome persistent challenges in the field, making robots more versatile and adaptable to complex environments such as homes.
Key Milestones of Helix
Helix sets several precedents in robotics:
- Full upper-body control: It is the first VLA model capable of performing high-frequency continuous control of a humanoid robot’s torso, head, wrists, and fingers.
- Multi-robot collaboration: For the first time, a single VLA model enables two robots to operate simultaneously on a collaborative task, handling objects they have never encountered before.
- Ability to grasp any object: Robots equipped with Helix can pick up virtually any small household item simply by following natural language commands.
- A single neural network: Unlike previous approaches, Helix uses a single set of weights to learn multiple behaviors without requiring task-specific fine-tuning.
- Commercial readiness: The model runs entirely on embedded low-power GPUs, making it ready for commercial deployment.
Challenges and Innovation in Home Robotics
Homes present one of the greatest challenges for robotics due to the vast variety of objects with unpredictable shapes, sizes, and textures. Traditionally, teaching a robot to perform new tasks requires significant human effort, either through specialized manual programming or thousands of demonstrations.
Helix proposes a new approach: directly translating the semantic knowledge of Vision-Language Models (VLMs) into robotic actions. This allows robots to acquire new skills instantly through natural language commands, eliminating the need for extensive manual programming.
Helix Architecture and Functionality
The Helix model combines two complementary systems to balance speed and generalization:
- System 2 (S2): An internet-pretrained VLM operating at 7-9 Hz, enabling scene understanding and language comprehension.
- System 1 (S1): A reactive visuomotor policy that translates semantic representations into precise continuous actions at 200 Hz.
This separation allows S2 to “think slow” to define high-level goals, while S1 “thinks fast” to execute actions in real-time, ensuring efficient and adaptable control.
Efficient Training and Deployment
Helix has been trained with only 500 hours of supervised data, significantly less than previous VLA datasets, demonstrating high efficiency in object and task generalization.
The model operates through parallel inference on embedded low-power GPUs, enabling robots to perform complex tasks without requiring additional specialized hardware.
Impact and Future of Humanoid Robotics
Helix’s performance stands out in its ability to manipulate thousands of previously unseen objects, execute collaborative tasks between robots, and adapt to dynamic environments without specific training.
This innovation represents a significant leap forward in humanoid robotics, paving the way for more autonomous and effective robots in household settings. The potential of Helix is only beginning to be explored, and its ongoing development promises to revolutionize how robots interact with the real world.
The Helix team is looking for talented individuals to push the boundaries of embedded AI and make these advancements accessible to millions of robots in the future.
Interested? Submit your enquiry using the form below:
Only available for registered users. Sign In to your account or register here.