Scaling Helix: a New State of the Art in Humanoid Logistics

June 07, 2025

In just three months since our initial deployment of Helix in a logistics environment, the system’s capabilities and performance have leapt forward. Helix can now handle a wider variety of packaging and is approaching human-level dexterity and speed, bringing us closer to fully autonomous package sorting. This rapid progress underscores the scalability of Helix’s learning-based approach to robotics, translating quickly into real-world application.

New package types – Helix now manipulates deformable poly bags and flat envelopes as reliably as rigid boxes, adjusting its grasp and strategy for each form factor and handling objects dynamically.
Faster throughput – Despite handling much more complex and varied package types, execution speed has also improved to 4.05 seconds per package (down from ~5.0s), achieving ~20% faster handling while maintaining accuracy.
Higher barcode scanning success – Shipping labels are now correctly oriented for scanning ~95% of the time (up from ~70%), through better vision and control.
Adaptive behaviors – The robot exhibits subtle behaviors learned from demonstration, such as gently patting down plastic mailers to flatten wrinkles and improve barcode reads.

Small package logistics, like the example shown here, is an ideal environment for AI learning, with constantly changing packages and scenes at every time step that make it perfectly suited for neural networks.

These improvements were achieved through both data scaling and model architectural improvements:

Temporal memory – A new vision memory module gives Helix stateful perception. The policy now also incorporates a history of past states, enabling temporally extended behaviors and improved robustness to interruptions.
Force feedback – Force sensing is integrated into the state input, providing a sense of touch proxy that leads to more precise grips and manipulation of packages.

Here we analyze the sources of these gains, examining how increasing the demonstration training data (from 10 to 60 hours) affects performance, and how each of the above architectural enhancements contributes to Helix’s speed and accuracy in package handling.

Expanded Package Variety and Adaptive Behaviors

Helix’s logistics policy has broadened to handle a much wider diversity of packages. In addition to standard rigid boxes, the system now manages polyethylene bags (poly bags), padded envelopes, and other deformable or thin parcels that pose unique challenges. These items can fold, crumple, or flex, making it harder to grasp and locate labels. Helix addresses this by adjusting its grasp strategy on the fly – for example, flicking away a soft bag to flip it dynamically, or using pinch grips for flat mailers. Despite the greater variety in shape and texture, Helix increased its throughput, processing items in about 4.05 seconds each on average without bottlenecks.

The goal of this logistics task is to rotate the package so the barcode faces downward for scanning. One notable behavior is Helix’s tendency to pat down plastic packaging before attempting to scan it. If a shipping label lies on a curved or wrinkled surface (common with loosely filled poly bags or bubbled envelopes), the policy reacts by briefly pressing and smoothing the surface flat. This subtle “flattening” action, learned from demonstrations, ensures the barcode is fully visible to the scanner. Such adaptive behavior highlights the advantage of end-to-end learning – the robot learns from demonstration strategies that were never explicitly hard-coded, directly from the data, to overcome real-world imperfections in packaging.

Video 1: Helix can handle a variety of object types, adjusting its grasp and flip strategy accordingly, on the fly. Helix learns to fluidly multitask like a human would. Pushing one package while reaching for the next shows it has mastered complex context-specific bimanual strategies to move faster.

Crucially, these new capabilities did not compromise efficiency. Throughput has increased alongside versatility. Helix’s average handling time per package dropped from roughly 5.0 seconds on a simplified set of packages to 4.31 seconds, even as the task got harder with new package types. This speed-up brings performance closer to human operator speeds. Likewise, barcode orientation success climbed to ~95%. Together, these improvements indicate a more dexterous and reliable system, one that can approach human-level speed and accuracy across a broad spectrum of real-world parcels.

Architectural Enhancements to Helix’s Visuo-Motor Policy

Many of the above gains were enabled by targeted improvements to Helix’s System 1 visuo-motor policy. Over the past two months, we introduced new modules for memory and sensing that make the control policy more context-aware and robust. These enhancements allow Helix to better perceive the state of the world over time and feel what it’s doing, complementing the vision and control foundation established in the initial deployment. Here we detail each improvement and how it contributes to Helix’s logistics performance.

Vision Memory

Helix’s policy now maintains a short-term visual memory of its environment, rather than acting only on instantaneous camera frames. Concretely, the model is equipped with a module that composes features from a sequence of recent video frames, giving it a temporally extended view of the scene. This implicit visual memory enables stateful behaviors: the robot can remember which sides of a package it has already inspected or which areas of a conveyor are clear. For example, if an initial camera view doesn’t fully reveal a label, Helix can recall partial glimpses from previous moments and decide to rotate the package to the remembered angle where the label was visible. The memory module thus helps eliminate redundant motions (the robot won’t “forget” and re-check the same side twice) and improves success rates by ensuring all necessary views of the item are considered. In essence, vision memory gives Helix a sense of temporal context, allowing it to behave more strategically over a multi-step manipulation. This was key to boosting the barcode orientation success to 95% – the policy can now reliably perform multi-step maneuvers (like multiple small rotations or viewpoint adjustments) to find a barcode, guided by visual recall, rather than relying on a single lucky glimpse.

State History

We also augmented Helix’s proprioceptive input with a history of recent states, which has enabled faster, more reactive control. Originally, the policy operated in fixed-duration action chunks: it would observe the current state and output a short trajectory of motions, then observe anew, and so on. By incorporating a window of past robot states (hand, torso and head positions) into the policy’s input, the system maintains continuity between these action chunks. Importantly, the state history preserves context, so that even with more frequent re-planning, the policy doesn’t lose track of what it was doing or destabilize the manipulation. The net result is a quicker response to surprises or disturbances: if a package shifts or an attempted grasp doesn’t land perfectly, Helix corrects mid-motion with minimal delay. This enhancement contributed significantly to the reduced handling time per package.

Force Feedback

To grant Helix a basic sense of touch, we integrated force feedback into the policy’s input observations. The forces that Figure 02 applies on its environment and the objects it manipulates are now part of the state fed into the neural network. This information allows the policy to detect contact events and adjust accordingly. For instance, as Helix reaches for a parcel, it can feel when it first touches the object or when a package is pressed against a surface. It learns to use these cues to modulate motion: for example, pausing a downward motion when contact with the conveyor belt is detected. By closing the loop with touch, Helix achieves more precise handling and ultimately higher success rate and consistency of motions, making the system more robust to variability in object weight, stiffness, and placement.

Results and Discussion

To quantify the impact of these improvements, we conducted controlled evaluations of Helix’s logistics performance under varying training data regimes and model configurations. We measured two key metrics: package handling speed (average seconds per package, lower is better) and barcode scanning success rate (percentage of packages correctly oriented for scanning, higher is better). The following results break down how additional training data and the new architectural features each contribute to Helix’s overall performance gains.

Scaling Up Training Data

First, we examine how scaling the amount of human demonstration data affects Helix’s proficiency. We compare models trained on approximately 10, 20, 40 and 60 hours of demonstration trajectories (with identical network architecture and hyperparameters). As shown in Figure 1 below, increasing the training data yields clear improvements in both throughput and accuracy.

Figure 1: Effect of training data quantity on package handling performance. More demonstration data leads to faster average handling speed (seconds per package, lower is better) and higher barcode scanning success. All models below are the same and use the latest Helix S1 architecture with memory and feedback modules enabled.

Going from 10 to 60 hours of training demonstrations, Helix’s average processing time per package dropped from ~6.84 s to 4.31 seconds, a 58% increase in throughput, and the barcode success rate climbed from 88.2% to 94.4%. These returns suggest that the model is still in a low data regime, as model performance keeps steadily increasing as we scale the data.

Contributions of Memory and Feedback Modules

We next evaluate how each of the recent architectural enhancements – vision memory, state history, and force feedback – contributes to performance. We perform an ablation study, comparing variants of the Helix model with these modules disabled or enabled. All models in this comparison are trained on the same 60h dataset, so any differences in metrics reflect the presence or absence of the new capabilities. Figure 2 summarizes the results of this ablation, listing the handling speed and success rate.

Figure 2: Performance impact of adding vision memory, state history, and force feedback. Each row shows a variant of the Helix policy (trained on 60h data) with certain modules enabled. The full model (last row) includes all enhancements. We report average handling time (s/pkg) and barcode success rate for each variant.

In Figure 2 we show how each module removes a specific bottleneck. The monocular baseline, lacking depth and temporal context, mis-places grasps and often idles in long pauses because it cannot tell how long it has spent in a state. Adding stereo resolves the depth issue—picks are cleaner and throughput rises—but the problem with long pauses persists. One way to resolve the pause issue is to increase the action chunk length, but this would come at the cost of reduced reaction time. Instead, introducing vision memory lets the policy recall if a bag has already been flipped or if a label was previously visible, eliminating redundant re-orientations and cutting another half-second from the cycle. When state history and force feedback are added, the robot gains a sense of elapsed time and touch: it no longer stalls, it modulates grip force on rigid boxes better, and better controls how much force it applies on its surrounding to avoid losing balance, boosting first-pass barcode success to 94%. Finally, scaling the network by increasing the transformer decoder head parameters count by 50% exploits these richer inputs, pushing average handling time down to 4.05 s while holding accuracy above 92%.

Visual Conditioning: Human-to-Robot Handover

While Helix’s primary goal in the logistics scenario is autonomous sorting, the same end-to-end model can be adapted to new interactions with minimal effort. An example is human handover behavior achieved through visual conditioning. By providing just a few extra demonstration episodes where a person awaits a package handoff (collected at random times, as part of the main data collection), we conditioned the policy to interpret a human’s outstretched hand as a cue to hand over the item. No new skills were explicitly programmed; the network simply learned that in the presence of a human reaching out, the appropriate action is to hand over the package rather than place it on a conveyor. This behavior uses the same neural policy and weights as all other actions – the difference comes purely from Helix’s observation of the human and the context it has learned from those additional examples.

Video 2: Human–robot handover demonstration. No separate program or mode switch is needed; Helix’s single neural network policy produces the appropriate response based on what it sees. This conditioned responsiveness highlights Helix’s flexibility: the system can be taught new context-dependent behaviors with only a handful of demonstrations.

Conclusion

We have presented how scaling up a high-quality demonstration dataset, combined with architectural refinements like vision memory, state history, and force feedback, has significantly improved Helix’s performance in real-world logistics. The result is a general visuo-motor policy that handles diverse packages with close to human-level speed and high reliability on a moving conveyor – a notable step up from the initial capabilities just two months ago. These improvements not only solve immediate challenges in package handling, but also yield general benefits to Helix’s control system that carry over to other use cases. By enabling stateful perception and force sensing, we’ve made the policy more robust and adaptable without sacrificing efficiency. Crucially, the policy benefited both from data scaling and architectural improvements, and either one alone isn’t enough to push the policy performance.

Helix is steadily scaling in dexterity and robustness, closing the gap between learned robotic manipulation and the demands of real-world tasks. Ongoing work will continue to broaden its skill set and ensure stability at even greater speeds and workloads. If you would like to help us push the frontiers of humanoid robotics, please consider applying here.