Zoo API: Robotics
The xinfer::zoo::robotics module provides a suite of hyper-optimized, low-latency pipelines for common robotics tasks.
In robotics, performance is not optional. The "perception-to-action" loop—the time it takes for a robot to see the world, understand it, and react—must often be measured in milliseconds. The zoo classes in this module are designed from the ground up for these hard real-time constraints, providing the core building blocks for intelligent robotic systems.
These pipelines are built to run efficiently on power-constrained embedded hardware, such as the NVIDIA Jetson platform.
AssemblyPolicy
Executes a trained Reinforcement Learning (RL) policy for complex, vision-based manipulation tasks.
Header: #include <xinfer/zoo/robotics/assembly_policy.h>
Use Case: AI-Driven Robotic Assembly
This class is designed for tasks like peg-in-hole insertion, wire routing, or component assembly, where a robot must learn from both visual and physical feedback. The AssemblyPolicy acts as the robot's "brain," taking in sensor data and outputting low-level motor commands.
#include <xinfer/zoo/robotics/assembly_policy.h>
#include <opencv2/opencv.hpp>
#include <iostream>
#include <vector>
// This function would be in the robot's main control loop, running at high frequency (e.g., 100Hz)
void execute_robot_step(xinfer::zoo::robotics::AssemblyPolicy& policy) {
// 1. Get current state from the robot's sensors.
cv::Mat camera_image; // Image from the robot's wrist-mounted camera
std::vector<float> joint_states; // Current angles of the robot's joints
// ... update camera_image and joint_states from the robot's hardware interface ...
// 2. Execute the policy to get the next action.
// This is a single, ultra-low-latency call to a fused C++/TensorRT pipeline.
std::vector<float> next_action = policy.predict(camera_image, joint_states);
// 3. Send the action to the robot's motor controllers.
// ... send motor commands based on next_action ...
}
int main() {
// 1. Configure and initialize the assembly policy.
// This loads two pre-built engines: one for vision, one for the policy MLP.
xinfer::zoo::robotics::AssemblyPolicyConfig config;
config.vision_encoder_engine_path = "assets/robot_vision_encoder.engine";
config.policy_engine_path = "assets/peg_insertion_policy.engine";
xinfer::zoo::robotics::AssemblyPolicy policy(config);
std::cout << "Assembly policy loaded and ready.\n";
// 2. Run the robot's control loop.
// while (task_is_active) {
// execute_robot_step(policy);
// }
return 0;
}Config Struct: AssemblyPolicyConfig
Input: cv::Mat from a camera and a std::vector<float> of the robot's physical state.
Output: std::vector<float> representing the next motor commands (e.g., joint velocities or end-effector deltas).
"F1 Car" Technology: This class orchestrates multiple, hyper-optimized TensorRT engines (a vision encoder and a policy MLP) within a hard real-time C++ application, ensuring a consistent, low-latency perception-to-action loop.
GraspPlanner
Performs 6D pose estimation on an object to determine a stable grasp for a robotic hand.
Header: #include <xinfer/zoo/robotics/grasp_planner.h>
(Note: This is a conceptual name for a 6D Pose Estimator specialized for robotics)
Use Case: Bin Picking
A robot needs to pick a specific object out of a bin of jumbled, randomly oriented parts. It must accurately determine the object's 3D position and orientation to grasp it successfully.
#include <xinfer/zoo/robotics/grasp_planner.h> // Conceptual header
#include <opencv2/opencv.hpp>
#include <iostream>
int main() {
// 1. Initialize the grasp planner.
// The engine would be a specialized model trained on 3D data.
xinfer::zoo::robotics::GraspPlannerConfig config;
config.engine_path = "assets/6d_pose_estimator.engine";
xinfer::zoo::robotics::GraspPlanner planner(config);
// 2. Get an RGB-D image from a 3D camera.
cv::Mat color_image; // ... from sensor
cv::Mat depth_image; // ... from sensor
// 3. Predict the 6D pose of the target object.
auto grasp_poses = planner.predict(color_image, depth_image);
// 4. Execute the grasp with the robot arm.
if (!grasp_poses.empty()) {
std::cout << "Found a graspable object. Executing pick.\n";
// ... robot control logic to move to the grasp_poses ...
} else {
std::cout << "No graspable object found.\n";
}
return 0;
}Config Struct: GraspPlannerConfig
Input: cv::Mat for color and depth data.
Output: A list of possible 6D grasp poses (position + quaternion rotation).
"F1 Car" Technology: This pipeline would use custom CUDA kernels for point cloud processing (e.g., from the depth image) and a fused TensorRT engine for the final pose regression.
