Zoo API: Reinforcement Learning

The xinfer::zoo::rl module provides the core, high-performance building block for deploying agents trained with Reinforcement Learning (RL).

In RL, the inference step—running the policy network to decide on an action—is often in the "hot loop" of a real-time system. The latency of this single operation can determine the success or failure of the entire application, whether it's a robot, a trading bot, or a game AI.

The xInfer RL zoo is designed around a single, powerful, and generic class: the Policy.


Core Component: rl::Policy

Header: #include <xinfer/zoo/rl/policy.h>

The Policy class is a hyper-optimized, low-latency engine for executing a trained RL agent's decision-making network. It is a generic wrapper that can run any policy that takes a state tensor as input and produces an action tensor as output.

The real "F1 car" magic comes from the TensorRT engine it loads. You would train your agent in a Python framework (like Stable Baselines3 or CleanRL), export the final actor/policy network to ONNX, and then use the xinfer-cli to build a hyper-optimized, INT8- or FP16-quantized engine. The Policy class then runs this engine with minimal overhead.

Core API

#include <xinfer/zoo/rl/policy.h>
 
// Configuration
struct PolicyConfig {
    std::string engine_path;
};
 
class Policy {
public:
    explicit Policy(const PolicyConfig& config);
 
    // Get an action for a single state
    core::Tensor predict(const core::Tensor& state);
    
    // Get actions for a batch of states (highly efficient)
    core::Tensor predict_batch(const core::Tensor& state_batch);
};

Domain-Specific Applications

While the rl::Policy is generic, it serves as the core engine for many specialized, real-world applications. The following zoo classes are powerful examples of how to use the Policy engine to solve domain-specific problems.

Industrial Robotics: robotics::AssemblyPolicy

Header: #include <xinfer/zoo/robotics/assembly_policy.h>

Use Case: Controls a robot arm to perform complex, vision-based assembly tasks. The policy takes a camera image and the robot's joint angles as input and outputs motor commands.

// This function would be in the robot's 100Hz control loop
void execute_robot_step(xinfer::zoo::robotics::AssemblyPolicy& policy) {
    // 1. Get current state from sensors
    cv::Mat camera_image = get_camera_frame();
    std::vector<float> joint_states = get_joint_angles();
 
    // 2. Execute the policy to get the next action in milliseconds
    std::vector<float> next_action = policy.predict(camera_image, joint_states);
 
    // 3. Send action to motor controllers
    send_motor_commands(next_action);
}

Autonomous Drones: drones::NavigationPolicy

Header: #include <xinfer/zoo/drones/navigation_policy.h>

Use Case: Enables agile, GPS-denied flight in cluttered environments. The policy takes a depth image and the drone's current state (velocity, orientation) as input and outputs flight control commands (roll, pitch, yaw, thrust).

// This function runs inside the drone's flight controller
void navigate_step(xinfer::zoo::drones::NavigationPolicy& policy) {
    // 1. Get state from sensors
    cv::Mat depth_image = get_depth_camera_frame();
    std::vector<float> drone_state = get_imu_data();
 
    // 2. Execute the policy to get flight commands
    xinfer::zoo::drones::NavigationAction action = policy.predict(depth_image, drone_state);
 
    // 3. Send commands to the motors
    set_motor_outputs(action.roll, action.pitch, action.yaw, action.thrust);
}

High-Frequency Trading: hft::OrderExecutionPolicy

Header: #include <xinfer/zoo/hft/order_execution_policy.h>

Use Case: Manages the execution of a large financial order to minimize market impact. The policy takes the current state of the limit order book as input and decides whether to place a small buy/sell order in the next microsecond.

// This function is in the hot path of a trading application's event loop
void on_market_data_update(xinfer::zoo::hft::OrderExecutionPolicy& policy) {
    // 1. Get the current market state as a GPU tensor
    xinfer::core::Tensor market_state_tensor = get_order_book_tensor();
 
    // 2. Execute the policy with microsecond latency
    xinfer::zoo::hft::OrderExecutionAction action = policy.predict(market_state_tensor);
 
    // 3. Execute the trade
    if (action.action == OrderActionType::PLACE_BUY) {
        execute_buy_order(action.volume, action.price_level);
    }
}

Game Development: gaming::NPC_BehaviorPolicy

Header: `#include <xinfer/zoo/gaming/npc_behavior_policy.h>

Use Case: Creates intelligent, non-scripted AI for hundreds of game characters. The policy takes a batch of states for all NPCs in a level and outputs a batch of actions.

// This function runs once per game frame
void update_all_npc_ai(xinfer::zoo::gaming::NPCBehaviorPolicy& policy, World& world) {
    // 1. Gather the states of all active NPCs into a single batched tensor
    xinfer::core::Tensor npc_state_batch = world.get_all_npc_states();
 
    // 2. Execute the policy for all NPCs in a single, efficient GPU call
    xinfer::core::Tensor npc_action_batch = policy.predict_batch(npc_state_batch);
 
    // 3. Apply the actions to each NPC in the world
    world.set_all_npc_actions(npc_action_batch);
}