Core Toolkit API Reference
Welcome to the API reference for the xInfer Core Toolkit.
While the Model Zoo API provides high-level, pre-packaged solutions for common tasks, the Core Toolkit is for developers who need maximum control and flexibility. These are the powerful, low-level building blocks that the zoo itself is built upon.
You should use the Core Toolkit when you are:
- Building a hyper-optimized pipeline for a custom model architecture not found in the
zoo. - Integrating
xInferinto a complex, existing C++ application with custom data structures. - Implementing advanced asynchronous workflows with multiple CUDA streams.
- Creating your own high-level
zoo-like abstractions for a specific domain.
The Core Modules
The toolkit is divided into four logical modules, each responsible for a specific part of the high-performance inference pipeline.
1. xinfer::core - The Inference Runtime
This is the heart of the xInfer runtime. These classes are responsible for loading and executing your pre-built, optimized TensorRT engines.
Tensor: A lightweight, safe C++ wrapper for managing GPU memory. It's the primary data structure for all I/O inxInfer.InferenceEngine: The workhorse class that loads a.enginefile and provides simple, powerful methods for running synchronous and asynchronous inference.
➡️ Full API Reference for core
2. xinfer::builders - The Optimization Toolkit
This module provides the "factory" tools for performing the crucial, offline "Build Step." You use these classes to convert a standard model format like ONNX into a hyper-optimized TensorRT engine.
EngineBuilder: A fluent API that automates the entire TensorRT build process, including enabling optimizations like FP16 and INT8.ONNXExporter: A convenience utility to bridge the gap from a trainedxTorchmodel to the ONNX format.INT8Calibrator: The interface for providing calibration data for INT8 quantization.
➡️ Full API Reference for builders
3. xinfer::preproc - GPU-Accelerated Pre-processing
This module contains a library of unique, high-performance CUDA kernels designed to eliminate CPU bottlenecks during data preparation.
ImageProcessor: A powerful class that can perform an entire image pre-processing pipeline (Resize -> Pad -> Normalize -> HWC to CHW) in a single, fused CUDA kernel.AudioProcessor: A fused pipeline for converting raw audio waveforms into mel spectrograms, usingcuFFTfor maximum performance.
➡️ Full API Reference for preproc
4. xinfer::postproc - GPU-Accelerated Post-processing
This module provides custom CUDA kernels to accelerate the most common post-processing tasks, avoiding slow GPU-to-CPU data transfers of large, raw model outputs.
detection::nms: A hyper-performant, GPU-based implementation of Non-Maximum Suppression for object detection.yolo_decoder::decode: A fused kernel for parsing the complex output of YOLO-family models.segmentation::argmax: A GPU-based kernel for converting raw segmentation logits into a final class mask.ctc::decode: A GPU-based kernel for decoding the output of speech recognition and OCR models.
➡️ Full API Reference for postproc
Next Steps
To see how these core components are used in practice, check out the Building Custom Pipelines guide.
