API Reference: Core Runtime Engine
The xinfer::core module provides the fundamental classes for loading and executing pre-built, optimized TensorRT engines. These are the low-level, high-performance building blocks that the high-level zoo API is built upon.
You would use this API directly if you need fine-grained control over the inference process, such as when building a custom pipeline that isn't covered by the zoo, or when you need to manage CUDA streams for complex asynchronous workflows.
Key Classes
Tensor: A lightweight, safe wrapper for managing GPU memory.InferenceEngine: The main class for loading a.enginefile and running inference.
Tensor
The core::Tensor is the primary data structure in xInfer. It is a C++ RAII-compliant object that safely manages the lifetime of a GPU memory buffer, abstracting away raw cudaMalloc and cudaFree calls.
Header: #include <xinfer/core/tensor.h>
Core Principles
- Ownership: A
Tensorobject owns its GPU memory. When it goes out of scope, its destructor is automatically called, freeing the GPU memory. - Movable, Not Copyable: To prevent accidental, expensive GPU-to-GPU copies,
Tensorobjects cannot be copied. They can only be moved. This ensures clear and efficient ownership transfer. - Lightweight: The class is a very thin wrapper, adding no performance overhead to the underlying GPU operations.
Example: Creating and Moving a Tensor
#include <xinfer/core/tensor.h>
#include <vector>
void process_tensor(xinfer::core::Tensor t) {
// This function now owns the tensor memory.
// When `t` goes out of scope here, the memory is freed.
}
int main() {
// 1. Create a tensor for a batch of 16 RGB images of size 224x224.
xinfer::core::Tensor my_tensor({16, 3, 224, 224}, xinfer::core::DataType::kFLOAT);
// 2. The GPU memory is now allocated.
// You can get the raw pointer to pass to a CUDA kernel.
void* gpu_ptr = my_tensor.data();
// 3. To pass it to a function, you must move it.
process_tensor(std::move(my_tensor));
// 4. After the move, `my_tensor` is in a null state.
// Accessing its data would now be an error.
// assert(my_tensor.data() == nullptr);
return 0;
}API Overview
-
Tensor(const std::vector<int64_t>& shape, DataType dtype)Constructor to allocate a new GPU buffer of a specific shape and data type. -
void* data() constReturns the rawvoid*pointer to the GPU memory buffer. -
const std::vector<int64_t>& shape() constReturns a reference to the vector describing the tensor's shape. -
void copy_from_host(const void* cpu_data)Performs acudaMemcpyHostToDeviceto upload data from the CPU to the GPU. -
void copy_to_host(void* cpu_data) constPerforms acudaMemcpyDeviceToHostto download data from the GPU to the CPU.
InferenceEngine
The core::InferenceEngine is the workhorse of the xInfer runtime. It loads a serialized .engine file created by the builders module and handles the execution of the model.
Header: #include <xinfer/core/engine.h>
Example: Manual Inference Pipeline
This example shows how you would use the InferenceEngine directly to build a custom pipeline, without the zoo.
#include <xinfer/core/engine.h>
#include <xinfer/core/tensor.h>
#include <xinfer/preproc/image_processor.h> // Using a preproc helper
#include <opencv2/opencv.hpp>
#include <iostream>
int main() {
try {
std::string engine_path = "resnet18.engine";
// 1. Load the pre-built engine. This is a fast operation.
xinfer::core::InferenceEngine engine(engine_path);
// 2. Manually create the pre-processor.
xinfer::preproc::ImageProcessor preprocessor(224, 224, {0.485, 0.456, 0.406}, {0.229, 0.224, 0.225});
// 3. Load input data and prepare GPU tensors.
cv::Mat image = cv::imread("cat_image.jpg");
auto input_shape = engine.get_input_shape(0);
xinfer::core::Tensor input_tensor(input_shape, xinfer::core::DataType::kFLOAT);
// 4. Run the pre-processing step.
preprocessor.process(image, input_tensor);
// 5. Run synchronous inference.
// The engine takes a vector of input tensors and returns a vector of output tensors.
std::vector<xinfer::core::Tensor> output_tensors = engine.infer({input_tensor});
// 6. Process the output.
// For ResNet-18, there is one output tensor with shape.
std::cout << "Inference successful. Output tensor has " << output_tensors.num_elements() << " elements.\n";
// You would now copy the output_tensors to the host and find the argmax.
} catch (const std::exception& e) {
std::cerr << "Error in custom pipeline: " << e.what() << std::endl;
}
return 0;
}API Overview
-
InferenceEngine(const std::string& engine_path)Constructor that loads and deserializes a TensorRT engine from a file path. -
std::vector<Tensor> infer(const std::vector<Tensor>& inputs)Performs a synchronous inference call. It takes a vector of inputTensorobjects and returns a vector of newly allocated outputTensorobjects. This is the simplest way to run inference. -
void infer_async(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs, cudaStream_t stream)Performs an asynchronous inference call on a specific CUDA stream. This is an advanced method for building complex pipelines where you want to overlap data transfers with computation. Note that the output tensors must be pre-allocated by the user. -
int get_num_inputs() constReturns the number of input tensors the model expects. -
std::vector<int64_t> get_input_shape(int index = 0) constReturns the shape of the specified input tensor. This is useful for allocating your inputTensorobjects correctly.
