API Reference: Engine Builders
The xinfer::builders module is the heart of the xInfer optimization pipeline. It provides a high-level, fluent C++ API that wraps the immense complexity of the NVIDIA TensorRT build process.
This is the toolkit you use to perform the crucial "Build Step" of the xInfer workflow, converting a standard model format like ONNX into a hyper-optimized, hardware-specific TensorRT engine.
While the xinfer-cli tool provides a command-line interface to this module, the C++ API gives you the full power and flexibility to integrate engine building directly into your applications or build automation scripts.
Key Classes
EngineBuilder: The main class for configuring and running the optimization process.ONNXExporter: A utility for converting trainedxTorchmodels into the ONNX format.INT8Calibrator: An interface for providing data for INT8 quantization.
EngineBuilder
This is the primary class you will interact with. It uses a "fluent" design pattern, allowing you to chain configuration calls together in a clean, readable way.
Header: #include <xinfer/builders/engine_builder.h>
Core Workflow
The EngineBuilder follows a simple, three-step process:
- Specify the input model (
.from_onnx()). - Configure the desired optimizations (
.with_fp16(),.with_int8(), etc.). - Execute the build process (
.build_and_save()).
Example: Building an FP16 Engine
This is the most common use case. It takes an ONNX file and creates a TensorRT engine optimized for FP16 precision, which typically provides a 2x speedup on modern NVIDIA GPUs with Tensor Cores.
#include <xinfer/builders/engine_builder.h>
#include <iostream>
int main() {
try {
std::string onnx_path = "path/to/your/model.onnx";
std::string engine_path = "my_model_fp16.engine";
// 1. Create the builder
xinfer::builders::EngineBuilder builder;
// 2. Configure the build process by chaining calls
builder.from_onnx(onnx_path)
.with_fp16()
.with_max_batch_size(16); // Specify the max batch size you'll use
// 3. Execute the build and save the engine
std::cout << "Building FP16 engine... this may take a few minutes.\n";
builder.build_and_save(engine_path);
std::cout << "Engine built successfully: " << engine_path << std::endl;
} catch (const std::exception& e) {
std::cerr << "Error building engine: " << e.what() << std::endl;
return 1;
}
return 0;
}API Overview
-
EngineBuilder& from_onnx(const std::string& onnx_path)Specifies the path to the input.onnxfile. -
EngineBuilder& with_fp16()Enables FP16 precision mode. TensorRT will convert the model's layers to use half-precision floats where possible. This is highly recommended for all GPUs from the Turing architecture (sm_75) onwards. -
EngineBuilder& with_int8(std::shared_ptr<INT8Calibrator> calibrator)Enables INT8 precision mode for maximum performance (~4x+ speedup). This requires providing a calibrator object. See the INT8 Quantization Guide for details. -
EngineBuilder& with_max_batch_size(int batch_size)Specifies the maximum batch size that the optimized engine will support. This allows TensorRT to tune its kernels for a specific batch size, which can improve performance. -
void build_and_save(const std::string& output_engine_path)Triggers the final build process and saves the resulting compiled engine to the specified path.
ONNXExporter
This is a convenience utility for developers using the xTorch training library. It provides a seamless bridge between a trained xTorch model and the ONNX format needed by the EngineBuilder.
Header: #include <xinfer/builders/onnx_exporter.h>
Example: Exporting an xTorch Model
#include <xinfer/builders/onnx_exporter.h>
#include <xtorch/models/resnet.h> // Assuming you have an xTorch model
#include <xtorch/util/serialization.h>
int main() {
// 1. Instantiate and load your trained xTorch model
auto model = std::make_shared<xt::models::ResNet18>();
// xt::load(model, "my_trained_resnet.weights");
model->eval();
// 2. Define the input specification for your model
xinfer::builders::InputSpec input_spec;
input_spec.name = "input";
input_spec.shape = {1, 3, 224, 224}; // Batch, Channels, Height, Width
// 3. Call the export function
std::string onnx_path = "resnet18.onnx";
bool success = xinfer::builders::export_to_onnx(*model, {input_spec}, onnx_path);
if (success) {
std::cout << "xTorch model exported successfully to " << onnx_path << std::endl;
}
return 0;
}build_engine_from_url (Convenience Function)
This high-level function combines downloading and building into a single step, perfect for use in the xinfer-cli or for quickly grabbing a model from an online repository.
Header: #include <xinfer/builders/engine_builder.h>
Example: Downloading and Building from the Web
#include <xinfer/builders/engine_builder.h>
#include <iostream>
int main() {
// 1. Define the configuration for the download-and-build process
xinfer::builders::BuildFromUrlConfig config;
config.onnx_url = "https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v2-7.onnx";
config.output_engine_path = "resnet50_web.engine";
config.use_fp16 = true;
config.max_batch_size = 1;
// 2. Call the function
std::cout << "Downloading and building ResNet-50...\n";
bool success = xinfer::builders::build_engine_from_url(config);
if (success) {
std::cout << "Engine created successfully at " << config.output_engine_path << std::endl;
}
return 0;
}