Zoo API: Computer Vision
The xinfer::zoo::vision module provides a comprehensive suite of high-level, hyper-optimized pipelines for the most common computer vision tasks.
Each class in this module is an end-to-end solution that handles all the complexity of pre-processing, TensorRT inference, and GPU-accelerated post-processing, giving you the final, human-readable answer with a single .predict() call.
ImageClassifier
Performs image classification, identifying the primary subject of an image.
Header: #include <xinfer/zoo/vision/classifier.h>
#include <xinfer/zoo/vision/classifier.h>
#include <opencv2/opencv.hpp>
#include <iostream>
int main() {
// 1. Configure the classifier
xinfer::zoo::vision::ClassifierConfig config;
config.engine_path = "assets/resnet50.engine";
config.labels_path = "assets/imagenet_labels.txt";
// 2. Initialize
xinfer::zoo::vision::ImageClassifier classifier(config);
// 3. Predict
cv::Mat image = cv::imread("assets/dog.jpg");
auto results = classifier.predict(image, 3); // Get top 3 results
// 4. Print results
std::cout << "Top 3 Predictions:\n";
for (const auto& result : results) {
printf(" - Label: %s, Confidence: %.4f\n", result.label.c_str(), result.confidence);
}
}Config Struct: ClassifierConfig
Output Struct: ClassificationResult
ObjectDetector
Detects and localizes multiple objects within an image.
Header: #include <xinfer/zoo/vision/detector.h>
#include <xinfer/zoo/vision/detector.h>
#include <opencv2/opencv.hpp>
#include <iostream>
int main() {
// 1. Configure the detector
xinfer::zoo::vision::DetectorConfig config;
config.engine_path = "assets/yolov8n.engine";
config.labels_path = "assets/coco.names";
config.confidence_threshold = 0.5f;
// 2. Initialize
xinfer::zoo::vision::ObjectDetector detector(config);
// 3. Predict
cv::Mat image = cv::imread("assets/street.jpg");
auto detections = detector.predict(image);
// 4. Draw results
for (const auto& box : detections) {
cv::rectangle(image, { (int)box.x1, (int)box.y1 }, { (int)box.x2, (int)box.y2 }, {0, 255, 0}, 2);
}
cv::imwrite("detections_output.jpg", image);
std::cout << "Saved annotated image to detections_output.jpg\n";
}Config Struct: DetectorConfig
Output Struct: BoundingBox
Segmenter
Performs semantic segmentation, assigning a class label to every pixel in an image.
Header: #include <xinfer/zoo/vision/segmenter.h>
#include <xinfer/zoo/vision/segmenter.h>
#include <opencv2/opencv.hpp>
int main() {
// 1. Configure the segmenter
xinfer::zoo::vision::SegmenterConfig config;
config.engine_path = "assets/segformer.engine";
// 2. Initialize
xinfer::zoo::vision::Segmenter segmenter(config);
// 3. Predict
cv::Mat image = cv::imread("assets/cityscape.jpg");
cv::Mat class_mask = segmenter.predict(image); // Returns a CV_8UC1 mask
// 4. Visualize the result
cv::Mat color_mask;
// (Here you would apply a colormap for visualization)
cv::imwrite("segmentation_output.png", class_mask);
}Config Struct: SegmenterConfig
Output: cv::Mat (single-channel, 8-bit integer mask)
InstanceSegmenter
Performs instance segmentation, detecting individual object instances and providing a per-pixel mask for each one.
Header: #include <xinfer/zoo/vision/instance_segmenter.h>
#include <xinfer/zoo/vision/instance_segmenter.h>
#include <opencv2/opencv.hpp>
int main() {
// 1. Configure the instance segmenter
xinfer::zoo::vision::InstanceSegmenterConfig config;
config.engine_path = "assets/mask_rcnn.engine";
config.labels_path = "assets/coco.names";
// 2. Initialize
xinfer::zoo::vision::InstanceSegmenter segmenter(config);
// 3. Predict
cv::Mat image = cv::imread("assets/people.jpg");
auto results = segmenter.predict(image);
// 4. Draw results
for (const auto& instance : results) {
// (Draw the instance.mask and instance.bounding_box on the image)
}
cv::imwrite("instance_segmentation_output.jpg", image);
}Config Struct: InstanceSegmenterConfig
Output Struct: InstanceSegmentationResult (contains box, mask, label, etc.)
PoseEstimator
Estimates the 2D keypoints of a human pose.
Header: #include <xinfer/zoo/vision/pose_estimator.h>
#include <xinfer/zoo/vision/pose_estimator.h>
#include <opencv2/opencv.hpp>
int main() {
xinfer::zoo::vision::PoseEstimatorConfig config;
config.engine_path = "assets/rtmpose.engine";
xinfer::zoo::vision::PoseEstimator estimator(config);
cv::Mat image = cv::imread("assets/person_running.jpg");
auto poses = estimator.predict(image);
// Draw the keypoints for the first detected person
if (!poses.empty()) {
for (const auto& keypoint : poses) {
if (keypoint.confidence > 0.5f) {
cv::circle(image, { (int)keypoint.x, (int)keypoint.y }, 3, {0, 0, 255}, -1);
}
}
}
cv::imwrite("pose_output.jpg", image);
}Config Struct: PoseEstimatorConfig
Output: std::vector<Pose> where Pose is a std::vector<Keypoint>
And More...
This module provides many more specialized pipelines, each with a simple, consistent API.
DepthEstimator: Predicts a dense depth map from a single RGB image.FaceDetector: A lightweight and fast detector specifically for faces.FaceRecognizer: Generates a 512-d feature embedding for a face, used for identification.HandTracker: Detects and tracks hands and their keypoints in real-time.OCR: A full, two-stage pipeline for detecting and recognizing text.ImageDeblur: Sharpens blurry images using a generative model.LowLightEnhancer: Brightens and denoises dark or nighttime images.SmokeFlameDetector: A specialized detector for industrial safety and wildfire monitoring.
Each of these would have its own section with a code example, just like the ones above.
