Zoo API: Generative AI
The xinfer::zoo::generative module provides high-level pipelines for a wide range of creative and generative AI tasks.
These classes are built on top of xInfer's hyper-optimized engines for state-of-the-art generative models like Stable Diffusion, GANs, and more. The zoo API abstracts away the complexity of these models, from multi-stage pipelines to iterative sampling loops, providing simple, powerful tools to bring your creative ideas to life in C++.
DiffusionPipeline
Generates high-quality images from text prompts using a Stable Diffusion-style model.
Header: #include <xinfer/zoo/generative/diffusion_pipeline.h>
#include <xinfer/zoo/generative/diffusion_pipeline.h>
#include <xinfer/utils/image_utils.h> // For saving the final tensor
#include <iostream>
int main() {
// 1. Configure the pipeline.
// The engine would be a pre-built U-Net from a Stable Diffusion model.
xinfer::zoo::generative::DiffusionPipelineConfig config;
config.unet_engine_path = "assets/stable_diffusion_unet.engine";
config.num_timesteps = 50; // Number of denoising steps
// 2. Initialize.
xinfer::zoo::generative::DiffusionPipeline pipeline(config);
// 3. Generate an image.
// The complex, 50-step iterative loop is handled internally in high-performance C++.
std::cout << "Generating image with diffusion model...\n";
// A full implementation would also take a text prompt that is processed by a CLIP text encoder.
xinfer::core::Tensor image_tensor = pipeline.generate(1);
// 4. Save the result.
xinfer::utils::save_tensor_as_image(image_tensor, "diffusion_output.png");
std::cout << "Image saved to diffusion_output.png\n";
}Config Struct: DiffusionPipelineConfig
Input: batch_size (and optionally, text prompt embeddings).
Output: xinfer::core::Tensor containing the generated image.
"F1 Car" Technology: The entire iterative denoising loop is a compiled C++ for loop, and each step uses a custom, fused CUDA kernel from postproc::diffusion_sampler, providing a massive speedup over a Python-based loop.
DCGAN
Generates images from a random latent vector using a Generative Adversarial Network.
Header: #include <xinfer/zoo/generative/dcgan.h>
#include <xinfer/zoo/generative/dcgan.h>
#include <xinfer/utils/image_utils.h>
int main() {
xinfer::zoo::generative::DCGAN_Generator generator("assets/dcgan_generator.engine");
std::cout << "Generating image with DCGAN...\n";
xinfer::core::Tensor image_tensor = generator.generate(1);
xinfer::utils::save_tensor_as_image(image_tensor, "dcgan_output.png");
std::cout << "Image saved to dcgan_output.png\n";
}Input: batch_size.
Output: xinfer::core::Tensor containing the generated image.
StyleTransfer
Applies the artistic style of one image to the content of another.
Header: #include <xinfer/zoo/generative/style_transfer.h>
#include <xinfer/zoo/generative/style_transfer.h>
#include <opencv2/opencv.hpp>
int main() {
// The engine is pre-built from a model trained on a specific style (e.g., "Starry Night").
xinfer::zoo::generative::StyleTransferConfig config;
config.engine_path = "assets/starry_night_style.engine";
xinfer::zoo::generative::StyleTransfer stylizer(config);
cv::Mat content_image = cv::imread("assets/my_photo.jpg");
cv::Mat styled_image = stylizer.predict(content_image);
cv::imwrite("styled_output.jpg", styled_image);
std::cout << "Saved styled image to styled_output.jpg\n";
}Config Struct: StyleTransferConfig
Input: cv::Mat content image.
Output: cv::Mat stylized image.
SuperResolution
Upscales a low-resolution image to a high-resolution version, adding realistic detail.
Header: #include <xinfer/zoo/generative/super_resolution.h>
#include <xinfer/zoo/generative/super_resolution.h>
#include <opencv2/opencv.hpp>
int main() {
xinfer::zoo::generative::SuperResolutionConfig config;
config.engine_path = "assets/esrgan_x4.engine";
config.upscale_factor = 4;
xinfer::zoo::generative::SuperResolution upscaler(config);
cv::Mat low_res_image = cv::imread("assets/low_res.png");
cv::Mat high_res_image = upscaler.predict(low_res_image);
cv::imwrite("super_resolution_output.png", high_res_image);
}Config Struct: SuperResolutionConfig
Input: cv::Mat low-resolution image.
Output: cv::Mat high-resolution image.
And More...
This module provides many more specialized generative pipelines.
Inpainter: Fills in a masked region of an image with plausible content.Outpainter: Extends the boundaries of an image with generated content.Colorizer: Adds realistic color to a black-and-white image.TextToSpeech: Converts a string of text into a spoken audio waveform.ImageToVideo: Generates a short video clip from a starting image.VideoFrameInterpolation: Creates smooth, slow-motion video by generating intermediate frames.
Each of these would have its own section with a code example, just like the ones above.
