Engine initialization parameter structure. Defines model weight loading paths, compute backend topology, and resource pre-allocation strategies.
The Config structure is the "seed" of the entire inference pipeline. Once populated in memory, it is passed directly to the CreateEngine function, anchoring the configuration within the engine instance until destruction.
| Member | Type | Description & Responsibilities |
|---|---|---|
model_path |
const char* |
Absolute path to the model file. Must point to a valid .bin or .tflite weight file. The DLL opens this file in read-only mode. |
backend |
const char* |
Inference backend chain. Example: "gpu", "cpu", "gpu,cpu".Specifies preferred devices, comma-separated, tried in order of priority. |
max_num_tokens |
int |
Total KV Cache context capacity. Determines the size of the pre-allocated GPU KV Cache buffer. Recommended to set to the model's maximum supported length (e.g., 2048, 4096). |
num_threads |
int |
CPU Parallelism. Only effective when backend includes CPU. Determines the thread pool size for XNNPACK.
|
bOptimizeShader |
int |
[UE5 Specific] Shader Optimization Switch. When set to 1, the DLL performs instruction pre-compilation during WebGPU initialization to reduce stuttering during inference. |
// 1. Initialize configuration object
LiteRtLm_Config config = {0};
config.model_path = "D:/Models/Gemma-2b-it.tflite";
config.backend = "gpu"; // Prefer WebGPU acceleration
config.max_num_tokens = 2048; // Reserve 2k context
config.num_threads = 8; // Use 8 threads if falling back to CPU
config.bOptimizeShader = 1; // Enable shader pre-optimization
// 2. Pass to engine creation function
void* engine = LiteRtLm_CreateEngine(config);
if (engine == NULL) {
// Handle initialization failure...
}