Triggers asynchronous incremental inference. Submits the request to the computation queue and returns immediately. Results are pushed via a callback function.
RunInference itself does not perform computation; it only "ignites" the process. Actual computation driving and callback dispatching must be completed through subsequent calls to LiteRtLm_WaitUntilDone.
void LiteRtLm_RunInference(
void* conv_ptr,
LiteRtLm_SamplingParams params,
LiteRtLmCallback callback,
void* user_ptr
);
| Parameter | Description |
|---|---|
conv_ptr |
The session handle to execute inference on. |
params |
Sampling and constraint parameters. See LiteRtLm_SamplingParams. |
callback |
Streaming result callback function pointer. |
user_ptr |
User context pointer, passed through to the callback. |
// Recommended to run this loop in a background thread
void BackgroundInferenceThread() {
// 1. Submit Task
LiteRtLm_RunInference(Conv, MyParams, MyCallback, this);
// 2. Enter driving loop until inference ends
int Status = 0;
while (Status == 0) {
// Pump results every 100ms, or block until a token is produced
Status = LiteRtLm_WaitUntilDone(Engine, 1);
if (bInterruptFlag) break;
}
}