EN 中文
Back to Overview

LiteRtLm_RunInference

Triggers asynchronous incremental inference. Submits the request to the computation queue and returns immediately. Results are pushed via a callback function.

01. Async Task Submission Flow

sequenceDiagram participant App as Host App (Worker Thread) participant DLL as Wrapper DLL participant GPU as WebGPU / CPU Device App->>DLL: Call RunInference DLL->>GPU: Submit Token Generation Task to Queue DLL-->>App: Immediate Return (Non-blocking) Note over App: Callback not yet triggered App->>DLL: Call WaitUntilDone (Drive Task) GPU->>DLL: Produce Incremental Token DLL->>App: Trigger LiteRtLmCallback
Non-blocking Design: RunInference itself does not perform computation; it only "ignites" the process. Actual computation driving and callback dispatching must be completed through subsequent calls to LiteRtLm_WaitUntilDone.

02. Detailed Parameter Definition

void LiteRtLm_RunInference(
    void* conv_ptr, 
    LiteRtLm_SamplingParams params,
    LiteRtLmCallback callback, 
    void* user_ptr
);
ParameterDescription
conv_ptr The session handle to execute inference on.
params Sampling and constraint parameters. See LiteRtLm_SamplingParams.
callback Streaming result callback function pointer.
user_ptr User context pointer, passed through to the callback.

03. Standard Invocation Sequence

// Recommended to run this loop in a background thread
void BackgroundInferenceThread() {
    // 1. Submit Task
    LiteRtLm_RunInference(Conv, MyParams, MyCallback, this);
    
    // 2. Enter driving loop until inference ends
    int Status = 0;
    while (Status == 0) {
        // Pump results every 100ms, or block until a token is produced
        Status = LiteRtLm_WaitUntilDone(Engine, 1); 
        if (bInterruptFlag) break;
    }
}