Dual-Thread Async Pumping, Result Marshalling, and Thread-Safety Procedures
sequenceDiagram
participant GT as Game Thread (Actor/UI)
participant PL as Plugin Layer (Wrapper)
participant K as Worker Thread (Async Pump)
GT->>PL: 1. SendChatRequest()
PL->>K: 2. Marshal Lambda Closures
activate K
K->>K: 3. Execute Inference Loop (WaitUntilDone)
loop Token Production
K->>PL: 4. Internal_OnChunk
PL-->>GT: 5. AsyncTask(GameThread) -> UI Update
end
deactivate K
K-->>PL: 6. Internal_OnDone
PL-->>GT: 7. AsyncTask(GameThread) -> Final Stats
By isolating the Compute Thread from the Game Thread, the plugin ensures that LLM inference—a heavy physical operation—never interferes with game frame rates.
All callbacks from background threads are automatically wrapped in ENamedThreads::GameThread via AsyncTask. When business logic (Layer 3) receives a signal, it is already physically back on the main thread.
SendChatRequest is a millisecond-level non-blocking call. It simply "commits" the task to the pipeline. This allows the game to maintain 60+ FPS while waiting for AI responses.
How the pipeline drives business rendering:
// 1. Logic defined in Layer 3 (Thread-agnostic)
FLiteRtLmChunkCallback MyOnChunk = ...Lambda([this](const FString& Chunk) {
// Safety: Marshalling guaranteed by Plugin Layer
UpdateUMG(Chunk);
});
// 2. Commit to Pipeline
Subsystem->SendChatRequest(..., MyOnChunk, ...);