Pipeline: Task Distribution

Sequence / Async Task Distribution Sequence

            sequenceDiagram
                participant GT as Game Thread (Actor/UI)
                participant PL as Plugin Layer (Wrapper)
                participant K as Worker Thread (Async Pump)
                
                GT->>PL: 1. SendChatRequest()
                PL->>K: 2. Marshal Lambda Closures
                activate K
                K->>K: 3. Execute Inference Loop (WaitUntilDone)
                loop Token Production
                    K->>PL: 4. Internal_OnChunk
                    PL-->>GT: 5. AsyncTask(GameThread) -> UI Update
                end
                deactivate K
                K-->>PL: 6. Internal_OnDone
                PL-->>GT: 7. AsyncTask(GameThread) -> Final Stats

01 / Concept: Async Pumping

By isolating the Compute Thread from the Game Thread, the plugin ensures that LLM inference—a heavy physical operation—never interferes with game frame rates.

Auto-Marshalling

All callbacks from background threads are automatically wrapped in ENamedThreads::GameThread via AsyncTask. When business logic (Layer 3) receives a signal, it is already physically back on the main thread.

Non-blocking Commitment

SendChatRequest is a millisecond-level non-blocking call. It simply "commits" the task to the pipeline. This allows the game to maintain 60+ FPS while waiting for AI responses.

02 / Source Code Demo (C++)

How the pipeline drives business rendering:

// 1. Logic defined in Layer 3 (Thread-agnostic)
FLiteRtLmChunkCallback MyOnChunk = ...Lambda([this](const FString& Chunk) {
    // Safety: Marshalling guaranteed by Plugin Layer
    UpdateUMG(Chunk); 
});

// 2. Commit to Pipeline
Subsystem->SendChatRequest(..., MyOnChunk, ...);