Architecture Overview

litert_lm_wrapper.dll is a deep convergence of over 70 industrial-grade integration items. This page fully lists every engineering component involved in compilation and linking. The breadth of this data represents the depth of our work.

I. Core Inference Framework

Component	License	Status	Responsibilities
LiteRT (Google)	Apache-2.0	PATCHED	Core inference kernel; refactored include paths and fixed MSVC symbol conflicts.
TensorFlow Core	Apache-2.0	PATCHED	Base operator library and multi-thread scheduling optimization.
rules_ml_toolchain	Apache-2.0	INTEGRATED	Manages hermeticity of GPU (CUDA/CUDNN) compilation environment.
bazel-toolchains	Apache-2.0	INTEGRATED	Cross-platform toolchain configuration set.

II. Logic Constraints & Integration Matrix

llguidance	MIT	5x PATCHED	Microsoft constraint framework. Deeply tuned sub-modules including RegexVec, Grammar, and Parser.
`toktrie`	MIT	PATCHED	Efficient Token retrieval Trie tree, supporting complex grammar validation.
antlr4rust	BSD-3-Clause	INTEGRATED	Rust grammar parser core, supporting structured output parsing.
`antlr_fc_tool_call_parser`	Custom	INTEGRATED	Integrated Function Call tool call parser A.
`antlr_python_tool_call_parser`	Custom	INTEGRATED	Integrated Python-style tool call parser B.
`json_parser`	Custom	INTEGRATED	Dedicated JSON structure validity checker.
`python_parser`	Custom	INTEGRATED	Python code block semantic parsing component.
`fc_parser`	Custom	INTEGRATED	Generic tool call protocol conversion parser.

III. Tokenization & Template Rendering

sentencepiece	Apache-2.0	REFACTORED	Tokenizer source refactored; severed full Abseil dependency for Unreal compatibility.
Minja (C++)	Apache-2.0	PATCHED	Fixed parser crash when handling extremely large JSON payloads.
minijinja (Rust)	Apache-2.0	INTEGRATED	High-performance Rust template engine.
tokenizers-cpp	Apache-2.0	INTEGRATED	HuggingFace protocol C++ bridge.
tokenizers (Rust)	Apache-2.0	INTEGRATED	HuggingFace tokenization core Rust runtime.
nanobind_json	MIT	PATCHED	JSON binding layer adaptation patch.

IV. Build Rules & Toolchain Support

`rules_rust`	Apache-2.0	PATCHED	Manages Rust toolchain; handles symbol visibility on Windows.
`rules_python`	Apache-2.0	INTEGRATED	Hermetic Python build environment.
`rules_apple`	Apache-2.0	INTEGRATED	iOS platform build support.
`rules_swift`	Apache-2.0	INTEGRATED	Swift library integration.
`rules_kotlin`	Apache-2.0	INTEGRATED	Android Kotlin support.
`rules_shell`	Apache-2.0	INTEGRATED	Cross-platform script execution rules.
`rules_platform`	Apache-2.0	INTEGRATED	Platform abstraction descriptions.
`platforms`	Apache-2.0	INTEGRATED	Base platform configurations.
`bazel_features`	Apache-2.0	INTEGRATED	Version feature detection library.
`apple_support`	Apache-2.0	INTEGRATED	Apple compilation helper set.
`rules_jvm_external`	Apache-2.0	INTEGRATED	Java dependency management.
cxxbridge_cmd	MIT/Apache	INTEGRATED	C++/Rust bridge code generator.

V. Hardware Vendor Acceleration Layer

`Qualcomm QAIRT`	Proprietary	INTEGRATED	Snapdragon NPU driver-level integration.
`MediaTek NeuroPilot`	Proprietary	INTEGRATED	Dimensity APU adaptation layer integration.
`Google Tensor SDK`	Proprietary	INTEGRATED	Pixel TPU hardware acceleration.
`WebGPU (Dawn/WGPU)`	BSD-3-Clause	INTEGRATED	Cross-platform GPU inference abstraction.
`XNNPACK`	Apache-2.0	INTEGRATED	High-performance CPU inference operator acceleration.
`CUDA / CUDNN / NCCL`	NVIDIA	INTEGRATED	Desktop GPU deep computing acceleration.

API Symbol Index

STRUCT

LiteRtLm_Config

STRUCT

LiteRtLm_SamplingParams

LiteRtLm_CreateEngine

FUNCTION

LiteRtLm_DestroyEngine

AppendAssistantMessage

CORE

LiteRtLm_RunInference

FUNCTION

LiteRtLm_StopMessage

FUNCTION

LiteRtLm_WaitUntilDone

Full Integration Blueprint

WinyunqDebug/main.cpp Guide

VRAM Topology

graph TD subgraph GPU[GPU VRAM - Hot Zone] W[Model Weights] AC[Active KV Cache] end subgraph RAM[System RAM - Cold Zone] IC[Inactive Agent Caches] end AC --- RAM