Architecture Overview
litert_lm_wrapper.dll is a deep convergence of over 70 industrial-grade integration items. This page fully lists every engineering component involved in compilation and linking. The breadth of this data represents the depth of our work.
I. Core Inference Framework
| Component | License | Status | Responsibilities |
|---|---|---|---|
| LiteRT (Google) | Apache-2.0 | PATCHED | Core inference kernel; refactored include paths and fixed MSVC symbol conflicts. |
| TensorFlow Core | Apache-2.0 | PATCHED | Base operator library and multi-thread scheduling optimization. |
| rules_ml_toolchain | Apache-2.0 | INTEGRATED | Manages hermeticity of GPU (CUDA/CUDNN) compilation environment. |
| bazel-toolchains | Apache-2.0 | INTEGRATED | Cross-platform toolchain configuration set. |
II. Logic Constraints & Integration Matrix
| llguidance | MIT | 5x PATCHED | Microsoft constraint framework. Deeply tuned sub-modules including RegexVec, Grammar, and Parser. |
toktrie | MIT | PATCHED | Efficient Token retrieval Trie tree, supporting complex grammar validation. |
| antlr4rust | BSD-3-Clause | INTEGRATED | Rust grammar parser core, supporting structured output parsing. |
antlr_fc_tool_call_parser | Custom | INTEGRATED | Integrated Function Call tool call parser A. |
antlr_python_tool_call_parser | Custom | INTEGRATED | Integrated Python-style tool call parser B. |
json_parser | Custom | INTEGRATED | Dedicated JSON structure validity checker. |
python_parser | Custom | INTEGRATED | Python code block semantic parsing component. |
fc_parser | Custom | INTEGRATED | Generic tool call protocol conversion parser. |
III. Tokenization & Template Rendering
| sentencepiece | Apache-2.0 | REFACTORED | Tokenizer source refactored; severed full Abseil dependency for Unreal compatibility. |
| Minja (C++) | Apache-2.0 | PATCHED | Fixed parser crash when handling extremely large JSON payloads. |
| minijinja (Rust) | Apache-2.0 | INTEGRATED | High-performance Rust template engine. |
| tokenizers-cpp | Apache-2.0 | INTEGRATED | HuggingFace protocol C++ bridge. |
| tokenizers (Rust) | Apache-2.0 | INTEGRATED | HuggingFace tokenization core Rust runtime. |
| nanobind_json | MIT | PATCHED | JSON binding layer adaptation patch. |
IV. Build Rules & Toolchain Support
rules_rust | Apache-2.0 | PATCHED | Manages Rust toolchain; handles symbol visibility on Windows. |
rules_python | Apache-2.0 | INTEGRATED | Hermetic Python build environment. |
rules_apple | Apache-2.0 | INTEGRATED | iOS platform build support. |
rules_swift | Apache-2.0 | INTEGRATED | Swift library integration. |
rules_kotlin | Apache-2.0 | INTEGRATED | Android Kotlin support. |
rules_shell | Apache-2.0 | INTEGRATED | Cross-platform script execution rules. |
rules_platform | Apache-2.0 | INTEGRATED | Platform abstraction descriptions. |
platforms | Apache-2.0 | INTEGRATED | Base platform configurations. |
bazel_features | Apache-2.0 | INTEGRATED | Version feature detection library. |
apple_support | Apache-2.0 | INTEGRATED | Apple compilation helper set. |
rules_jvm_external | Apache-2.0 | INTEGRATED | Java dependency management. |
| cxxbridge_cmd | MIT/Apache | INTEGRATED | C++/Rust bridge code generator. |
V. Hardware Vendor Acceleration Layer
Qualcomm QAIRT | Proprietary | INTEGRATED | Snapdragon NPU driver-level integration. |
MediaTek NeuroPilot | Proprietary | INTEGRATED | Dimensity APU adaptation layer integration. |
Google Tensor SDK | Proprietary | INTEGRATED | Pixel TPU hardware acceleration. |
WebGPU (Dawn/WGPU) | BSD-3-Clause | INTEGRATED | Cross-platform GPU inference abstraction. |
XNNPACK | Apache-2.0 | INTEGRATED | High-performance CPU inference operator acceleration. |
CUDA / CUDNN / NCCL | NVIDIA | INTEGRATED | Desktop GPU deep computing acceleration. |
API Symbol Index
STRUCT
LiteRtLm_Config
STRUCT
LiteRtLm_SamplingParams
STRUCT
LiteRtLm_Result
TYPEDEF
LiteRtLmCallback
FUNCTION
GetAvailableBackends
FUNCTION
LiteRtLm_CreateEngine
FUNCTION
LiteRtLm_DestroyEngine
FUNCTION
CreateConversation
FUNCTION
CreateWithConfig
FUNCTION
DestroyConversation
FUNCTION
AppendUserMessage
FUNCTION
AppendMessageJson
FUNCTION
AppendAssistantMessage
CORE
LiteRtLm_RunInference
FUNCTION
LiteRtLm_StopMessage
FUNCTION
LiteRtLm_WaitUntilDone
VRAM Topology
graph TD
subgraph GPU[GPU VRAM - Hot Zone]
W[Model Weights]
AC[Active KV Cache]
end
subgraph RAM[System RAM - Cold Zone]
IC[Inactive Agent Caches]
end
AC --- RAM