EN 中文
Back to Overview

LiteRtLm_SamplingParams

Sampling and constraint parameter structure. Controls the creativity of text generation (Temperature/Top-P) and hard constraints on output format (JSON/Regex).

01. Role Context in Generation Loop

graph TD Params[LiteRtLm_SamplingParams] -->|Input| Run[LiteRtLm_RunInference] subgraph Inference_Loop [Kernel Generation Loop] Logits[Raw Logits] --> Sampling[Sampling Logic: Temp/Top-P] Sampling --> Constraint[Constraint Correction: LLGuidance] Constraint -->|Next Token| Result[Append to Result] end Params -.->|Controls| Sampling Params -.->|Provides Grammar/Regex| Constraint

SamplingParams not only determines the "personality" of the AI but also, through the llguidance integration layer, forces the output to conform to a specific Schema (like JSON).

02. Member Variable Details

Member Type Description & Constraints
temperature float Sampling Temperature. Range [0.0, 2.0].
0.0 indicates Greedy search for deterministic results; 1.0 is standard creativity.
top_p float Nucleus Sampling. Range [0.0, 1.0].
Samples only from the set of candidates whose cumulative probability reaches P. Usually set to 0.9.
max_tokens int Generation Limit.
Forcibly stops after this many tokens, regardless of whether an EOF token was generated.
constraint_type int Hard Constraint Type.
0: No constraint | 1: Regex | 2: JSON Schema | 3: Lark Grammar.
constraint_string const char* Constraint Description String.
Passes the corresponding Regex or JSON Schema based on constraint_type.
Note: When constraint_type > 0, the first token latency might increase slightly due to the initialization of the llguidance grammar state machine.

03. C Language Usage Example (JSON Constraint)

LiteRtLm_SamplingParams params = {0};
params.temperature = 0.0f; // 0 temperature is recommended for structured output
params.max_tokens = 512;

// Enable hard JSON Schema constraint
params.constraint_type = 2; 
params.constraint_string = "{\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}";

// Apply to inference
LiteRtLm_RunInference(conversation, params, MyCallback, NULL);