Sampling and constraint parameter structure. Controls the creativity of text generation (Temperature/Top-P) and hard constraints on output format (JSON/Regex).
SamplingParams not only determines the "personality" of the AI but also, through the llguidance integration layer, forces the output to conform to a specific Schema (like JSON).
| Member | Type | Description & Constraints |
|---|---|---|
temperature |
float |
Sampling Temperature. Range [0.0, 2.0]. 0.0 indicates Greedy search for deterministic results; 1.0 is standard creativity. |
top_p |
float |
Nucleus Sampling. Range [0.0, 1.0]. Samples only from the set of candidates whose cumulative probability reaches P. Usually set to 0.9. |
max_tokens |
int |
Generation Limit. Forcibly stops after this many tokens, regardless of whether an EOF token was generated. |
constraint_type |
int |
Hard Constraint Type. 0: No constraint | 1: Regex | 2: JSON Schema | 3: Lark Grammar. |
constraint_string |
const char* |
Constraint Description String. Passes the corresponding Regex or JSON Schema based on constraint_type.
|
constraint_type > 0, the first token latency might increase slightly due to the initialization of the llguidance grammar state machine.
LiteRtLm_SamplingParams params = {0};
params.temperature = 0.0f; // 0 temperature is recommended for structured output
params.max_tokens = 512;
// Enable hard JSON Schema constraint
params.constraint_type = 2;
params.constraint_string = "{\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}";
// Apply to inference
LiteRtLm_RunInference(conversation, params, MyCallback, NULL);