AiConfig
Configuration class for the local LLM (llama.cpp via llama-cpp-java).
All parameters are preconfigured for deterministic behaviour, low randomness, and minimal hallucinations. This is essential for entity and name-matching use cases, where predictable and reproducible outputs are required.
Value parameters
- agentSimilarityThresholdForHitToExplain
-
Similarity threshold used by match-explanation logic. Determines whether two entities are considered sufficiently similar for detailed explanation or justification. Default:
0.8 - inferenceFrequencyPenalty
-
Penalizes tokens proportionally to how often they appear in the output, reducing repetitive patterns. Similar to OpenAI's frequency penalty. Default:
0.2 - inferenceMaxTokens
-
Maximum number of tokens the model may generate for a single inference request. Protects against runaway generation. Default:
256 - inferenceMinP
-
Minimum probability threshold for token sampling. Prevents sampling from extremely low-probability tokens, improving determinism. Default:
0.1 - inferencePresencePenalty
-
Penalizes tokens that already appear in the output to increase topic diversity. Similar to OpenAI's presence penalty. Default:
0.2 - inferenceRepeatPenalty
-
Penalty factor applied to recently used tokens to reduce repetitive outputs. Values slightly above
1.0discourage repetition. Default:1.5 - inferenceStopList
-
Array of strings marking stop conditions. If the model generates any of these strings, inference stops immediately. Default: empty array
- inferenceTemperature
-
Sampling temperature controlling randomness. Lower values produce more deterministic outputs; higher values allow more creativity. Recommended:
0.2 – 0.8Default:0.6 - inferenceTopK
-
Limits sampling to the top K highest-probability tokens. Lower values reduce randomness and help avoid hallucinations. Default:
5 - inferenceTopP
-
Nucleus sampling threshold. Model samples only from tokens whose cumulative probability reaches
top_p. Combines well withtop_k. Default:1.0(disabled) - modelBatchSize
-
Number of tokens processed per batch during inference. Higher values may improve throughput but also increase memory consumption. Default:
32 - modelContextSize
-
Size of the model’s context window in tokens. Determines how many tokens the model can keep in memory during inference. Default:
1024 - modelGpuLayers
-
Number of model layers to offload onto the GPU. Requires llama.cpp compiled with GPU support. Improves inference speed if GPU is available. Use
0to run entirely on CPU. Default:0 - modelThreads
-
Number of CPU threads used for inference. Defaults to the number of available CPU cores minus two, ensuring system responsiveness. Default:
Runtime.availableProcessors - 2(minimum 1)
Attributes
- Graph
-
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass Any