AiConfig

esc.configuration.AiConfig
case class AiConfig(modelContextSize: Int = ..., modelBatchSize: Int = ..., modelGpuLayers: Int = ..., modelThreads: Int = ..., inferenceTemperature: Float = ..., inferenceTopK: Int = ..., inferenceTopP: Float = ..., inferenceMinP: Float = ..., inferenceRepeatPenalty: Float = ..., inferenceMaxTokens: Int = ..., inferencePresencePenalty: Float = ..., inferenceFrequencyPenalty: Float = ..., inferenceStopList: Array[String] = ..., agentSimilarityThresholdForHitToExplain: Double = ...)

Configuration class for the local LLM (llama.cpp via llama-cpp-java).

All parameters are preconfigured for deterministic behaviour, low randomness, and minimal hallucinations. This is essential for entity and name-matching use cases, where predictable and reproducible outputs are required.

Value parameters

agentSimilarityThresholdForHitToExplain

Similarity threshold used by match-explanation logic. Determines whether two entities are considered sufficiently similar for detailed explanation or justification. Default: 0.8

inferenceFrequencyPenalty

Penalizes tokens proportionally to how often they appear in the output, reducing repetitive patterns. Similar to OpenAI's frequency penalty. Default: 0.2

inferenceMaxTokens

Maximum number of tokens the model may generate for a single inference request. Protects against runaway generation. Default: 256

inferenceMinP

Minimum probability threshold for token sampling. Prevents sampling from extremely low-probability tokens, improving determinism. Default: 0.1

inferencePresencePenalty

Penalizes tokens that already appear in the output to increase topic diversity. Similar to OpenAI's presence penalty. Default: 0.2

inferenceRepeatPenalty

Penalty factor applied to recently used tokens to reduce repetitive outputs. Values slightly above 1.0 discourage repetition. Default: 1.5

inferenceStopList

Array of strings marking stop conditions. If the model generates any of these strings, inference stops immediately. Default: empty array

inferenceTemperature

Sampling temperature controlling randomness. Lower values produce more deterministic outputs; higher values allow more creativity. Recommended: 0.2 – 0.8 Default: 0.6

inferenceTopK

Limits sampling to the top K highest-probability tokens. Lower values reduce randomness and help avoid hallucinations. Default: 5

inferenceTopP

Nucleus sampling threshold. Model samples only from tokens whose cumulative probability reaches top_p. Combines well with top_k. Default: 1.0 (disabled)

modelBatchSize

Number of tokens processed per batch during inference. Higher values may improve throughput but also increase memory consumption. Default: 32

modelContextSize

Size of the model’s context window in tokens. Determines how many tokens the model can keep in memory during inference. Default: 1024

modelGpuLayers

Number of model layers to offload onto the GPU. Requires llama.cpp compiled with GPU support. Improves inference speed if GPU is available. Use 0 to run entirely on CPU. Default: 0

modelThreads

Number of CPU threads used for inference. Defaults to the number of available CPU cores minus two, ensuring system responsiveness. Default: Runtime.availableProcessors - 2 (minimum 1)

Attributes

Graph
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Members list

Value members

Inherited methods

def productElementNames: Iterator[String]

An iterator over the names of all the elements of this product.

An iterator over the names of all the elements of this product.

Attributes

Inherited from:
Product
def productIterator: Iterator[Any]

An iterator over all the elements of this product.

An iterator over all the elements of this product.

Attributes

Returns

in the default implementation, an Iterator[Any]

Inherited from:
Product