esc.configuration

Members list

Type members

Classlikes

case class AiConfig(modelContextSize: Int = ..., modelBatchSize: Int = ..., modelGpuLayers: Int = ..., modelThreads: Int = ..., inferenceTemperature: Float = ..., inferenceTopK: Int = ..., inferenceTopP: Float = ..., inferenceMinP: Float = ..., inferenceRepeatPenalty: Float = ..., inferenceMaxTokens: Int = ..., inferencePresencePenalty: Float = ..., inferenceFrequencyPenalty: Float = ..., inferenceStopList: Array[String] = ..., agentSimilarityThresholdForHitToExplain: Double = ...)

Configuration class for the local LLM (llama.cpp via llama-cpp-java).

All parameters are preconfigured for deterministic behaviour, low randomness, and minimal hallucinations. This is essential for entity and name-matching use cases, where predictable and reproducible outputs are required.

Value parameters

agentSimilarityThresholdForHitToExplain: Similarity threshold used by match-explanation logic. Determines whether two entities are considered sufficiently similar for detailed explanation or justification. Default: 0.8
inferenceFrequencyPenalty: Penalizes tokens proportionally to how often they appear in the output, reducing repetitive patterns. Similar to OpenAI's frequency penalty. Default: 0.2
inferenceMaxTokens: Maximum number of tokens the model may generate for a single inference request. Protects against runaway generation. Default: 256
inferenceMinP: Minimum probability threshold for token sampling. Prevents sampling from extremely low-probability tokens, improving determinism. Default: 0.1
inferencePresencePenalty: Penalizes tokens that already appear in the output to increase topic diversity. Similar to OpenAI's presence penalty. Default: 0.2
inferenceRepeatPenalty: Penalty factor applied to recently used tokens to reduce repetitive outputs. Values slightly above 1.0 discourage repetition. Default: 1.5
inferenceStopList: Array of strings marking stop conditions. If the model generates any of these strings, inference stops immediately. Default: empty array
inferenceTemperature: Sampling temperature controlling randomness. Lower values produce more deterministic outputs; higher values allow more creativity. Recommended: 0.2 – 0.8 Default: 0.6
inferenceTopK: Limits sampling to the top K highest-probability tokens. Lower values reduce randomness and help avoid hallucinations. Default: 5
inferenceTopP: Nucleus sampling threshold. Model samples only from tokens whose cumulative probability reaches top_p. Combines well with top_k. Default: 1.0 (disabled)
modelBatchSize: Number of tokens processed per batch during inference. Higher values may improve throughput but also increase memory consumption. Default: 32
modelContextSize: Size of the model’s context window in tokens. Determines how many tokens the model can keep in memory during inference. Default: 1024
modelGpuLayers: Number of model layers to offload onto the GPU. Requires llama.cpp compiled with GPU support. Improves inference speed if GPU is available. Use 0 to run entirely on CPU. Default: 0
modelThreads: Number of CPU threads used for inference. Defaults to the number of available CPU cores minus two, ensuring system responsiveness. Default: Runtime.availableProcessors - 2 (minimum 1)

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Sugar object for creating AiConfig when using Java. For using Scala "new SimilarityConfig()" is exactly the same.

Attributes

Supertypes: class Object

trait Matchable

class Any
Self type: AiConfigFactory.type

case class SimilarityConfig(normOrgLegalformWeight: Double = ..., normOrgCountryWeight: Double = ..., nameElementSimilarityForHit: Double = ..., matchSelectionMode: Int = ..., checkDateForSearchHit: Boolean = ..., dateComparisonMethod: Int = ..., maxDateYearDifferenceForHit: Int = ..., checkCountryForSearchHit: Boolean = ..., similarityValueForSearchHit: Double = ..., numberOfHitsForSearchHit: Int = ..., maxNumberOfCandidatesFromSearch: Int = ..., searchEntityGroupMode: Int = ..., allowOneLetterAbbreviation: Boolean = ..., oneLetterAbbreviationWeight: Double = ..., checkCountyForAdressSearch: Boolean = ..., numberOfHitsForAddressSearchHit: Int = ..., fuzzyScoreForAddressSearch: Double = ...)

Class for the configuration of the normalizing and similarity stuff. Important: Make sure you use the same configuration for indexing and searching/comparing. Otherwise there may be unwanted side effects.

Value parameters

DateComparisonMethod: Method which date parts are to be compared. Currently only 0 = year is supported. Default is 0.
allowOneLetterAbbreviation: Defines whether abbreviations with a letter are taken into account. With true, for example, Benjamin is a hit with B. Default is false.
checkCountryForSearchHit: Defines whether the country should be considered or not. Default is true.
checkCountyForAdressSearch: Defines whether the country should be considered or not in address search. Countries overrules stop or hit words. Default is true.
checkDateForSearchHit: Defines whether the date should be taken into account or not. Default is true.
fuzzyScoreForAddressSearch: Value of the fuzziness to identify individual elements of an address as hits. Value between 0.1 and 1. Default is 0.8.
matchSelectionMode: Method of how a match has to be determined: 0 = Based on simialrity. 1 = Based on nofHits (number of hits). Default is 0.
maxDateYearDifferenceForHit: Defines the uncertainty/tolerance in the annual comparison in number of years (+/-). Default is 2.
maxNumberOfCandidatesFromSearch: Defines the maximum number of candidates to be considered by the IR search, from which hits are then determined. Default is 10000
nameElementSimilarityForHit: Minimum similarity to mark as hit. Default is 0.9.
normOrgCountryWeight: Weight (reduction) of a country match (recommended: < 1, default is 0.5).
normOrgLegalformWeight: Weight (reduction) of a legal form match (recommended: < 1, default is 0.25).
numberOfHitsForAddressSearchHit: Minimum number of elements that must be found for the address to be considered a hit. Default is 2.
numberOfHitsForSearchHit: Value of the nofHits (number of hits) from which the comparison is classified as a hit. Default is 2.
oneLetterAbbreviationWeight: If abbreviations are taken into account, this value defines the weight (reduction) of such a hit. Default is 0.5.
searchEntityGroupMode: Defines the field by which the hits are to be grouped. Depends which value is unique: 0 = externalId, 1 = Id. Default is 0.
similarityValueForSearchHit: Value of the similarity from which the comparison is classified as a hit. Default is 0.9.

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Sugar object for creating SimilarityConfig when using Java. For using Scala "new SimilarityConfig()" is exactly the same.

Attributes

Supertypes: class Object

trait Matchable

class Any
Self type: SimilarityConfigFactory.type

In this article

Generated with