high_level module
- corrdim.high_level.measure_text(text, model, tokenizer=None, truncation_tokens=None, context_length=None, dim_reduction=8192, stride=1, correlation_integral_range=None, epsilon_range=None, num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float16, backend=None, forward_chunk_size=None, **model_kwargs)[source]
- Parameters:
text (str)
model (str | LanguageModelWrapper)
tokenizer (object | None)
truncation_tokens (int | None)
context_length (int | None)
dim_reduction (int | None)
stride (int)
correlation_integral_range (Tuple[float, float] | None)
epsilon_range (Tuple[float, float] | None)
num_epsilon (int)
block_size (int)
show_progress (bool)
precision (torch.dtype)
backend (str | None)
forward_chunk_size (int | None)
- Return type:
- corrdim.high_level.measure_text_progressive(text, model, tokenizer=None, truncation_tokens=None, skip_prefix_tokens=100, measure_every_tokens=None, context_length=None, dim_reduction=8192, stride=1, correlation_integral_range=None, epsilon_range=None, num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float16, backend=None, forward_chunk_size=None, **model_kwargs)[source]
Compute progressive curves once, then fit correlation dimension at sampled prefixes.
For each index
iinrange(skip_prefix_tokens, sequence_length, step), uses rowcorrints_progressive[i]with the sharedepsilonsgrid. Results are inby_prefix(i→DimensionResult). Ifmeasure_every_tokensisNone,stepis chosen fromsequence_length:< 100→1,< 1000→10, otherwise100. Other arguments followmeasure_text()/progressive_curve_from_text().- Parameters:
text (str)
model (str | LanguageModelWrapper)
tokenizer (object | None)
truncation_tokens (int | None)
skip_prefix_tokens (int)
measure_every_tokens (int | None)
context_length (int | None)
dim_reduction (int | None)
stride (int)
correlation_integral_range (Tuple[float, float] | None)
epsilon_range (Tuple[float, float] | None)
num_epsilon (int)
block_size (int)
show_progress (bool)
precision (torch.dtype)
backend (str | None)
forward_chunk_size (int | None)
- Return type:
- corrdim.high_level.measure_texts(texts, model, tokenizer=None, truncation_tokens=None, context_length=None, dim_reduction=8192, stride=1, correlation_integral_range=None, epsilon_range=None, num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float16, backend=None, batch_size=None, forward_chunk_size=None, **model_kwargs)[source]
- Parameters:
texts (list[str])
model (str | LanguageModelWrapper)
tokenizer (object | None)
truncation_tokens (int | None)
context_length (int | None)
dim_reduction (int | None)
stride (int)
correlation_integral_range (Tuple[float, float] | None)
epsilon_range (Tuple[float, float] | None)
num_epsilon (int)
block_size (int)
show_progress (bool)
precision (torch.dtype)
backend (str | None)
batch_size (int | None)
forward_chunk_size (int | None)
- Return type:
list[DimensionResult]
- corrdim.high_level.measure_texts_progressive(texts, model, tokenizer=None, truncation_tokens=None, skip_prefix_tokens=100, measure_every_tokens=None, context_length=None, dim_reduction=8192, stride=1, correlation_integral_range=None, epsilon_range=None, num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float16, backend=None, batch_size=None, forward_chunk_size=None, **model_kwargs)[source]
Like
measure_text_progressive()for several strings; batches log-probability extraction when supported.- Parameters:
texts (list[str])
model (str | LanguageModelWrapper)
tokenizer (object | None)
truncation_tokens (int | None)
skip_prefix_tokens (int)
measure_every_tokens (int | None)
context_length (int | None)
dim_reduction (int | None)
stride (int)
correlation_integral_range (Tuple[float, float] | None)
epsilon_range (Tuple[float, float] | None)
num_epsilon (int)
block_size (int)
show_progress (bool)
precision (torch.dtype)
backend (str | None)
batch_size (int | None)
forward_chunk_size (int | None)
- Return type: