high_level module

corrdim.high_level.measure_text(text, model, tokenizer=None, truncation_tokens=None, context_length=None, dim_reduction=8192, stride=1, correlation_integral_range=None, epsilon_range=None, num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float16, backend=None, forward_chunk_size=None, **model_kwargs)[source]
Parameters:
  • text (str)

  • model (str | LanguageModelWrapper)

  • tokenizer (object | None)

  • truncation_tokens (int | None)

  • context_length (int | None)

  • dim_reduction (int | None)

  • stride (int)

  • correlation_integral_range (Tuple[float, float] | None)

  • epsilon_range (Tuple[float, float] | None)

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • precision (torch.dtype)

  • backend (str | None)

  • forward_chunk_size (int | None)

Return type:

DimensionResult

corrdim.high_level.measure_text_progressive(text, model, tokenizer=None, truncation_tokens=None, skip_prefix_tokens=100, measure_every_tokens=None, context_length=None, dim_reduction=8192, stride=1, correlation_integral_range=None, epsilon_range=None, num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float16, backend=None, forward_chunk_size=None, **model_kwargs)[source]

Compute progressive curves once, then fit correlation dimension at sampled prefixes.

For each index i in range(skip_prefix_tokens, sequence_length, step), uses row corrints_progressive[i] with the shared epsilons grid. Results are in by_prefix (iDimensionResult). If measure_every_tokens is None, step is chosen from sequence_length: < 1001, < 100010, otherwise 100. Other arguments follow measure_text() / progressive_curve_from_text().

Parameters:
  • text (str)

  • model (str | LanguageModelWrapper)

  • tokenizer (object | None)

  • truncation_tokens (int | None)

  • skip_prefix_tokens (int)

  • measure_every_tokens (int | None)

  • context_length (int | None)

  • dim_reduction (int | None)

  • stride (int)

  • correlation_integral_range (Tuple[float, float] | None)

  • epsilon_range (Tuple[float, float] | None)

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • precision (torch.dtype)

  • backend (str | None)

  • forward_chunk_size (int | None)

Return type:

ProgressiveDimensionResult

corrdim.high_level.measure_texts(texts, model, tokenizer=None, truncation_tokens=None, context_length=None, dim_reduction=8192, stride=1, correlation_integral_range=None, epsilon_range=None, num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float16, backend=None, batch_size=None, forward_chunk_size=None, **model_kwargs)[source]
Parameters:
  • texts (list[str])

  • model (str | LanguageModelWrapper)

  • tokenizer (object | None)

  • truncation_tokens (int | None)

  • context_length (int | None)

  • dim_reduction (int | None)

  • stride (int)

  • correlation_integral_range (Tuple[float, float] | None)

  • epsilon_range (Tuple[float, float] | None)

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • precision (torch.dtype)

  • backend (str | None)

  • batch_size (int | None)

  • forward_chunk_size (int | None)

Return type:

list[DimensionResult]

corrdim.high_level.measure_texts_progressive(texts, model, tokenizer=None, truncation_tokens=None, skip_prefix_tokens=100, measure_every_tokens=None, context_length=None, dim_reduction=8192, stride=1, correlation_integral_range=None, epsilon_range=None, num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float16, backend=None, batch_size=None, forward_chunk_size=None, **model_kwargs)[source]

Like measure_text_progressive() for several strings; batches log-probability extraction when supported.

Parameters:
  • texts (list[str])

  • model (str | LanguageModelWrapper)

  • tokenizer (object | None)

  • truncation_tokens (int | None)

  • skip_prefix_tokens (int)

  • measure_every_tokens (int | None)

  • context_length (int | None)

  • dim_reduction (int | None)

  • stride (int)

  • correlation_integral_range (Tuple[float, float] | None)

  • epsilon_range (Tuple[float, float] | None)

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • precision (torch.dtype)

  • backend (str | None)

  • batch_size (int | None)

  • forward_chunk_size (int | None)

Return type:

list[ProgressiveDimensionResult]