low_level module

corrdim.low_level.clear_model_cache()[source]
Return type:

None

corrdim.low_level.curve_from_text(text, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float32, backend=None, forward_chunk_size=None, **model_kwargs)[source]
Parameters:
  • text (str)

  • model (str | LanguageModelWrapper)

  • tokenizer (object | None)

  • context_length (int | None)

  • dim_reduction (int | None)

  • stride (int)

  • epsilon_range (Tuple[float, float])

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • precision (torch.dtype)

  • backend (str | None)

  • forward_chunk_size (int | None)

Return type:

CurveResult

corrdim.low_level.curve_from_texts(texts, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float32, backend=None, batch_size=None, forward_chunk_size=None, **model_kwargs)[source]
Parameters:
  • texts (list[str])

  • model (str | LanguageModelWrapper)

  • tokenizer (object | None)

  • context_length (int | None)

  • dim_reduction (int | None)

  • stride (int)

  • epsilon_range (Tuple[float, float])

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • precision (torch.dtype)

  • backend (str | None)

  • batch_size (int | None)

  • forward_chunk_size (int | None)

Return type:

list[CurveResult]

corrdim.low_level.curve_from_vectors(vectors, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, backend=None)[source]
Parameters:
  • vectors (torch.Tensor)

  • epsilon_range (Tuple[float, float])

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • backend (str | None)

Return type:

CurveResult

corrdim.low_level.curve_from_vectors_batch(vectors_batch, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, backend=None)[source]
Parameters:
  • vectors_batch (torch.Tensor)

  • epsilon_range (Tuple[float, float])

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • backend (str | None)

Return type:

list[CurveResult]

corrdim.low_level.progressive_curve_from_text(text, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float32, backend=None, forward_chunk_size=None, **model_kwargs)[source]
Parameters:
  • text (str)

  • model (str | LanguageModelWrapper)

  • tokenizer (object | None)

  • context_length (int | None)

  • dim_reduction (int | None)

  • stride (int)

  • epsilon_range (Tuple[float, float])

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • precision (torch.dtype)

  • backend (str | None)

  • forward_chunk_size (int | None)

Return type:

ProgressiveCurveResult

corrdim.low_level.progressive_curve_from_texts(texts, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float32, backend=None, batch_size=None, forward_chunk_size=None, **model_kwargs)[source]
Parameters:
  • texts (list[str])

  • model (str | LanguageModelWrapper)

  • tokenizer (object | None)

  • context_length (int | None)

  • dim_reduction (int | None)

  • stride (int)

  • epsilon_range (Tuple[float, float])

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • precision (torch.dtype)

  • backend (str | None)

  • batch_size (int | None)

  • forward_chunk_size (int | None)

Return type:

list[ProgressiveCurveResult]

corrdim.low_level.progressive_curve_from_vectors(vectors, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, backend=None)[source]
Parameters:
  • vectors (torch.Tensor)

  • epsilon_range (Tuple[float, float])

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • backend (str | None)

Return type:

ProgressiveCurveResult

corrdim.low_level.progressive_curve_from_vectors_batch(vectors_batch, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, backend=None)[source]
Parameters:
  • vectors_batch (torch.Tensor)

  • epsilon_range (Tuple[float, float])

  • num_epsilon (int)

  • block_size (int)

  • show_progress (bool)

  • backend (str | None)

Return type:

list[ProgressiveCurveResult]

corrdim.low_level.text_to_vectors(text, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, show_progress=False, precision=torch.float32, forward_chunk_size=None, **model_kwargs)[source]

Extract log-probability vectors from text using model.

This is the public entry point for vector extraction; the returned tensor has shape (sampled_seq_len, reduced_vocab_size) and can be passed directly to curve_from_vectors() or progressive_curve_from_vectors().

Parameters:
  • text (str) – Input text.

  • model (str | LanguageModelWrapper) – HuggingFace model name/ID (str) or a pre-built LanguageModelWrapper instance.

  • tokenizer (object | None) – Tokenizer instance (only used when model is a string).

  • context_length (int | None) – Maximum context length for the model.

  • dim_reduction (int | None) – Vocabulary grouping size for dimensionality reduction.

  • stride (int) – Keep every stride-th token vector.

  • show_progress (bool) – Show a progress bar during inference.

  • precision (torch.dtype) – Output tensor dtype.

  • forward_chunk_size (int | None) – Number of tokens per forward-pass chunk. Reduce this value (e.g. 128) on systems with limited VRAM. Only effective when model is a string; for wrapper instances set the attribute directly.

  • **model_kwargs – Extra keyword arguments forwarded to the model loader when model is a string.

Return type:

torch.Tensor