low_level module
- corrdim.low_level.curve_from_text(text, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float32, backend=None, forward_chunk_size=None, **model_kwargs)[source]
- Parameters:
text (str)
model (str | LanguageModelWrapper)
tokenizer (object | None)
context_length (int | None)
dim_reduction (int | None)
stride (int)
epsilon_range (Tuple[float, float])
num_epsilon (int)
block_size (int)
show_progress (bool)
precision (torch.dtype)
backend (str | None)
forward_chunk_size (int | None)
- Return type:
- corrdim.low_level.curve_from_texts(texts, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float32, backend=None, batch_size=None, forward_chunk_size=None, **model_kwargs)[source]
- Parameters:
texts (list[str])
model (str | LanguageModelWrapper)
tokenizer (object | None)
context_length (int | None)
dim_reduction (int | None)
stride (int)
epsilon_range (Tuple[float, float])
num_epsilon (int)
block_size (int)
show_progress (bool)
precision (torch.dtype)
backend (str | None)
batch_size (int | None)
forward_chunk_size (int | None)
- Return type:
list[CurveResult]
- corrdim.low_level.curve_from_vectors(vectors, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, backend=None)[source]
- Parameters:
vectors (torch.Tensor)
epsilon_range (Tuple[float, float])
num_epsilon (int)
block_size (int)
show_progress (bool)
backend (str | None)
- Return type:
- corrdim.low_level.curve_from_vectors_batch(vectors_batch, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, backend=None)[source]
- Parameters:
vectors_batch (torch.Tensor)
epsilon_range (Tuple[float, float])
num_epsilon (int)
block_size (int)
show_progress (bool)
backend (str | None)
- Return type:
list[CurveResult]
- corrdim.low_level.progressive_curve_from_text(text, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float32, backend=None, forward_chunk_size=None, **model_kwargs)[source]
- Parameters:
text (str)
model (str | LanguageModelWrapper)
tokenizer (object | None)
context_length (int | None)
dim_reduction (int | None)
stride (int)
epsilon_range (Tuple[float, float])
num_epsilon (int)
block_size (int)
show_progress (bool)
precision (torch.dtype)
backend (str | None)
forward_chunk_size (int | None)
- Return type:
- corrdim.low_level.progressive_curve_from_texts(texts, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, precision=torch.float32, backend=None, batch_size=None, forward_chunk_size=None, **model_kwargs)[source]
- Parameters:
texts (list[str])
model (str | LanguageModelWrapper)
tokenizer (object | None)
context_length (int | None)
dim_reduction (int | None)
stride (int)
epsilon_range (Tuple[float, float])
num_epsilon (int)
block_size (int)
show_progress (bool)
precision (torch.dtype)
backend (str | None)
batch_size (int | None)
forward_chunk_size (int | None)
- Return type:
list[ProgressiveCurveResult]
- corrdim.low_level.progressive_curve_from_vectors(vectors, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, backend=None)[source]
- Parameters:
vectors (torch.Tensor)
epsilon_range (Tuple[float, float])
num_epsilon (int)
block_size (int)
show_progress (bool)
backend (str | None)
- Return type:
- corrdim.low_level.progressive_curve_from_vectors_batch(vectors_batch, epsilon_range=(1e-20, 1e+20), num_epsilon=1024, block_size=512, show_progress=False, backend=None)[source]
- Parameters:
vectors_batch (torch.Tensor)
epsilon_range (Tuple[float, float])
num_epsilon (int)
block_size (int)
show_progress (bool)
backend (str | None)
- Return type:
list[ProgressiveCurveResult]
- corrdim.low_level.text_to_vectors(text, model, tokenizer=None, context_length=None, dim_reduction=8192, stride=1, show_progress=False, precision=torch.float32, forward_chunk_size=None, **model_kwargs)[source]
Extract log-probability vectors from text using model.
This is the public entry point for vector extraction; the returned tensor has shape
(sampled_seq_len, reduced_vocab_size)and can be passed directly tocurve_from_vectors()orprogressive_curve_from_vectors().- Parameters:
text (str) – Input text.
model (str | LanguageModelWrapper) – HuggingFace model name/ID (
str) or a pre-builtLanguageModelWrapperinstance.tokenizer (object | None) – Tokenizer instance (only used when model is a string).
context_length (int | None) – Maximum context length for the model.
dim_reduction (int | None) – Vocabulary grouping size for dimensionality reduction.
stride (int) – Keep every stride-th token vector.
show_progress (bool) – Show a progress bar during inference.
precision (torch.dtype) – Output tensor dtype.
forward_chunk_size (int | None) – Number of tokens per forward-pass chunk. Reduce this value (e.g. 128) on systems with limited VRAM. Only effective when model is a string; for wrapper instances set the attribute directly.
**model_kwargs – Extra keyword arguments forwarded to the model loader when model is a string.
- Return type:
torch.Tensor