Quick start
Text to correlation dimension
This is the highest-level API:
import torch
import corrdim
result = corrdim.measure_text(
"Your text here...",
model="Qwen/Qwen2.5-1.5B",
precision=torch.float16,
)
print("corrdim:", result.corrdim)
print("fit_r2:", result.fit_r2)
print("linear_region_bounds:", result.linear_region_bounds)
Get the curve first
If you want the full correlation-integral curve before fitting:
import torch
import corrdim
curve = corrdim.curve_from_text(
"Your text here...",
model="Qwen/Qwen2.5-1.5B",
precision=torch.float16,
)
result = corrdim.estimate_dimension_from_curve(curve)
print(curve.sequence_length)
print(curve.epsilons.shape)
print(curve.corrints.shape)
print(result.corrdim)
Start from vectors
If you already have a sequence of log-probability vectors:
import torch
import corrdim
vectors = torch.randn(200, 512)
curve = corrdim.curve_from_vectors(vectors, num_epsilon=256)
result = corrdim.estimate_dimension_from_curve(curve)
Progressive analysis
To study how the curve evolves over prefixes of the sequence:
import torch
import corrdim
progressive = corrdim.progressive_curve_from_text(
"Your text here...",
model="Qwen/Qwen2.5-1.5B",
precision=torch.float16,
)
print(progressive.sequence_length)
print(progressive.epsilons.shape)
print(progressive.corrints_progressive.shape)
From progressive curve to many fitted dimensions
If you want scalar correlation dimensions at several prefix lengths (not only the integral curves), use measure_text_progressive. It builds the progressive curve once, then runs estimate_dimension_from_curve at sampled indices below sequence_length. If measure_every_tokens is omitted (None), the stride is chosen from length: under 100 → 1, under 1000 → 10, otherwise 100.
The return value is a ProgressiveDimensionResult: by_prefix maps prefix index to DimensionResult (same fields as measure_text); corrdims maps index to the fitted scalar only.
import torch
import corrdim
prog_dims = corrdim.measure_text_progressive(
"Your long text here...",
model="Qwen/Qwen2.5-1.5B",
precision=torch.float16,
skip_prefix_tokens=100,
)
for i, d in sorted(prog_dims.corrdims.items()):
print(i, d)
Which function should I use?
use
measure_text/measure_textsfor the final dimension directlyuse
measure_text_progressivewhen you want fitted dimensions at many prefixes in one passuse
curve_from_text/curve_from_vectorswhen you want the raw curveuse
progressive_curve_from_text/progressive_curve_from_vectorsfor prefix-wise analysisuse
correlation_integralwhen you already have tensors and want raw backend outputs