Concepts
What CorrDim measures
Given a text and an autoregressive language model, CorrDim measures the text’s global structural complexity as perceived by that model.
At a high level:
repetitive or degenerate text tends to have a lower correlation dimension
ordinary fluent text tends to have a higher dimension
richer long-range structure can produce an even higher dimension
CorrDim is therefore best treated as a sequence-level geometric signal, not as a replacement for perplexity.
How the pipeline works
CorrDim typically follows four steps:
Convert text into a sequence of next-token log-probability vectors.
Optionally reduce the vocabulary dimension.
Compute a correlation-integral curve over a range of epsilon thresholds.
Fit the slope in log-log space to estimate the correlation dimension.
In Python, these stages map roughly to:
curve_from_text(...)orcurve_from_vectors(...)for curve constructionestimate_dimension_from_curve(...)for slope fittingmeasure_text(...)when you want both steps wrapped into one callmeasure_text_progressive(...)when you want fitted dimensions at subsampled prefix lengths after a single progressive curve pass
Backend model
CorrDim exposes multiple backends for correlation-integral computation:
triton: Triton kernelspytorch: pure PyTorch implementationpytorch_fast: PyTorch variant optimized for distance computationauto: resolve automatically, preferringtritonwhen available and otherwisepytorch
You can select the backend with an environment variable:
export CORRDIM_CORRINT_BACKEND=pytorch
Or in Python:
import corrdim
resolved = corrdim.set_corrint_backend("auto")
print("Using backend:", resolved)
print(corrdim.available_corrint_backends())
If you do not set anything, CorrDim defaults to triton.
API layers
The library is intentionally split into layers:
high-level API:
measure_text,measure_texts,measure_text_progressivecurve API:
curve_from_text,curve_from_texts,curve_from_vectorsprogressive API:
progressive_curve_from_text,progressive_curve_from_vectors; fitted dimensions along prefixes usemeasure_text_progressive→ProgressiveDimensionResultraw backend API:
correlation_counts,correlation_integral,progressive_correlation_integral
Use the highest layer that still gives you the outputs you need.