Frequency Detector¶

Embedding-space frequency shortcut detector detects whether class signals are overly concentrated in a small set of embedding dimensions. It does not localize input-domain frequency artifacts (e.g., image Fourier bands). Instead, it identifies representational shortcut signatures from class-separable dimensions.

How It Works¶

Train a probe (logistic regression by default) on embeddings vs. task labels
Predict labels using the probe (on training data or a holdout split)
Compute per-class TPR/FPR from the probe predictions
Flag shortcut classes where TPR >= threshold AND FPR <= threshold
Extract top-k dimensions from the probe coefficient magnitudes
Assess risk based on the fraction of flagged classes

graph TD
    A[Embeddings + Labels] --> B[Train Logistic Probe]
    B --> C[Predict Labels]
    C --> D[Compute Per-Class TPR/FPR]
    D --> E{TPR >= threshold AND FPR <= threshold?}
    E -->|Yes| F[Flag as Shortcut Class]
    E -->|No| G[Not a Shortcut Class]
    F --> H[Extract Top-k Dimensions]
    H --> I[Assess Risk Level]
    G --> I

Basic Usage¶

from shortcut_detect import FrequencyDetector

detector = FrequencyDetector(
    top_percent=0.05,
    tpr_threshold=0.5,
    fpr_threshold=0.15,
    probe_evaluation="holdout",
    probe_holdout_frac=0.2,
)

detector.fit(embeddings, labels)
report = detector.get_report()

print(f"Shortcut detected: {report['shortcut_detected']}")
print(f"Risk level: {report['risk_level']}")
print(f"Shortcut classes: {report['report']['shortcut_classes']}")

Parameters¶

Parameter	Type	Default	Description
`top_percent`	float	0.05	Fraction of top embedding dimensions to examine
`tpr_threshold`	float	0.5	Per-class true-positive-rate threshold for shortcut flagging
`fpr_threshold`	float	0.15	Per-class false-positive-rate threshold for shortcut flagging
`probe_estimator`	BaseEstimator	None	Optional sklearn classifier (default: LogisticRegression)
`probe_evaluation`	str	"train"	Evaluation mode: "train" or "holdout"
`probe_holdout_frac`	float	0.2	Holdout fraction when `probe_evaluation="holdout"`
`random_state`	int	42	Random seed for reproducibility

Outputs¶

Report Structure¶

Field	Type	Description
`shortcut_detected`	bool	Whether any class was flagged
`risk_level`	str	"low", "moderate", or "high"
`metrics.probe_accuracy`	float	Overall probe accuracy
`metrics.n_shortcut_classes`	int	Number of flagged classes
`metrics.n_classes`	int	Total number of classes
`report.shortcut_classes`	list	Class labels flagged as shortcuts
`report.class_rates`	dict	Per-class TPR, FPR, and support
`report.top_dims_by_class`	dict	Top-k important dimensions per class
`report.confusion_matrix`	list	Confusion matrix from probe predictions

Interpretation¶

Risk Level	Condition
Low	No classes flagged as shortcuts
Moderate	Less than half of classes flagged
High	Half or more of classes flagged

Example with Synthetic Data¶

from shortcut_detect import FrequencyDetector
from shortcut_detect.datasets import generate_linear_shortcut

# Generate data with known shortcut
X, y = generate_linear_shortcut(
    n_samples=400,
    embedding_dim=24,
    shortcut_dims=4,
)

# Detect shortcut
detector = FrequencyDetector(
    top_percent=0.1,
    probe_evaluation="holdout",
    probe_holdout_frac=0.2,
)
detector.fit(X, y)

report = detector.get_report()
print(f"Shortcut detected: {report['shortcut_detected']}")
print(f"Risk level: {report['risk_level']}")
print(f"Shortcut classes: {report['report']['shortcut_classes']}")
print(f"Probe accuracy: {report['metrics']['probe_accuracy']:.2%}")

Unified API Integration¶

from shortcut_detect import ShortcutDetector

detector = ShortcutDetector(
    methods=["frequency"],
    seed=42,
    freq_top_percent=0.05,
    freq_tpr_threshold=0.5,
    freq_fpr_threshold=0.15,
    freq_probe_evaluation="holdout",
    freq_probe_holdout_frac=0.2,
)
detector.fit(embeddings, labels)
print(detector.summary())

When to Use¶

Use Frequency Detector when:

You have only embeddings (no model access required)
You want to check if class signal concentrates in few dimensions
You need a fast, lightweight shortcut check
You want to identify which embedding dimensions carry class-separable shortcuts

Don't use Frequency Detector when:

You need to detect input-domain frequency artifacts (e.g., Fourier bands in images)
Your embeddings have very few dimensions (< 10)
You have very few samples (< 10)

Theory¶

The detector is based on the observation that shortcut-reliant models tend to encode class information in a small subset of embedding dimensions. A logistic regression probe is trained to predict class labels from embeddings, and the resulting coefficient magnitudes indicate which dimensions carry the most class-separable information.

Per-class TPR and FPR:

$$\text{TPR}_c = \frac{TP_c}{TP_c + FN_c}, \quad \text{FPR}_c = \frac{FP_c}{FP_c + TN_c}$$

A class $c$ is flagged as a shortcut class when $\text{TPR}c \ge \tau}}$ and $\text{FPRc \le \tau$, indicating the probe can reliably identify and isolate this class from few dimensions.}

Top-k dimensions are extracted from the probe's coefficient matrix:

$$\text{top-k}_c = \text{argsort}(|w_c|)[-k:]$$

where $w_c$ is the coefficient vector for class $c$ and $k = \lceil \text{top_percent} \times d \rceil$.