Frequency Detector¶
Embedding-space frequency shortcut detector detects whether class signals are overly concentrated in a small set of embedding dimensions. It does not localize input-domain frequency artifacts (e.g., image Fourier bands). Instead, it identifies representational shortcut signatures from class-separable dimensions.
How It Works¶
- Train a probe (logistic regression by default) on embeddings vs. task labels
- Predict labels using the probe (on training data or a holdout split)
- Compute per-class TPR/FPR from the probe predictions
- Flag shortcut classes where TPR >= threshold AND FPR <= threshold
- Extract top-k dimensions from the probe coefficient magnitudes
- Assess risk based on the fraction of flagged classes
graph TD
A[Embeddings + Labels] --> B[Train Logistic Probe]
B --> C[Predict Labels]
C --> D[Compute Per-Class TPR/FPR]
D --> E{TPR >= threshold AND FPR <= threshold?}
E -->|Yes| F[Flag as Shortcut Class]
E -->|No| G[Not a Shortcut Class]
F --> H[Extract Top-k Dimensions]
H --> I[Assess Risk Level]
G --> I
Basic Usage¶
from shortcut_detect import FrequencyDetector
detector = FrequencyDetector(
top_percent=0.05,
tpr_threshold=0.5,
fpr_threshold=0.15,
probe_evaluation="holdout",
probe_holdout_frac=0.2,
)
detector.fit(embeddings, labels)
report = detector.get_report()
print(f"Shortcut detected: {report['shortcut_detected']}")
print(f"Risk level: {report['risk_level']}")
print(f"Shortcut classes: {report['report']['shortcut_classes']}")
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
top_percent |
float | 0.05 | Fraction of top embedding dimensions to examine |
tpr_threshold |
float | 0.5 | Per-class true-positive-rate threshold for shortcut flagging |
fpr_threshold |
float | 0.15 | Per-class false-positive-rate threshold for shortcut flagging |
probe_estimator |
BaseEstimator | None | Optional sklearn classifier (default: LogisticRegression) |
probe_evaluation |
str | "train" | Evaluation mode: "train" or "holdout" |
probe_holdout_frac |
float | 0.2 | Holdout fraction when probe_evaluation="holdout" |
random_state |
int | 42 | Random seed for reproducibility |
Outputs¶
Report Structure¶
| Field | Type | Description |
|---|---|---|
shortcut_detected |
bool | Whether any class was flagged |
risk_level |
str | "low", "moderate", or "high" |
metrics.probe_accuracy |
float | Overall probe accuracy |
metrics.n_shortcut_classes |
int | Number of flagged classes |
metrics.n_classes |
int | Total number of classes |
report.shortcut_classes |
list | Class labels flagged as shortcuts |
report.class_rates |
dict | Per-class TPR, FPR, and support |
report.top_dims_by_class |
dict | Top-k important dimensions per class |
report.confusion_matrix |
list | Confusion matrix from probe predictions |
Interpretation¶
| Risk Level | Condition |
|---|---|
| Low | No classes flagged as shortcuts |
| Moderate | Less than half of classes flagged |
| High | Half or more of classes flagged |
Example with Synthetic Data¶
from shortcut_detect import FrequencyDetector
from shortcut_detect.datasets import generate_linear_shortcut
# Generate data with known shortcut
X, y = generate_linear_shortcut(
n_samples=400,
embedding_dim=24,
shortcut_dims=4,
)
# Detect shortcut
detector = FrequencyDetector(
top_percent=0.1,
probe_evaluation="holdout",
probe_holdout_frac=0.2,
)
detector.fit(X, y)
report = detector.get_report()
print(f"Shortcut detected: {report['shortcut_detected']}")
print(f"Risk level: {report['risk_level']}")
print(f"Shortcut classes: {report['report']['shortcut_classes']}")
print(f"Probe accuracy: {report['metrics']['probe_accuracy']:.2%}")
Unified API Integration¶
from shortcut_detect import ShortcutDetector
detector = ShortcutDetector(
methods=["frequency"],
seed=42,
freq_top_percent=0.05,
freq_tpr_threshold=0.5,
freq_fpr_threshold=0.15,
freq_probe_evaluation="holdout",
freq_probe_holdout_frac=0.2,
)
detector.fit(embeddings, labels)
print(detector.summary())
When to Use¶
Use Frequency Detector when:
- You have only embeddings (no model access required)
- You want to check if class signal concentrates in few dimensions
- You need a fast, lightweight shortcut check
- You want to identify which embedding dimensions carry class-separable shortcuts
Don't use Frequency Detector when:
- You need to detect input-domain frequency artifacts (e.g., Fourier bands in images)
- Your embeddings have very few dimensions (< 10)
- You have very few samples (< 10)
Theory¶
The detector is based on the observation that shortcut-reliant models tend to encode class information in a small subset of embedding dimensions. A logistic regression probe is trained to predict class labels from embeddings, and the resulting coefficient magnitudes indicate which dimensions carry the most class-separable information.
Per-class TPR and FPR:
$$\text{TPR}_c = \frac{TP_c}{TP_c + FN_c}, \quad \text{FPR}_c = \frac{FP_c}{FP_c + TN_c}$$
A class $c$ is flagged as a shortcut class when $\text{TPR}c \ge \tau}}$ and $\text{FPRc \le \tau$, indicating the probe can reliably identify and isolate this class from few dimensions.}
Top-k dimensions are extracted from the probe's coefficient matrix:
$$\text{top-k}_c = \text{argsort}(|w_c|)[-k:]$$
where $w_c$ is the coefficient vector for class $c$ and $k = \lceil \text{top_percent} \times d \rceil$.
See Also¶
- Probe-based Detection - Complementary classifier approach
- API Reference - Full API documentation
- Overview - Compare all methods