Home
ShortKit-ML — Detect and mitigate shortcuts and biases in machine learning embedding spaces. 20+ detection and mitigation methods with a unified API. Multi-attribute support tests multiple sensitive attributes simultaneously. Model Comparison mode for benchmarking multiple embedding models.
What is Shortcut Detection?¶
Machine learning models often learn shortcuts - spurious correlations between input features and labels that don't generalize. For example, a medical imaging model might learn to predict disease based on hospital equipment watermarks rather than actual pathology.
This library helps you detect such shortcuts in your model's embedding space before they cause real-world harm.
Key Features¶
-
HBAC Clustering
Hierarchical Bias-Aware Clustering detects if embeddings cluster by protected attributes.
-
Probe-based Detection
Tests if sensitive group information can be recovered from embeddings using classifiers.
-
Statistical Testing
Identifies embedding dimensions with significant group differences using hypothesis tests.
-
Geometric Analysis
Finds bias directions and prototype subspaces in embedding geometry.
Quick Example¶
from shortcut_detect import ShortcutDetector
import numpy as np
# Load your embeddings and labels
embeddings = np.load("embeddings.npy") # (n_samples, embedding_dim)
labels = np.load("labels.npy") # (n_samples,)
# Detect shortcuts using all methods
detector = ShortcutDetector(methods=['hbac', 'probe', 'statistical', 'geometric'])
detector.fit(embeddings, labels)
# Generate comprehensive report
detector.generate_report("report.html", format="html")
print(detector.summary())
Output:
======================================================================
UNIFIED SHORTCUT DETECTION SUMMARY
======================================================================
HIGH RISK: Multiple methods detected shortcuts
- HBAC detected shortcuts (high confidence)
- Probe accuracy 94.2% (high risk)
- 3 group comparisons show significant differences
Interactive Dashboard¶
Launch the Gradio web interface for interactive analysis:

Features:
- Sample data: CheXpert medical imaging (2000 samples, 512-dim)
- Custom CSV upload
- PDF/HTML report export
- Interactive visualizations
Installation¶
Or install from source:
git clone https://github.com/criticaldata/ShortKit-ML.git
cd Shortcut_Detect
pip install -e ".[all]"
Detection Methods Overview¶
| Method | What it detects | Speed | GPU Required |
|---|---|---|---|
| HBAC | Clustering by protected attributes | Fast | No |
| Probe | Information leakage via classifiers | Medium | Optional |
| Statistical | Dimension-wise group differences | Fast | No |
| Geometric | Bias directions & subspaces | Fast | No |
| GradCAM | Attention overlap with shortcuts | Slow | Yes |
| SpRAy | Heatmap clustering (Clever Hans) | Medium | Optional |
Who Uses This?¶
This library is designed for:
- ML Researchers studying fairness and bias
- Healthcare AI Teams validating medical imaging models
- ML Engineers auditing production models
- Data Scientists exploring embedding quality
License¶
MIT License - see LICENSE for details.