Home

ShortKit-ML Logo

ShortKit-ML — Detect and mitigate shortcuts and biases in machine learning embedding spaces. 20+ detection and mitigation methods with a unified API. Multi-attribute support tests multiple sensitive attributes simultaneously. Model Comparison mode for benchmarking multiple embedding models.

What is Shortcut Detection?¶

Machine learning models often learn shortcuts - spurious correlations between input features and labels that don't generalize. For example, a medical imaging model might learn to predict disease based on hospital equipment watermarks rather than actual pathology.

This library helps you detect such shortcuts in your model's embedding space before they cause real-world harm.

Key Features¶

HBAC Clustering

Hierarchical Bias-Aware Clustering detects if embeddings cluster by protected attributes.

Learn more
Probe-based Detection

Tests if sensitive group information can be recovered from embeddings using classifiers.

Learn more
Statistical Testing

Identifies embedding dimensions with significant group differences using hypothesis tests.

Learn more
Geometric Analysis

Finds bias directions and prototype subspaces in embedding geometry.

Learn more

Quick Example¶

from shortcut_detect import ShortcutDetector
import numpy as np

# Load your embeddings and labels
embeddings = np.load("embeddings.npy")  # (n_samples, embedding_dim)
labels = np.load("labels.npy")          # (n_samples,)

# Detect shortcuts using all methods
detector = ShortcutDetector(methods=['hbac', 'probe', 'statistical', 'geometric'])
detector.fit(embeddings, labels)

# Generate comprehensive report
detector.generate_report("report.html", format="html")
print(detector.summary())

Output:

======================================================================
UNIFIED SHORTCUT DETECTION SUMMARY
======================================================================
HIGH RISK: Multiple methods detected shortcuts
  - HBAC detected shortcuts (high confidence)
  - Probe accuracy 94.2% (high risk)
  - 3 group comparisons show significant differences

Interactive Dashboard¶

Launch the Gradio web interface for interactive analysis:

python app.py

Dashboard Preview

Features:

Sample data: CheXpert medical imaging (2000 samples, 512-dim)
Custom CSV upload
PDF/HTML report export
Interactive visualizations

Installation¶

pip install shortcut-detect

Or install from source:

git clone https://github.com/criticaldata/ShortKit-ML.git
cd Shortcut_Detect
pip install -e ".[all]"

Full installation guide

Detection Methods Overview¶

Method	What it detects	Speed	GPU Required
HBAC	Clustering by protected attributes	Fast	No
Probe	Information leakage via classifiers	Medium	Optional
Statistical	Dimension-wise group differences	Fast	No
Geometric	Bias directions & subspaces	Fast	No
GradCAM	Attention overlap with shortcuts	Slow	Yes
SpRAy	Heatmap clustering (Clever Hans)	Medium	Optional

Who Uses This?¶

This library is designed for:

ML Researchers studying fairness and bias
Healthcare AI Teams validating medical imaging models
ML Engineers auditing production models
Data Scientists exploring embedding quality

License¶

MIT License - see LICENSE for details.