Skip to content

Home

ShortKit-ML Logo

ShortKit-ML — Detect and mitigate shortcuts and biases in machine learning embedding spaces. 20+ detection and mitigation methods with a unified API. Multi-attribute support tests multiple sensitive attributes simultaneously. Model Comparison mode for benchmarking multiple embedding models.

PyPI version Python 3.10+ PyTorch CI Dataset on HF Docs


What is Shortcut Detection?

Machine learning models often learn shortcuts - spurious correlations between input features and labels that don't generalize. For example, a medical imaging model might learn to predict disease based on hospital equipment watermarks rather than actual pathology.

This library helps you detect such shortcuts in your model's embedding space before they cause real-world harm.

Key Features

  • 🧬 HBAC Clustering


    Hierarchical Bias-Aware Clustering detects if embeddings cluster by protected attributes.

    Learn more

  • 🔬 Probe-based Detection


    Tests if sensitive group information can be recovered from embeddings using classifiers.

    Learn more

  • 📊 Statistical Testing


    Identifies embedding dimensions with significant group differences using hypothesis tests.

    Learn more

  • 🧭 Geometric Analysis


    Finds bias directions and prototype subspaces in embedding geometry.

    Learn more

Quick Example

from shortcut_detect import ShortcutDetector
import numpy as np

# Load your embeddings and labels
embeddings = np.load("embeddings.npy")  # (n_samples, embedding_dim)
labels = np.load("labels.npy")          # (n_samples,)

# Detect shortcuts using all methods
detector = ShortcutDetector(methods=['hbac', 'probe', 'statistical', 'geometric'])
detector.fit(embeddings, labels)

# Generate comprehensive report
detector.generate_report("report.html", format="html")
print(detector.summary())

Output:

======================================================================
UNIFIED SHORTCUT DETECTION SUMMARY
======================================================================
HIGH RISK: Multiple methods detected shortcuts
  - HBAC detected shortcuts (high confidence)
  - Probe accuracy 94.2% (high risk)
  - 3 group comparisons show significant differences

Interactive Dashboard

Launch the Gradio web interface for interactive analysis:

python app.py

Dashboard Preview

Features:

  • Sample data: CheXpert medical imaging (2000 samples, 512-dim)
  • Custom CSV upload
  • PDF/HTML report export
  • Interactive visualizations

Installation

pip install shortcut-detect

Or install from source:

git clone https://github.com/criticaldata/ShortKit-ML.git
cd Shortcut_Detect
pip install -e ".[all]"

Full installation guide

Detection Methods Overview

Method What it detects Speed GPU Required
HBAC Clustering by protected attributes Fast No
Probe Information leakage via classifiers Medium Optional
Statistical Dimension-wise group differences Fast No
Geometric Bias directions & subspaces Fast No
GradCAM Attention overlap with shortcuts Slow Yes
SpRAy Heatmap clustering (Clever Hans) Medium Optional

Who Uses This?

This library is designed for:

  • ML Researchers studying fairness and bias
  • Healthcare AI Teams validating medical imaging models
  • ML Engineers auditing production models
  • Data Scientists exploring embedding quality

License

MIT License - see LICENSE for details.