Shortcut Feature Masking (M01)¶
Shortcut Feature Masking is a data mitigation approach (Teso & Kersting 2019) that produces augmented training data by masking or inpainting detected shortcut regions. Using this data for retraining discourages models from relying on shortcuts.
Reference¶
Teso, S., & Kersting, K. (2019). Explanatory Interactive Machine Learning. AIES 2019. The counterexample strategy randomizes or masks wrongly identified components so the learner does not depend on them.
What It Does¶
- Images: Given images and shortcut regions (from GradCAM heatmaps or user-provided masks), produces new images with those regions zeroed, randomized, or inpainted.
- Embeddings: Given embeddings and flagged dimension indices (e.g. from HBAC or statistical detection), produces new embeddings with those dimensions zeroed or randomized.
Strategies¶
| Strategy | Images | Embeddings |
|---|---|---|
| zero | Set shortcut pixels to 0 | Set flagged dimensions to 0 |
| randomize | Replace shortcut pixels with random values | Shuffle values across samples for flagged dims |
| inpaint | Fill shortcut regions with mean (simple inpainting) | N/A (treated as zero) |
Requirements¶
- Image mode: NumPy, PIL. No extra dependencies.
- Embedding mode: NumPy only.
Basic Usage¶
Image mode¶
from shortcut_detect import ShortcutMasker
import numpy as np
masker = ShortcutMasker(strategy="randomize", heatmap_threshold=0.5, augment_fraction=1.0)
# images: (N, H, W) or (N, H, W, C), values in [0, 1]
# shortcut_masks: (N, H, W) binary, or pass heatmaps instead
augmented = masker.mask_images(images, shortcut_masks=masks)
# Or from GradCAM heatmaps:
augmented = masker.mask_images(images, heatmaps=heatmaps)
Embedding mode¶
masker = ShortcutMasker(strategy="zero")
# embeddings: (N, D), flagged_dim_indices: list of dimension indices
augmented = masker.mask_embeddings(embeddings, flagged_dim_indices=[0, 3, 7])
Dashboard¶
In the Advanced Analysis tab, use the Shortcut Feature Masking (M01) accordion:
- Image mode: Upload images and either mask images or a heatmap file (.npz/.npy). Choose strategy (zero / randomize / inpaint) and augment fraction. Download a zip of augmented images.
- Embedding mode: Upload an embeddings CSV and enter dimension indices to mask (e.g.
0,3,7). Download the augmented CSV.
Workflow¶
- Run detection (e.g. GradCAM, HBAC, Statistical) to identify shortcut regions or dimensions.
- Run Shortcut Masking with those regions/dimensions to produce augmented data.
- Use the augmented data (optionally mixed with original) for retraining.
Parameters¶
- strategy:
"zero"|"randomize"|"inpaint"(images);"zero"|"randomize"(embeddings). - heatmap_threshold: Binarization threshold for heatmaps (default 0.5).
- augment_fraction: Fraction of samples to augment, 0–1 (default 1.0).
- random_state: Seed for reproducible randomization.