Causal Effect Regularization¶

Causal Effect Regularization (Kumar et al., 2023) estimates the causal effect of each attribute on the task label. Attributes with near-zero estimated effect are flagged as spurious shortcuts.

What It Detects¶

Per-attribute causal effect on the task label.
Attributes with low causal effect (spurious) that the classifier should not rely on.

Required Inputs¶

embeddings: np.ndarray (n, d) representation space
labels: np.ndarray (n,) task labels
attributes: dict[str, np.ndarray] - attribute name -> (n,) values per sample (binary or categorical)

Optional:

counterfactual_pairs: For interventional data (Phase 2). Not used in current Direct estimator.

Unified API Example¶

from shortcut_detect import ShortcutDetector

loader = {
    "embeddings": emb,
    "labels": labels,
    "attributes": {
        "race": race_labels,
        "color": color_labels,
    },
}

detector = ShortcutDetector(
    methods=["causal_effect"],
    causal_effect_spurious_threshold=0.1,
)
detector.fit_from_loaders({"causal_effect": loader})

result = detector.get_results()["causal_effect"]
print(result["metrics"])
print(result["report"]["per_attribute"])

Interpretation¶

Attributes with |TE_a| < threshold are flagged as spurious (shortcuts).
Higher estimated causal effect indicates the attribute may be task-relevant.
Risk levels:
high: multiple spurious attributes
moderate: one spurious attribute
low: no spurious attributes

Reference¶

Kumar, Abhinav, Amit Deshpande, and Amit Sharma. "Causal Effect Regularization: Automated Detection and Removal of Spurious Attributes." arXiv:2306.11072 (2023).