CAV (Concept Activation Vectors)¶
Concept Activation Vectors (Kim et al., 2018) test whether model representations encode human-interpretable shortcut concepts.
This implementation focuses on precomputed activations and optional directional derivatives (for TCAV scoring), which fits the library's loader-driven XAI workflow.
What It Detects¶
- Whether a concept is linearly separable from a random baseline in activation space.
- Whether model sensitivity aligns with that concept direction (TCAV score).
Required Inputs¶
concept_sets:dict[str, np.ndarray]where each array is(n, d)random_set:np.ndarray(n, d)or list of arrays
Optional:
target_activations:np.ndarray(m, d)target_directional_derivatives:np.ndarray(m, d)
If directional derivatives are omitted, TCAV risk is reported as unknown.
Unified API Example¶
from shortcut_detect import ShortcutDetector
bundle = {
"concept_sets": {
"hospital_token": concept_arr, # (n,d)
"device_artifact": artifact_arr,
},
"random_set": random_arr, # (n,d)
"target_activations": target_acts, # optional
"target_directional_derivatives": target_dd, # optional
}
detector = ShortcutDetector(
methods=["cav"],
cav_quality_threshold=0.7,
cav_shortcut_threshold=0.6,
)
detector.fit_from_loaders({"cav": bundle})
result = detector.get_results()["cav"]
print(result["metrics"])
print(result["report"]["per_concept"])
Interpretation¶
- High concept quality (AUC) + high TCAV indicates likely shortcut reliance.
- A concept is flagged when both thresholds are exceeded.
- Risk levels:
high: very strong flagged TCAV signalmoderate: flagged concepts, weaker maximum signallow: tested with derivatives, no flagged conceptsunknown: derivatives not provided
Reference¶
Kim, Been, et al. "Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)." ICML 2018.