Geometric Analyzer API¶
The geometric module analyzes embedding geometry to detect shortcuts.
Class Reference¶
GeometricShortcutAnalyzer
¶
GeometricShortcutAnalyzer(
n_components: int = 5,
min_group_size: int = 20,
effect_threshold: float = 0.8,
subspace_cosine_threshold: float = 0.85,
)
Bases: DetectorBase
Analyze shortcut risk using bias directions and prototype subspaces.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_components
|
int
|
Number of principal directions per group. |
5
|
min_group_size
|
int
|
Minimum samples per group required for analysis. |
20
|
effect_threshold
|
float
|
Threshold on standardized projection gap for high risk. |
0.8
|
subspace_cosine_threshold
|
float
|
Threshold on mean subspace cosine for overlap risk. |
0.85
|
Source code in shortcut_detect/geometric/geometric/src/detector.py
Functions¶
fit
¶
Fit analyzer on embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embeddings
|
ndarray
|
(n_samples, embedding_dim) array. |
required |
group_labels
|
Sequence
|
Sequence of group identifiers aligned with embeddings. |
required |
Source code in shortcut_detect/geometric/geometric/src/detector.py
GeometricShortcutAnalyzer¶
Constructor¶
GeometricShortcutAnalyzer(
n_components: int = 5,
normalize: bool = True,
random_state: int = None
)
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
n_components |
int | 5 | PCA components per group |
normalize |
bool | True | Normalize embeddings |
random_state |
int | None | Random seed |
Methods¶
fit()¶
Analyze geometric structure of embeddings.
Parameters:
| Parameter | Type | Description |
|---|---|---|
embeddings |
ndarray | Shape (n_samples, n_features) |
group_labels |
ndarray | Shape (n_samples,) |
Returns: self
transform()¶
Project embeddings onto bias direction.
debias()¶
Remove bias direction from embeddings.
Attributes (after fit)¶
| Attribute | Type | Description |
|---|---|---|
bias_direction_ |
ndarray | Unit vector between group centroids |
bias_effect_size_ |
float | Cohen's d along bias direction |
subspace_overlap_ |
float | Principal angle overlap (0-1) |
group_centroids_ |
dict | Centroid per group |
group_pca_ |
dict | PCA model per group |
projections_ |
dict | Projections per group |
summary_ |
str | Human-readable summary |
Usage Examples¶
Basic Usage¶
from shortcut_detect import GeometricShortcutAnalyzer
analyzer = GeometricShortcutAnalyzer(n_components=5)
analyzer.fit(embeddings, group_labels)
print(analyzer.summary_)
print(f"Effect size: {analyzer.bias_effect_size_:.2f}")
print(f"Subspace overlap: {analyzer.subspace_overlap_:.2f}")
Debiasing¶
analyzer.fit(embeddings, group_labels)
# Remove bias direction
embeddings_debiased = analyzer.debias(embeddings)
# Verify debiasing
analyzer_after = GeometricShortcutAnalyzer()
analyzer_after.fit(embeddings_debiased, group_labels)
print(f"Effect size after: {analyzer_after.bias_effect_size_:.2f}")
Visualization¶
import matplotlib.pyplot as plt
# Project onto bias direction
projections = analyzer.transform(embeddings)
fig, ax = plt.subplots(figsize=(10, 4))
for group in np.unique(group_labels):
mask = group_labels == group
ax.hist(projections[mask], bins=50, alpha=0.5, label=f'Group {group}')
ax.legend()
ax.set_xlabel('Bias Direction Projection')
plt.savefig('bias_projections.png')