Geometric Analyzer API¶

The geometric module analyzes embedding geometry to detect shortcuts.

Class Reference¶

GeometricShortcutAnalyzer ¶

GeometricShortcutAnalyzer(
    n_components: int = 5,
    min_group_size: int = 20,
    effect_threshold: float = 0.8,
    subspace_cosine_threshold: float = 0.85,
)

Bases: DetectorBase

Analyze shortcut risk using bias directions and prototype subspaces.

Parameters:

Name	Type	Description	Default
`n_components`	`int`	Number of principal directions per group.	`5`
`min_group_size`	`int`	Minimum samples per group required for analysis.	`20`
`effect_threshold`	`float`	Threshold on standardized projection gap for high risk.	`0.8`
`subspace_cosine_threshold`	`float`	Threshold on mean subspace cosine for overlap risk.	`0.85`

Source code in shortcut_detect/geometric/geometric/src/detector.py

def __init__(
    self,
    n_components: int = 5,
    min_group_size: int = 20,
    effect_threshold: float = 0.8,
    subspace_cosine_threshold: float = 0.85,
):
    super().__init__(method="geometric")

    self.n_components = n_components
    self.min_group_size = min_group_size
    self.effect_threshold = effect_threshold
    self.subspace_cosine_threshold = subspace_cosine_threshold

    self.group_stats_: dict[str, dict[str, np.ndarray]] = {}
    self.bias_pairs_: list[BiasPairStats] = []
    self.subspace_pairs_: list[SubspacePairStats] = []
    self.summary_: dict[str, str] = {}

Functions¶

fit ¶

fit(
    embeddings: ndarray, group_labels: Sequence
) -> GeometricShortcutAnalyzer

Fit analyzer on embeddings.

Parameters:

Name	Type	Description	Default
`embeddings`	`ndarray`	(n_samples, embedding_dim) array.	required
`group_labels`	`Sequence`	Sequence of group identifiers aligned with embeddings.	required

Source code in shortcut_detect/geometric/geometric/src/detector.py

def fit(self, embeddings: np.ndarray, group_labels: Sequence) -> GeometricShortcutAnalyzer:
    """
    Fit analyzer on embeddings.

    Args:
        embeddings: (n_samples, embedding_dim) array.
        group_labels: Sequence of group identifiers aligned with embeddings.
    """
    X = np.asarray(embeddings)
    if X.ndim != 2:
        raise ValueError("embeddings must be 2D (n_samples, embedding_dim)")

    labels = np.asarray(group_labels)
    if labels.shape[0] != X.shape[0]:
        raise ValueError("embeddings and group_labels must have the same length")

    unique_groups = np.unique(labels)
    if unique_groups.shape[0] < 2:
        raise ValueError("At least two groups are required for geometric analysis")

    self.group_stats_ = self._compute_group_stats(X, labels, unique_groups)
    self.bias_pairs_ = self._compute_bias_pairs(X, labels)
    self.subspace_pairs_ = self._compute_subspace_pairs()
    self.summary_ = self._assess_risk()
    self._finalize_results()
    self._is_fitted = True
    return self

GeometricShortcutAnalyzer¶

Constructor¶

GeometricShortcutAnalyzer(
    n_components: int = 5,
    normalize: bool = True,
    random_state: int = None
)

Parameters¶

Parameter	Type	Default	Description
`n_components`	int	5	PCA components per group
`normalize`	bool	True	Normalize embeddings
`random_state`	int	None	Random seed

Methods¶

fit()¶

def fit(
    embeddings: np.ndarray,
    group_labels: np.ndarray
) -> GeometricShortcutAnalyzer

Analyze geometric structure of embeddings.

Parameters:

Parameter	Type	Description
`embeddings`	ndarray	Shape (n_samples, n_features)
`group_labels`	ndarray	Shape (n_samples,)

Returns: self

transform()¶

def transform(embeddings: np.ndarray) -> np.ndarray

Project embeddings onto bias direction.

debias()¶

def debias(embeddings: np.ndarray) -> np.ndarray

Remove bias direction from embeddings.

Attributes (after fit)¶

Attribute	Type	Description
`bias_direction_`	ndarray	Unit vector between group centroids
`bias_effect_size_`	float	Cohen's d along bias direction
`subspace_overlap_`	float	Principal angle overlap (0-1)
`group_centroids_`	dict	Centroid per group
`group_pca_`	dict	PCA model per group
`projections_`	dict	Projections per group
`summary_`	str	Human-readable summary

Usage Examples¶

Basic Usage¶

from shortcut_detect import GeometricShortcutAnalyzer

analyzer = GeometricShortcutAnalyzer(n_components=5)
analyzer.fit(embeddings, group_labels)

print(analyzer.summary_)
print(f"Effect size: {analyzer.bias_effect_size_:.2f}")
print(f"Subspace overlap: {analyzer.subspace_overlap_:.2f}")

Debiasing¶

analyzer.fit(embeddings, group_labels)

# Remove bias direction
embeddings_debiased = analyzer.debias(embeddings)

# Verify debiasing
analyzer_after = GeometricShortcutAnalyzer()
analyzer_after.fit(embeddings_debiased, group_labels)
print(f"Effect size after: {analyzer_after.bias_effect_size_:.2f}")

Visualization¶

import matplotlib.pyplot as plt

# Project onto bias direction
projections = analyzer.transform(embeddings)

fig, ax = plt.subplots(figsize=(10, 4))
for group in np.unique(group_labels):
    mask = group_labels == group
    ax.hist(projections[mask], bins=50, alpha=0.5, label=f'Group {group}')
ax.legend()
ax.set_xlabel('Bias Direction Projection')
plt.savefig('bias_projections.png')