Skip to content

Geometric Analyzer API

The geometric module analyzes embedding geometry to detect shortcuts.

Class Reference

GeometricShortcutAnalyzer

GeometricShortcutAnalyzer(
    n_components: int = 5,
    min_group_size: int = 20,
    effect_threshold: float = 0.8,
    subspace_cosine_threshold: float = 0.85,
)

Bases: DetectorBase

Analyze shortcut risk using bias directions and prototype subspaces.

Parameters:

Name Type Description Default
n_components int

Number of principal directions per group.

5
min_group_size int

Minimum samples per group required for analysis.

20
effect_threshold float

Threshold on standardized projection gap for high risk.

0.8
subspace_cosine_threshold float

Threshold on mean subspace cosine for overlap risk.

0.85
Source code in shortcut_detect/geometric/geometric/src/detector.py
def __init__(
    self,
    n_components: int = 5,
    min_group_size: int = 20,
    effect_threshold: float = 0.8,
    subspace_cosine_threshold: float = 0.85,
):
    super().__init__(method="geometric")

    self.n_components = n_components
    self.min_group_size = min_group_size
    self.effect_threshold = effect_threshold
    self.subspace_cosine_threshold = subspace_cosine_threshold

    self.group_stats_: dict[str, dict[str, np.ndarray]] = {}
    self.bias_pairs_: list[BiasPairStats] = []
    self.subspace_pairs_: list[SubspacePairStats] = []
    self.summary_: dict[str, str] = {}

Functions

fit

fit(
    embeddings: ndarray, group_labels: Sequence
) -> GeometricShortcutAnalyzer

Fit analyzer on embeddings.

Parameters:

Name Type Description Default
embeddings ndarray

(n_samples, embedding_dim) array.

required
group_labels Sequence

Sequence of group identifiers aligned with embeddings.

required
Source code in shortcut_detect/geometric/geometric/src/detector.py
def fit(self, embeddings: np.ndarray, group_labels: Sequence) -> GeometricShortcutAnalyzer:
    """
    Fit analyzer on embeddings.

    Args:
        embeddings: (n_samples, embedding_dim) array.
        group_labels: Sequence of group identifiers aligned with embeddings.
    """
    X = np.asarray(embeddings)
    if X.ndim != 2:
        raise ValueError("embeddings must be 2D (n_samples, embedding_dim)")

    labels = np.asarray(group_labels)
    if labels.shape[0] != X.shape[0]:
        raise ValueError("embeddings and group_labels must have the same length")

    unique_groups = np.unique(labels)
    if unique_groups.shape[0] < 2:
        raise ValueError("At least two groups are required for geometric analysis")

    self.group_stats_ = self._compute_group_stats(X, labels, unique_groups)
    self.bias_pairs_ = self._compute_bias_pairs(X, labels)
    self.subspace_pairs_ = self._compute_subspace_pairs()
    self.summary_ = self._assess_risk()
    self._finalize_results()
    self._is_fitted = True
    return self

GeometricShortcutAnalyzer

Constructor

GeometricShortcutAnalyzer(
    n_components: int = 5,
    normalize: bool = True,
    random_state: int = None
)

Parameters

Parameter Type Default Description
n_components int 5 PCA components per group
normalize bool True Normalize embeddings
random_state int None Random seed

Methods

fit()

def fit(
    embeddings: np.ndarray,
    group_labels: np.ndarray
) -> GeometricShortcutAnalyzer

Analyze geometric structure of embeddings.

Parameters:

Parameter Type Description
embeddings ndarray Shape (n_samples, n_features)
group_labels ndarray Shape (n_samples,)

Returns: self

transform()

def transform(embeddings: np.ndarray) -> np.ndarray

Project embeddings onto bias direction.

debias()

def debias(embeddings: np.ndarray) -> np.ndarray

Remove bias direction from embeddings.

Attributes (after fit)

Attribute Type Description
bias_direction_ ndarray Unit vector between group centroids
bias_effect_size_ float Cohen's d along bias direction
subspace_overlap_ float Principal angle overlap (0-1)
group_centroids_ dict Centroid per group
group_pca_ dict PCA model per group
projections_ dict Projections per group
summary_ str Human-readable summary

Usage Examples

Basic Usage

from shortcut_detect import GeometricShortcutAnalyzer

analyzer = GeometricShortcutAnalyzer(n_components=5)
analyzer.fit(embeddings, group_labels)

print(analyzer.summary_)
print(f"Effect size: {analyzer.bias_effect_size_:.2f}")
print(f"Subspace overlap: {analyzer.subspace_overlap_:.2f}")

Debiasing

analyzer.fit(embeddings, group_labels)

# Remove bias direction
embeddings_debiased = analyzer.debias(embeddings)

# Verify debiasing
analyzer_after = GeometricShortcutAnalyzer()
analyzer_after.fit(embeddings_debiased, group_labels)
print(f"Effect size after: {analyzer_after.bias_effect_size_:.2f}")

Visualization

import matplotlib.pyplot as plt

# Project onto bias direction
projections = analyzer.transform(embeddings)

fig, ax = plt.subplots(figsize=(10, 4))
for group in np.unique(group_labels):
    mask = group_labels == group
    ax.hist(projections[mask], bins=50, alpha=0.5, label=f'Group {group}')
ax.legend()
ax.set_xlabel('Bias Direction Projection')
plt.savefig('bias_projections.png')

See Also