Skip to content

Bug: feature_names incorrectly labels features with wrong patterns and channels #2

@jakublipinski

Description

@jakublipinski

Description

The feature_names property in xrocket/block.py incorrectly pairs patterns with channels and thresholds. This causes features to be mislabeled with the wrong channel information, making it impossible to correctly interpret which input channels contribute to each feature.

Impact

  • Features are labeled with incorrect channel information
  • Features that should use only one channel are labeled as using a different channel
  • This breaks any downstream analysis that relies on knowing which channels each feature uses
  • In my case, features that were labeled as using an all-zero channel actually had variance, which revealed the bug

Root Cause

The bug is in xrocket/block.py in the feature_names property (around lines 142-159):

Current (incorrect) implementation:

for pattern, channels, threshold in zip(
    self.conv.patterns * self.num_combinations * self.num_thresholds,
    self.mix.combinations * self.num_thresholds,
    self.thresholds.thresholds,
)

Problem: This uses zip() with repeated lists, which creates incorrect pairings. The features are actually generated in nested order (pattern → channels → threshold), but the zip operation pairs them linearly.

Minimal Reproducible Example

import torch
import numpy as np
from xrocket import XRocket

# Create data with 2 channels: one random, one all zeros
np.random.seed(42)
data = []
for _ in range(5):
    sample = np.zeros((2, 100))
    sample[0, :] = np.random.randn(100)  # Channel 0: random
    sample[1, :] = 0.0                   # Channel 1: zeros
    data.append(torch.FloatTensor(sample))

# Initialize and fit XRocket
rocket = XRocket(in_channels=2, max_kernel_span=100, combination_order=1, 
                 feature_cap=100, kernel_length=3, max_dilations=2)
rocket.fit(data[0].unsqueeze(0))

# Generate embeddings
embeddings = np.array([rocket(x.unsqueeze(0)).numpy().squeeze() for x in data])

# Check features labeled as using "only channel 1" (the zero channel)
for i in range(min(20, embeddings.shape[1])):
    feature_name = rocket.feature_names[i]
    channels_str = feature_name[2]  # String like "[1.0, 0.0]" or "[0.0, 1.0]"
    
    if "[0.0, 1.0]" in channels_str:  # Labeled as using only channel 1
        values = embeddings[:, i]
        variance = np.var(values)
        print(f"Feature {i}: channels={channels_str}, variance={variance:.6f}")
        
        if variance > 1e-9:
            print(f"  ❌ Has variance despite zero-channel label - feature_names is WRONG!")

Expected: Features labeled as using only the zero channel should have variance = 0.0

Actual: These features have non-zero variance, proving they don't actually use the channel they're labeled with.

Proposed Fix

Replace the zip-based approach with proper nested loops in the feature_names property:

@property
def feature_names(self) -> list[tuple]:
    """(pattern, dilation, channels, threshold) tuples to identify features."""
    assert self.is_fitted, "module needs to be fitted for thresholds to be named"
    feature_names = []
    for pattern in self.conv.patterns:
        for channels in self.mix.combinations:
            for threshold in self.thresholds.thresholds:
                feature_names.append((
                    str(pattern),
                    self.dilation,
                    str(channels),
                    f"{threshold:.4f}",
                ))
    return feature_names

Verification

After applying the fix:

  1. Run the minimal example above
  2. Features labeled as using only the zero channel should now have variance = 0.0
  3. The feature_names should correctly match the actual feature generation order

Environment

  • XRocket version: Commit 1511e81
  • Python version: 3.11.9
  • PyTorch version: 2.2.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions