Skip to content

Reference

Sevhena Walker edited this page Apr 5, 2025 · 1 revision

Smells Index

1. Cache Repeated Calls

The Hidden Cost of Redundancy
When a pure function (one that always returns the same output for identical inputs) is called multiple times with the same arguments within a limited scope, each invocation performs the same computation anew. This is computationally wasteful, like recalculating a Fibonacci number from scratch every time it's needed.

Why It Matters:

  • Energy Impact: Each redundant call consumes CPU cycles unnecessarily, translating directly to increased power consumption. In data processing pipelines, this can compound across millions of iterations.
  • Readability: Repeated calls obscure the developer's intent to reuse a value, making control flow harder to follow.

Technical Nuances:

  • Only applies to pure functions – those without side effects or reliance on external state.
  • The scope of caching is local (within a function body), unlike @lru_cache which persists across calls.

Example Evolution:

# Before: Energy-wasting repetition
report = generate_report(user)  
summary = f"Data: {generate_report(user)}"  # Repeats identical work

# After: Explicit value reuse
cached_report = generate_report(user)  # Single computation
report = cached_report  
summary = f"Data: {cached_report}"

2. Long Lambda Functions

Energy Impact Each redundant function call executes the entire computation pipeline again - from CPU register allocation to memory fetches. For complex calculations, this wastes millions of transistor cycles per repetition. Caching locally eliminates these redundant silicon operations.

The Anonymous Code Smell
Lambda functions are meant for short, one-off operations. When they grow beyond a simple expression, they defeat their purpose by becoming:

  • Untestable: Cannot be unit tested in isolation
  • Undebuggable: Appear as <lambda> in stack traces
  • Unmaintainable: Encourage logic duplication

The Human Factor:
Developers often start with a "small" lambda that accretes complexity over time. What begins as lambda x: x+1 morphs into a 5-line expression handling edge cases, better served as a named function.

Refactoring Philosophy:

# Before: Lambda doing too much
users.sort(key=lambda u: (u.last_name, u.first_name, u.age // 10))

# After: Self-documenting named function
def user_sort_key(user):
    """Sort by last name, then first name, then age group."""
    return (u.last_name, u.first_name, u.age // 10)

users.sort(key=user_sort_key)

Key Indicators:

  • Contains conditionals (if/else)
  • Multiple operations (more than one , in tuples)
  • Exceeds 50 characters

3. Use Generators in Predicates

The Eager Evaluation Trap
any() and all() can short-circuit (stop evaluating after the first False/True), but wrapping a list comprehension forces complete evaluation before checking. This is like reading an entire book to check if it contains the letter 'A'.

Energy Impact List comprehensions materialize all data in memory simultaneously, requiring upfront RAM allocation and triggering more frequent garbage collection. Generators stream data through the CPU cache linearly, reducing memory subsystem power draw.

Deep Dive:

# Before: Eager list creation
if any([x > threshold for x in sensor_readings]):  # Processes ALL readings first

# After: Lazy generator
if any(x > threshold for x in sensor_readings):  # Stops at first violation

Why It Matters:

  • Memory: A list of 1M items consumes ~8MB RAM even if the first item matches
  • Performance: For early matches, generators avoid processing 99.9% of elements

When to Apply:

  • Inside any()/all() calls
  • With large or unbounded iterables

4. Long Element Chains (Nested Subscripts)

Detects repeated access patterns to nested dictionaries where the deepest consistent access depth determines flattening. If calls access dict["a"]["b"]["c"] (3 levels) and dict["a"]["b"]["c"]["d"] (4 levels), the dictionary is flattened to 3 levels to safely support all existing accesses.

Energy Rationale:
Each nested subscript requires:

  1. A hash computation for each [] operation
  2. Multiple pointer dereferences
  3. Sequential memory fetches

Flattening reduces this to a single hash lookup for the final key, cutting energy use per access.

Technical Implementation:

# Before: Variable depth access
config["user"]["prefs"]["theme"]       # 3 levels  
config["user"]["prefs"]["theme"]["dark"]  # 4 levels

# Phase 1: Identify deepest common depth (3 levels)
# Phase 2: Flatten to that depth
flattened = {
    "user.prefs.theme": original["user"]["prefs"]["theme"]
}

# After: Unified access
flattened["user_prefs_theme"]          # All accesses now 1 operation  
flattened["user_prefs_theme"]["dark"]  # From 4 → 2 levels

Example Transformation:

# Original
data = {
    "a": {
        "b": {
            "c": 1, 
            "d": {
                "e": 2  # Will remain nested
            }
        }
    }
}

# Flattened (to depth 3)
flattened = {
    "a_b_c": 1,
    "a_b_d": {"e": 2}  # Not flattened further
}

# Access becomes:
flattened["a_b_c"]       # Instead of data["a"]["b"]["c"]  
flattened["a_b_d"]["e"]  # Mixed-depth access still works

5. Long Message Chains (Method Chaining)

The Train Wreck Pattern
Method chains (obj.a().b().c()) create temporal coupling – each link depends on the previous method's output shape. Changes to any intermediate method can break the entire chain.

Energy Impact Method chains create temporary intermediate objects that live just long enough to serve the next call. These "object thrashing" patterns keep the garbage collector running hot, wasting energy on short-lived allocations.

Design Implications:

# Before: Tight coupling
result = (dataset
          .clean()    # Must return cleaned dataset
          .filter()   # Must expose filter()
          .sort()     # Must support sorting
         )

# After: Explicit steps
cleaned = dataset.clean()          # Can validate here
filtered = cleaned.filter(...)     # Debuggable
result = filtered.sort(...)

When Chaining Works:

  • With fluent interfaces designed for it (e.g., Pandas, SQLAlchemy)
  • When methods return self for builder patterns

6. String Concatenation in Loops

Energy Impact Each += operation copies the entire string to new memory. For large strings, this turns an O(n) operation into O(n²), effectively squaring the CPU energy required due to excessive memory bus activity.

Under the Hood:

# Before: Hidden inefficiency
s = ""
for _ in range(10000):
    s += "x"  # Copies entire string each time

# After: Linear time
parts = []
for _ in range(10000):
    parts.append("x")  # O(1) appends
s = "".join(parts)     # Single allocation

7. Member-Ignoring Method

The Orphaned self
Instance methods that don't use self are architectural dead weight – they pretend to belong to a class but operate independently. This misleads maintainers about dependencies.

Energy Impact Unnecessary instance methods still incur the overhead of Python's method binding protocol, including creation of temporary bound method objects. Static methods avoid this dispatch machinery.

Refactoring Guidance:

# Before: Misleading instance method
class TextUtils:
    def capitalize(text):  # Where's self?
        return text.upper()

# After: Honest static method
class TextUtils:
    @staticmethod
    def capitalize(text):
        return text.upper()

When to Keep:

  • Methods overriding superclass templates
  • Planned future use of instance state

8. Long Parameter Lists

The Signature Smell
Functions with many parameters often try to do too much. They become hard to call correctly and resist modification.

Energy Impact Each additional parameter increases stack frame setup costs and memory writes during function calls. With 6+ parameters, the calling overhead can exceed the function's actual work energy usage.

Design Alternatives:

# Before: Parameter soup
def render_chart(data, title, x_label, y_label, width, height, color, style):

# After: Responsibility grouping
class ChartOptions:
    def __init__(self, title, labels, dimensions, style):
        self.title = title
        self.labels = labels
        self.dimensions = dimensions
        self.style = style

def render_chart(data, options: ChartOptions):

Rule of Thumb:
If you can't remember the parameter order without looking, the list is too long.

Clone this wiki locally