Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
420 changes: 420 additions & 0 deletions CODE_SMELLS.md

Large diffs are not rendered by default.

73 changes: 73 additions & 0 deletions docs/issues/01-duplicate-month-mappings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Code Smell: Duplicate Month Mapping Dictionaries

**Labels:** `code-quality`, `refactoring`, `medium-priority`
**Type:** Code Duplication

## Description

The Spanish-to-English month mapping dictionary is duplicated across multiple parser files, violating the DRY (Don't Repeat Yourself) principle. This creates maintenance burden and risk of inconsistency.

## Locations

The same or similar month mapping appears in:
- `core/parsers/hey_banco.py` (lines 143-156)
- `core/parsers/rappi.py` (lines 22-35)
- `core/parsers/banorte.py` (lines 36-49)
- `core/parsers/nubank.py` (lines 23-36)

## Current Code

Each parser has variations of:
```python
SPANISH_TO_ENGLISH_MONTH = {
"ene": "Jan",
"feb": "Feb",
"mar": "Mar",
# ... etc
}
```

## Impact

- **Maintenance Burden:** Changes must be replicated in 4 locations
- **Risk of Inconsistency:** If one mapping is updated and others aren't
- **Code Bloat:** Unnecessary duplication across files
- **Testing Overhead:** Same logic needs to be tested multiple times

## Proposed Solution

1. Create a shared date utilities module:
```python
# core/parsers/date_utils.py
SPANISH_TO_ENGLISH_MONTH = {
"ene": "Jan",
"enero": "January",
"feb": "Feb",
"febrero": "February",
# ... complete mapping
}

def normalize_spanish_date(date_str: str) -> str:
"""Normalize Spanish date string to English format."""
# Common date normalization logic
pass
```

2. Update all parsers to import from the shared module:
```python
from core.parsers.date_utils import SPANISH_TO_ENGLISH_MONTH
```

3. Consider extracting common date parsing logic into reusable functions

## Acceptance Criteria

- [ ] Create `core/parsers/date_utils.py` with comprehensive month mappings
- [ ] Update all 4 parser files to import the shared constant
- [ ] Remove duplicate definitions from parser files
- [ ] Verify all parsers still work correctly with existing test data
- [ ] Add unit tests for the shared date utilities

## Related Issues

- #2 (Duplicate Date Normalization Logic)
119 changes: 119 additions & 0 deletions docs/issues/02-duplicate-date-normalization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Code Smell: Duplicate Date Normalization Logic

**Labels:** `code-quality`, `refactoring`, `medium-priority`
**Type:** Code Duplication

## Description

The date normalization logic (Spanish months to English, AM/PM conversion) is duplicated within the `HeyBancoParser` class. The same ~27 lines of code appear in two different methods.

## Locations

- `core/parsers/hey_banco.py` - `_parse_outgoing_transfer()` (lines 143-169)
- `core/parsers/hey_banco.py` - `_parse_credit_card_payment()` (lines 226-253)

## Current Code

Both methods contain identical normalization logic:
```python
# Normalize Spanish months and AM/PM to English equivalents
month_map = {
"enero": "January",
"febrero": "February",
# ... etc
}

normalized_date_str = date_str
for spanish, english in month_map.items():
normalized_date_str = normalized_date_str.replace(spanish, english)
normalized_date_str = normalized_date_str.replace(
spanish.capitalize(), english
)

# Normalize "a. m." and "p. m." to "AM" / "PM"
normalized_date_str = normalized_date_str.replace("a. m.", "AM").replace(
"p. m.", "PM"
)

try:
datetime_obj = date_parser(normalized_date_str, dayfirst=True)
except (ValueError, TypeError, OverflowError) as e:
logger.error(...)
```

## Impact

- **Code Duplication:** ~27 lines duplicated exactly
- **Bug Risk:** Bug fixes need to be applied in multiple places
- **Readability:** Makes methods longer and harder to understand
- **Maintenance:** Changes to date parsing logic require updates in 2 places

## Proposed Solution

Extract the normalization logic into a private helper method:

```python
def _normalize_spanish_date(self, date_str: str) -> str:
"""
Normalize Spanish date string to English format.

Converts:
- Spanish month names to English
- "a. m." / "p. m." to "AM" / "PM"

Args:
date_str: Raw date string in Spanish format

Returns:
Normalized date string ready for parsing
"""
month_map = {
"enero": "January",
"febrero": "February",
"marzo": "March",
"abril": "April",
"mayo": "May",
"junio": "June",
"julio": "July",
"agosto": "August",
"septiembre": "September",
"octubre": "October",
"noviembre": "November",
"diciembre": "December",
}

normalized = date_str
for spanish, english in month_map.items():
normalized = normalized.replace(spanish, english)
normalized = normalized.replace(spanish.capitalize(), english)

normalized = normalized.replace("a. m.", "AM").replace("p. m.", "PM")
return normalized

def _parse_date(self, date_str: str) -> datetime | None:
"""Parse normalized date string with error handling."""
try:
normalized = self._normalize_spanish_date(date_str)
return date_parser(normalized, dayfirst=True)
except (ValueError, TypeError, OverflowError) as e:
logger.error("Failed to parse date: %s -> %s", date_str, e)
return None
```

Then update both methods to use:
```python
datetime_obj = self._parse_date(date_str.strip())
```

## Acceptance Criteria

- [ ] Create `_normalize_spanish_date()` helper method
- [ ] Create `_parse_date()` helper method with error handling
- [ ] Update `_parse_outgoing_transfer()` to use helper methods
- [ ] Update `_parse_credit_card_payment()` to use helper methods
- [ ] Verify both methods produce identical results with test data
- [ ] Reduce method lengths by ~20 lines each

## Related Issues

- #1 (Duplicate Month Mappings) - Should be resolved together
90 changes: 90 additions & 0 deletions docs/issues/03-magic-number-max-results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Code Smell: Magic Number (maxResults=500)

**Labels:** `code-quality`, `refactoring`, `low-priority`
**Type:** Magic Number

## Description

The Gmail API call in `fetch_emails.py` uses a hardcoded value `maxResults=500` without explanation or named constant.

## Location

- `core/fetch_emails.py` (line 23)

## Current Code

```python
response = (
service.users()
.messages()
.list(userId="me", q=query, maxResults=500, pageToken=page_token)
.execute()
)
```

## Impact

- **Unclear Intent:** Why 500? Is it the API limit? Performance tuning?
- **Maintainability:** Harder to adjust if Gmail API limits change
- **Not Configurable:** Cannot be changed without code modification
- **Documentation:** No comment explaining the choice

## Proposed Solution

Define as a named constant at the module level with documentation:

```python
# Gmail API allows up to 500 messages per page
# Using maximum to minimize API calls
MAX_MESSAGES_PER_PAGE = 500

def list_messages(service, query: str = " ", page_token: str | None = None):
"""Yield all message metadata matching the query, handling pagination automatically."""
while True:
response = (
service.users()
.messages()
.list(userId="me", q=query, maxResults=MAX_MESSAGES_PER_PAGE, pageToken=page_token)
.execute()
)
# ... rest of code
```

Or make it configurable:

```python
DEFAULT_PAGE_SIZE = 500

def list_messages(
service,
query: str = " ",
page_token: str | None = None,
page_size: int = DEFAULT_PAGE_SIZE
):
"""
Yield all message metadata matching the query, handling pagination automatically.

Args:
service: Authenticated Gmail API service
query: Gmail search query
page_token: Pagination token for continuing previous search
page_size: Number of messages per page (default: 500, max: 500)
"""
if page_size > 500:
logger.warning("Gmail API maximum is 500 messages per page, using 500")
page_size = 500

# ... rest of code with maxResults=page_size
```

## Acceptance Criteria

- [ ] Define `MAX_MESSAGES_PER_PAGE` constant at module level
- [ ] Add comment explaining the value (Gmail API limit)
- [ ] Replace magic number with named constant
- [ ] Consider making it a parameter for testing flexibility
- [ ] Update any related documentation

## Additional Notes

According to [Gmail API documentation](https://developers.google.com/gmail/api/reference/rest/v1/users.messages/list), the maximum value for `maxResults` is 500, so this is already using the optimal value. The issue is purely about code clarity and maintainability.
Loading