diff --git a/CODE_SMELLS.md b/CODE_SMELLS.md new file mode 100644 index 0000000..e68e8bc --- /dev/null +++ b/CODE_SMELLS.md @@ -0,0 +1,420 @@ +# Code Smells Identified in Expense Tracker + +This document lists code quality issues (code smells) identified in the codebase. Each item includes a description, location, severity, and suggested remediation. + +--- + +## 1. Duplicate Code: Month Mapping Dictionaries + +**Severity:** Medium +**Category:** Code Duplication + +### Description +The Spanish-to-English month mapping dictionary is duplicated across multiple parser files, violating the DRY (Don't Repeat Yourself) principle. + +### Locations +- `core/parsers/hey_banco.py` (lines 143-156) +- `core/parsers/rappi.py` (lines 22-35) +- `core/parsers/banorte.py` (lines 36-49) +- `core/parsers/nubank.py` (lines 23-36) + +### Impact +- Maintenance burden: Changes must be replicated in 4 locations +- Risk of inconsistency if one mapping is updated and others aren't +- Unnecessary code bloat + +### Suggested Fix +Create a shared constants module for date utilities: +```python +# core/parsers/date_utils.py +SPANISH_TO_ENGLISH_MONTH = { + "ene": "Jan", + "enero": "January", + # ... etc +} +``` +Then import this constant in all parsers that need it. + +--- + +## 2. Duplicate Code: Date Normalization Logic + +**Severity:** Medium +**Category:** Code Duplication + +### Description +The date normalization logic (Spanish months to English, AM/PM conversion) is duplicated in two methods within `HeyBancoParser`. + +### Locations +- `core/parsers/hey_banco.py` - `_parse_outgoing_transfer()` (lines 143-169) +- `core/parsers/hey_banco.py` - `_parse_credit_card_payment()` (lines 226-253) + +### Impact +- Same code appears twice with identical logic +- Bug fixes need to be applied in multiple places +- Makes the methods longer and harder to read + +### Suggested Fix +Extract the normalization logic into a private helper method: +```python +def _normalize_spanish_date(self, date_str: str) -> str: + # Month mapping and normalization logic + pass +``` + +--- + +## 3. Magic Number: maxResults=500 + +**Severity:** Low +**Category:** Magic Numbers + +### Description +Hardcoded value `maxResults=500` in the Gmail API call without explanation. + +### Location +- `core/fetch_emails.py` (line 23) + +### Impact +- Unclear why 500 was chosen +- Makes it harder to adjust if API limits change +- Not configurable without code modification + +### Suggested Fix +Define as a named constant at module level: +```python +MAX_MESSAGES_PER_PAGE = 500 # Gmail API maximum +``` + +--- + +## 4. Long Method: _parse_credit_card_purchase + +**Severity:** Medium +**Category:** Long Method + +### Description +The `_parse_credit_card_purchase` method in `HeyBancoParser` is 67 lines long and handles multiple responsibilities: validation, amount parsing, description extraction, date parsing, and card type determination. + +### Location +- `core/parsers/hey_banco.py` (lines 277-339) + +### Impact +- Difficult to test individual parts +- Harder to understand at a glance +- Multiple reasons to change (violates Single Responsibility Principle) + +### Suggested Fix +Break down into smaller methods: +- `_extract_amount(text)` +- `_extract_merchant(text)` +- `_extract_transaction_date(text)` +- `_determine_card_type(text)` + +--- + +## 5. Feature Envy: Return Type in ParserHelper + +**Severity:** Low +**Category:** Design Smell + +### Description +The `get_parser_for_email` method returns a union of specific parser types instead of the base `BaseBankParser` type, creating tight coupling. + +### Location +- `core/parsers/parser_helper.py` (lines 33-51) + +### Impact +- Every time a new parser is added, the return type annotation must be updated +- Violates Open/Closed Principle +- Creates unnecessary coupling to concrete implementations + +### Suggested Fix +Change return type to base class: +```python +def get_parser_for_email(from_header: str) -> BaseBankParser | None: +``` + +--- + +## 6. Missing Explicit Return Statement + +**Severity:** Low +**Category:** Code Clarity + +### Description +The `get_parser_for_email` method doesn't have an explicit `return None` at the end. + +### Location +- `core/parsers/parser_helper.py` (line 51) + +### Impact +- Implicit behavior is less clear +- May confuse developers unfamiliar with Python's implicit None return + +### Suggested Fix +Add explicit return statement: +```python +return PARSERS.get(bank) +return None # No matching parser found +``` + +--- + +## 7. Dead Code: PAYPAL Bank Definition + +**Severity:** Low +**Category:** Dead Code + +### Description +`SupportedBanks.PAYPAL` is defined in the enum and has email addresses configured, but the PayPalParser is commented out. + +### Locations +- `constants/banks.py` (line 22) - PAYPAL enum value defined +- `constants/banks.py` (line 38) - PAYPAL emails configured +- `core/parsers/parser_helper.py` (line 24) - PayPalParser commented out + +### Impact +- Confusion about whether PayPal is actually supported +- Unused constants taking up space +- May cause issues if someone tries to use PayPal + +### Suggested Fix +Either: +1. Remove PAYPAL from constants if not ready, or +2. Implement PayPalParser if support is intended + +--- + +## 8. Variable Shadowing Bug: Duplicate Assignment + +**Severity:** High +**Category:** Bug + +### Description +Line 78 in `hey_banco.py` has a duplicate assignment: `amount = amount = float(...)`, which is clearly a mistake. + +### Location +- `core/parsers/hey_banco.py` (line 78) + +### Code +```python +amount = amount = float(amount_match.group(1).replace(",", "")) +``` + +### Impact +- This is a clear typo/bug +- While functionally it works, it indicates careless coding +- May confuse code reviewers and maintainers + +### Suggested Fix +```python +amount = float(amount_match.group(1).replace(",", "")) +``` + +--- + +## 9. Double Session Commit + +**Severity:** Medium +**Category:** Performance / Design Issue + +### Description +The `save_transaction` method commits the database session twice: once for the bank (line 33) and once for the transaction (line 50). + +### Location +- `core/services/transaction_service.py` (lines 33, 50) + +### Impact +- Inefficient: Two database commits when one would suffice +- Breaks transaction atomicity partially +- The bank commit happens before the transaction is validated + +### Suggested Fix +Commit only once at the end, or rely on the context manager's automatic commit: +```python +bank = Bank(name=transaction.bank_name) +session.add(bank) +session.flush() # Get ID without committing +# ... rest of code +# Let context manager handle the single commit +``` + +--- + +## 10. Logging Bug: Duplicate category_id + +**Severity:** Low +**Category:** Bug + +### Description +Line 60 logs `category_id` twice instead of logging both `category_id` and `subcategory_id`. + +### Location +- `core/services/transaction_service.py` (line 60) + +### Code +```python +logger.info( + "Transaction added [ID: %s] | %s | %s | %s | Category: %s | Subcategory: %s", + tx.transaction_id, + tx.amount, + tx.date, + tx.description, + tx.category_id, + tx.category_id, # Should be tx.subcategory_id +) +``` + +### Impact +- Incorrect logging output +- Subcategory information is never logged +- Makes debugging harder + +### Suggested Fix +```python +tx.category_id, +tx.subcategory_id, +``` + +--- + +## 11. Data Clump: email_message Dictionary + +**Severity:** Medium +**Category:** Data Clump + +### Description +The `email_message` dictionary is passed around with keys accessed inconsistently. Different parsers access different combinations of keys (`body_html`, `body_plain`, `subject`, `date`). + +### Locations +- Multiple parser files access `email_message` with varying patterns +- No clear data contract or validation + +### Impact +- Unclear what keys are guaranteed to exist +- Prone to KeyError if dict structure changes +- Difficult to refactor + +### Suggested Fix +Create an `EmailMessage` dataclass or Pydantic model: +```python +@dataclass +class EmailMessage: + id: str + subject: str + from_address: str + date: str + body_plain: str = "" + body_html: str = "" +``` + +--- + +## 12. Class Variable as Instance Variable + +**Severity:** Low +**Category:** Design Issue + +### Description +`Database.engine` is defined as a class variable (line 29) but is used as an instance variable, which could lead to unexpected sharing between instances. + +### Location +- `database/database.py` (line 29) + +### Code +```python +class Database: + engine = None # Class variable + + def __init__(self, db_url="sqlite:///expenses.db"): + self.engine = create_engine(...) # Assigned as instance variable +``` + +### Impact +- Confusing: Appears to be class variable but used as instance +- Could cause bugs if multiple Database instances are created with different URLs +- Not following Python best practices + +### Suggested Fix +Remove the class variable declaration: +```python +class Database: + def __init__(self, db_url="sqlite:///expenses.db"): + self.engine = create_engine(...) +``` + +--- + +## 13. Inconsistent Error Handling + +**Severity:** Medium +**Category:** Error Handling + +### Description +Error handling is inconsistent across the codebase. Some methods return `None` on error, some log and continue, and error handling patterns vary. + +### Examples +- `_decode_payload()` returns placeholder string on error +- Parser methods return `None` on parsing failure +- `save_transaction()` returns `None` or int +- Some exceptions are caught broadly, others specifically + +### Impact +- Unclear error handling contract +- Difficult to debug issues +- May hide important errors +- Inconsistent user experience + +### Suggested Fix +Define a consistent error handling strategy: +1. Use custom exceptions for different error types +2. Document what each method returns on failure +3. Consider using Result types or Optional types consistently + +--- + +## 14. Inconsistent Body Selection Logic + +**Severity:** Medium +**Category:** Code Inconsistency / Bug + +### Description +Different parsers have inconsistent logic for choosing between `body_html` and `body_plain`. + +### Locations +- `hey_banco.py` (lines 46-49): Prefers HTML, falls back to plain +- `rappi.py` (lines 52-55): Fetches plain twice (bug) +- `nubank.py` (lines 63-64): Only uses plain +- `banorte.py` (lines 80-83): Prefers HTML, falls back to plain + +### Impact +- Inconsistent behavior across parsers +- The Rappi parser has a clear bug (fetches body_plain twice) +- Makes testing harder due to varying assumptions + +### Suggested Fix +Implement a consistent body selection strategy, possibly in the base parser class. + +--- + +## Summary + +**Total Issues:** 14 +**High Severity:** 1 +**Medium Severity:** 8 +**Low Severity:** 5 + +### Priority Recommendations +1. **Fix Bug #8** (Variable shadowing) - Quick win, clear bug +2. **Fix Bug #10** (Logging bug) - Quick win, clear bug +3. **Fix Bug #14** (Rappi body selection) - Quick win, clear bug +4. **Refactor #1** (Duplicate month mappings) - High impact on maintainability +5. **Refactor #2** (Duplicate date normalization) - Improves HeyBancoParser +6. **Fix #9** (Double session commit) - Performance and design improvement +7. Address remaining issues based on development priorities + +--- + +*Generated: 2026-01-13* +*Repository: javrr-ui/expense-tracker* diff --git a/docs/issues/01-duplicate-month-mappings.md b/docs/issues/01-duplicate-month-mappings.md new file mode 100644 index 0000000..54fad4d --- /dev/null +++ b/docs/issues/01-duplicate-month-mappings.md @@ -0,0 +1,73 @@ +# Code Smell: Duplicate Month Mapping Dictionaries + +**Labels:** `code-quality`, `refactoring`, `medium-priority` +**Type:** Code Duplication + +## Description + +The Spanish-to-English month mapping dictionary is duplicated across multiple parser files, violating the DRY (Don't Repeat Yourself) principle. This creates maintenance burden and risk of inconsistency. + +## Locations + +The same or similar month mapping appears in: +- `core/parsers/hey_banco.py` (lines 143-156) +- `core/parsers/rappi.py` (lines 22-35) +- `core/parsers/banorte.py` (lines 36-49) +- `core/parsers/nubank.py` (lines 23-36) + +## Current Code + +Each parser has variations of: +```python +SPANISH_TO_ENGLISH_MONTH = { + "ene": "Jan", + "feb": "Feb", + "mar": "Mar", + # ... etc +} +``` + +## Impact + +- **Maintenance Burden:** Changes must be replicated in 4 locations +- **Risk of Inconsistency:** If one mapping is updated and others aren't +- **Code Bloat:** Unnecessary duplication across files +- **Testing Overhead:** Same logic needs to be tested multiple times + +## Proposed Solution + +1. Create a shared date utilities module: + ```python + # core/parsers/date_utils.py + SPANISH_TO_ENGLISH_MONTH = { + "ene": "Jan", + "enero": "January", + "feb": "Feb", + "febrero": "February", + # ... complete mapping + } + + def normalize_spanish_date(date_str: str) -> str: + """Normalize Spanish date string to English format.""" + # Common date normalization logic + pass + ``` + +2. Update all parsers to import from the shared module: + ```python + from core.parsers.date_utils import SPANISH_TO_ENGLISH_MONTH + ``` + +3. Consider extracting common date parsing logic into reusable functions + +## Acceptance Criteria + +- [ ] Create `core/parsers/date_utils.py` with comprehensive month mappings +- [ ] Update all 4 parser files to import the shared constant +- [ ] Remove duplicate definitions from parser files +- [ ] Verify all parsers still work correctly with existing test data +- [ ] Add unit tests for the shared date utilities + +## Related Issues + +- #2 (Duplicate Date Normalization Logic) diff --git a/docs/issues/02-duplicate-date-normalization.md b/docs/issues/02-duplicate-date-normalization.md new file mode 100644 index 0000000..0b72ad1 --- /dev/null +++ b/docs/issues/02-duplicate-date-normalization.md @@ -0,0 +1,119 @@ +# Code Smell: Duplicate Date Normalization Logic + +**Labels:** `code-quality`, `refactoring`, `medium-priority` +**Type:** Code Duplication + +## Description + +The date normalization logic (Spanish months to English, AM/PM conversion) is duplicated within the `HeyBancoParser` class. The same ~27 lines of code appear in two different methods. + +## Locations + +- `core/parsers/hey_banco.py` - `_parse_outgoing_transfer()` (lines 143-169) +- `core/parsers/hey_banco.py` - `_parse_credit_card_payment()` (lines 226-253) + +## Current Code + +Both methods contain identical normalization logic: +```python +# Normalize Spanish months and AM/PM to English equivalents +month_map = { + "enero": "January", + "febrero": "February", + # ... etc +} + +normalized_date_str = date_str +for spanish, english in month_map.items(): + normalized_date_str = normalized_date_str.replace(spanish, english) + normalized_date_str = normalized_date_str.replace( + spanish.capitalize(), english + ) + +# Normalize "a. m." and "p. m." to "AM" / "PM" +normalized_date_str = normalized_date_str.replace("a. m.", "AM").replace( + "p. m.", "PM" +) + +try: + datetime_obj = date_parser(normalized_date_str, dayfirst=True) +except (ValueError, TypeError, OverflowError) as e: + logger.error(...) +``` + +## Impact + +- **Code Duplication:** ~27 lines duplicated exactly +- **Bug Risk:** Bug fixes need to be applied in multiple places +- **Readability:** Makes methods longer and harder to understand +- **Maintenance:** Changes to date parsing logic require updates in 2 places + +## Proposed Solution + +Extract the normalization logic into a private helper method: + +```python +def _normalize_spanish_date(self, date_str: str) -> str: + """ + Normalize Spanish date string to English format. + + Converts: + - Spanish month names to English + - "a. m." / "p. m." to "AM" / "PM" + + Args: + date_str: Raw date string in Spanish format + + Returns: + Normalized date string ready for parsing + """ + month_map = { + "enero": "January", + "febrero": "February", + "marzo": "March", + "abril": "April", + "mayo": "May", + "junio": "June", + "julio": "July", + "agosto": "August", + "septiembre": "September", + "octubre": "October", + "noviembre": "November", + "diciembre": "December", + } + + normalized = date_str + for spanish, english in month_map.items(): + normalized = normalized.replace(spanish, english) + normalized = normalized.replace(spanish.capitalize(), english) + + normalized = normalized.replace("a. m.", "AM").replace("p. m.", "PM") + return normalized + +def _parse_date(self, date_str: str) -> datetime | None: + """Parse normalized date string with error handling.""" + try: + normalized = self._normalize_spanish_date(date_str) + return date_parser(normalized, dayfirst=True) + except (ValueError, TypeError, OverflowError) as e: + logger.error("Failed to parse date: %s -> %s", date_str, e) + return None +``` + +Then update both methods to use: +```python +datetime_obj = self._parse_date(date_str.strip()) +``` + +## Acceptance Criteria + +- [ ] Create `_normalize_spanish_date()` helper method +- [ ] Create `_parse_date()` helper method with error handling +- [ ] Update `_parse_outgoing_transfer()` to use helper methods +- [ ] Update `_parse_credit_card_payment()` to use helper methods +- [ ] Verify both methods produce identical results with test data +- [ ] Reduce method lengths by ~20 lines each + +## Related Issues + +- #1 (Duplicate Month Mappings) - Should be resolved together diff --git a/docs/issues/03-magic-number-max-results.md b/docs/issues/03-magic-number-max-results.md new file mode 100644 index 0000000..281205b --- /dev/null +++ b/docs/issues/03-magic-number-max-results.md @@ -0,0 +1,90 @@ +# Code Smell: Magic Number (maxResults=500) + +**Labels:** `code-quality`, `refactoring`, `low-priority` +**Type:** Magic Number + +## Description + +The Gmail API call in `fetch_emails.py` uses a hardcoded value `maxResults=500` without explanation or named constant. + +## Location + +- `core/fetch_emails.py` (line 23) + +## Current Code + +```python +response = ( + service.users() + .messages() + .list(userId="me", q=query, maxResults=500, pageToken=page_token) + .execute() +) +``` + +## Impact + +- **Unclear Intent:** Why 500? Is it the API limit? Performance tuning? +- **Maintainability:** Harder to adjust if Gmail API limits change +- **Not Configurable:** Cannot be changed without code modification +- **Documentation:** No comment explaining the choice + +## Proposed Solution + +Define as a named constant at the module level with documentation: + +```python +# Gmail API allows up to 500 messages per page +# Using maximum to minimize API calls +MAX_MESSAGES_PER_PAGE = 500 + +def list_messages(service, query: str = " ", page_token: str | None = None): + """Yield all message metadata matching the query, handling pagination automatically.""" + while True: + response = ( + service.users() + .messages() + .list(userId="me", q=query, maxResults=MAX_MESSAGES_PER_PAGE, pageToken=page_token) + .execute() + ) + # ... rest of code +``` + +Or make it configurable: + +```python +DEFAULT_PAGE_SIZE = 500 + +def list_messages( + service, + query: str = " ", + page_token: str | None = None, + page_size: int = DEFAULT_PAGE_SIZE +): + """ + Yield all message metadata matching the query, handling pagination automatically. + + Args: + service: Authenticated Gmail API service + query: Gmail search query + page_token: Pagination token for continuing previous search + page_size: Number of messages per page (default: 500, max: 500) + """ + if page_size > 500: + logger.warning("Gmail API maximum is 500 messages per page, using 500") + page_size = 500 + + # ... rest of code with maxResults=page_size +``` + +## Acceptance Criteria + +- [ ] Define `MAX_MESSAGES_PER_PAGE` constant at module level +- [ ] Add comment explaining the value (Gmail API limit) +- [ ] Replace magic number with named constant +- [ ] Consider making it a parameter for testing flexibility +- [ ] Update any related documentation + +## Additional Notes + +According to [Gmail API documentation](https://developers.google.com/gmail/api/reference/rest/v1/users.messages/list), the maximum value for `maxResults` is 500, so this is already using the optimal value. The issue is purely about code clarity and maintainability. diff --git a/docs/issues/04-long-method-parse-credit-card.md b/docs/issues/04-long-method-parse-credit-card.md new file mode 100644 index 0000000..12c134b --- /dev/null +++ b/docs/issues/04-long-method-parse-credit-card.md @@ -0,0 +1,140 @@ +# Code Smell: Long Method (_parse_credit_card_purchase) + +**Labels:** `code-quality`, `refactoring`, `medium-priority` +**Type:** Long Method + +## Description + +The `_parse_credit_card_purchase` method in `HeyBancoParser` is 67 lines long and handles multiple responsibilities, making it difficult to test, understand, and maintain. + +## Location + +- `core/parsers/hey_banco.py` (lines 277-339) + +## Current Issues + +The method does too much: +1. Validates transaction (calls another method) +2. Extracts amount with regex and parsing +3. Extracts merchant/description +4. Extracts and parses date +5. Determines card type (credit vs debit) +6. Constructs Transaction object + +## Impact + +- **Testing Difficulty:** Cannot easily test individual extraction logic +- **Readability:** Hard to understand the full flow at a glance +- **Maintainability:** Changes to one aspect affect the entire method +- **Reusability:** Cannot reuse extraction logic elsewhere +- **Violations:** Breaks Single Responsibility Principle + +## Proposed Solution + +Break down into focused, testable methods: + +```python +def _parse_credit_card_purchase(self, text: str, email_id: str) -> Transaction | None: + """Parse a debit or credit card purchase alert.""" + if self.is_transaction_not_valid_or_email_not_supported(text): + return None + + amount = self._extract_purchase_amount(text) + if amount is None: + return None + + merchant = self._extract_merchant_name(text) + transaction_date = self._extract_transaction_date(text) + card_type = self._determine_card_type(text) + + description = f"{merchant} {card_type}" if card_type else merchant + + return TransactionCreate( + bank_name=self.bank_name, + email_id=email_id, + date=transaction_date, + amount=amount, + description=description, + merchant=merchant, + reference=None, + status="", + type="expense", + ) + +def _extract_purchase_amount(self, text: str) -> float | None: + """Extract purchase amount from card purchase email.""" + amount_match = re.search( + r"Cantidad:[\s\S]*?]*>\s*\$?([\d,]+\.\d{2})\s*", text + ) + if not amount_match: + return None + + try: + return float(amount_match.group(1).replace(",", "")) + except ValueError: + logger.error("Failed to parse amount: %s", amount_match.group(1)) + return None + +def _extract_merchant_name(self, text: str) -> str: + """Extract merchant/store name from card purchase email.""" + description_match = re.search( + r"Comercio:[\s\S]*?]*>\s*([^<]+?)\s*", text + ) + return description_match.group(1).strip() if description_match else "" + +def _extract_transaction_date(self, text: str) -> datetime | None: + """Extract transaction date from card purchase email.""" + date_match = re.search( + r"Fecha y hora de la transacción:[\s\S]*?]*>\s*" + r"(\d{1,2}/\d{1,2}/\d{4}\s*-\s*\d{2}:\d{2}\s*hrs)" + r"\s*", + text, + ) + if not date_match: + return None + + date_str = date_match.group(1) + cleaned = date_str.replace("hrs", "").strip() + try: + return date_parser(cleaned, dayfirst=True) + except (ValueError, TypeError, OverflowError) as e: + logger.error("Failed to parse date: %s -> %s", date_str, e) + return None + +def _determine_card_type(self, text: str) -> str: + """Determine if purchase was with credit or debit card.""" + card_type_match = re.search( + r"con tu (Credito|Debito|Crédito|Débito|Crédito|Débito)", + text, + ) + if not card_type_match: + return "" + + card = unidecode(card_type_match.group(1).strip().lower()) + + if card == "debito": + return "debit_card_purchase" + elif card == "credito": + return "credit_card_purchase" + + return "" +``` + +## Acceptance Criteria + +- [ ] Extract `_extract_purchase_amount()` method +- [ ] Extract `_extract_merchant_name()` method +- [ ] Extract `_extract_transaction_date()` method +- [ ] Extract `_determine_card_type()` method +- [ ] Update `_parse_credit_card_purchase()` to use new methods +- [ ] Reduce main method to ~20 lines (orchestration only) +- [ ] Add unit tests for each extraction method +- [ ] Verify all existing functionality still works + +## Benefits + +- Each method has a single, clear responsibility +- Easier to test individual components +- Better error handling at each step +- More reusable code +- Clearer method names document the process diff --git a/docs/issues/05-feature-envy-parser-return-type.md b/docs/issues/05-feature-envy-parser-return-type.md new file mode 100644 index 0000000..44e039f --- /dev/null +++ b/docs/issues/05-feature-envy-parser-return-type.md @@ -0,0 +1,79 @@ +# Code Smell: Feature Envy in Parser Return Type + +**Labels:** `code-quality`, `refactoring`, `low-priority`, `design` +**Type:** Design Smell + +## Description + +The `get_parser_for_email` method in `ParserHelper` returns a union of specific parser types instead of the base `BaseBankParser` type, creating tight coupling and violating the Open/Closed Principle. + +## Location + +- `core/parsers/parser_helper.py` (lines 33-51) + +## Current Code + +```python +@staticmethod +def get_parser_for_email( + from_header: str, +) -> HeyBancoParser | NubankParser | RappiParser | BanorteParser | None: + """Determine the appropriate parser instance...""" + from_lower = from_header.lower() + for bank, emails in bank_emails.items(): + if any(email.lower() in from_lower for email in emails): + return PARSERS.get(bank) +``` + +## Impact + +- **Tight Coupling:** Return type explicitly lists all concrete parser types +- **Maintenance Burden:** Every new parser requires updating this return type annotation +- **Open/Closed Violation:** Class must be modified when extended +- **Unnecessary Detail:** Callers don't need to know about specific implementations + +## Proposed Solution + +Change return type to base class: + +```python +@staticmethod +def get_parser_for_email(from_header: str) -> BaseBankParser | None: + """ + Determine the appropriate parser instance based on the email's From header. + + The matching is case-insensitive and checks if any of the known sender + email addresses for a bank appear in the From header. + + Args: + from_header: The raw From: header value from the email + + Returns: + An instantiated parser for the matching bank, or None if no match is found + """ + from_lower = from_header.lower() + for bank, emails in bank_emails.items(): + if any(email.lower() in from_lower for email in emails): + return PARSERS.get(bank) + + return None # Explicit return (fixes #6) +``` + +## Benefits + +- **Loose Coupling:** Callers work with base interface +- **Open/Closed Compliance:** Can add new parsers without changing this method +- **Type Safety:** Still maintains type checking through base class +- **Simplicity:** Cleaner, simpler type annotation + +## Acceptance Criteria + +- [ ] Change return type to `BaseBankParser | None` +- [ ] Add explicit `return None` statement +- [ ] Verify type checking still works +- [ ] Update any code that depends on specific parser types (unlikely) +- [ ] Run all tests to ensure no regressions + +## Related Issues + +- #6 (Missing explicit return statement) diff --git a/docs/issues/06-missing-explicit-return.md b/docs/issues/06-missing-explicit-return.md new file mode 100644 index 0000000..8a0e9c8 --- /dev/null +++ b/docs/issues/06-missing-explicit-return.md @@ -0,0 +1,76 @@ +# Code Smell: Missing Explicit Return Statement + +**Labels:** `code-quality`, `refactoring`, `low-priority` +**Type:** Code Clarity + +## Description + +The `get_parser_for_email` method in `ParserHelper` doesn't have an explicit `return None` statement at the end, relying on Python's implicit None return. + +## Location + +- `core/parsers/parser_helper.py` (line 51, end of method) + +## Current Code + +```python +@staticmethod +def get_parser_for_email( + from_header: str, +) -> HeyBancoParser | NubankParser | RappiParser | BanorteParser | None: + """Determine the appropriate parser instance...""" + from_lower = from_header.lower() + for bank, emails in bank_emails.items(): + if any(email.lower() in from_lower for email in emails): + return PARSERS.get(bank) + # Missing explicit return None +``` + +## Impact + +- **Code Clarity:** Implicit behavior is less obvious +- **Maintainability:** May confuse developers unfamiliar with Python's implicit None return +- **Best Practice:** PEP 8 recommends explicit returns when a function can return None +- **Intent:** Doesn't clearly communicate "no parser found" case + +## Proposed Solution + +Add explicit return statement at the end: + +```python +@staticmethod +def get_parser_for_email(from_header: str) -> BaseBankParser | None: + """ + Determine the appropriate parser instance based on the email's From header. + + Args: + from_header: The raw From: header value from the email + + Returns: + An instantiated parser for the matching bank, or None if no match is found + """ + from_lower = from_header.lower() + for bank, emails in bank_emails.items(): + if any(email.lower() in from_lower for email in emails): + return PARSERS.get(bank) + + return None # ← Explicit: No matching parser found +``` + +## Acceptance Criteria + +- [ ] Add explicit `return None` statement +- [ ] Verify all tests still pass +- [ ] Consider adding a log message for debugging: + ```python + logger.debug("No parser found for email from: %s", from_header) + return None + ``` + +## Priority + +**Low** - This is a minor code clarity issue. While it's good practice to be explicit, the current code works correctly. + +## Related Issues + +- #5 (Feature Envy in Parser Return Type) - Should be fixed together diff --git a/docs/issues/07-dead-code-paypal.md b/docs/issues/07-dead-code-paypal.md new file mode 100644 index 0000000..606210f --- /dev/null +++ b/docs/issues/07-dead-code-paypal.md @@ -0,0 +1,126 @@ +# Code Smell: Dead Code (PAYPAL Bank Definition) + +**Labels:** `code-quality`, `cleanup`, `low-priority` +**Type:** Dead Code + +## Description + +`SupportedBanks.PAYPAL` is defined in the enum and has email addresses configured in `bank_emails`, but the `PayPalParser` is commented out in the parsers dictionary. This creates confusion about whether PayPal is actually supported. + +## Locations + +- `constants/banks.py` (line 22) - PAYPAL enum value defined +- `constants/banks.py` (line 38) - PAYPAL email addresses configured +- `core/parsers/parser_helper.py` (line 24) - PayPalParser commented out + +## Current Code + +```python +# In constants/banks.py +class SupportedBanks(StrEnum): + HEY_BANCO = "hey_banco" + NUBANK = "nubank" + RAPPI = "rappi" + PAYPAL = "paypal" # ← Defined but not used + BANORTE = "banorte" + +bank_emails = { + # ... other banks + SupportedBanks.PAYPAL: ["service@paypal.com.mx"], # ← Configured but not used + # ... +} + +# In core/parsers/parser_helper.py +PARSERS = { + SupportedBanks.HEY_BANCO: HeyBancoParser(), + SupportedBanks.NUBANK: NubankParser(), + SupportedBanks.RAPPI: RappiParser(), + # SupportedBanks.PAYPAL: PayPalParser(), # ← Commented out + SupportedBanks.BANORTE: BanorteParser(), +} +``` + +## Impact + +- **Confusion:** Unclear whether PayPal is supported or not +- **Code Bloat:** Unused constants taking up space +- **Potential Bugs:** If someone enables PAYPAL thinking it's ready +- **Documentation:** README doesn't mention PayPal as unsupported + +## Options for Resolution + +### Option 1: Remove if not ready +If PayPal support is not implemented yet, remove it entirely: + +```python +# Remove from constants/banks.py +class SupportedBanks(StrEnum): + HEY_BANCO = "hey_banco" + NUBANK = "nubank" + RAPPI = "rappi" + # PAYPAL can be added when ready + BANORTE = "banorte" + +# Remove from bank_emails +bank_emails = { + # ... (no PAYPAL entry) +} +``` + +### Option 2: Implement if intended +If PayPal support is intended, implement the parser: + +```python +# Create core/parsers/paypal.py +class PayPalParser(BaseBankParser): + bank_name = SupportedBanks.PAYPAL + + def parse(self, email_message, email_id: str) -> Transaction | None: + # Implementation + pass + +# Uncomment in parser_helper.py +PARSERS = { + # ... + SupportedBanks.PAYPAL: PayPalParser(), +} +``` + +### Option 3: Mark as planned +If PayPal support is planned but not ready, add a comment: + +```python +# In constants/banks.py +class SupportedBanks(StrEnum): + HEY_BANCO = "hey_banco" + NUBANK = "nubank" + RAPPI = "rappi" + PAYPAL = "paypal" # TODO: Implement PayPalParser (planned) + BANORTE = "banorte" + +# In parser_helper.py +PARSERS = { + SupportedBanks.HEY_BANCO: HeyBancoParser(), + SupportedBanks.NUBANK: NubankParser(), + SupportedBanks.RAPPI: RappiParser(), + # TODO: Implement PayPalParser + # SupportedBanks.PAYPAL: PayPalParser(), + SupportedBanks.BANORTE: BanorteParser(), +} +``` + +## Recommended Approach + +**Option 1** (Remove) is recommended unless PayPal implementation is actively being worked on. It reduces confusion and keeps the codebase clean. + +## Acceptance Criteria + +- [ ] Decide on approach (remove, implement, or document as planned) +- [ ] Update `constants/banks.py` accordingly +- [ ] Update `parser_helper.py` accordingly +- [ ] Update README.md if needed to reflect PayPal status +- [ ] Remove or add PayPal from "próximos bancos" section in README + +## Additional Notes + +The README mentions PayPal in the "Soporte futuro para más bancos" section as a planned bank, so Option 3 (mark as planned) might be most appropriate. diff --git a/docs/issues/08-variable-shadowing-bug.md b/docs/issues/08-variable-shadowing-bug.md new file mode 100644 index 0000000..e83b8b3 --- /dev/null +++ b/docs/issues/08-variable-shadowing-bug.md @@ -0,0 +1,58 @@ +# Bug: Variable Shadowing (Duplicate Assignment) + +**Labels:** `bug`, `high-priority`, `easy-fix` +**Type:** Code Error + +## Description + +Line 78 in `hey_banco.py` contains a duplicate assignment: `amount = amount = float(...)`. This is clearly a typo that should be fixed. + +## Location + +- `core/parsers/hey_banco.py` (line 78) + +## Current Code + +```python +amount_match = re.search( + r"Cantidad\s*\s*]*>([0-9]+(?:\.[0-9]+)?)", text +) + +if amount_match: + amount = amount = float(amount_match.group(1).replace(",", "")) + # ^^^^^^^^ Duplicate assignment +``` + +## Impact + +- **Code Quality:** Indicates careless coding or copy-paste error +- **Functional Impact:** While it still works, it's misleading +- **Code Review:** May confuse reviewers and future maintainers +- **Professional:** Makes the codebase look less polished + +## Proposed Solution + +Remove the duplicate assignment: + +```python +if amount_match: + amount = float(amount_match.group(1).replace(",", "")) +``` + +## Acceptance Criteria + +- [ ] Remove duplicate assignment on line 78 +- [ ] Verify the parser still works correctly +- [ ] Run any existing tests for HeyBancoParser +- [ ] Check for similar issues in other files (there's another similar case on line 117) + +## Additional Findings + +While investigating, found a similar duplicate assignment: +- Line 117: `amount = amount = float(amount_match.group(1).replace(",", ""))` + +Both should be fixed in the same PR. + +## Priority + +**High** - This is an obvious bug that should be fixed quickly. While it doesn't break functionality, it indicates code quality issues and is very easy to fix. diff --git a/docs/issues/09-double-session-commit.md b/docs/issues/09-double-session-commit.md new file mode 100644 index 0000000..e997aab --- /dev/null +++ b/docs/issues/09-double-session-commit.md @@ -0,0 +1,156 @@ +# Code Smell: Double Session Commit + +**Labels:** `code-quality`, `performance`, `medium-priority`, `database` +**Type:** Design Issue + +## Description + +The `save_transaction` method commits the database session twice: once for the bank entity (line 33) and once for the transaction (line 50). This is inefficient and breaks transaction atomicity. + +## Location + +- `core/services/transaction_service.py` (lines 33, 50) + +## Current Code + +```python +def save_transaction(self, transaction: TransactionCreate) -> Optional[int]: + """Save a transaction to the database.""" + with self.db.session() as session: + try: + stmt = select(Bank).where(Bank.name == transaction.bank_name) + bank = session.exec(stmt).first() + + if bank is None: + bank = Bank(name=transaction.bank_name) + session.add(bank) + session.commit() # ← First commit + session.refresh(bank) + + tx = Transaction( + bank_id=bank.id, + # ... other fields + ) + + session.add(tx) + session.commit() # ← Second commit + session.refresh(tx) + # ... +``` + +## Impact + +- **Performance:** Two database commits when one would suffice +- **Atomicity:** Bank and transaction aren't saved atomically +- **Failure Scenario:** If transaction save fails, bank is already committed +- **Database Load:** Extra roundtrips to database + +## Proposed Solution + +Use `session.flush()` to get the bank ID without committing: + +```python +def save_transaction(self, transaction: TransactionCreate) -> Optional[int]: + """Save a transaction to the database.""" + with self.db.session() as session: + try: + stmt = select(Bank).where(Bank.name == transaction.bank_name) + bank = session.exec(stmt).first() + + if bank is None: + bank = Bank(name=transaction.bank_name) + session.add(bank) + session.flush() # ← Flush to get ID, don't commit yet + # No need to refresh, flush makes ID available + + tx = Transaction( + bank_id=bank.id, + email_id=transaction.email_id, + date=transaction.date, + amount=transaction.amount, + category_id=None, + subcategory_id=None, + description=transaction.description, + type=transaction.type, + merchant=transaction.merchant, + reference=transaction.reference, + ) + + session.add(tx) + # Let the context manager handle commit + # This commits both bank and transaction atomically + + except SQLAlchemyError as e: + # Rollback will undo both bank and transaction + session.rollback() + logger.error("Database error during save: %s", e, exc_info=True) + return None +``` + +## Better Alternative + +The context manager in `database.py` already handles commits. We can simplify further: + +```python +def save_transaction(self, transaction: TransactionCreate) -> Optional[int]: + """Save a transaction to the database.""" + with self.db.session() as session: + # Get or create bank + stmt = select(Bank).where(Bank.name == transaction.bank_name) + bank = session.exec(stmt).first() + + if bank is None: + bank = Bank(name=transaction.bank_name) + session.add(bank) + session.flush() # Get ID without committing + + # Create transaction + tx = Transaction( + bank_id=bank.id, + email_id=transaction.email_id, + date=transaction.date, + amount=transaction.amount, + category_id=None, + subcategory_id=None, + description=transaction.description, + type=transaction.type, + merchant=transaction.merchant, + reference=transaction.reference, + ) + + session.add(tx) + session.flush() # Get transaction ID + + logger.info( + "Transaction added [ID: %s] | %s | %s | %s | Category: %s | Subcategory: %s", + tx.transaction_id, + tx.amount, + tx.date, + tx.description, + tx.category_id, + tx.subcategory_id, # Fixed in #10 + ) + + return tx.transaction_id + # Context manager commits here (single commit for both) +``` + +## Acceptance Criteria + +- [ ] Replace `session.commit()` calls with `session.flush()` +- [ ] Remove explicit commits; rely on context manager +- [ ] Verify both bank and transaction are still saved +- [ ] Test rollback behavior on error +- [ ] Confirm performance improvement (optional) +- [ ] Update any related documentation + +## Benefits + +- **Atomicity:** Bank and transaction saved in single transaction +- **Performance:** One commit instead of two +- **Consistency:** Better error handling and rollback behavior +- **Cleaner Code:** Removes manual commit management + +## Related Issues + +- #10 (Logging bug with category_id) - Same method diff --git a/docs/issues/10-logging-bug-duplicate-category.md b/docs/issues/10-logging-bug-duplicate-category.md new file mode 100644 index 0000000..474d6f5 --- /dev/null +++ b/docs/issues/10-logging-bug-duplicate-category.md @@ -0,0 +1,62 @@ +# Bug: Logging Duplicate category_id + +**Labels:** `bug`, `low-priority`, `easy-fix` +**Type:** Code Error + +## Description + +Line 60 in `transaction_service.py` logs `category_id` twice instead of logging both `category_id` and `subcategory_id`. The subcategory information is never logged. + +## Location + +- `core/services/transaction_service.py` (line 60) + +## Current Code + +```python +logger.info( + "Transaction added [ID: %s] | %s | %s | %s | Category: %s | Subcategory: %s", + tx.transaction_id, + tx.amount, + tx.date, + tx.description, + tx.category_id, + tx.category_id, # ← Should be tx.subcategory_id +) +``` + +## Impact + +- **Incorrect Logging:** Subcategory information never appears in logs +- **Debugging:** Makes it harder to debug category/subcategory issues +- **Misleading:** Log message claims to show subcategory but shows category twice + +## Proposed Solution + +Fix the typo by changing the last parameter: + +```python +logger.info( + "Transaction added [ID: %s] | %s | %s | %s | Category: %s | Subcategory: %s", + tx.transaction_id, + tx.amount, + tx.date, + tx.description, + tx.category_id, + tx.subcategory_id, # ← Fixed +) +``` + +## Acceptance Criteria + +- [ ] Change second `tx.category_id` to `tx.subcategory_id` +- [ ] Test that logging works correctly +- [ ] Verify log output shows correct values + +## Priority + +**Low** - This is a minor bug that only affects logging output. However, it's a very easy fix that should be included in any PR touching this file. + +## Related Issues + +- #9 (Double session commit) - Same method, can be fixed together diff --git a/docs/issues/11-data-clump-email-message.md b/docs/issues/11-data-clump-email-message.md new file mode 100644 index 0000000..4a9a66a --- /dev/null +++ b/docs/issues/11-data-clump-email-message.md @@ -0,0 +1,132 @@ +# Code Smell: Data Clump (email_message Dictionary) + +**Labels:** `code-quality`, `refactoring`, `medium-priority`, `design` +**Type:** Data Clump + +## Description + +The `email_message` dictionary is passed around throughout the codebase with inconsistent key access patterns. Different parsers access different combinations of keys (`body_html`, `body_plain`, `subject`, `date`), and there's no clear data contract or validation. + +## Locations + +- `core/fetch_emails.py` - Creates the dict (lines 48-56) +- All parser files - Access various keys inconsistently +- No formal data structure definition + +## Current Pattern + +```python +# Created in fetch_emails.py +payload = { + "id": msg_id, + "subject": email_msg.get("Subject", ""), + "from": email_msg.get("From", ""), + "to": email_msg.get("To", ""), + "date": email_msg.get("Date", ""), + "body_plain": "", + "body_html": "", +} + +# Used inconsistently in parsers +subject = email_message.get("subject", "") # Different parsers, different keys +body = email_message.get("body_html", "") +# Sometimes body_plain, sometimes body_html, sometimes both +``` + +## Impact + +- **No Contract:** Unclear what keys are guaranteed to exist +- **KeyError Risk:** Prone to errors if dict structure changes +- **Inconsistency:** Different parsers have different access patterns +- **Testing Difficulty:** Must mock dictionaries with exact keys +- **Type Safety:** No type checking on dictionary keys +- **Documentation:** Structure not formally documented + +## Proposed Solution + +Create an `EmailMessage` dataclass or Pydantic model: + +```python +# core/models/email_message.py +from dataclasses import dataclass +from datetime import datetime + +@dataclass +class EmailMessage: + """Structured representation of a parsed email message.""" + + id: str + subject: str + from_address: str + date: str + to: str = "" + body_plain: str = "" + body_html: str = "" + + @property + def body(self) -> str: + """Get email body, preferring HTML if available.""" + return self.body_html or self.body_plain + + @property + def body_text(self) -> str: + """Get email body, preferring plain text if available.""" + return self.body_plain or self.body_html +``` + +Or using Pydantic for validation: + +```python +from pydantic import BaseModel, Field + +class EmailMessage(BaseModel): + """Structured representation of a parsed email message.""" + + id: str = Field(..., description="Gmail message ID") + subject: str = Field(default="", description="Email subject line") + from_address: str = Field(..., alias="from", description="Sender email address") + date: str = Field(default="", description="Email date header") + to: str = Field(default="", description="Recipient email address") + body_plain: str = Field(default="", description="Plain text email body") + body_html: str = Field(default="", description="HTML email body") + + class Config: + populate_by_name = True + + @property + def body(self) -> str: + """Get email body, preferring HTML if available.""" + return self.body_html or self.body_plain +``` + +## Implementation Steps + +1. Create `EmailMessage` model +2. Update `parse_email()` to return `EmailMessage` instead of dict +3. Update all parsers to work with `EmailMessage` objects +4. Add helper properties for common access patterns +5. Update type hints throughout codebase + +## Acceptance Criteria + +- [ ] Create `EmailMessage` dataclass or Pydantic model +- [ ] Update `parse_email()` return type +- [ ] Update all parser signatures to accept `EmailMessage` +- [ ] Update parser implementations to use model attributes +- [ ] Add helper properties for body selection +- [ ] Update all type hints +- [ ] Add tests for the new model +- [ ] Verify all parsers still work correctly + +## Benefits + +- **Type Safety:** Compile-time checking of attribute access +- **Clear Contract:** Documented structure with types +- **Better IDE Support:** Autocomplete for attributes +- **Validation:** Can add validation rules with Pydantic +- **Consistency:** Single source of truth for structure +- **Refactoring:** Easier to change structure in one place + +## Related Issues + +- #14 (Inconsistent body selection logic) - Would be addressed by helper properties diff --git a/docs/issues/12-class-variable-as-instance.md b/docs/issues/12-class-variable-as-instance.md new file mode 100644 index 0000000..248e2dc --- /dev/null +++ b/docs/issues/12-class-variable-as-instance.md @@ -0,0 +1,115 @@ +# Code Smell: Class Variable Used as Instance Variable + +**Labels:** `code-quality`, `refactoring`, `low-priority`, `design` +**Type:** Design Issue + +## Description + +The `Database` class defines `engine` as a class variable (line 29) but then assigns it as an instance variable in `__init__`. This is confusing and could lead to unexpected behavior if multiple `Database` instances are created. + +## Location + +- `database/database.py` (lines 29, 41) + +## Current Code + +```python +class Database: + """SQLite database handler for storing and managing financial transactions.""" + + engine = None # ← Class variable declaration + + def __init__(self, db_url="sqlite:///expenses.db"): + """Initialize database connection and ensure schema exists.""" + try: + self.engine = create_engine( # ← Assigned as instance variable + url=db_url, + echo=False, + ) + # ... +``` + +## Impact + +- **Confusing:** Appears to be a class variable but is actually used as instance variable +- **Misleading:** Suggests engine might be shared across instances +- **Best Practice Violation:** Not following Python conventions +- **Potential Bugs:** Could cause issues if: + - Multiple Database instances created with different URLs + - Code tries to access Database.engine directly + - Inheritance is used + +## Example of Potential Issue + +```python +# This could be confusing +db1 = Database("sqlite:///db1.db") +db2 = Database("sqlite:///db2.db") + +# What is Database.engine? It could be: +# - None (the class variable) +# - db2.engine (if last assignment to class var) +# - Undefined behavior +``` + +## Proposed Solution + +Remove the class variable declaration and use only instance variables: + +```python +class Database: + """SQLite database handler for storing and managing financial transactions. + + Creates and maintains the required tables (sources, categories, subcategories, + transactions) with proper foreign key constraints. + + Supports context manager protocol for safe connection handling. + """ + + def __init__(self, db_url="sqlite:///expenses.db"): + """Initialize database connection and ensure schema exists. + + Args: + db_url: URL to the database. Defaults to "sqlite:///expenses.db". + + Raises: + sqlite3.Error: If connection or schema creation fails. + """ + try: + self.engine = create_engine( + url=db_url, + echo=False, + ) + + from models.bank import Bank + from models.category import Category + from models.subcategory import Subcategory + from models.transaction import Transaction + + SQLModel.metadata.create_all(self.engine) + logger.info("Database initialized: %s", db_url) + except Exception as e: + logger.error("Database initialization failed: %s", e) + raise +``` + +## Alternative Solution + +If the intent was to share a single engine across instances (not recommended for SQLite), properly implement it as a singleton pattern or class method. + +## Acceptance Criteria + +- [ ] Remove `engine = None` class variable declaration +- [ ] Verify engine is properly initialized as instance variable +- [ ] Test with single Database instance +- [ ] Test with multiple Database instances (if applicable) +- [ ] Ensure no code accesses `Database.engine` directly +- [ ] Run all tests to ensure no regressions + +## Priority + +**Low** - This doesn't cause functional issues in the current codebase since only one Database instance is typically created. However, it's a code smell that should be addressed for clarity and maintainability. + +## Additional Context + +The current usage in `main.py` creates only one Database instance, so this doesn't cause actual problems. However, fixing it improves code quality and prevents future issues. diff --git a/docs/issues/13-inconsistent-error-handling.md b/docs/issues/13-inconsistent-error-handling.md new file mode 100644 index 0000000..b8f907b --- /dev/null +++ b/docs/issues/13-inconsistent-error-handling.md @@ -0,0 +1,196 @@ +# Code Smell: Inconsistent Error Handling + +**Labels:** `code-quality`, `refactoring`, `medium-priority`, `error-handling` +**Type:** Error Handling Inconsistency + +## Description + +Error handling is inconsistent throughout the codebase. Different methods use different patterns for handling and reporting errors, making it difficult to understand the error handling contract and debug issues. + +## Examples of Inconsistency + +### 1. Return Values on Error + +```python +# Some methods return None +def _parse_credit_card_payment(self, text, email_id) -> Transaction | None: + # ... + if amount_match: + try: + amount = float(amount_match.group(1).replace(",", "")) + except ValueError: + logger.error("Failed to parse amount: %s", amount_match.group(1)) + return None # ← Returns None on error + +# Some methods return placeholder values +def _decode_payload(part): + try: + # ... decoding logic + except (TypeError, ValueError, AttributeError) as e: + logger.error("Error decoding email payload: %s", e) + return "[Error al decodificar el cuerpo del email]" # ← Returns placeholder + +# Some return Optional[int] +def save_transaction(self, transaction: TransactionCreate) -> Optional[int]: + try: + # ... + return tx.transaction_id + except SQLAlchemyError as e: + session.rollback() + logger.error(...) + return None # ← Returns None on error +``` + +### 2. Exception Catching Patterns + +```python +# Some catch broad exceptions +except Exception as e: + logger.error("An error occurred: %s", e, exc_info=True) + raise + +# Some catch specific exceptions +except (ValueError, TypeError, OverflowError) as e: + logger.error("Failed to parse date: %s -> %s", date_str, e) + +# Some catch multiple specific exceptions separately +except SQLAlchemyError as e: + # ... +except ValueError as e: + # ... +``` + +### 3. Logging Patterns + +```python +# Some include exc_info +logger.error("An error occurred: %s", e, exc_info=True) + +# Some don't +logger.error("Failed to parse amount: %s", amount_match.group(1)) + +# Some log and continue, others log and return/raise +``` + +## Impact + +- **Unclear Contract:** Difficult to know what to expect on error +- **Debugging Difficulty:** Inconsistent error information +- **Error Hiding:** Some errors might be silently swallowed +- **User Experience:** No consistent error handling for users +- **Testing:** Hard to test error conditions consistently + +## Proposed Solution + +Define a consistent error handling strategy: + +### 1. Create Custom Exception Types + +```python +# core/exceptions.py +class ExpenseTrackerError(Exception): + """Base exception for expense tracker.""" + pass + +class ParsingError(ExpenseTrackerError): + """Error parsing email content.""" + pass + +class DatabaseError(ExpenseTrackerError): + """Error interacting with database.""" + pass + +class AuthenticationError(ExpenseTrackerError): + """Error authenticating with Gmail.""" + pass +``` + +### 2. Define Error Handling Patterns + +```python +# For parser methods: Return None on failure, log warning +def _parse_credit_card_payment(self, text, email_id) -> Transaction | None: + try: + amount = self._extract_amount(text) + if amount is None: + logger.warning("Could not extract amount from email %s", email_id) + return None + # ... continue parsing + except ParsingError as e: + logger.warning("Failed to parse credit card payment: %s", e) + return None + +# For service methods: Return Result type or raise exception +from typing import Union, TypeVar, Generic + +T = TypeVar('T') + +class Result(Generic[T]): + """Result type for operations that can fail.""" + + def __init__(self, value: T | None = None, error: Exception | None = None): + self.value = value + self.error = error + + @property + def is_success(self) -> bool: + return self.error is None + + @property + def is_failure(self) -> bool: + return self.error is not None + +def save_transaction(self, transaction: TransactionCreate) -> Result[int]: + try: + # ... save logic + return Result(value=tx.transaction_id) + except SQLAlchemyError as e: + logger.error("Database error: %s", e, exc_info=True) + return Result(error=DatabaseError(f"Failed to save transaction: {e}")) +``` + +### 3. Document Error Handling in Docstrings + +```python +def parse(self, email_message, email_id: str) -> Transaction | None: + """Parse an email message to extract a transaction. + + Args: + email_message: The parsed email message dict + email_id: The unique identifier of the email + + Returns: + Transaction object if parsing succeeds, None otherwise. + + Note: + This method logs warnings for parsing failures and returns None + rather than raising exceptions. This allows the main loop to + continue processing other emails even if one fails. + """ +``` + +## Recommended Strategy + +1. **Parser Methods:** Return `None` on failure, log at WARNING level +2. **Service Methods:** Use Result type or raise custom exceptions +3. **Main Loop:** Catch and log exceptions, continue processing +4. **Critical Errors:** Let exceptions propagate (auth, database connection) +5. **Logging:** Always use `exc_info=True` for ERROR level + +## Acceptance Criteria + +- [ ] Define custom exception types +- [ ] Document error handling strategy in CONTRIBUTING.md +- [ ] Update docstrings to document error behavior +- [ ] Consider implementing Result type for service layer +- [ ] Add consistent logging patterns +- [ ] Update tests to verify error handling + +## Priority + +**Medium** - While the current code works, inconsistent error handling makes maintenance and debugging harder. This should be addressed as part of a refactoring effort. + +## Related Issues + +- #4 (Long method) - Refactoring provides opportunity to improve error handling +- #11 (Data clump) - New models can include validation and better error handling diff --git a/docs/issues/14-inconsistent-body-selection.md b/docs/issues/14-inconsistent-body-selection.md new file mode 100644 index 0000000..03de9d0 --- /dev/null +++ b/docs/issues/14-inconsistent-body-selection.md @@ -0,0 +1,194 @@ +# Bug: Inconsistent Body Selection Logic + +**Labels:** `bug`, `code-quality`, `medium-priority` +**Type:** Code Inconsistency / Bug + +## Description + +Different parsers have inconsistent logic for choosing between `body_html` and `body_plain` from email messages. Most notably, the Rappi parser has a clear bug where it fetches `body_plain` twice instead of having a fallback to HTML. + +## Locations and Patterns + +### 1. HeyBancoParser (Correct Pattern) +```python +# core/parsers/hey_banco.py (lines 46-49) +body = email_message.get("body_html", "") + +if not body: + body = email_message.get("body_plain", "") +``` +✅ Prefers HTML, falls back to plain + +### 2. RappiParser (BUG - Fetches plain twice) +```python +# core/parsers/rappi.py (lines 52-55) +body = email_message.get("body_plain", "") + +if not body: + body = email_message.get("body_plain", "") # ← BUG: Should be body_html +``` +❌ Bug: Same key accessed twice, no actual fallback + +### 3. NubankParser (Only uses plain) +```python +# core/parsers/nubank.py (lines 63-64) +body = email_message.get("body_plain", "") +date = email_message.get("date", "") +``` +⚠️ Never tries HTML, even if plain is empty + +### 4. BanorteParser (Correct Pattern) +```python +# core/parsers/banorte.py (lines 80-83) +body = email_message.get("body_html", "") + +if not body: + body = email_message.get("body_plain", "") +``` +✅ Prefers HTML, falls back to plain + +## Impact + +- **Rappi Parser Bug:** Will fail to parse emails that only have HTML body +- **NuBank Limitation:** Cannot parse emails that only have HTML body +- **Inconsistency:** Different parsers have different expectations +- **Testing Difficulty:** Tests need to account for different behaviors +- **Maintenance:** No clear standard for body selection + +## Proposed Solution + +### Option 1: Fix Individual Bugs +Minimal change to fix the immediate bugs: + +```python +# In rappi.py - Fix the duplicate key access +body = email_message.get("body_plain", "") + +if not body: + body = email_message.get("body_html", "") # ← Fixed + +# In nubank.py - Add fallback to HTML +body = email_message.get("body_plain", "") + +if not body: + body = email_message.get("body_html", "") # ← Added fallback +``` + +### Option 2: Implement Consistent Strategy in Base Parser + +```python +# core/parsers/base_parser.py +class BaseBankParser(ABC): + """Abstract base class for bank-specific email parsers.""" + + bank_name = "generic" + + # Add preference for body type + prefer_html_body = True # Can be overridden by subclasses + + def get_email_body(self, email_message: dict) -> str: + """ + Get email body with consistent fallback logic. + + Args: + email_message: Parsed email message dict + + Returns: + Email body (HTML or plain text based on preference) + """ + if self.prefer_html_body: + body = email_message.get("body_html", "") + if not body: + body = email_message.get("body_plain", "") + else: + body = email_message.get("body_plain", "") + if not body: + body = email_message.get("body_html", "") + + return body + + @abstractmethod + def parse(self, email_message, email_id: str) -> Transaction | None: + """Parse an email message to extract a transaction.""" +``` + +Then update parsers: + +```python +# In each parser's parse method +body = self.get_email_body(email_message) + +# Or override preference if needed +class NubankParser(BaseBankParser): + prefer_html_body = False # NuBank emails work better with plain text +``` + +### Option 3: Use EmailMessage Model Helper + +If implementing #11 (EmailMessage model), add helper property: + +```python +@dataclass +class EmailMessage: + # ... fields ... + + def get_body(self, prefer_html: bool = True) -> str: + """Get email body with specified preference.""" + if prefer_html: + return self.body_html or self.body_plain + return self.body_plain or self.body_html +``` + +## Recommended Approach + +**Option 2** (Base parser method) provides: +- Immediate bug fixes +- Consistent behavior across parsers +- Flexibility for parser-specific preferences +- Easier testing and maintenance + +## Acceptance Criteria + +### Immediate Fixes (High Priority) +- [ ] Fix Rappi parser bug (line 55) +- [ ] Add HTML fallback to NuBank parser +- [ ] Test both parsers with HTML-only and plain-only emails + +### Long-term Improvement (Medium Priority) +- [ ] Implement `get_email_body()` method in base parser +- [ ] Update all parsers to use base parser method +- [ ] Add tests for body selection logic +- [ ] Document body selection strategy + +## Testing + +Add tests to verify: +```python +def test_rappi_parser_with_html_only_email(): + """Rappi parser should handle HTML-only emails.""" + email = { + "body_html": "Test content", + "body_plain": "", + # ... + } + # Should not fail + +def test_nubank_parser_with_html_only_email(): + """NuBank parser should handle HTML-only emails.""" + email = { + "body_html": "Test content", + "body_plain": "", + # ... + } + # Should not fail +``` + +## Priority + +**High** for Rappi bug fix (clear logic error) +**Medium** for overall consistency improvement + +## Related Issues + +- #11 (Data Clump) - EmailMessage model could help standardize this +- #13 (Inconsistent Error Handling) - Part of broader consistency problem diff --git a/docs/issues/README.md b/docs/issues/README.md new file mode 100644 index 0000000..d7b5398 --- /dev/null +++ b/docs/issues/README.md @@ -0,0 +1,199 @@ +# Code Smell Issues Documentation + +This directory contains detailed documentation for 14 code smells identified in the expense-tracker codebase. + +## Quick Start - Creating GitHub Issues + +Each file in this directory can be used to create a GitHub issue. Here's how: + +### Automated Approach (Recommended) + +Use the GitHub CLI to create issues from these files: + +```bash +# For each issue file, create a GitHub issue +cd docs/issues + +# Example for issue #1 +gh issue create --title "Code Smell: Duplicate Month Mapping Dictionaries" \ + --body-file 01-duplicate-month-mappings.md \ + --label "code-quality,refactoring,medium-priority" +``` + +### Manual Approach + +1. Go to https://github.com/javrr-ui/expense-tracker/issues/new +2. Open an issue file (e.g., `01-duplicate-month-mappings.md`) +3. Copy the title (first # heading) +4. Copy the body content +5. Add labels as specified in the file +6. Create the issue + +## Issue Priority Summary + +### High Priority (Fix Immediately) +- **#8** - Variable Shadowing Bug - Clear typo/bug in amount assignment + +### Medium Priority (Address Soon) +- **#1** - Duplicate Month Mappings - 4 files have duplicate code +- **#2** - Duplicate Date Normalization - Same logic in 2 methods +- **#4** - Long Method - 67-line method with multiple responsibilities +- **#9** - Double Session Commit - Performance and atomicity issue +- **#11** - Data Clump - email_message dict used inconsistently +- **#13** - Inconsistent Error Handling - No unified error strategy +- **#14** - Inconsistent Body Selection (includes bug) - Rappi parser has bug + +### Low Priority (Nice to Have) +- **#3** - Magic Number - Hardcoded 500 without constant +- **#5** - Feature Envy - Return type too specific +- **#6** - Missing Explicit Return - Implicit None return +- **#7** - Dead Code - PAYPAL defined but not implemented +- **#10** - Logging Bug - category_id logged twice +- **#12** - Class Variable as Instance - Confusing variable declaration + +## Quick Wins (Easy Fixes) + +These issues are straightforward to fix and would make good first contributions: + +1. **#8** - Variable Shadowing (2 lines to fix) +2. **#10** - Logging Bug (1 line to fix) +3. **#3** - Magic Number (add constant) +4. **#6** - Missing Return (add explicit return) + +## Related Issues + +Some issues should be addressed together: + +- **#1 + #2** - Both deal with date/month handling duplication +- **#5 + #6** - Both in the same method (`get_parser_for_email`) +- **#9 + #10** - Both in the same method (`save_transaction`) +- **#11 + #14** - Both deal with email_message structure + +## Statistics + +- **Total Issues:** 14 +- **Bugs:** 4 (High priority: 1, Low priority: 3) +- **Code Duplication:** 2 +- **Design Issues:** 5 +- **Code Quality:** 3 + +## Files in This Directory + +| File | Title | Priority | Type | +|------|-------|----------|------| +| 01-duplicate-month-mappings.md | Duplicate Month Mapping Dictionaries | Medium | Duplication | +| 02-duplicate-date-normalization.md | Duplicate Date Normalization Logic | Medium | Duplication | +| 03-magic-number-max-results.md | Magic Number (maxResults=500) | Low | Magic Number | +| 04-long-method-parse-credit-card.md | Long Method (_parse_credit_card_purchase) | Medium | Long Method | +| 05-feature-envy-parser-return-type.md | Feature Envy in Parser Return Type | Low | Design | +| 06-missing-explicit-return.md | Missing Explicit Return Statement | Low | Clarity | +| 07-dead-code-paypal.md | Dead Code (PAYPAL Bank Definition) | Low | Dead Code | +| 08-variable-shadowing-bug.md | Variable Shadowing (Duplicate Assignment) | High | Bug | +| 09-double-session-commit.md | Double Session Commit | Medium | Performance | +| 10-logging-bug-duplicate-category.md | Logging Duplicate category_id | Low | Bug | +| 11-data-clump-email-message.md | Data Clump (email_message Dictionary) | Medium | Design | +| 12-class-variable-as-instance.md | Class Variable Used as Instance Variable | Low | Design | +| 13-inconsistent-error-handling.md | Inconsistent Error Handling | Medium | Error Handling | +| 14-inconsistent-body-selection.md | Inconsistent Body Selection Logic | Medium | Bug | + +## Creating All Issues at Once + +If you want to create all issues at once using the GitHub CLI: + +```bash +#!/bin/bash +# create-all-issues.sh + +cd docs/issues + +# High Priority +gh issue create --title "Bug: Variable Shadowing (Duplicate Assignment)" \ + --body-file 08-variable-shadowing-bug.md \ + --label "bug,high-priority,easy-fix" + +# Medium Priority +gh issue create --title "Code Smell: Duplicate Month Mapping Dictionaries" \ + --body-file 01-duplicate-month-mappings.md \ + --label "code-quality,refactoring,medium-priority" + +gh issue create --title "Code Smell: Duplicate Date Normalization Logic" \ + --body-file 02-duplicate-date-normalization.md \ + --label "code-quality,refactoring,medium-priority" + +gh issue create --title "Code Smell: Long Method (_parse_credit_card_purchase)" \ + --body-file 04-long-method-parse-credit-card.md \ + --label "code-quality,refactoring,medium-priority" + +gh issue create --title "Code Smell: Double Session Commit" \ + --body-file 09-double-session-commit.md \ + --label "code-quality,performance,medium-priority,database" + +gh issue create --title "Code Smell: Data Clump (email_message Dictionary)" \ + --body-file 11-data-clump-email-message.md \ + --label "code-quality,refactoring,medium-priority,design" + +gh issue create --title "Code Smell: Inconsistent Error Handling" \ + --body-file 13-inconsistent-error-handling.md \ + --label "code-quality,refactoring,medium-priority,error-handling" + +gh issue create --title "Bug: Inconsistent Body Selection Logic" \ + --body-file 14-inconsistent-body-selection.md \ + --label "bug,code-quality,low-priority" + +# Low Priority +gh issue create --title "Code Smell: Magic Number (maxResults=500)" \ + --body-file 03-magic-number-max-results.md \ + --label "code-quality,refactoring,low-priority" + +gh issue create --title "Code Smell: Feature Envy in Parser Return Type" \ + --body-file 05-feature-envy-parser-return-type.md \ + --label "code-quality,refactoring,low-priority,design" + +gh issue create --title "Code Smell: Missing Explicit Return Statement" \ + --body-file 06-missing-explicit-return.md \ + --label "code-quality,refactoring,low-priority" + +gh issue create --title "Code Smell: Dead Code (PAYPAL Bank Definition)" \ + --body-file 07-dead-code-paypal.md \ + --label "code-quality,cleanup,low-priority" + +gh issue create --title "Bug: Logging Duplicate category_id" \ + --body-file 10-logging-bug-duplicate-category.md \ + --label "bug,low-priority,easy-fix" + +gh issue create --title "Code Smell: Class Variable Used as Instance Variable" \ + --body-file 12-class-variable-as-instance.md \ + --label "code-quality,refactoring,low-priority,design" + +echo "All issues created successfully!" +``` + +Make the script executable and run it: +```bash +chmod +x create-all-issues.sh +./create-all-issues.sh +``` + +## Contributing + +When fixing these issues: + +1. **Reference the issue** in your PR description +2. **Add tests** if the fix changes behavior +3. **Run existing tests** to ensure no regressions +4. **Update documentation** if needed +5. **Consider related issues** - some should be fixed together + +## Need Help? + +If you need clarification on any of these issues or want to discuss the best approach: + +1. Comment on the GitHub issue +2. Check the "Related Issues" section in each file +3. Review the "Acceptance Criteria" for specific requirements + +--- + +*Generated: 2026-01-13* +*Repository: javrr-ui/expense-tracker* +*For questions, please open a discussion or comment on specific issues* diff --git a/docs/issues/create-all-issues.sh b/docs/issues/create-all-issues.sh new file mode 100755 index 0000000..275ed26 --- /dev/null +++ b/docs/issues/create-all-issues.sh @@ -0,0 +1,138 @@ +#!/bin/bash +# Script to create all code smell issues in GitHub +# Requires GitHub CLI (gh) to be installed and authenticated + +set -e # Exit on error + +echo "==============================================" +echo "Creating Code Smell Issues for Expense Tracker" +echo "==============================================" +echo "" + +# Check if gh CLI is installed +if ! command -v gh &> /dev/null; then + echo "Error: GitHub CLI (gh) is not installed." + echo "Please install it from: https://cli.github.com/" + exit 1 +fi + +# Check if authenticated +if ! gh auth status &> /dev/null; then + echo "Error: Not authenticated with GitHub CLI." + echo "Please run: gh auth login" + exit 1 +fi + +cd "$(dirname "$0")" + +echo "Creating issues in priority order..." +echo "" + +# High Priority Issues +echo "📌 Creating HIGH PRIORITY issues..." + +gh issue create \ + --title "Bug: Variable Shadowing (Duplicate Assignment)" \ + --body-file 08-variable-shadowing-bug.md \ + --label "bug,high-priority,easy-fix" +echo "✓ Created: Variable Shadowing Bug (#8)" + +# Medium Priority Issues +echo "" +echo "📋 Creating MEDIUM PRIORITY issues..." + +gh issue create \ + --title "Code Smell: Duplicate Month Mapping Dictionaries" \ + --body-file 01-duplicate-month-mappings.md \ + --label "code-quality,refactoring,medium-priority" +echo "✓ Created: Duplicate Month Mappings (#1)" + +gh issue create \ + --title "Code Smell: Duplicate Date Normalization Logic" \ + --body-file 02-duplicate-date-normalization.md \ + --label "code-quality,refactoring,medium-priority" +echo "✓ Created: Duplicate Date Normalization (#2)" + +gh issue create \ + --title "Code Smell: Long Method (_parse_credit_card_purchase)" \ + --body-file 04-long-method-parse-credit-card.md \ + --label "code-quality,refactoring,medium-priority" +echo "✓ Created: Long Method (#4)" + +gh issue create \ + --title "Code Smell: Double Session Commit" \ + --body-file 09-double-session-commit.md \ + --label "code-quality,performance,medium-priority,database" +echo "✓ Created: Double Session Commit (#9)" + +gh issue create \ + --title "Code Smell: Data Clump (email_message Dictionary)" \ + --body-file 11-data-clump-email-message.md \ + --label "code-quality,refactoring,medium-priority,design" +echo "✓ Created: Data Clump (#11)" + +gh issue create \ + --title "Code Smell: Inconsistent Error Handling" \ + --body-file 13-inconsistent-error-handling.md \ + --label "code-quality,refactoring,medium-priority,error-handling" +echo "✓ Created: Inconsistent Error Handling (#13)" + +gh issue create \ + --title "Bug: Inconsistent Body Selection Logic" \ + --body-file 14-inconsistent-body-selection.md \ + --label "bug,code-quality,medium-priority" +echo "✓ Created: Inconsistent Body Selection (#14)" + +# Low Priority Issues +echo "" +echo "📝 Creating LOW PRIORITY issues..." + +gh issue create \ + --title "Code Smell: Magic Number (maxResults=500)" \ + --body-file 03-magic-number-max-results.md \ + --label "code-quality,refactoring,low-priority" +echo "✓ Created: Magic Number (#3)" + +gh issue create \ + --title "Code Smell: Feature Envy in Parser Return Type" \ + --body-file 05-feature-envy-parser-return-type.md \ + --label "code-quality,refactoring,low-priority,design" +echo "✓ Created: Feature Envy (#5)" + +gh issue create \ + --title "Code Smell: Missing Explicit Return Statement" \ + --body-file 06-missing-explicit-return.md \ + --label "code-quality,refactoring,low-priority" +echo "✓ Created: Missing Explicit Return (#6)" + +gh issue create \ + --title "Code Smell: Dead Code (PAYPAL Bank Definition)" \ + --body-file 07-dead-code-paypal.md \ + --label "code-quality,cleanup,low-priority" +echo "✓ Created: Dead Code - PAYPAL (#7)" + +gh issue create \ + --title "Bug: Logging Duplicate category_id" \ + --body-file 10-logging-bug-duplicate-category.md \ + --label "bug,low-priority,easy-fix" +echo "✓ Created: Logging Bug (#10)" + +gh issue create \ + --title "Code Smell: Class Variable Used as Instance Variable" \ + --body-file 12-class-variable-as-instance.md \ + --label "code-quality,refactoring,low-priority,design" +echo "✓ Created: Class Variable as Instance (#12)" + +echo "" +echo "==============================================" +echo "✅ All 14 issues created successfully!" +echo "==============================================" +echo "" +echo "Summary:" +echo " • 1 High Priority bug" +echo " • 7 Medium Priority issues" +echo " • 6 Low Priority issues" +echo "" +echo "View all issues at:" +echo " https://github.com/javrr-ui/expense-tracker/issues" +echo ""