diff --git a/README.md b/README.md
index cd1bfec..59a8fe2 100644
--- a/README.md
+++ b/README.md
@@ -1,15 +1,29 @@
 # paperweight
 
-This project automatically retrieves, filters, and summarizes recent academic papers from arXiv based on user-specified categories, then sends notifications to the user.
+A scalable system for retrieving, filtering, and summarizing academic papers from arXiv based on user preferences, with customizable notifications.
 
 ## Features
 
 - **ArXiv Integration**: Fetches recent papers from arXiv using their API, ensuring up-to-date access to the latest research.
 - **Customizable Filtering**: Filters papers based on user-defined preferences, including keywords, categories, and exclusion criteria.
-- **Intelligent Summarization** (BETA): Generates concise summaries or extracts abstracts, providing quick insights into paper content. Note: This feature is currently in beta and may have some limitations.
+- **Intelligent Summarization** (BETA): Generates concise summaries or extracts abstracts, providing quick insights into paper content.
 - **Flexible Notification System**: Notifies users via email, with potential for expansion to other notification methods.
 - **Configurable Settings**: Allows users to fine-tune the application's behavior through a YAML configuration file.
 
+## System Architecture
+
+```
+┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
+│    SCRAPER    │────▶│   PROCESSOR   │────▶│   ANALYZER    │────▶│   NOTIFIER    │
+└───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘
+        │                     │                     │                     │
+        ▼                     ▼                     ▼                     ▼
+┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
+│ arXiv API &   │     │ Scoring &     │     │ Abstract      │     │ Email &       │
+│ PDF Processing│     │ Filtering     │     │ Extraction    │     │ Templating    │
+└───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘
+```
+
 ## Table of Contents
 - [Getting Started](#getting-started)
 - [Installation](#installation)
@@ -17,6 +31,7 @@ This project automatically retrieves, filters, and summarizes recent academic pa
 - [Usage](#usage)
 - [Configuration](#configuration)
 - [FAQ and Troubleshooting](#faq-and-troubleshooting)
+- [Technical Details](#technical-details)
 - [Roadmap](#roadmap)
 - [Glossary](#glossary)
 - [License](#license)
@@ -29,11 +44,13 @@ This project automatically retrieves, filters, and summarizes recent academic pa
 
 - Python 3.10 or higher
 - Required Python packages:
-  - pypdf
-  - python-dotenv
-  - PyYAML
-  - requests
-  - simplerllm
+  - pypdf - For PDF document processing
+  - python-dotenv - For environment variable management
+  - PyYAML - For configuration parsing
+  - requests - For API communication
+  - simplerllm - For LLM integration
+  - tenacity - For resilient API interactions
+  - tiktoken - For token counting
 
 ## Installation
 
@@ -98,14 +115,41 @@ For a comprehensive list of frequently asked questions, including setup instruct
 
 If you can't find an answer to your question or solution to your problem in the FAQ, please [open an issue](https://github.com/seanbrar/paperweight/issues) on GitHub.
 
+## Technical Details
+
+### Processing Pipeline
+
+paperweight processes papers through four main stages:
+
+1. **Scraping** (`scraper.py`): Fetches recent papers from arXiv's API based on user-defined categories and processes the PDF/LaTeX content.
+
+2. **Processing** (`processor.py`): Calculates relevance scores based on keyword matching, with weights for title, abstract, and content matches, plus handling of exclusion keywords.
+
+3. **Analysis** (`analyzer.py`): Either extracts the abstract or generates a summary using an LLM (OpenAI or Gemini), with configurable options.
+
+4. **Notification** (`notifier.py`): Formats the filtered papers and sends them via email, with options for sorting by relevance, date, or title.
+
+### Resilience Features
+
+- **Retry Logic**: Uses the `tenacity` library to implement exponential backoff for API calls
+- **Error Handling**: Comprehensive error catching and logging throughout the codebase
+- **State Persistence**: Maintains processing state between runs using the `last_processed_date.txt` file
+
+### Performance Considerations
+
+- **Token Counting**: Uses `tiktoken` to accurately count tokens for LLM context management
+- **Configurable Limits**: Allows setting maximum papers per category to control processing time
+- **Incremental Processing**: Only fetches papers published since the last run
+
 ## Roadmap
 
 Key upcoming features:
 - Implement machine learning-based paper recommendations
 - Add support for additional academic paper sources
 - Expand notification methods
+- Enhance batch processing capabilities
 
-For a full list of proposed features and known issues, see the [open issues](https://github.com/seanbrar/paperweight/issues) page or the detailed [roadmap](docs/ROADMAP.md).
+For a full list of proposed features and planned enhancements, see the detailed [roadmap](docs/ROADMAP.md).
 
 ## Glossary
 
@@ -114,6 +158,8 @@ For a full list of proposed features and known issues, see the [open issues](htt
 - **YAML**: A human-readable data serialization format used for configuration files.
 - **SMTP**: Simple Mail Transfer Protocol; used for sending emails.
 - **LLM**: Large Language Model; an AI model used for text generation and analysis.
+- **Embedding**: A numerical representation of text that captures semantic meaning.
+- **Token**: A unit of text processed by language models, roughly corresponding to 4 characters.
 
 ## License
 
diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index cbc0025..8fc9760 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -1,56 +1,87 @@
 # paperweight roadmap
 
-This document outlines the planned features and improvements for the paperweight project. Please note that this roadmap is subject to change based on user feedback and project priorities.
+This document outlines planned features and improvements for the paperweight project. The roadmap is organized into focused development areas to create a scalable, efficient academic paper processing system.
 
-## Short-term Goals
+## Core System Enhancements
 
-### General Improvements
-- [ ] Implement general code cleanup and optimization
-- [ ] Increase overall speed through asynchronous operations
-- [ ] Create a web-hosted demo of the program
+### Performance & Efficiency
+- [ ] Implement asynchronous processing for paper fetching and analysis
+- [ ] Add configurable batch processing with adjustable batch sizes
+- [ ] Create memory usage tracking and optimization for large document sets
+- [ ] Implement benchmarking tools to measure and optimize performance
+
+### Context Management
+- [ ] Develop intelligent document chunking for papers exceeding token limits
+- [ ] Implement hierarchical summarization for extremely long papers
+- [ ] Create a context window awareness system that optimizes token usage
+- [ ] Add semantic sectioning to prioritize important paper components
+
+### Caching Infrastructure
+- [ ] Implement persistent caching for paper embeddings and metadata
+- [ ] Create smart cache invalidation strategies based on paper updates
+- [ ] Develop a disk-based storage system for embeddings to reduce API costs
+- [ ] Add cache statistics reporting for optimization insights
+
+## Module-Specific Improvements
 
 ### Scraper Module
-- [ ] Build and implement PDF extraction evaluations
-- [ ] Add retry logic in API/scraper (possibly using tenacity)
-- [ ] Revisit and improve date checking logic
-  - [ ] Develop comprehensive testing suite with dummy papers
-- [ ] Parse out unnecessary content (e.g., references, LaTeX preambles)
-- [ ] Add support for extracting and handling images from papers
+- [ ] Enhance PDF extraction precision with specialized academic paper handling
+- [ ] Add support for extracting and processing figures and tables
+- [ ] Expand retry logic in API interactions using advanced backoff strategies
+- [ ] Improve date-based paper filtering with precise version tracking
 
 ### Processor Module
-- [ ] Refine and expand the normalization score system for papers
+- [ ] Develop enhanced scoring algorithms for more accurate paper relevance
+- [ ] Implement sliding window analysis for sequential context processing
+- [ ] Create adaptive keyword weighting based on document section importance
+- [ ] Add citation network analysis for evaluating paper significance
 
 ### Analyzer Module
-- [ ] Conduct additional testing of LLM integration
-- [ ] Implement rate limits for API calls
-- [ ] Explore and potentially add support for a wider selection of models
-- [ ] Refine and optimize summarization prompts
+- [ ] Expand LLM provider support with a unified interface
+- [ ] Implement streaming responses for long paper summarization
+- [ ] Create domain-specific summarization templates for different fields
+- [ ] Add comparative analysis between related papers
 
 ### Notifier Module
-- [ ] Improve handling of scenarios where all papers are discarded
-- [ ] Revisit and potentially expand the fields included in notifications (e.g., authors)
-- [ ] Add more options for paper ordering and field selection in email notifications
+- [ ] Develop a modular notification system supporting multiple channels
+- [ ] Create customizable templates for notification formatting
+- [ ] Implement digest mode for batched notifications
+- [ ] Add interactive elements to notifications for user feedback
+
+## Strategic Directions
 
-## Medium-term Goals
+### Machine Learning Integration
+- [ ] Replace keyword-based filtering with embedding similarity scoring
+- [ ] Implement personalized paper recommendations based on user interests
+- [ ] Develop citation impact prediction for emerging papers
+- [ ] Create a feedback loop to improve future recommendations
 
-- [ ] Replace current static keyword-based filtering with a machine learning recommendation engine
-  - [ ] Ensure interface compatibility is maintained
-- [ ] Expand notification methods beyond email
-  - [ ] Investigate possibilities like desktop notifications or a desktop agent
-- [ ] Rethink the notification system to make SMTP configuration less cumbersome for users
+### Expanded Data Sources
+- [ ] Add support for multiple academic repositories (PubMed, IEEE, etc.)
+- [ ] Implement unified metadata schema across different sources
+- [ ] Create source-specific optimizations for each repository
+- [ ] Develop cross-repository deduplication
 
-## Long-term Goals
+### User Experience
+- [ ] Create a simple web interface for configuration and monitoring
+- [ ] Develop a local dashboard for visualizing paper recommendations
+- [ ] Add personalized preference learning from user interactions
+- [ ] Implement saved searches and automated monitoring
 
-- [ ] Add support for additional academic paper sources beyond arXiv
-- [ ] Implement machine learning-based paper recommendations
-- [ ] Continuously improve and refine the LLM-based summarization feature
+## Development Infrastructure
 
-## Ongoing Tasks
+### Testing & Quality
+- [ ] Expand test coverage with more integration tests
+- [ ] Develop performance regression testing
+- [ ] Create automated benchmark suites for optimization
+- [ ] Implement continuous profiling for memory and CPU usage
 
-- [ ] Maintain and update documentation
-- [ ] Address bugs and issues reported by users
-- [ ] Optimize performance and resource usage
+### Documentation
+- [ ] Expand API documentation for extensibility
+- [ ] Create visual architecture diagrams
+- [ ] Develop advanced configuration guides for specific use cases
+- [ ] Add code examples for common extension patterns
 
-We welcome contributions and suggestions from the community. If you have ideas for new features or improvements, please open an issue on the [GitHub repository](https://github.com/seanbrar/paperweight/issues).
+We welcome contributions and suggestions from the community. If you have ideas for features or improvements, please open an issue on the [GitHub repository](https://github.com/seanbrar/paperweight/issues).
 
 For information on how to contribute to paperweight, please see the [contributing guide](docs/CONTRIBUTING.md).
\ No newline at end of file
diff --git a/setup.py b/setup.py
index e322c42..ff22c80 100644
--- a/setup.py
+++ b/setup.py
@@ -3,7 +3,7 @@
 
 setup(
     name="paperweight",
-    version="0.1.1",
+    version="0.1.2",
     package_dir={"": "src"},
     packages=find_packages(where="src"),
     install_requires=[
diff --git a/src/paperweight/analyzer.py b/src/paperweight/analyzer.py
index c71b2ba..789f121 100644
--- a/src/paperweight/analyzer.py
+++ b/src/paperweight/analyzer.py
@@ -1,3 +1,10 @@
+"""Module for analyzing and summarizing academic papers.
+
+This module provides functionality for analyzing paper content using LLMs (Language Model Models)
+and extracting relevant information. It supports different analysis types including abstract
+extraction and paper summarization using various LLM providers.
+"""
+
 import logging
 from typing import Any, Dict
 
@@ -11,29 +18,61 @@
 
 logger = logging.getLogger(__name__)
 
+
 def get_abstracts(processed_papers, config):
-    analysis_type = config.get('type', 'abstract')
+    """Extract abstracts or summaries from processed papers based on configuration.
+
+    Args:
+        processed_papers: List of dictionaries containing paper data.
+        config: Configuration dictionary specifying analysis type and parameters.
 
-    if analysis_type == 'abstract':
-        return [paper['abstract'] for paper in processed_papers]
-    elif analysis_type == 'summary':
+    Returns:
+        List of strings containing either abstracts or summaries based on config type.
+
+    Raises:
+        ValueError: If an unknown analysis type is specified in config.
+    """
+    analysis_type = config.get("type", "abstract")
+
+    if analysis_type == "abstract":
+        return [paper["abstract"] for paper in processed_papers]
+    elif analysis_type == "summary":
         return [summarize_paper(paper, config) for paper in processed_papers]
     else:
         raise ValueError(f"Unknown analysis type: {analysis_type}")
 
+
 @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
 def summarize_paper(paper: Dict[str, Any], config: Dict[str, Any]) -> str:
-    llm_provider = config.get('analyzer', {}).get('llm_provider', 'openai').lower()
-    api_key = config.get('analyzer', {}).get('api_key')
+    """Generate a summary of a paper using an LLM.
 
-    if llm_provider not in ['openai', 'gemini'] or not api_key:
-        logger.warning(f"No valid LLM provider or API key available for {llm_provider}. Falling back to abstract.")
-        return paper['abstract']
+    Args:
+        paper: Dictionary containing paper data including content and metadata.
+        config: Configuration dictionary containing LLM settings.
+
+    Returns:
+        A string containing the generated summary.
+
+    Raises:
+        ValueError: If no valid LLM provider or API key is available.
+    """
+    llm_provider = config.get("analyzer", {}).get("llm_provider", "openai").lower()
+    api_key = config.get("analyzer", {}).get("api_key")
+
+    if llm_provider not in ["openai", "gemini"] or not api_key:
+        logger.warning(
+            f"No valid LLM provider or API key available for {llm_provider}. Falling back to abstract."
+        )
+        return paper["abstract"]
 
     try:
         provider = LLMProvider[llm_provider.upper()]
-        model_name = 'gpt-4o-mini' if provider == LLMProvider.OPENAI else 'gemini-1.5-flash'
-        llm_instance = LLM.create(provider=provider, model_name=model_name, api_key=api_key)
+        model_name = (
+            "gpt-4o-mini" if provider == LLMProvider.OPENAI else "gemini-1.5-flash"
+        )
+        llm_instance = LLM.create(
+            provider=provider, model_name=model_name, api_key=api_key
+        )
         prompt = f"Write a concise, accurate summary of the following paper's content in about 3-5 sentences:\n\n```{paper['content']}```"
 
         input_tokens = count_tokens(prompt)
@@ -47,12 +86,29 @@ def summarize_paper(paper: Dict[str, Any], config: Dict[str, Any]) -> str:
         return response
     except Exception as e:
         logger.error(f"Error summarizing paper: {e}", exc_info=True)
-        return paper['abstract']
+        return paper["abstract"]
+
 
 def create_llm_instance(provider: str, api_key: str) -> LLM:
-    if provider == 'openai':
-        return LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o-mini", api_key=api_key)
-    elif provider == 'gemini':
-        return LLM.create(provider=LLMProvider.GEMINI, model_name="gemini-1.5-flash", api_key=api_key)
+    """Create an instance of the specified LLM provider.
+
+    Args:
+        provider: The name of the LLM provider ('openai' or 'gemini').
+        api_key: API key for the specified provider.
+
+    Returns:
+        An initialized LLM instance.
+
+    Raises:
+        ValueError: If an unsupported provider is specified.
+    """
+    if provider == "openai":
+        return LLM.create(
+            provider=LLMProvider.OPENAI, model_name="gpt-4o-mini", api_key=api_key
+        )
+    elif provider == "gemini":
+        return LLM.create(
+            provider=LLMProvider.GEMINI, model_name="gemini-1.5-flash", api_key=api_key
+        )
     else:
         raise ValueError(f"Unsupported LLM provider: {provider}")
diff --git a/src/paperweight/logging_config.py b/src/paperweight/logging_config.py
index 4ed82c2..79d9ee1 100644
--- a/src/paperweight/logging_config.py
+++ b/src/paperweight/logging_config.py
@@ -1,44 +1,63 @@
+"""Module for configuring logging in the paperweight application.
+
+This module provides functionality for setting up logging with both file and console
+handlers, configurable log levels, and standardized formatting. It ensures log directories
+exist and handles invalid logging level configurations gracefully.
+"""
+
 import logging
 import logging.config
 import os
 
 
 def setup_logging(logging_config):
-    valid_levels = {'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'}
-    logging_level = logging_config.get('level', 'INFO').upper()
+    """Set up logging configuration for the application.
+
+    Args:
+        logging_config: Dictionary containing logging configuration parameters including
+                       'level' and 'file' settings.
+
+    The function configures both file and console handlers with the following features:
+    - Console handler with WARNING and above levels
+    - File handler with the configured level (defaults to INFO)
+    - Standard format: timestamp - logger_name - level - message
+    - Automatic creation of log directory if it doesn't exist
+    """
+    valid_levels = {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}
+    logging_level = logging_config.get("level", "INFO").upper()
     if logging_level not in valid_levels:
-        logging_level = 'INFO'
+        logging_level = "INFO"
 
-    log_file = logging_config['file']
+    log_file = logging_config["file"]
     log_dir = os.path.dirname(log_file)
     if log_dir and not os.path.exists(log_dir):
         os.makedirs(log_dir, exist_ok=True)
 
     logging_config = {
-        'version': 1,
-        'disable_existing_loggers': False,
-        'formatters': {
-            'standard': {
-                'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s',
-                'datefmt': '%Y-%m-%d %H:%M:%S'
+        "version": 1,
+        "disable_existing_loggers": False,
+        "formatters": {
+            "standard": {
+                "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+                "datefmt": "%Y-%m-%d %H:%M:%S",
             },
         },
-        'handlers': {
-            'console': {
-                'class': 'logging.StreamHandler',
-                'formatter': 'standard',
-                'level': 'WARNING',
+        "handlers": {
+            "console": {
+                "class": "logging.StreamHandler",
+                "formatter": "standard",
+                "level": "WARNING",
             },
-            'file': {
-                'class': 'logging.FileHandler',
-                'filename': log_file,
-                'formatter': 'standard',
-                'level': logging_level,
+            "file": {
+                "class": "logging.FileHandler",
+                "filename": log_file,
+                "formatter": "standard",
+                "level": logging_level,
             },
         },
-        'root': {
-            'handlers': ['console', 'file'],
-            'level': logging_level,
+        "root": {
+            "handlers": ["console", "file"],
+            "level": logging_level,
         },
     }
     logging.config.dictConfig(logging_config)
diff --git a/src/paperweight/main.py b/src/paperweight/main.py
index 9cc8970..1cc152b 100644
--- a/src/paperweight/main.py
+++ b/src/paperweight/main.py
@@ -1,3 +1,10 @@
+"""Main module for the paperweight application.
+
+This module serves as the entry point for the paperweight application, coordinating
+the paper fetching, processing, analysis, and notification processes. It handles
+configuration loading, logging setup, and the main execution flow of the application.
+"""
+
 import argparse
 import logging
 import traceback
@@ -14,9 +21,20 @@
 
 logger = logging.getLogger(__name__)
 
+
 def setup_and_get_papers(force_refresh):
+    """Set up the application and fetch papers.
+
+    Args:
+        force_refresh: Boolean indicating whether to ignore the last processed date
+                      and fetch all papers within the configured time window.
+
+    Returns:
+        Tuple of (papers, config) where papers is a list of paper dictionaries and
+        config is the loaded configuration dictionary.
+    """
     config = load_config()
-    setup_logging(config['logging'])
+    setup_logging(config["logging"])
     logger.info("Configuration loaded successfully")
 
     if force_refresh:
@@ -25,27 +43,54 @@ def setup_and_get_papers(force_refresh):
     else:
         return get_recent_papers(), config
 
+
 def process_and_summarize_papers(recent_papers, config):
+    """Process and analyze papers based on configured criteria.
+
+    Args:
+        recent_papers: List of paper dictionaries to process.
+        config: Configuration dictionary containing processing parameters.
+
+    Returns:
+        List of processed papers with relevance scores and summaries.
+    """
     if not recent_papers:
         logger.info("No new papers to process. Exiting.")
         return None
 
-    processed_papers = process_papers(recent_papers, config['processor'])
+    processed_papers = process_papers(recent_papers, config["processor"])
     logger.info(f"Processed {len(processed_papers)} papers")
 
     if not processed_papers:
         logger.info("No papers met the relevance criteria. Exiting.")
         return None
 
-    summaries = get_abstracts(processed_papers, config['analyzer'])
+    summaries = get_abstracts(processed_papers, config["analyzer"])
     for paper, summary in zip(processed_papers, summaries):
-        paper['summary'] = summary if summary else paper.get('abstract', 'No summary available')
+        paper["summary"] = (
+            summary if summary else paper.get("abstract", "No summary available")
+        )
 
     return processed_papers
 
+
 def main():
-    parser = argparse.ArgumentParser(description="paperweight: Fetch and process arXiv papers")
-    parser.add_argument('--force-refresh', action='store_true', help='Force refresh papers regardless of last processed date')
+    """Main entry point for the paperweight application.
+
+    This function parses command line arguments, coordinates the paper processing
+    pipeline, and handles any errors that occur during execution.
+
+    Returns:
+        0 on successful execution, 1 on error.
+    """
+    parser = argparse.ArgumentParser(
+        description="paperweight: Fetch and process arXiv papers"
+    )
+    parser.add_argument(
+        "--force-refresh",
+        action="store_true",
+        help="Force refresh papers regardless of last processed date",
+    )
     args = parser.parse_args()
 
     try:
@@ -53,7 +98,9 @@ def main():
         processed_papers = process_and_summarize_papers(recent_papers, config)
 
         if processed_papers:
-            notification_sent = compile_and_send_notifications(processed_papers, config['notifier'])
+            notification_sent = compile_and_send_notifications(
+                processed_papers, config["notifier"]
+            )
             if notification_sent:
                 logger.info("Notifications compiled and sent successfully")
             else:
@@ -69,6 +116,7 @@ def main():
     except Exception as e:
         logger.error(f"An unexpected error occurred: {e}")
 
+
 if __name__ == "__main__":
     try:
         main()
diff --git a/src/paperweight/notifier.py b/src/paperweight/notifier.py
index 30277fb..6db93b2 100644
--- a/src/paperweight/notifier.py
+++ b/src/paperweight/notifier.py
@@ -1,3 +1,10 @@
+"""Module for sending email notifications about processed papers.
+
+This module handles the creation and sending of email notifications about relevant papers
+that have been processed. It includes functionality for composing email content and
+sending emails through SMTP servers.
+"""
+
 import logging
 import smtplib
 from email.mime.multipart import MIMEMultipart
@@ -5,20 +12,31 @@
 
 logger = logging.getLogger(__name__)
 
+
 def send_email_notification(subject, body, config):
-    from_email = config['email']['from']
-    from_password = config['email']['password']
-    to_email = config['email']['to']
-    smtp_server = config['email']['smtp_server']
-    smtp_port = config['email']['smtp_port']
+    """Send an email notification using the configured SMTP server.
+
+    Args:
+        subject: The subject line of the email.
+        body: The body text of the email.
+        config: Configuration dictionary containing email settings.
+
+    Raises:
+        smtplib.SMTPException: If there is an error sending the email.
+    """
+    from_email = config["email"]["from"]
+    from_password = config["email"]["password"]
+    to_email = config["email"]["to"]
+    smtp_server = config["email"]["smtp_server"]
+    smtp_port = config["email"]["smtp_port"]
 
     # Create the email
     msg = MIMEMultipart()
-    msg['From'] = from_email
-    msg['To'] = to_email
-    msg['Subject'] = subject
+    msg["From"] = from_email
+    msg["To"] = to_email
+    msg["Subject"] = subject
 
-    msg.attach(MIMEText(body, 'plain'))
+    msg.attach(MIMEText(body, "plain"))
 
     # Send the email
     try:
@@ -28,23 +46,32 @@ def send_email_notification(subject, body, config):
         text = msg.as_string()
         server.sendmail(from_email, to_email, text)
         server.quit()
-        logger.info("Email sent successfully")
-        return True
+        logger.info("Email notification sent successfully")
     except Exception as e:
-        logger.error(f"Failed to send email: {e}")
-        return False
+        logger.error(f"Failed to send email notification: {e}", exc_info=True)
+        raise
+
 
 def compile_and_send_notifications(papers, config):
+    """Compile paper information and send email notifications.
+
+    Args:
+        papers: List of dictionaries containing paper data.
+        config: Configuration dictionary containing email and notification settings.
+
+    Returns:
+        bool: True if notifications were sent successfully, False otherwise.
+    """
     if not papers:
         logger.info("No papers to send notifications for.")
         return
 
-    sort_order = config.get('email', {}).get('sort_order', 'relevance')
+    sort_order = config.get("email", {}).get("sort_order", "relevance")
 
-    if sort_order == 'alphabetical':
-        papers = sorted(papers, key=lambda x: x['title'].lower())
-    elif sort_order == 'publication_time':
-        papers = sorted(papers, key=lambda x: x['date'], reverse=True)
+    if sort_order == "alphabetical":
+        papers = sorted(papers, key=lambda x: x["title"].lower())
+    elif sort_order == "publication_time":
+        papers = sorted(papers, key=lambda x: x["date"], reverse=True)
     # For 'relevance' or any other value, we keep the existing order (already sorted by relevance)
 
     subject = "New Papers from ArXiv"
diff --git a/src/paperweight/processor.py b/src/paperweight/processor.py
index 3a01ea0..8c698e7 100644
--- a/src/paperweight/processor.py
+++ b/src/paperweight/processor.py
@@ -1,3 +1,10 @@
+"""Module for processing and scoring academic papers.
+
+This module handles the processing of papers including scoring based on relevance criteria,
+keyword matching, and importance weighting. It provides functionality for filtering papers
+based on minimum score thresholds and normalizing scores across multiple papers.
+"""
+
 import logging
 import math
 import re
@@ -6,80 +13,152 @@
 
 logger = logging.getLogger(__name__)
 
-def process_papers(papers: List[Dict[str, Any]], processor_config: Dict[str, Any]) -> List[Dict[str, Any]]:
+
+def process_papers(
+    papers: List[Dict[str, Any]], processor_config: Dict[str, Any]
+) -> List[Dict[str, Any]]:
+    """Process and score a list of papers based on configured criteria.
+
+    Args:
+        papers: List of dictionaries containing paper data.
+        processor_config: Configuration dictionary containing scoring parameters and thresholds.
+
+    Returns:
+        List of processed papers with relevance scores, sorted by normalized score.
+    """
     processed_papers = []
     for paper in papers:
         score, score_breakdown = calculate_paper_score(paper, processor_config)
         logger.debug(f"Paper '{paper['title']}' scored {score}")
-        if score >= processor_config['min_score']:
-            paper['relevance_score'] = score
-            paper['score_breakdown'] = score_breakdown
+        if score >= processor_config["min_score"]:
+            paper["relevance_score"] = score
+            paper["score_breakdown"] = score_breakdown
             processed_papers.append(paper)
         else:
-            logger.debug(f"Paper '{paper['title']}' filtered out. Score {score} < min_score {processor_config['min_score']}")
+            logger.debug(
+                f"Paper '{paper['title']}' filtered out. Score {score} < min_score {processor_config['min_score']}"
+            )
 
     logger.debug(f"Processed {len(processed_papers)} papers out of {len(papers)}")
 
     processed_papers = normalize_scores(processed_papers)
-    return sorted(processed_papers, key=lambda x: x['normalized_score'], reverse=True)
+    return sorted(processed_papers, key=lambda x: x["normalized_score"], reverse=True)
+
 
 def normalize_scores(papers: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    """Normalize relevance scores across all papers to a 0-1 scale.
+
+    Args:
+        papers: List of dictionaries containing paper data with relevance scores.
+
+    Returns:
+        List of papers with added normalized_score field.
+    """
     if not papers:
         return papers
 
-    max_score = max(paper['relevance_score'] for paper in papers)
-    min_score = min(paper['relevance_score'] for paper in papers)
+    max_score = max(paper["relevance_score"] for paper in papers)
+    min_score = min(paper["relevance_score"] for paper in papers)
 
     for paper in papers:
         if max_score != min_score:
-            paper['normalized_score'] = (paper['relevance_score'] - min_score) / (max_score - min_score)
+            paper["normalized_score"] = (paper["relevance_score"] - min_score) / (
+                max_score - min_score
+            )
         else:
-            paper['normalized_score'] = 1.0
+            paper["normalized_score"] = 1.0
 
     logger.debug("Normalized scores calculated")
     return papers
 
+
 def calculate_paper_score(paper, config):
+    """Calculate a relevance score for a paper based on configured criteria.
+
+    Args:
+        paper: Dictionary containing paper data including content and metadata.
+        config: Configuration dictionary containing scoring parameters.
+
+    Returns:
+        Tuple of (total_score, score_breakdown) where score_breakdown is a dictionary
+        containing individual component scores.
+    """
     score = 0
     score_breakdown = {}
     # Keyword matching
-    title_keywords = count_keywords(paper['title'], config['keywords'])
-    abstract_keywords = count_keywords(paper['abstract'], config['keywords'])
-    content_keywords = count_keywords(paper['content'], config['keywords'])
+    title_keywords = count_keywords(paper["title"], config["keywords"])
+    abstract_keywords = count_keywords(paper["abstract"], config["keywords"])
+    content_keywords = count_keywords(paper["content"], config["keywords"])
 
     max_title_score = 50
     max_abstract_score = 50
     max_content_score = 25
 
-    title_score = min(title_keywords * config['title_keyword_weight'], max_title_score)
-    abstract_score = min(abstract_keywords * config['abstract_keyword_weight'], max_abstract_score)
-    content_score = min(content_keywords * config['content_keyword_weight'], max_content_score)
+    title_score = min(title_keywords * config["title_keyword_weight"], max_title_score)
+    abstract_score = min(
+        abstract_keywords * config["abstract_keyword_weight"], max_abstract_score
+    )
+    content_score = min(
+        content_keywords * config["content_keyword_weight"], max_content_score
+    )
 
     score += title_score + abstract_score + content_score
-    score_breakdown['keyword_matching'] = {
-        'title': round(title_score, 2),
-        'abstract': round(abstract_score, 2),
-        'content': round(content_score, 2)
+    score_breakdown["keyword_matching"] = {
+        "title": round(title_score, 2),
+        "abstract": round(abstract_score, 2),
+        "content": round(content_score, 2),
     }
 
     # Exclusion list
-    exclusion_count = count_keywords(paper['content'], config['exclusion_keywords'])
-    exclusion_score = min(exclusion_count * config['exclusion_keyword_penalty'], max_content_score)
+    exclusion_count = count_keywords(paper["content"], config["exclusion_keywords"])
+    exclusion_score = min(
+        exclusion_count * config["exclusion_keyword_penalty"], max_content_score
+    )
     score -= exclusion_score
-    score_breakdown['exclusion_penalty'] = -round(exclusion_score, 2)
+    score_breakdown["exclusion_penalty"] = -round(exclusion_score, 2)
 
     # Simple text analysis
-    important_word_count = count_important_words(paper['content'], config['important_words'])
-    important_word_score = min(important_word_count * config['important_words_weight'], max_content_score)
+    important_word_count = count_important_words(
+        paper["content"], config["important_words"]
+    )
+    important_word_score = min(
+        important_word_count * config["important_words_weight"], max_content_score
+    )
     score += important_word_score
-    score_breakdown['important_words'] = round(important_word_score, 2)
+    score_breakdown["important_words"] = round(important_word_score, 2)
+
+    return max(score, 0), score_breakdown  # Ensure score is not negative
 
-    return max(score, 0), score_breakdown # Ensure score is not negative
 
 def count_keywords(text, keywords):
-    return sum(math.log(text.lower().count(keyword.lower()) + 1) for keyword in keywords)
+    """Count occurrences of keywords in text.
+
+    Args:
+        text: The text to search in.
+        keywords: List of keywords to count.
+
+    Returns:
+        Dictionary mapping keywords to their occurrence counts.
+    """
+    return sum(
+        math.log(text.lower().count(keyword.lower()) + 1) for keyword in keywords
+    )
+
 
 def count_important_words(text, important_words):
-    words = re.findall(r'\w+', text.lower())
+    """Count occurrences of important words in text.
+
+    Args:
+        text: The text to search in.
+        important_words: List of important words to count.
+
+    Returns:
+        Dictionary mapping important words to their occurrence counts.
+    """
+    words = re.findall(r"\w+", text.lower())
     word_counts = Counter(words)
-    return sum(math.log(word_counts[word.lower()] + 1) for word in important_words if word.lower() in word_counts)
+    return sum(
+        math.log(word_counts[word.lower()] + 1)
+        for word in important_words
+        if word.lower() in word_counts
+    )
diff --git a/src/paperweight/scraper.py b/src/paperweight/scraper.py
index 6d69f0d..1e57dc1 100644
--- a/src/paperweight/scraper.py
+++ b/src/paperweight/scraper.py
@@ -1,3 +1,10 @@
+"""Module for fetching and processing arXiv papers.
+
+This module handles all interactions with the arXiv API, including fetching paper metadata,
+downloading PDFs, and extracting text content. It includes retry mechanisms for robust
+API interactions and various methods for processing paper content.
+"""
+
 import gzip
 import io
 import logging
@@ -26,12 +33,29 @@
 
 logger = logging.getLogger(__name__)
 
+
 @retry(
     stop=stop_after_attempt(3),
     wait=wait_exponential(multiplier=1, min=4, max=10),
-    retry=retry_if_exception_type((requests.ConnectionError, requests.Timeout))
+    retry=retry_if_exception_type((requests.ConnectionError, requests.Timeout)),
 )
-def fetch_arxiv_papers(category: str, start_date: date, max_results: Optional[int] = None) -> List[Dict[str, Any]]:
+def fetch_arxiv_papers(
+    category: str, start_date: date, max_results: Optional[int] = None
+) -> List[Dict[str, Any]]:
+    """Fetch papers from arXiv API for a specific category and date range.
+
+    Args:
+        category: The arXiv category to fetch papers from (e.g., 'cs.AI').
+        start_date: The date from which to start fetching papers.
+        max_results: Optional maximum number of results to return.
+
+    Returns:
+        List of dictionaries containing paper metadata.
+
+    Raises:
+        requests.ConnectionError: If connection to arXiv API fails.
+        requests.Timeout: If the request times out.
+    """
     logger.debug(f"Fetching arXiv papers for category '{category}' since {start_date}")
     base_url = "http://export.arxiv.org/api/query?"
     query = f"cat:{category}"
@@ -39,7 +63,7 @@ def fetch_arxiv_papers(category: str, start_date: date, max_results: Optional[in
         "search_query": query,
         "start": 0,
         "sortBy": "submittedDate",
-        "sortOrder": "descending"
+        "sortOrder": "descending",
     }
     if max_results is not None and max_results > 0:
         params["max_results"] = max_results
@@ -49,8 +73,12 @@ def fetch_arxiv_papers(category: str, start_date: date, max_results: Optional[in
         response.raise_for_status()
     except HTTPError as http_err:
         if response.status_code == 400 and "Invalid field: cat" in response.text:
-            logger.error(f"Invalid arXiv category: {category}. Please check your configuration.")
-            raise ValueError(f"Invalid arXiv category: {category}. Please check your configuration.") from http_err
+            logger.error(
+                f"Invalid arXiv category: {category}. Please check your configuration."
+            )
+            raise ValueError(
+                f"Invalid arXiv category: {category}. Please check your configuration."
+            ) from http_err
         else:
             logger.error(f"HTTP error occurred: {http_err}")
             raise
@@ -58,13 +86,18 @@ def fetch_arxiv_papers(category: str, start_date: date, max_results: Optional[in
     root = ET.fromstring(response.content)
 
     papers = []
-    for entry in root.findall('{http://www.w3.org/2005/Atom}entry'):
-        title_elem = entry.find('{http://www.w3.org/2005/Atom}title')
-        link_elem = entry.find('{http://www.w3.org/2005/Atom}id')
-        published_elem = entry.find('{http://www.w3.org/2005/Atom}published')
-        summary_elem = entry.find('{http://www.w3.org/2005/Atom}summary')
-
-        if title_elem is None or link_elem is None or published_elem is None or summary_elem is None:
+    for entry in root.findall("{http://www.w3.org/2005/Atom}entry"):
+        title_elem = entry.find("{http://www.w3.org/2005/Atom}title")
+        link_elem = entry.find("{http://www.w3.org/2005/Atom}id")
+        published_elem = entry.find("{http://www.w3.org/2005/Atom}published")
+        summary_elem = entry.find("{http://www.w3.org/2005/Atom}summary")
+
+        if (
+            title_elem is None
+            or link_elem is None
+            or published_elem is None
+            or summary_elem is None
+        ):
             logger.warning("Skipping entry due to missing required elements")
             continue
 
@@ -82,27 +115,37 @@ def fetch_arxiv_papers(category: str, start_date: date, max_results: Optional[in
         logger.debug(f"Paper '{title}' submitted on {submitted_date}")
 
         if submitted_date < start_date:
-            logger.debug(f"Stopping fetch: paper date {submitted_date} is before start date {start_date}")
+            logger.debug(
+                f"Stopping fetch: paper date {submitted_date} is before start date {start_date}"
+            )
             break
 
-        papers.append({
-            "title": title,
-            "link": link,
-            "date": submitted_date,
-            "abstract": abstract
-        })
+        papers.append(
+            {"title": title, "link": link, "date": submitted_date, "abstract": abstract}
+        )
 
         if max_results is not None and max_results > 0 and len(papers) >= max_results:
             logger.debug(f"Reached max_results limit of {max_results}")
             break
 
-    logger.info(f"Successfully fetched {len(papers)} papers for category '{category}' since {start_date}")
+    logger.info(
+        f"Successfully fetched {len(papers)} papers for category '{category}' since {start_date}"
+    )
     return papers
 
+
 def fetch_recent_papers(start_days=1):
+    """Fetch papers published within the last specified number of days.
+
+    Args:
+        start_days: Number of days to look back for papers.
+
+    Returns:
+        List of dictionaries containing paper metadata.
+    """
     config = load_config()
-    categories = config['arxiv']['categories']
-    max_results = config['arxiv'].get('max_results', 0)  # Default to 0 if not set
+    categories = config["arxiv"]["categories"]
+    max_results = config["arxiv"].get("max_results", 0)  # Default to 0 if not set
     end_date = datetime.now().date()
     start_date = end_date - timedelta(days=start_days)
 
@@ -114,9 +157,19 @@ def fetch_recent_papers(start_days=1):
     for category in categories:
         logger.info(f"Processing category: {category}")
         try:
-            papers = fetch_arxiv_papers(category, start_date, max_results=max_results if max_results > 0 else None)
-            new_papers = [paper for paper in papers if paper['link'].split('/abs/')[-1] not in processed_ids]
-            processed_ids.update(paper['link'].split('/abs/')[-1] for paper in new_papers)
+            papers = fetch_arxiv_papers(
+                category,
+                start_date,
+                max_results=max_results if max_results > 0 else None,
+            )
+            new_papers = [
+                paper
+                for paper in papers
+                if paper["link"].split("/abs/")[-1] not in processed_ids
+            ]
+            processed_ids.update(
+                paper["link"].split("/abs/")[-1] for paper in new_papers
+            )
 
             if max_results > 0:
                 new_papers = new_papers[:max_results]
@@ -130,22 +183,38 @@ def fetch_recent_papers(start_days=1):
     logger.info(f"Fetched a total of {len(all_papers)} papers")
     return all_papers
 
+
 @retry(
     stop=stop_after_attempt(3),
     wait=wait_exponential(multiplier=1, min=4, max=10),
-    retry=retry_if_exception_type((requests.ConnectionError, requests.Timeout, requests.RequestException))
+    retry=retry_if_exception_type(
+        (requests.ConnectionError, requests.Timeout, requests.RequestException)
+    ),
 )
 def fetch_paper_content(paper_id):
+    """Fetch the content of a specific paper from arXiv.
+
+    Args:
+        paper_id: The arXiv ID of the paper to fetch.
+
+    Returns:
+        Tuple of (content, method) where method indicates the source type.
+
+    Raises:
+        requests.ConnectionError: If connection to arXiv fails.
+        requests.Timeout: If the request times out.
+        requests.RequestException: For other request-related errors.
+    """
     logger.debug(f"Fetching content for paper ID: {paper_id}")
-    source_url = f'http://export.arxiv.org/e-print/{paper_id}'
-    pdf_url = f'https://export.arxiv.org/pdf/{paper_id}'
+    source_url = f"http://export.arxiv.org/e-print/{paper_id}"
+    pdf_url = f"https://export.arxiv.org/pdf/{paper_id}"
 
     try:
         # Try to fetch source first
         response = requests.get(source_url, timeout=30)
         response.raise_for_status()
         logger.debug(f"Successfully fetched source for paper ID: {paper_id}")
-        return response.content, 'source'
+        return response.content, "source"
     except requests.RequestException as e:
         logger.warning(f"Failed to fetch source for paper ID: {paper_id}. Error: {e}")
 
@@ -154,14 +223,23 @@ def fetch_paper_content(paper_id):
         response = requests.get(pdf_url, timeout=30)
         response.raise_for_status()
         logger.debug(f"Successfully fetched PDF for paper ID: {paper_id}")
-        return response.content, 'pdf'
+        return response.content, "pdf"
     except requests.RequestException as e:
         logger.warning(f"Failed to fetch PDF for paper ID: {paper_id}. Error: {e}")
 
     logger.error(f"Failed to fetch content for paper ID: {paper_id}")
     return None, None
 
+
 def extract_text_from_pdf(pdf_content):
+    """Extract text content from a PDF file.
+
+    Args:
+        pdf_content: Binary content of the PDF file.
+
+    Returns:
+        Extracted text as a string.
+    """
     pdf_file = io.BytesIO(pdf_content)
     pdf_reader = PdfReader(pdf_file)
     text = ""
@@ -169,11 +247,21 @@ def extract_text_from_pdf(pdf_content):
         text += page.extract_text()
     return text
 
+
 def extract_text_from_source(content, method):
-    if method not in ['pdf', 'source']:
+    """Extract text from various source formats.
+
+    Args:
+        content: The content to extract text from.
+        method: The method to use for extraction ('pdf' or 'source').
+
+    Returns:
+        Extracted text as a string.
+    """
+    if method not in ["pdf", "source"]:
         raise ValueError(f"Invalid source type: {method}")
 
-    if method == 'pdf':
+    if method == "pdf":
         return extract_text_from_pdf(content)
 
     # Try to decompress gzip content
@@ -190,11 +278,11 @@ def extract_text_from_source(content, method):
             for member in tar.getmembers():
                 if member.isfile():
                     _, ext = os.path.splitext(member.name)
-                    if ext.lower() in ['.tex', '.txt', '.log']:
+                    if ext.lower() in [".tex", ".txt", ".log"]:
                         f = tar.extractfile(member)
                         if f:
-                            text += f.read().decode('utf-8', errors='ignore')
-                    elif ext.lower() in ['.png', '.jpg', '.jpeg']:
+                            text += f.read().decode("utf-8", errors="ignore")
+                    elif ext.lower() in [".png", ".jpg", ".jpeg"]:
                         # Optionally log the presence of image files
                         logger.debug(f"Skipping image file: {member.name}")
                     else:
@@ -202,9 +290,18 @@ def extract_text_from_source(content, method):
             return text
     else:
         # If it's not a tar file, assume it's a single file
-        return decompressed.decode('utf-8', errors='ignore')
+        return decompressed.decode("utf-8", errors="ignore")
+
 
 def fetch_paper_contents(paper_ids):
+    """Fetch contents for multiple papers in parallel.
+
+    Args:
+        paper_ids: List of arXiv paper IDs to fetch.
+
+    Returns:
+        Dictionary mapping paper IDs to their content.
+    """
     contents = []
     total_papers = len(paper_ids)
     logger.info(f"Fetching content for {total_papers} papers")
@@ -218,7 +315,9 @@ def fetch_paper_contents(paper_ids):
 
         if (i + 1) % 4 == 0:
             time.sleep(1)
-            logger.debug(f"Processed {i + 1}/{total_papers} papers. Waiting 1 second...")
+            logger.debug(
+                f"Processed {i + 1}/{total_papers} papers. Waiting 1 second..."
+            )
 
         if (i + 1) % 20 == 0:
             logger.info(f"Processed {i + 1}/{total_papers} papers")
@@ -226,7 +325,16 @@ def fetch_paper_contents(paper_ids):
     logger.info(f"Finished fetching content for all {total_papers} papers")
     return contents
 
+
 def get_recent_papers(force_refresh=False):
+    """Get recent papers, either from cache or by fetching new ones.
+
+    Args:
+        force_refresh: If True, ignore cache and fetch new papers.
+
+    Returns:
+        List of dictionaries containing paper metadata.
+    """
     last_processed_date = get_last_processed_date()
     logger.info(f"Last processed date: {last_processed_date}")
     current_date = datetime.now().date()
@@ -244,12 +352,14 @@ def get_recent_papers(force_refresh=False):
         elif days > 7:
             # If more than a week has passed, limit to 7 days to avoid overload
             days = 7
-            logger.warning(f"More than a week since last run. Limiting fetch to last {days} days.")
+            logger.warning(
+                f"More than a week since last run. Limiting fetch to last {days} days."
+            )
 
     logger.info(f"Fetching papers for the last {days} days")
     recent_papers = fetch_recent_papers(days)
     logger.info(f"Fetched {len(recent_papers)} recent papers")
-    paper_ids = [paper['link'].split('/abs/')[-1] for paper in recent_papers]
+    paper_ids = [paper["link"].split("/abs/")[-1] for paper in recent_papers]
 
     contents = fetch_paper_contents(paper_ids)
 
@@ -259,22 +369,25 @@ def get_recent_papers(force_refresh=False):
             logger.debug(f"Extracting text for paper ID: {paper_id}")
             text = extract_text_from_source(content, method)
 
-            papers_with_content.append({
-                "id": paper_id,
-                "title": paper['title'],
-                "link": paper['link'],
-                "date": paper['date'],
-                "abstract": paper['abstract'],
-                "content": text,
-                "content_type": method
-            })
+            papers_with_content.append(
+                {
+                    "id": paper_id,
+                    "title": paper["title"],
+                    "link": paper["link"],
+                    "date": paper["date"],
+                    "abstract": paper["abstract"],
+                    "content": text,
+                    "content_type": method,
+                }
+            )
 
     if papers_with_content:
         save_last_processed_date(current_date)
-        logger.info(f"Processed {len(papers_with_content)} papers. Last processed date updated to {current_date}")
+        logger.info(
+            f"Processed {len(papers_with_content)} papers. Last processed date updated to {current_date}"
+        )
     else:
         logger.info("No new papers found.")
 
     logger.info(f"Returning {len(papers_with_content)} papers with content")
     return papers_with_content
-
diff --git a/src/paperweight/utils.py b/src/paperweight/utils.py
index f7728c9..9df91ea 100644
--- a/src/paperweight/utils.py
+++ b/src/paperweight/utils.py
@@ -1,3 +1,11 @@
+"""Utility functions for the paperweight application.
+
+This module provides various utility functions for configuration management,
+environment variable handling, date tracking, and token counting. It includes
+functions for loading and validating configuration, expanding environment variables,
+and managing the last processed date for paper fetching.
+"""
+
 import logging
 import os
 import re
@@ -11,7 +19,16 @@
 
 logger = logging.getLogger(__name__)
 
+
 def expand_env_vars(config):
+    """Recursively expand environment variables in configuration values.
+
+    Args:
+        config: Configuration object (dict, list, or scalar value).
+
+    Returns:
+        Configuration object with environment variables expanded.
+    """
     if isinstance(config, dict):
         return {k: expand_env_vars(v) for k, v in config.items()}
     elif isinstance(config, list):
@@ -21,8 +38,20 @@ def expand_env_vars(config):
     else:
         return config
 
+
 def override_with_env(config):
-    env_prefix = 'PAPERWEIGHT_'
+    """Override configuration values with environment variables.
+
+    Args:
+        config: Configuration dictionary to override.
+
+    Returns:
+        Configuration dictionary with values overridden by environment variables.
+
+    Environment variables should be prefixed with 'PAPERWEIGHT_' and use uppercase.
+    Nested configuration keys are joined with underscores.
+    """
+    env_prefix = "PAPERWEIGHT_"
     for key, value in config.items():
         env_var = f"{env_prefix}{key.upper()}"
         if isinstance(value, dict):
@@ -30,7 +59,7 @@ def override_with_env(config):
         elif env_var in os.environ:
             env_value = os.environ[env_var]
             if isinstance(value, bool):
-                config[key] = env_value.lower() in ('true', '1', 'yes')
+                config[key] = env_value.lower() in ("true", "1", "yes")
             elif isinstance(value, int):
                 config[key] = int(env_value)
             elif isinstance(value, float):
@@ -39,11 +68,25 @@ def override_with_env(config):
                 config[key] = env_value
     return config
 
-def load_config(config_path='config.yaml'):
+
+def load_config(config_path="config.yaml"):
+    """Load and validate the application configuration.
+
+    Args:
+        config_path: Path to the YAML configuration file.
+
+    Returns:
+        Dictionary containing the validated configuration.
+
+    Raises:
+        FileNotFoundError: If the configuration file doesn't exist.
+        yaml.YAMLError: If the configuration file is invalid YAML.
+        ValueError: If the configuration is invalid.
+    """
     try:
         load_dotenv()
 
-        with open(config_path, 'r') as config_file:
+        with open(config_path, "r") as config_file:
             config = yaml.safe_load(config_file)
         if config is None:
             raise ValueError("Empty configuration file")
@@ -52,23 +95,23 @@ def load_config(config_path='config.yaml'):
         config = override_with_env(config)
 
         # Handle API keys
-        if config['analyzer']['type'] == 'summary':
-            llm_provider = config['analyzer'].get('llm_provider')
+        if config["analyzer"]["type"] == "summary":
+            llm_provider = config["analyzer"].get("llm_provider")
             if not llm_provider:
                 raise ValueError("LLM provider not specified for summary analyzer type")
 
-            api_key_from_config = config['analyzer'].get('api_key')
-            api_key_from_env = os.getenv(f'{llm_provider.upper()}_API_KEY')
+            api_key_from_config = config["analyzer"].get("api_key")
+            api_key_from_env = os.getenv(f"{llm_provider.upper()}_API_KEY")
             api_key = api_key_from_config or api_key_from_env
             if api_key:
-                config['analyzer']['api_key'] = api_key
+                config["analyzer"]["api_key"] = api_key
             else:
                 raise ValueError(f"Missing API key for {llm_provider}")
         else:
             pass
 
-        if 'arxiv' in config and 'max_results' in config['arxiv']:
-            config['arxiv']['max_results'] = int(config['arxiv']['max_results'])
+        if "arxiv" in config and "max_results" in config["arxiv"]:
+            config["arxiv"]["max_results"] = int(config["arxiv"]["max_results"])
 
         check_config(config)
         logger.info("Configuration loaded and validated successfully")
@@ -89,84 +132,175 @@ def load_config(config_path='config.yaml'):
         logger.error(f"Exception in load_config: {str(e)}")
         raise
 
+
 def check_config(config):
+    """Check if the configuration is valid.
+
+    Args:
+        config: Configuration dictionary to validate.
+
+    Returns:
+        bool: True if configuration is valid.
+
+    Raises:
+        ValueError: If any required configuration is missing or invalid.
+    """
     if not isinstance(config, dict):
         raise ValueError("Configuration must be a dictionary")
     try:
         _check_required_sections(config)
-        _check_arxiv_section(config['arxiv'])
-        _check_analyzer_section(config['analyzer'])
-        _check_notifier_section(config['notifier'])
-        _check_logging_section(config['logging'])
+        _check_arxiv_section(config["arxiv"])
+        _check_analyzer_section(config["analyzer"])
+        _check_notifier_section(config["notifier"])
+        _check_logging_section(config["logging"])
     except KeyError as e:
         raise ValueError(f"Missing required section or key: {e}")
 
+
 def _check_required_sections(config):
-    required_sections = ['arxiv', 'processor', 'analyzer', 'notifier', 'logging']
+    """Check if all required configuration sections are present.
+
+    Args:
+        config: Configuration dictionary to check.
+
+    Raises:
+        ValueError: If any required section is missing.
+    """
+    required_sections = ["arxiv", "processor", "analyzer", "notifier", "logging"]
     for section in required_sections:
         if section not in config:
             raise ValueError(f"Missing required section: '{section}'")
 
+
 def _check_arxiv_section(arxiv):
-    if 'categories' not in arxiv:
+    """Validate the arXiv section of the configuration.
+
+    Args:
+        arxiv: arXiv configuration dictionary.
+
+    Raises:
+        ValueError: If arXiv configuration is invalid.
+    """
+    if "categories" not in arxiv:
         raise ValueError("Missing required subsection: 'categories' in 'arxiv'")
-    invalid_categories = [cat for cat in arxiv['categories'] if not is_valid_arxiv_category(cat)]
+    invalid_categories = [
+        cat for cat in arxiv["categories"] if not is_valid_arxiv_category(cat)
+    ]
     if invalid_categories:
         raise ValueError(f"Invalid arXiv category: {', '.join(invalid_categories)}")
-    if 'max_results' in arxiv:
+    if "max_results" in arxiv:
         try:
-            max_results = int(arxiv['max_results'])
+            max_results = int(arxiv["max_results"])
         except ValueError:
             raise ValueError("'max_results' in 'arxiv' section must be a valid integer")
 
         if max_results < 0:
-            raise ValueError("'max_results' in 'arxiv' section must be a non-negative integer")
+            raise ValueError(
+                "'max_results' in 'arxiv' section must be a non-negative integer"
+            )
+
 
 def _check_analyzer_section(analyzer):
-    valid_analyzer_types = ['abstract', 'summary']
-    if analyzer.get('type') not in valid_analyzer_types:
+    """Validate the analyzer section of the configuration.
+
+    Args:
+        analyzer: Analyzer configuration dictionary.
+
+    Raises:
+        ValueError: If analyzer configuration is invalid.
+    """
+    valid_analyzer_types = ["abstract", "summary"]
+    if analyzer.get("type") not in valid_analyzer_types:
         raise ValueError(f"Invalid analyzer type: '{analyzer.get('type')}'")
-    if analyzer.get('type') == 'summary':
-        valid_llm_providers = ['openai', 'gemini']
-        if analyzer.get('llm_provider') not in valid_llm_providers:
+    if analyzer.get("type") == "summary":
+        valid_llm_providers = ["openai", "gemini"]
+        if analyzer.get("llm_provider") not in valid_llm_providers:
             raise ValueError(f"Invalid LLM provider: '{analyzer.get('llm_provider')}'")
 
+
 def _check_notifier_section(notifier):
-    if 'email' not in notifier:
+    """Validate the notifier section of the configuration.
+
+    Args:
+        notifier: Notifier configuration dictionary.
+
+    Raises:
+        ValueError: If notifier configuration is invalid.
+    """
+    if "email" not in notifier:
         raise ValueError("Missing required subsection: 'email' in 'notifier'")
-    required_email_fields = ['to', 'from', 'password', 'smtp_server', 'smtp_port']
+    required_email_fields = ["to", "from", "password", "smtp_server", "smtp_port"]
     for field in required_email_fields:
-        if field not in notifier['email']:
+        if field not in notifier["email"]:
             raise ValueError(f"Missing required email field: '{field}'")
 
+
 def _check_logging_section(logging):
-    valid_logging_levels = ['DEBUG', 'INFO', 'WARNING', 'ERROR']
-    if logging.get('level') not in valid_logging_levels:
+    """Validate the logging section of the configuration.
+
+    Args:
+        logging: Logging configuration dictionary.
+
+    Raises:
+        ValueError: If logging configuration is invalid.
+    """
+    valid_logging_levels = ["DEBUG", "INFO", "WARNING", "ERROR"]
+    if logging.get("level") not in valid_logging_levels:
         raise ValueError(f"Invalid logging level: '{logging.get('level')}'")
 
+
 def is_valid_arxiv_category(category):
+    """Check if an arXiv category string is valid.
+
+    Args:
+        category: arXiv category string to validate.
+
+    Returns:
+        bool: True if the category format is valid.
+    """
     # A simple method to catch obviously invalid categories
-    pattern = r'^[a-z]+\.[A-Z]{2,}$'
+    pattern = r"^[a-z]+\.[A-Z]{2,}$"
     return bool(re.match(pattern, category))
 
+
 def get_last_processed_date():
+    """Get the date when papers were last processed.
+
+    Returns:
+        datetime.date: The last processed date if available, None otherwise.
+    """
     try:
         if os.path.exists(LAST_PROCESSED_DATE_FILE):
-            with open(LAST_PROCESSED_DATE_FILE, 'r') as f:
+            with open(LAST_PROCESSED_DATE_FILE, "r") as f:
                 date_str = f.read().strip()
                 return datetime.strptime(date_str, "%Y-%m-%d").date()
     except (IOError, ValueError) as e:
         logger.error(f"Error reading last processed date: {e}")
     return None
 
+
 def save_last_processed_date(date):
+    """Save the date when papers were last processed.
+
+    Args:
+        date: datetime.date object to save.
+    """
     try:
-        with open(LAST_PROCESSED_DATE_FILE, 'w') as f:
+        with open(LAST_PROCESSED_DATE_FILE, "w") as f:
             f.write(date.strftime("%Y-%m-%d"))
         logger.info(f"Saved last processed date: {date}")
     except IOError as e:
         logger.error(f"Error saving last processed date: {e}")
 
+
 def count_tokens(text):
+    """Count the number of tokens in a text string using tiktoken.
+
+    Args:
+        text: String to count tokens in.
+
+    Returns:
+        int: Number of tokens in the text.
+    """
     encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
-    return len(encoding.encode(text, allowed_special={'<|endoftext|>'}))
+    return len(encoding.encode(text, allowed_special={"<|endoftext|>"}))