Merge pull request #83 from eli64s/refactor/repo-processor

refactor: Improve repository preprocessing design and metadata extraction.
eli64s · Jan 5, 2024 · 5096f17 · 5096f17
2 parents 67261d6 + f7b8224
commit 5096f17
Show file tree

Hide file tree

Showing 52 changed files with 2,618 additions and 1,586 deletions.
diff --git a/README.md b/README.md
@@ -281,9 +281,9 @@ See the <a href="#-configuration">Configuration</a> section below for the comple
       </tr>
       <tr>
         <td>3️⃣</td>
-        <td><a href="https://github.com/eli64s/readme-ai/blob/main/examples/markdown/readme-javascript.md">readme-javascript.md</a></td>
-        <td><a href="https://github.com/idosal/assistant-chat-gpt-javascript">(repository deleted)</a></td>
-        <td>JavaScript, React</td>
+        <td><a href="https://github.com/eli64s/readme-ai/blob/main/examples/markdown/readme-postgres.md">readme-postgres.md</a></td>
+        <td><a href="https://github.com/jwills/buenavista">postgres-proxy-server</a></td>
+        <td>Python, Postgres, Duckdb, Docker</td>
       </tr>
       <tr>
         <td>4️⃣</td>
@@ -351,19 +351,7 @@ A repository URL or local path to your codebase is required run readme-ai. The f
 
 **OpenAI API Key**
 
-An OpenAI API account and API key are needed to use *readme-ai*. The following steps outline the process.
-
-<details closed>
-  <summary>🔐 OpenAI API Account Setup</summary>
-  <ol>
-    <li>Go to the <a href="https://platform.openai.com/">OpenAI website</a>.</li>
-    <li>Click the "Sign up for free" button.</li>
-    <li>Fill out the registration form with your information and agree to the terms of service.</li>
-    <li>Once logged in, click on the "API" tab.</li>
-    <li>Follow the instructions to create a new API key.</li>
-    <li>Copy the API key and keep it in a secure place.</li>
-  </ol>
-</details>
+An OpenAI API account and API key are needed to use *readme-ai*. Get started by creating an account [here](https://platform.openai.com/docs/quickstart/account-setup). Once you have an account, you can create an API key on the [API settings page](https://platform.openai.com/api-keys).
 
 > [!WARNING]
 >
@@ -395,11 +383,16 @@ conda install -c conda-forge readmeai
 Alternatively, clone the readme-ai repository and build from source.
 
 ```sh
-git clone https://github.com/eli64s/readme-ai && \
+git clone https://github.com/eli64s/readme-ai
+```
+
+Change into the project directory.
+
+```sh
 cd readme-ai
 ```
 
-Then use one of the methods below to install the project's dependencies (Bash, Conda, Pipenv, or Poetry).
+And install the dependencies using one of the methods below.
 
 Using `bash`
 ```sh
@@ -420,7 +413,7 @@ poetry shell
 
 ---
 
-### 👩‍💻 Running *README-AI*
+### 👩‍💻 Running *readme-ai*
 
 Before running the application, ensure you have an OpenAI API key and its set as an environment variable.
 
@@ -554,11 +547,6 @@ The readme-ai tool is designed with flexibility in mind, allowing users to confi
 <details closed><summary>🔠 Configuration Models</summary>
 <br>
 
-<!--
-# README-AI Configuration and Settings
-This documentation provides an overview of the configuration and settings for the README.ai CLI tool. It details various data models and functions that are used to configure the tool, making it adaptable for different environments and use cases.
--->
-
 ***GitService Enum***
 
 - **Purpose**: Defines Git service details.

diff --git a/docs/architecture.md b/docs/architecture.md
diff --git a/docs/concepts.md b/docs/concepts.md
@@ -0,0 +1,106 @@
+# readme-ai Core Concepts
+
+readme-ai is a tool for auto-generating README files for code repositories using AI. Here are some of its key concepts:
+
+## Repository Analysis
+
+- Traverses the repository directory tree to build a code structure overview
+- Extracts metadata like dependencies and languages used
+- Analyzes characteristics to inform content generation
+
+## AI-Powered Content Creation
+
+- Uses GPT language models via the OpenAI API
+- Structured prompts injected with repository details
+- Generates sections like project overview and technical features
+- Summarizes code files in markdown tables
+
+## Customization
+
+- Flexible configuration system
+- CLI options to tweak badge icons, images, model settings
+- Supports different badge styles like flat, plastic, skills
+- Can provide custom images and set text alignment
+- Edit prompt templates to influence content
+
+## Modular Design
+
+- Components and parsers decoupled from core logic
+- Built using factory and strategy patterns
+- Easily extend functionality with new parsers
+- Abstracts services like file handling and git ops
+
+## Asynchronous Workflows
+
+- Leverages Python asyncio for non-blocking I/O
+- Concurrent networking, disk and CPU bound tasks
+- Manages OpenAI rate limits for optimal performance
+- Resource management via async context managers
+
+## Robustness
+
+- Exponential backoff retry logic for resilience
+- Caching frequently used responses
+- Handles Unicode encoding errors gracefully
+- Secure temp directories to isolate repository
+- Configurable logging for debuggability
+
+By leveraging these concepts and more, readme-ai aims to offer a flexible platform for auto-generating documentation to boost developer productivity.
+
+---
+
+ Here is a markdown document discussing some of the core concepts of the readme-ai project:
+
+# README-AI Core Concepts
+
+README-AI is a tool for auto-generating detailed README files for software projects using AI. It utilizes several core concepts and components to analyze codebases and produce high-quality documentation.
+
+## Codebase Analysis
+
+README-AI performs an in-depth analysis of the provided codebase to extract key information.
+
+- **File traversal**: Recursively traverse the codebase directory to identify all files. Special cases like ignoring certain files or handling GitHub workflows are handled programmatically.
+
+- **Metadata extraction**: File metadata like name, path, content, language, dependencies etc. are extracted and stored. Popular dependency manifest formats are parsed to detect dependencies.
+
+- **Content preprocessing**: File contents are tokenized to allow smarter content generation tailored to codebase complexity.
+
+The output is a structured `FileData` object that encapsulates file details.
+
+## LLM API Integration
+
+Language Models like GPT-3 are leveraged to generate fluent text for documentation.
+
+- **Modular design**: The LLM API client is abstracted into a separate `ModelHandler` class to allow swapping out different AI providers.
+
+- **Prompt engineering**: Carefully crafted prompt templates are populated with codebase metadata to produce accurate, relevant content.
+
+- **Batching & caching**: Requests are batched and caching used to optimize performance and costs. Exponential backoff retries handle errors.
+
+Generated text is inserted into Markdown templates to build a full-fledged README.
+
+## Configuration-driven
+
+The tool relies extensively on configuration using Pydantic models.
+
+- **Settings**: Central settings file with common constants and file paths. Helper configuration provides additional customization.
+
+- **Validation**: Rigorous validations are performed on settings like repository URL to prevent errors.
+
+- **Extensibility**: Adding new features or functionality requires minimal code changes due to config-driven design.
+
+Overall, this promotes maintainability, testability and flexibility.
+
+## Customizable Output
+
+Users can customize the look and feel of the generated README by providing a range of CLI options.
+
+- **Appearance**: Choose badge styles, header images, alignment options and more for unique styling.
+
+- **Content**: Control language model behavior with parameters like temperature and max tokens. Toggle emojis in text.
+
+- **Templates**: (WIP) Generate focused READMEs for domains like machine learning, webdev etc.
+
+In summary, README-AI aims to simplify documentation through intelligent automation, while keeping the user in control.
+
+---