-
Notifications
You must be signed in to change notification settings - Fork 182
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #83 from eli64s/refactor/repo-processor
refactor: Improve repository preprocessing design and metadata extraction.
- Loading branch information
Showing
52 changed files
with
2,618 additions
and
1,586 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# readme-ai Core Concepts | ||
|
||
readme-ai is a tool for auto-generating README files for code repositories using AI. Here are some of its key concepts: | ||
|
||
## Repository Analysis | ||
|
||
- Traverses the repository directory tree to build a code structure overview | ||
- Extracts metadata like dependencies and languages used | ||
- Analyzes characteristics to inform content generation | ||
|
||
## AI-Powered Content Creation | ||
|
||
- Uses GPT language models via the OpenAI API | ||
- Structured prompts injected with repository details | ||
- Generates sections like project overview and technical features | ||
- Summarizes code files in markdown tables | ||
|
||
## Customization | ||
|
||
- Flexible configuration system | ||
- CLI options to tweak badge icons, images, model settings | ||
- Supports different badge styles like flat, plastic, skills | ||
- Can provide custom images and set text alignment | ||
- Edit prompt templates to influence content | ||
|
||
## Modular Design | ||
|
||
- Components and parsers decoupled from core logic | ||
- Built using factory and strategy patterns | ||
- Easily extend functionality with new parsers | ||
- Abstracts services like file handling and git ops | ||
|
||
## Asynchronous Workflows | ||
|
||
- Leverages Python asyncio for non-blocking I/O | ||
- Concurrent networking, disk and CPU bound tasks | ||
- Manages OpenAI rate limits for optimal performance | ||
- Resource management via async context managers | ||
|
||
## Robustness | ||
|
||
- Exponential backoff retry logic for resilience | ||
- Caching frequently used responses | ||
- Handles Unicode encoding errors gracefully | ||
- Secure temp directories to isolate repository | ||
- Configurable logging for debuggability | ||
|
||
By leveraging these concepts and more, readme-ai aims to offer a flexible platform for auto-generating documentation to boost developer productivity. | ||
|
||
--- | ||
|
||
Here is a markdown document discussing some of the core concepts of the readme-ai project: | ||
|
||
# README-AI Core Concepts | ||
|
||
README-AI is a tool for auto-generating detailed README files for software projects using AI. It utilizes several core concepts and components to analyze codebases and produce high-quality documentation. | ||
|
||
## Codebase Analysis | ||
|
||
README-AI performs an in-depth analysis of the provided codebase to extract key information. | ||
|
||
- **File traversal**: Recursively traverse the codebase directory to identify all files. Special cases like ignoring certain files or handling GitHub workflows are handled programmatically. | ||
|
||
- **Metadata extraction**: File metadata like name, path, content, language, dependencies etc. are extracted and stored. Popular dependency manifest formats are parsed to detect dependencies. | ||
|
||
- **Content preprocessing**: File contents are tokenized to allow smarter content generation tailored to codebase complexity. | ||
|
||
The output is a structured `FileData` object that encapsulates file details. | ||
|
||
## LLM API Integration | ||
|
||
Language Models like GPT-3 are leveraged to generate fluent text for documentation. | ||
|
||
- **Modular design**: The LLM API client is abstracted into a separate `ModelHandler` class to allow swapping out different AI providers. | ||
|
||
- **Prompt engineering**: Carefully crafted prompt templates are populated with codebase metadata to produce accurate, relevant content. | ||
|
||
- **Batching & caching**: Requests are batched and caching used to optimize performance and costs. Exponential backoff retries handle errors. | ||
|
||
Generated text is inserted into Markdown templates to build a full-fledged README. | ||
|
||
## Configuration-driven | ||
|
||
The tool relies extensively on configuration using Pydantic models. | ||
|
||
- **Settings**: Central settings file with common constants and file paths. Helper configuration provides additional customization. | ||
|
||
- **Validation**: Rigorous validations are performed on settings like repository URL to prevent errors. | ||
|
||
- **Extensibility**: Adding new features or functionality requires minimal code changes due to config-driven design. | ||
|
||
Overall, this promotes maintainability, testability and flexibility. | ||
|
||
## Customizable Output | ||
|
||
Users can customize the look and feel of the generated README by providing a range of CLI options. | ||
|
||
- **Appearance**: Choose badge styles, header images, alignment options and more for unique styling. | ||
|
||
- **Content**: Control language model behavior with parameters like temperature and max tokens. Toggle emojis in text. | ||
|
||
- **Templates**: (WIP) Generate focused READMEs for domains like machine learning, webdev etc. | ||
|
||
In summary, README-AI aims to simplify documentation through intelligent automation, while keeping the user in control. | ||
|
||
--- |
Oops, something went wrong.