👻 Please follow me for new updates Dev.to | Github | Github Org
😉 Please join our discord server Discord
💫 Please have an interesting relationship with me Telegram | Email
An intelligent document analysis system that leverages OpenAI's Assistant API to extract, analyze and summarize key information from Homeowners Association (HOA) documents.
- Automated processing of multiple document formats (PDF, Word, TXT, MD)
- Smart information extraction using authority-based ranking system
- Vector store integration for efficient document search
- Comprehensive analysis across 20 key HOA topics
- Detailed answers and concise summaries for each topic
- JSON output for easy integration
- Document hierarchy enforcement based on authority ranking
- Intelligent handling of conflicting information
- Citation tracking and source documentation
- Token limit management through iterative processing
- Error handling and retry mechanisms
Key settings in main.py
:
HOA_DOCS_DIR = "./input/hoa_documents" # Input documents location
MODEL_NAME = "gpt-4o-mini" # AI model selection
TEMPERATURE = 0.1 # Response determinism (lower = more focused)
OUTPUT_DIR = "./output" # Results location
- Place HOA documents in the input/hoa_documents directory
- Run the analysis:
python main.py
- Find results in the output directory:
- {timestamp}-summary.json: Condensed findings by category
- {timestamp}-answers.json: Detailed analysis with sources
Documents are prioritized in this order (1 = highest authority):
- CC&R Amendments
- CC&Rs
- Bylaws
- Articles of Incorporation
- Operating Rules [...]
The system analyzes 20 key areas including:
- HOA Name and Official Details
- Financial Information (Dues, Increases, Health)
- Rules and Policies
- Management and Operations
- Community Features and Maintenance
- Legal and Insurance Matters
- Python 3.x
- OpenAI API access
- Required packages:
- openai
- python-docx
- PyPDF2 (for PDF processing)
The system generates two JSON files:
- Summary Table:
[
{
"Category": "Category Name",
"Findings": "Concise summary",
"Source": "Source document path"
}
]
- Detailed Answers:
[
{
"question": "Original question",
"answer": "Detailed response",
"summary": "Brief summary",
"source": "Source documents",
"source_ids": ["file_ids"]
}
]