This project is a Python application that extracts metadata from files within a zip archive and generates JSON files containing the extracted metadata. It uses Azure OpenAI's language model to analyze and extract relevant information from the file contents.
- Extracts metadata from various file types within a zip archive
- Generates JSON files for each processed file, containing structured metadata
- Handles zip file contents efficiently
- Uses Azure OpenAI's language model for intelligent metadata extraction
- Provides two user interface options:
- Custom HTML UI (app.py)
- Streamlit-based UI (test.py)
- Python 3.12
- pipenv
-
Clone this repository:
git clone https://github.com/jc2409/jsonify.git cd jsonify
-
Install dependencies using pipenv:
pipenv install
-
Set up your Azure OpenAI API credentials: Create a
.env
file in the project root and add your Azure OpenAI API key and version:AZURE_OPENAI_API_KEY=your_api_key_here AZURE_OPENAI_API_VERSION=your_api_version_here
-
Activate the virtual environment:
pipenv shell
-
Choose your preferred UI option:
Run the Flask application:
python app.py
Access the application by navigating to
http://localhost:5000
in your web browser.Run the Streamlit application:
streamlit run test.py
Your default web browser should open automatically to the Streamlit app. If not, access it at the URL provided in the terminal.
-
Upload a zip file through the web interface.
-
Process the zip file and view the extracted metadata.
-
Download the generated JSON files containing the metadata.
- You can modify the
FileMetadata
class in the Python scripts to adjust the metadata fields extracted from each file. - The
extract_text
function can be expanded to handle additional file types for text extraction. - To customize the Custom HTML UI, edit the HTML templates in the
templates
folder and updateapp.py
accordingly. - For Streamlit UI modifications, edit
test.py
directly.