MagicXML is a FastAPI-based service designed to fetch, process, and convert XML data into structured CSV files. It is optimized for handling large XML files by processing them in chunks asynchronously, making it suitable for heavy data processing tasks.
- Asynchronous processing: Efficiently fetches and processes XML data in chunks using asyncio and aiohttp.
- Customizable XML Parsing: Handles specific XML structures, extracting and cleaning data as required.
- CSV Export: Converts XML data into well-structured CSV files, accommodating various encoding standards.
- REST API Interface: Simple API endpoints to trigger the processing and retrieval of files.
- Error Handling: Robust error management ensures that issues during XML processing are captured and reported.
- Python 3.8+
- FastAPI: A modern, fast (high-performance), web framework for building APIs with Python 3.6+.
- aiohttp: An asynchronous HTTP client/server framework.
- aiofiles: A library for handling local file operations asynchronously.
- Jinja2: A templating engine for Python.
To use the API, you can send a POST
request to the /process_link
endpoint with the necessary parameters. Below is an example using curl
:
curl -X 'POST' \
'https://solarxml.replit.app//process_link' \
-H 'Content-Type: application/json' \
-d '{"link_url": "YOUR_XML_URL", "preset_id": "id=1234"}' \
-o process_response.json
Replace YOUR_XML_URL with the actual URL of the XML data you want to process. This request will save the response in a file named process_response.json.
git clone https://github.com/Solrikk/MagicXML.git
cd MagicXML
You can install the required dependencies using pip:
pip install -r requirements.txt
Renders the index page with instructions or UI for interacting with the service.
- Description: Renders the index page.
- Response: HTML page.
- Description: Processes the given link to fetch, parse, and save XML data into a CSV file.
- Request Body:
link_url
(str): The URL to fetch the XML data from.preset_id
(str, optional): An optional preset ID.
- Response: JSON containing the URL of the generated CSV file and the preset ID.
- Description: Downloads the specified CSV file.
- Path Parameter:
filename
(str): The name of the CSV file to download.
- Response: The requested CSV file.
main.py
: The main application file.templates/
: Directory containing HTML templates.static/
: Directory containing static files (CSS, JS, images).data_files/
: Directory where the generated CSV files are saved.
- Imports: Various libraries and modules for HTTP handling, asynchronous operations, file handling, and XML parsing.
- FastAPI Setup: Initializes the FastAPI app, sets up Jinja2 templates, and mounts the static files directory.
- Data Models: Defines the
LinkData
model using Pydantic for request validation. - Utility Functions: Includes functions for removing unwanted HTML tags from descriptions and fetching URL data in chunks.
- Processing Functions: Contains asynchronous functions to process XML data, parse it, and save it into CSV files.
- API Endpoints: Defines endpoints for the root page, processing links, and downloading CSV files.
remove_unwanted_tags(description)
: Removes HTML tags from a given description string.fetch_url_in_chunks(link_url, chunk_size=1024)
: Fetches data from a URL in chunks and yields the data.process_offer(offer_elem, build_category_path)
: Processes individual XML elements to extract offer data.process_link_stream(link_url, chunk_size=1024)
: Processes the XML data stream from the URL, parses it, and writes it to a CSV file.
- CORS is enabled for all origins (). This can be restricted as needed.*
- Ensure to handle any sensitive data appropriately and restrict access to certain endpoints if necessary.
MagicXML is maintained by Solrikk. If you have any questions or need further assistance, please feel free to reach out.