TOPO is an application designed to ingest and manipulate data from multiple formats, providing an interactive local visualization of the data and exposing it as REST API endpoints. The project follows Object-Oriented Programming (OOP) principles to ensure clean, maintainable, and modular code. Additionally, it incorporates rigorous testing to validate functionality and reliability.
The frontend is built using Next.js and Tailwind CSS, providing a fast and responsive user interface. The backend is implemented with FastAPI, ensuring high-performance API interactions and efficient data processing.
You can sort the data according to the headings in the table by double clicking the headings of the table then click the arrow for ascending or descending sorting
Key Features:
- Data ingestion and processing from multiple formats.
- Interactive data visualization.
- REST API for exposing processed data.
- Robust OOP-based design for maintainability.
- Comprehensive unit and integration testing.
Ensure you have the following installed:
- Node.js (for frontend)
- Python (for backend)
- Package managers: npm/yarn for frontend, pip for backend
- Additional Dependicies - environment variables, API keys, charting libraries such as Chart.js, Recharts
-
Clone the repository:
git clone https://github.com/Pratz2005/TOPO.git cd TOPO
-
Install dependencies:
For frontend:
cd frontend npm install
For backend:
cd backend python -m venv .venv # Create a Virtual Environment # Activate Virtual Environment .venv\Scripts\activate # Windows source .venv/bin/activate # MacOS # Install dependencies pip install -r requirements.txt
-
Start the development server: Start Backend
cd backend //if not in backend python data_ingestion.py //run data_ingestion.py uvicorn main:app --reload
You can test the backend endpoints at http://127.0.0.1:8000/api/data for unified data, http://127.0.0.1:8000/api/{filetype}(for csv,json,pdf,pptx) and http://127.0.0.1:8000/api/revenue for revenue distribution
-
Start Frontend
cd frontend npm run dev
-
Access the application at:
http://localhost:3000 //(or if 3000 is in use it may access 3001, I have done checks for the CORS policy uptil 3002
- http://127.0.0.1:8000/api/data
- http://127.0.0.1:8000/api/data/csv
- http://127.0.0.1:8000/api/data/json
- http://127.0.0.1:8000/api/data/pdf
- http://127.0.0.1:8000/api/data/pptx
- http://127.0.0.1:8000/api/data/revenue
- Ensure the application is running.
- Execute integration tests:
cd backend $env:PYTHONPATH = "backend" pytest
test_get_full_data()
- Tests/api/data
, ensures a list response with"Date"
column.test_get_file_specific_data()
- Tests/api/data/csv
, ensures"Membership_ID"
column exists.test_get_nonexistent_file()
- Tests/api/data/nonexistent
, expects404
error.test_get_revenue_distribution()
- Tests/api/data/revenue
, expects"Gym"
in response.test_sorting()
- Tests/api/data/csv?sort_by=Revenue
, ensures correct sorting order.
test_ingest_csv()
- Ingestsdataset2.csv
, ensures"Date"
column exists.test_ingest_json()
- Ingestsdataset1.json
, ensures"company_id"
and"employee_name"
columns exist.test_ingest_pdf()
- Ingestsdataset3.pdf
, ensures"Revenue (in $)"
column exists.test_ingest_pptx()
- Ingestsdataset4.pptx
, ensures"Gym"
exists in revenue breakdown.test_ingest_missing_file()
- Tests ingestion of a non-existent file, expectsFileNotFoundError
.
-
Consistent Data Format Across Files
- Data files (CSV, JSON, PDF, PPTX) follow a structured format.
- Columns like
Date
,Membership_ID
,Revenue
exist in all datasets.
-
Users Interact via the Frontend UI
- The backend serves API responses; there’s no separate admin dashboard.
- Extracting Structured Data from PDFs & PPTX
- ❌ Issue: PDFs and PowerPoint slides store data in unstructured formats.
- ✅ Solution: Used
pdfplumber
for PDFs and `python-ppt
- CORS & API Connection Issues
- ❌ Issue: Frontend requests were blocked by CORS.
- ✅ Solution: Configured CORS Middleware in FastAPI to allow requests.