This system is designed to parse XML data from the National Science Foundation (NSF) award data repository and integrate it into a MySQL database. The aim is to provide an efficient, user-friendly interface for accessing, analyzing, and visualizing research funding data. The project utilizes Python for XML parsing and Node.js for backend services, with an interface built using Express and Handlebars.
Before you begin the installation and setup, ensure your system meets the following requirements:
Python is used for writing the XML parsing scripts.
Node.js powers the backend server that interacts with the database and serves the frontend.
This database stores the parsed XML data.
Essential for version control and for cloning the repository.
Node package manager for managing backend dependencies.
These tools should be installed on a system with administrative access to ensure smooth setup and operation.
To get started, clone the repository to your local machine by running the following command in your terminal:
git clone https://github.com/sumanth3333/XML2SQL.git
cd XML2SQL
Python dependencies required for parsing XML and interacting with the MySQL database:
pip install mysql-connector-python pandas
Change to the backend directory and install the Node.js dependencies:
npm install express express-session express-handlebars http mysql
Set up your MySQL database:
mysql -u root -p
Enter your root password, then create and configure the NSF database:
CREATE DATABASE NSF;
USE NSF;
# Execute any additional database schema or seed data scripts here.
exit;
Run the Final.py
script to automatically configure the MySQL database with all necessary tables and settings:
python Final.py
This script handles the creation of database tables, configurations, and any other initial setup requirements, ensuring the system is ready for use without manual database setup.
After setting up the database, start the Node.js server using:
npm start
This command initiates the backend server which will be available at http://localhost:3306
. Please note, this port is typically used for MySQL connections, so if you are using it to serve web content instead, ensure that no conflicts occur, or adjust as necessary for your environment.
With the server running, navigate to http://localhost:3306
in your web browser to access the system interface. This setup allows you to view the entries in the MySQL database that were created from the parsed XML data.
Here are some common issues you might encounter and how to resolve them:
- Connection Errors: Check that MySQL is running on the expected port and accessible. Ensure that no other services are using port 3306.
- Environment Variables: Make sure all environment variables are set correctly in the
.env
file, particularly the database connection details. - Dependency Issues: Confirm that all necessary Node.js and Python packages are installed. Re-install any missing packages using npm for Node.js dependencies and pip for Python dependencies.
- Database Connection Failures: If you cannot connect to the MySQL database, check the credentials in the connection settings (host, user, password, database name). Ensure the MySQL service is active. On Linux, you can use:
If it's not running, start it with:
sudo systemctl status mysql.service
sudo systemctl start mysql.service
- Python Script Errors: If running
Final.py
results in errors, check the console for any error messages. Errors might be due to issues in parsing XML with unexpected formats or values. Ensure that the XML files adhere to the expected structure required by your script. - Node.js Server Not Starting: If the Node.js server fails to start, check for errors in the console output. Common issues might include missing package dependencies since you need to ensure all required node modules are installed. You can install any missing packages with:
npm install
- Port Conflicts: Since the system uses port 3306 for MySQL, make sure no other applications are using the same port. This port is crucial for database operations and should not be used for other services. Check if other instances of MySQL or services are running on the same port.
- XML Parsing Issues: If there are errors during XML parsing, verify the format and structure of the XML files. Ensure they match the expected schema as defined in your Python scripts. Errors often occur due to missing tags or incorrect data types in XML files.
- Session Management Errors: If issues arise related to session management in the Node.js application, verify that the
express-session
middleware is properly set up in your server configuration. Check if session handling is correctly implemented and that sessions are being correctly stored and retrieved. - Security Warnings: If you encounter security warnings from npm about vulnerabilities in packages, consider updating the affected packages by checking the latest versions available and updating your package.json file accordingly. Run:
to update the packages to their latest versions.
npm update