A simple Python-based GUI application for data preprocessing and visualization. Import your datasets, apply mathematical transformations, and visualize the results side-by-side in real-time.
- Overview
- Features
- Installation
- Usage
- Transformations
- Statistics
- Project Structure
- Requirements
- Screenshots
- About the Author
This tool is designed for quick data exploration and preprocessing. Load a CSV or Excel file, select columns to visualize on dual graphs, and apply transformations to see how they affect your data distribution. Statistics are calculated and displayed automatically for both graphs.
Built with Tkinter for the GUI and matplotlib for visualization. The interface is straightforward - no complicated menus or settings, just load your data and start transforming.
- Dual Graph Visualization - Compare original and transformed data side-by-side
- 6 Mathematical Transformations - Standardize, Min-Max, Log, Square Root, Inverse, Exponential
- Real-Time Statistics - Automatic calculation of 8 key statistics for each graph
- Import/Export - Support for both CSV and Excel formats
- DPI-Aware Rendering - Sharp text and graphics on high-resolution displays
- Column Selection - Choose any numeric columns for X and Y axes independently
- Clone the repository:
git clone https://github.com/dn-stef/data-transform-tool.git
cd data-transform-tool- Install dependencies:
pip install -r requirements.txt- Run the application:
python main.py-
Import Data
- Click
File → Import Data - Select a CSV or Excel file (or use files from
sample-data/) - The tool automatically detects numeric columns
- Click
-
Select Columns
- Use the dropdown menus under "Left Graph" and "Right Graph"
- Choose X and Y columns for each graph independently
- Graphs update automatically when selections change
-
Apply Transformations
- Select a column from the "Transform Column" dropdown
- Click any transformation button (Standardize, Min-Max, Log, etc.)
- The transformation applies to the selected column
- The transformed column is added as a new option in the graph dropdowns
-
Export Data
The tool supports 6 mathematical transformations:
Scales data to have mean = 0 and standard deviation = 1.
Formula: (x - mean) / std
Scales data to a 0-1 range.
Formula: (x - min) / (max - min)
Note: Undefined when all values are identical (max = min)
Applies natural logarithm. Useful for right-skewed distributions.
Formula: ln(x)
Note: Only works on positive values
Takes the square root of each value. Reduces right skew.
Formula: √x
Note: Only works on non-negative values
Takes the reciprocal of each value.
Formula: 1 / x
Note: Undefined for zero values
Applies exponential function.
Formula: e^x
Each graph displays 8 statistics in a 3x3 grid below the graphs:
- Count - Number of data points
- Mean - Average value
- Median - Middle value when sorted
- Mode - Most frequently occurring value
- Std - Standard deviation (spread of data)
- Min - Minimum value
- Max - Maximum value
- Range - Difference between max and min
Statistics update automatically when columns change or transformations are applied.
data-transform-tool/
├── main.py # Entry point, launches the GUI
├── gui.py # Main GUI class and interface logic
├── utils.py # Data import/export and statistics functions
├── transformations.py # Mathematical transformation implementations
├── requirements.txt # Python dependencies
├── sample-data/ # Example datasets for testing
└── screenshots/ # GUI screenshots for README
- main.py - Initializes the application with DPI awareness and starts the GUI
- gui.py - Contains the
DataProcessingGUIclass with all UI components, controls, and event handlers - utils.py - Handles file I/O operations and calculates statistics (mean, median, mode, etc.)
- transformations. py - Implements all 6 transformation functions with error handling
- sample-data/ - Contains example CSV/Excel files for testing the tool
- screenshots/ - Images of the GUI for documentation
- Python 3.7+
- pandas - Data manipulation and CSV/Excel handling
- numpy - Numerical operations and array processing
- matplotlib - Graph plotting and visualization
- openpyxl - Excel file support
- tkinter - GUI framework (included with Python)
Install all dependencies with:
pip install pandas numpy matplotlib openpyxlI'm a physics graduate with a focus on data analysis and Python programming. This project was built to expand my skills in GUI development and data preprocessing while creating something practical and applicable to real-world workflows.
I work with Python, data visualization, and building tools that simplify complex tasks. This tool combines physics-oriented thinking with programming to make data exploration more accessible.
This GUI was built with assistance from GitHub Copilot, which helped accelerate development and implement features more efficiently.
Built with Python, Tkinter, and matplotlib.

