A Qt-based graphical tool for processing and exploring tabular datasets (CSV) in machine learning and data analysis projects.
Note that DataMole
is still experimental and under development.
- Import and export CSV dataset
- Apply transformations using the graphical interface:
- Fill missing values
- Replace values
- Discretize numeric columns or datetimes attributes
- Scale columns
- One-hot encode
- Add/remove columns
- Join two tables
- Convert types (e.g. numeric -> string)
- Extract time series information from longitudinal datasets
- Draw scatterplots and line charts for time series
- Get data statistics (histogram, mean, std, etc.)
- Create pipelines of transformations and execute them
- Import and export pipelines (in
pickle
)
- Install Python >= 3.8.0
- Open a terminal. On Windows 10 use the Windows PowerShell
- Create a virtualenv (*):
- Install virtualenv:
python -m pip install virtualenv
- Move in the main
dataMole
folder (the one withmain.py
) - Create a virtualenv:
python -m virtualenv venv
- Activate it:
source ./venv/bin/activate
(.\venv\Scripts\Activate.ps1
on Windows)
- Install virtualenv:
- With the active virtualenv, install dependencies:
python -m pip install -r requirements.txt
- Generate Qt resources:
make resources
(**) - Start software with
python main.py
(*) Of course you can just use the global Python installation if you are ok with that (not recommended)
(**) On Windows make
command does not work, so the command to give at step 5 is:
pyside2-rcc dataMole/resources.qrc -o dataMole/qt_resources.py
This will generate a new file qt_resources.py
.
Refer to the user manual in docs/manuals
In addition to the packages listed in requirements.txt
you may want to install the ones listed in requirements.dev.txt
.
Refer to the developer manual in docs/manuals
for information about software architecture and on how to extend it.
Manuals .tex
source files are in docs/manuals/source
folder.
- Move into the
docs
folder - Generate stubs (rst) if required:
make stubs
- Generate documentation:
make html
for html documentationmake pdf
to convert the documentation in a single pdf file
The output will be found in the auto/build
directory.
Use make clean
to remove the build directory.
Here is a list of third party software used within this project.
- dsideb/nodegraph-pyqt: an interactive DAG visualization tool built in PySide/PyQt, licensed under GNU GPL-3.0;
- z3ntu/QtWaitingSpinner: configurable Qt widget for showing a waiting spinner, with MIT license.