A powerful, desktop data cleaning and transformation tool built with love as a gift π
- About
- Features
- Technology Stack
- Installation
- Usage
- Development
- Building
- Important Notes
- Contributing
- License
Data Eater (Le Glouton) is a desktop application designed to make data cleaning, validation, and transformation a breeze. Originally built as a thoughtful present for a data professional working with legacy systems named 'Mom', it specializes in:
- CSV/Excel Processing: Handle large datasets with ease
- AS400/IBM iSeries Compatibility: Prepare data for mainframe systems with proper encoding (Windows-1252) and format validation
- Power BI Ready: Export clean, structured data perfect for business intelligence tools
- Zero Data Loss: Your source files are never modified - all operations work on in-memory copies
This application runs entirely on your local machine, ensuring complete data privacy and security. No cloud uploads, no external dependencies for core functionality. (Doesn't apply to deployments manually made as web-apps or toggling of LLM-based assistance features)!
Data Eater was originally created as a gift for a data professional, with an initial focus on a specific company's workflow. Feel free to fork, adapt, and customize it for your own use case!
- Multi-format Import: CSV, XLSX files with automatic encoding detection
- Advanced Filtering: Build complex filters with SQL-like syntax
- Smart Search: Real-time search across all columns
- Column Operations: Rename, reorder, delete, and transform columns
- Sorting & Grouping: Multi-column sorting and grouping capabilities
- Split Columns: Divide columns by delimiter or pattern
- Magic Join: Intelligent data merging from multiple sources
- Pivot/Unpivot: Reshape data for different analytical needs
- Regex Extractor: Extract patterns using regular expressions
- Formula Engine: Create calculated columns with custom formulas
- Conditional Logic: Apply IF-THEN-ELSE rules to your data
- Deduplication: Remove duplicate rows intelligently
- Name Splitter: Parse full names into first/last components
- Phone Standardizer: Normalize phone numbers across formats
- Email Validator: Validate and clean email addresses
- Currency Normalizer: Handle multiple currency formats
- Unit Converter: Convert between measurement units
- Mojibake Fixer: Repair encoding issues and garbled text
- Date Intelligence: Parse and standardize various date formats
- Mainframe Compatibility: Special tools for AS400/IBM iSeries
- Fixed-Width Parser: Handle fixed-width format files (80/132 columns)
- Encoding Control: Export with Windows-1252 or other legacy encodings
- Column Name Validation: Ensure AS400 compliance (30 char limit, no special chars)
- Health Dashboard: Get instant data quality metrics
- Data Visualization: Built-in charts and graphs
- Geo Mapping: Visualize location data on interactive maps
- SQL Console: Run custom SQL queries with DuckDB
- DAX Support: Create Power BI-compatible DAX measures
- Date Dimension Generator: Create calendar tables for time intelligence
- Multiple Formats: Export to CSV, XLSX, or SQL
- Encoding Options: Choose from various text encodings
- Session Persistence: Resume work where you left off
- Backup System: Automatic session backup and recovery
Data Eater is built with modern, performant technologies:
- React 19 - UI framework
- TypeScript - Type-safe development
- Vite - Fast build tool and dev server
- Tailwind CSS 4 - Utility-first styling
- Framer Motion - Smooth animations
- Glide Data Grid - High-performance data grid
- DuckDB WASM - In-browser SQL analytics database
- ExcelJS - Excel file manipulation
- Zustand - State management
- Leaflet - Interactive mapping
- Recharts - Data visualization
- i18next - Internationalization support
- Luxon - Date/time handling
- Zod - Schema validation
- Node.js 18+ and npm (or yarn/pnpm)
- Rust 1.70+ and Cargo
- System Dependencies for Tauri (varies by OS)
Windows:
# Install Visual Studio Build Tools or Visual Studio with C++ development tools
# Download from: https://visualstudio.microsoft.com/downloads/macOS:
# Install Xcode Command Line Tools
xcode-select --installLinux (Ubuntu/Debian):
sudo apt update
sudo apt install libwebkit2gtk-4.1-dev \
build-essential \
curl \
wget \
file \
libxdo-dev \
libssl-dev \
libayatana-appindicator3-dev \
librsvg2-dev- Clone the repository:
git clone https://github.com/YOUR-USERNAME/data_eater.git
cd data_eater- Install dependencies:
npm install- Run in development mode:
npm run tauri dev-
Launch Data Eater
- Run the application from your desktop or via
npm run tauri dev
- Run the application from your desktop or via
-
Import Your Data
- Drag and drop a CSV or XLSX file onto the main window
- Or click "SΓ©lectionner un Fichier" to browse
-
Explore & Clean
- Use the toolbox on the right to access transformation tools
- Search and filter your data using the top toolbar
- Select columns to view statistics and apply operations
-
Export Results
- Click "Exporter" or "Sauvegarder" in the header
- Choose your format and encoding
- Your original file remains untouched!
- Load your CSV/Excel file
- Use "Mainframizer" tool to validate column names
- Fix any encoding issues with "Mojibake Fixer"
- Export with Windows-1252 encoding
- Import your data source
- Use "Date Dimension" to create calendar tables
- Apply "DAX" tool to create measures
- Export as XLSX or CSV
- Load customer/contact data
- Apply "Phone Standardizer" for phone numbers
- Use "Email Validator" to clean emails
- Use "Name Splitter" to separate full names
- Export cleaned dataset
data_eater/
βββ src/ # React/TypeScript source code
β βββ components/ # UI components (39 components)
β βββ services/ # Business logic and data services
β βββ stores/ # Zustand state management
β βββ lib/ # Utilities and constants
β βββ assets/ # Images, fonts, etc.
βββ src-tauri/ # Rust backend
β βββ src/ # Tauri application code
β βββ icons/ # Application icons
β βββ Cargo.toml # Rust dependencies
βββ public/ # Static assets
βββ index.html # HTML entry point
βββ package.json # Node dependencies
βββ vite.config.ts # Vite configuration
# Start development server (web only)
npm run dev
# Start Tauri development (desktop app)
npm run tauri dev
# Build for production
npm run build
# Build Tauri desktop app
npm run tauri build
# Type checking
npm run tsc
# Preview production build
npm run preview- Create Component: Add to
src/components/ - Add Service Logic: Create service in
src/services/ - State Management: Use Zustand stores in
src/stores/ - Register in App: Import and add to
App.tsx
- Frontend: Use browser DevTools (F12 in dev mode)
- Backend: Rust logs in terminal during
tauri dev - SQL Queries: Enable console logging in DuckDB service
Build (soon) production-ready installers for your platform:
# Build for your current platform
npm run tauri build
# Output locations:
# - Windows: src-tauri/target/release/bundle/msi/
# - macOS: src-tauri/target/release/bundle/dmg/
# - Linux: src-tauri/target/release/bundle/deb/ or appimage/While primarily a desktop app, you can build the web interface:
npm run build
# Output: dist/ directoryNote: Some features (file system access, native menus) will not work in web version.
This application was originally built as a gift for a specific company and contains branded content throughout the codebase and UI.
Before sharing or deploying for your organization, you should remove or adapt all references to:
- "Robertet" - Company name (found in 12+ files)
- "Grasse" - Location references (4+ files)
- Company-specific terminology in error messages and UI
- Branded assets and imagery
- French-language content specific to the original use case
See CUSTOMIZATION.md for a complete list of files containing branded content and instructions on how to rebrand the application.
Key files containing brand references (see CUSTOMIZATION.md for complete list):
index.html- Page titlesrc-tauri/tauri.conf.json- App identifier and titlesrc/lib/constants.ts- Company name and configurationsrc/lib/errors.json- Error messages with company referencessrc/stores/mascotStore.ts- Recipe messagessrc/services/smartQueryService.ts- AI promptssrc/services/healthService.ts- Pattern commentssrc/components/FAQPage.tsx- Help text and descriptionssrc/components/GeoMapModal.tsx- Default map coordinatessrc/components/FixedWidthModal.tsx- Modal subtitlessrc/components/EmailValidatorModal.tsx- Example domainssrc/components/ConditionalLogicModal.tsx- Placeholder examples- Multiple other component files with French text and branding
Search commands to find all references:
# Find all "Robertet" references
grep -ri "robertet" src/
# Find all "Grasse" references
grep -ri "grasse" src/- All data processing happens locally on your machine
- No data is sent to external servers (except optional AI features if configured)
- Source files are never modified - operations work on in-memory copies
- Session data stored locally using LocalForage
Contributions are welcome! This project is open source and free to use, modify, and distribute.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow existing code style and conventions
- Add comments for complex logic
- Test your changes thoroughly
- Update documentation as needed
This project is licensed under the MIT License - see the LICENSE file for details.
β Permitted:
- Commercial use
- Modification
- Distribution
- Private use
β Required Before Distribution:
- Remove all branded content (company names, specific terminology, branded assets)
- Remove or replace personal references
π― Why? This app was built as a personal gift with specific branding. While the code is open source, the branding and company-specific content are not intended for redistribution. Please customize it for your own use!
- Built with β€οΈ as a gift for data professionals
- Inspired by powerful CLI tools like qsv and xan
- Powered by the amazing open-source community
If you encounter issues or have questions:
- Check the in-app FAQ (click the "?" icon)
- Review the CUSTOMIZATION.md guide
- Open an issue on GitHub
- Fork and adapt for your needs!
Happy Data Cleaning! π΄
Remember: Always remove branded content before sharing!