Automated MS Access Data Processing with Intelligent ID Mapping
Professional Python tool for bulk processing MS Access database files, creating correspondence tables, and generating complex identifiers according to specified rules.
β
Bulk Processing - Simultaneous processing of multiple Access files
β
Intelligent Mapping - Automatic ID matching using correspondence tables
β
ID Generation - Creating complex identifiers according to rules
β
Duplicate Detection - Excluding duplicate records
β
Detailed Logging - Tracking every processing step
β
Python 3.6.8+ Compatibility - Works on older Python versions
β
Clean Code - Fully documented and readable code
git clone https://github.com/palagina00/ms-access-data-processor.git
cd ms-access-data-processorpip install -r requirements.txtpython tests/generate_test_data.pypython src/access_processor.pycat data/output/result.csv- Python: 3.6.8 or higher
- OS: Windows, Linux, macOS
- MS Access Driver: for working with .mdb files (optional)
# 1. Install Python 3.6.8+
# Download from https://www.python.org/downloads/
# 2. Create virtual environment
python -m venv venv
venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Install MS Access Driver (if needed)
# Download: https://www.microsoft.com/en-us/download/details.aspx?id=13255# 1. Create virtual environment
python3 -m venv venv
source venv/bin/activate
# 2. Install dependencies
pip install -r requirements.txtDetailed instructions: INSTALLATION.md
from src.access_processor import AccessDataProcessor
# Create processor
processor = AccessDataProcessor(
input_dir='data/input',
correspondence_file='data/correspondence.csv',
codes_file='data/filename_codes.csv'
)
# Process all files
processor.process_all_files('data/output/result.csv')python src/access_processor.pyInput file (input/18%Ese21.csv):
RecordID;ID;SomeData
1;8d 7d 2c_Ah9h;Data_1
2;3f 2a 1b_Xk5l;Data_2
Correspondence table (correspondence.csv):
id;ID2
8d 7d 2c_Ah9h;8d 7d 2c_P000
3f 2a 1b_Xk5l;3f 2a 1b_P000
Filename codes (filename_codes.csv):
filename;code
18%Ese21.csv;AF21
Result (output/result.csv):
ID3;ID4
AF21_8d 7d 2c_P000;AF21_8d 7d 2c_Ah9h
AF21_3f 2a 1b_P000;AF21_3f 2a 1b_Xk5l
More examples: USAGE.md
ms-access-data-processor/
β
βββ data/ # Data files
β βββ input/ # Input files
β βββ output/ # Output files
β βββ correspondence.csv # ID β ID2 mapping table
β βββ filename_codes.csv # Filename β Code mapping
β
βββ src/ # Source code
β βββ __init__.py
β βββ access_processor.py # Main processor
β
βββ tests/ # Tests and utilities
β βββ generate_test_data.py # Test data generator
β
βββ docs/ # Documentation
β βββ INSTALLATION.md # Installation guide
β βββ USAGE.md # User guide
β
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore files
βββ LICENSE # MIT License
βββ README.md # This file
Step 1: Create output CSV file
Step 2: Read all input files
Step 3: For each ID β find corresponding ID2
Step 4: Generate ID3 = CODE + "_" + ID2
Step 5: Check for ID3 duplicates
Step 6: Generate ID4 = ID3[:14] + original_ID[-4:]
Step 7: Write to output file
Input:
Filename: 18%Ese21.csv
ID: "8d 7d 2c_Ah9h"
Processing:
1. Code = "AF21" (from filename_codes.csv)
2. ID2 = "8d 7d 2c_P000" (from correspondence.csv)
3. ID3 = "AF21_8d 7d 2c_P000"
4. ID4 = "AF21_8d 7d 2c_" + "Ah9h" = "AF21_8d 7d 2c_Ah9h"
Output:
ID3;ID4
AF21_8d 7d 2c_P000;AF21_8d 7d 2c_Ah9h
- Python 3.6.8+ - Main programming language
- pyodbc - Working with MS Access databases
- CSV - CSV file processing
- Logging - Detailed process logging
- Pathlib - Modern path handling
- β Speed: ~1000 records/sec
- β Memory: Minimal usage (stream processing)
- β Scalability: Support for files of any size
- β Reliability: Complete error handling
- β Data migration between systems
- β Creating lookup tables
- β Generating unique identifiers
- β Bulk database processing
- β ETL processes (Extract, Transform, Load)
- Basic CSV file processing
- Intelligent ID mapping
- Process logging
- Test data generator
- Real .mdb file support via pyodbc
- GUI interface
- Excel export with formatting
- Parallel file processing
Pull requests are welcome! For major changes, please open an issue first to discuss.
This project is licensed under MIT License.
Palagina Ekaterina
- π§ Email: palagina00@gmail.com
- π GitHub: @palagina00
- πΌ Portfolio: github.com/palagina00
If this project was helpful, please give it a β on GitHub!
Have questions or suggestions?
- π§ Email: palagina00@gmail.com
- π Report Bug: Issues
- π‘ Request Feature: Issues
Made with β€οΈ by Palagina Ekaterina