This project is designed to process and analyze product data from multiple sources, map related manufacturers, and provide insights into manufacturer relationships.
- Setup
- Usage
- Main Components
- Data Flow
- Output
- Performance Improvements
- Validation Enhancements
- Logging
- Node.js (v14 or later recommended)
- npm (usually comes with Node.js)
-
Clone the repository:
git clone https://github.com/zahidhasann88/product-manufacturer-mapping.git cd product-manufacturer-mapping
-
Install dependencies:
npm install
-
Place your CSV data files in the
./data
directory.
- To compile the TypeScript code and run the application, use:
npm run build-and-start
Reads and parses CSV files containing product data and matches. Utilizes parallel processing for improved performance when handling multiple CSV files.
Maps related manufacturers based on product data and matches. It also determines the relationship type (parent/child/sibling) between manufacturers.
Assigns a brand to a given product title based on the manufacturer relations. The brand is determined using a case-insensitive match against the product title.
Handles all database operations, including initializing the database, saving manufacturer relations, and retrieving data. The database operations ensure transactional integrity during complex operations.
Configures and manages application logging. Logs are written to both console and file, with configurable log levels and directory paths.
Implements heuristics to flag potentially faulty manufacturer matches, improving data quality by detecting anomalies.
- CSV Reading: CSV files are read in parallel from the
data/
directory using the CSV Reader. - Manufacturer Mapping: Product data and matches are processed by the Manufacturer Mapper, leveraging enhanced relationship detection algorithms.
- Brand Assignment: Product titles are assigned a brand using the Brand Assigner.
- Database Operations: Manufacturer relations are generated and saved to the SQLite database via the Database Manager.
- Validation: The Validation Algorithm flags potentially problematic manufacturer matches.
- Logging: All operations and results are logged to both the console and log files.
The program produces the following outputs:
- Console Logs: Detailed process steps and results are output to the console.
- Log Files: Logs are stored in the
logs/
directory for comprehensive debugging and auditing. - SQLite Database: A database file (
manufacturer_relations.db
) containing the manufacturer relations. - Flagged Manufacturers: A list of manufacturers that may require manual review, based on the enhanced validation algorithm.
The implementation includes parallel processing for CSV file reading, which significantly improves performance when dealing with multiple data files.
The validation algorithm includes additional heuristics such as string similarity checks and detection of number-only manufacturers, reducing the likelihood of false matches.
Logging is configured via environment variables:
LOG_DIR
: Specifies the directory where log files are stored (default:./logs
).LOG_LEVEL
: Specifies the log level (info
,error
, etc.), allowing fine-grained control over logging verbosity.
- Error Logs: Stored in
logs/error.log
for capturing error-level messages. - Combined Logs: Stored in
logs/combined.log
, containing all log messages above the configured log level.