Match messages from Column A (platform messages) against Column B (internal messages) in an Excel file.
This script cleans and normalizes both columns and then checks:
Does each message in Column A exist somewhere in Column B (after cleaning)?
If yes → Column C will contain the matching B value If no → Column C will say "no match"
The script outputs a new Excel file with the results.
-
Reads an
.xlsxfile -
Cleans/normalizes the text in both columns
- lowercasing
- remove punctuation
- remove emojis
- remove weird characters
- collapse extra spaces
-
Creates a fast lookup of normalized Column B values
-
For every row in Column A:
- If normalized A matches normalized B → copy the original B into Column C
- Otherwise →
"no match"
-
Saves a new
.xlsxwith the results
You need Python installed.
Install dependencies:
pip install pandas openpyxlThat’s it.
your-folder/
│
├── match_messages.py
├── input.xlsx
└── (results will be saved here as output.xlsx)
-
Put your input Excel file in the same folder (example:
input.xlsx). -
Open a terminal / command prompt in that folder.
-
Run the script:
python match_messages.py input.xlsx output.xlsx- A new file will appear:
output.xlsx
-
Open
output.xlsxand look at Column C:- If a matching message was found → it shows the matching B message
- If not →
"no match"
Input columns:
| Column A (platform msg) | Column B (internal msg) |
|---|---|
| "Please refund me!!!" | "please refund me" |
| "Driver stole my order" | no similar message |
Output column:
| Column C |
|---|
| please refund me |
| no match |
The script assumes:
- Column A = first column in Excel
- Column B = second column
- It writes results to a new Column C
If your file is different, you can easily adjust the code.
- It cleans both A and B using the same rules
- It then compares the cleaned text, not the raw text
- If the cleaned A text equals cleaned B text → that’s a match
- Exact message deduping, not fuzzy or semantic matching
This avoids mistakes and runs fast.