Skip to content

Latest commit

 

History

History
202 lines (162 loc) · 6.15 KB

README.md

File metadata and controls

202 lines (162 loc) · 6.15 KB

Soft String Compare

C++ functions to judge the similarity of strings. Originally created to correct messy human input. Also compiles to WebAssembly.

Install

You need cmake and a C++ compiler (such as g++) installed.

# clone repo
git clone https://github.com/Infinitifall/Soft-String-Compare
cd Soft-String-Compare

# make
cd build_native
cmake ../ -DCMAKE_BUILD_TYPE=Release
# cmake ../ -DCMAKE_BUILD_TYPE=Debug
cmake --build ./

Use

The code in example.cpp will try to correct the messy product names in input_list.txt (see below) by matching them to a catalog of products.

iphne 13 pro maks
nkie ar jrdan 1
samsng 55" qled 4k smrt tv
dysin v11 absloote
instnt pot dou plus 6 qt
playstaton 5 digitl editon
fitbt versa 3 smrt wach
keurigk-cup coffe makr
bose quetcomfort 45 hdphnes
nutribullt pro - 13-pce hi-sped blndr
roomba i3+ evo self-emtying robt vacum
ninja foodie 10-in1 xl pro ar fyer & othr
lgo star wars milenim falcn

...

Run the example executable

# run
cd build_native
./example
✅ iPhone 13 Pro Max
✅ Nike Air Jordan 1
✅ Samsung 55" QLED 4K Smart TV
✅ Dyson V11 Absolute
✅ Instant Pot Duo Plus 6 Qt
✅ PlayStation 5 Digital Edition
✅ Fitbit Versa 3 Smartwatch
✅ Keurig K-Cup Coffee Maker
✅ Bose QuietComfort 45 Headphones
✅ NutriBullet Pro - 13-Piece High-Speed Blender
✅ iRobot Roomba i3+ EVO Self-Emptying Robot Vacuum
✅ Ninja Foodi 10-in-1 XL Pro Air Fryer & Other
✅ LEGO Star Wars Millennium Falcon
✅ Amazon Echo Dot (4th Gen) Smart Speaker
✅ VIZIO 5.1 Surround Sound Bar System
✅ Logitech MX Master 3S Wireless Mouse
✅ KitchenAid Artisan Series 5 Qt. Mixer
✅ GoPro HERO11 Black
✅ Apple Watch Series 7 GPS + Cellular
✅ Sonos One SL Wi-Fi Speaker
✅ Microsoft Surface Pro 8 Laptop
✅ LG 65" C1 Series OLED 4K UHD Smart TV
✅ Breville The Barista Express Espresso Machine
✅ Garmin Forerunner 945 GPS Running Watch
✅ Whirlpool 4.8 cu. ft. Front Load Washer
✅ Canon EOS R6 Mirrorless Camera
✅ Beats by Dr. Dre Studio3 Wireless Over-Ear Headphones
✅ Theragun Prime Deep Tissue Massage Gun
✅ Philips Norelco Multigroom All-in-One Trimmer
✅ Nespresso Vertuo Next Coffee & Espresso Maker
✅ Ring Video Doorbell Pro 2
✅ Brita Longlast Water Filter Pitcher
✅ Vitamix E310 Explorian Series Blender
✅ Traeger Pro 575 Wood Pellet Grill
✅ Oculus Quest 2 Advanced All-in-One Virtual Reality Headset
✅ Sunbeam Osmo 3 Reverse Osmosis Water Filter System
✅ Merax 10' Trampoline with Enclosure
✅ Klipsch HT-G700 3.1ch Dolby Atmos Soundbar
✅ YETI Tundra 45 Hard Cooler
✅ RTIC UltraLight 52 Qt Cooler
✅ HidrateSpark 3.0 32oz Insulated Water Bottle
❌ Bose QuietComfort 45 Headphones (lumin ultra-comfortble coper-infusd matres = Leesa Original Mattress)
✅ Anova Culinary Sous Vide Precision Cooker
✅ AeroGarden Harvest Elite Indoor Garden
✅ Waterpik Aquarius Water Flosser
✅ eufy RoboVac 11S (Slim) Robot Vacuum
✅ SKIL 1/4" Hex Electric Screwdriver
✅ SMOK Novo 4 Pod System Vape Kit
✅ Anker PowerCore 10000 Portable Charger

✅ count = 48
☑️  count = 0
❌ count = 1

🎯 ratio = 97.959 %

We see it is able to match severely misspelled product names to a very high degree (>90%).

We can also enter arbitrary strings to see how the system ranks items. For example, to see why lumin ultra-comfortble coper-infusd matres wasn't correctly matched:

Enter name: lumin ultra-comfortble coper-infusd matres
...

0.000 rating: Logitech MX Master 3S Wireless Mouse
0.000 rating: VIZIO 5.1 Surround Sound Bar System
0.360 rating: Vitamix E310 Explorian Series Blender
0.360 rating: NutriBullet Pro - 13-Piece High-Speed Blender
0.847 rating: Merax 10' Trampoline with Enclosure
0.847 rating: Philips Norelco Multigroom All-in-One Trimmer
2.357 rating: Anker PowerCore 10000 Portable Charger
2.659 rating: Nespresso Vertuo Next Coffee & Espresso Maker
3.701 rating: RTIC UltraLight 52 Qt Cooler
3.924 rating: Traeger Pro 575 Wood Pellet Grill
5.997 rating: Breville The Barista Express Espresso Machine
7.131 rating: Leesa Original Mattress
9.128 rating: Garmin Forerunner 945 GPS Running Watch
14.926 rating: Bose QuietComfort 45 Headphones

1: lumin ultra-comfortble coper-infusd matres
++ ____________comfort_______________________
++ ________________________________________es
++ _______________________co_________________
++ _____________________e ___________________
-- ___e __________________________
-- __________Co___________________
-- _____________________________es
-- __________Comfort______________
2: Bose QuietComfort 45 Headphones

We see that the correct choice Leesa Original Mattress is given the third highest rating. This is a particularly tricky example because it doesn't have much in common with the input string lumin ultra-comfortble coper-infusd matres.

Performance

In example.cpp: In function int main(int argc, char** argv): Comment out all lines except real_run_wrapper(argc, argv);. Compile with Release to enable optimization flags

# alternatively, run the `cmake release` task in VS Codium

cd build_native
cmake ../ -DCMAKE_BUILD_TYPE=Release
cmake --build ./
# using bash
time ./example ../data_dummy/all_list.txt ../data_dummy/input_list.txt ../data_dummy/output_list.txt ../data_dummy/all_list.txt
real    0m0.059s
user    0m0.052s
sys     0m0.003s

To correct 50 product names on a Dell XPS 15 (9510, 2021) equivalent.

Compile to WebAssembly

You need a Wasm compiler (like Emscripten) and a local development server (such as http-server)

# make
cd build_wasm
emcmake cmake ../ -DCMAKE_BUILD_TYPE=Release
# emcmake cmake ../ -DCMAKE_BUILD_TYPE=Debug
emmake make


# run a local development server
npx http-server ./ -o -p 9999
# visit http://localhost:9999 in your web browser

Useful guides on C/C++ to Wasm:

Programming in VS Codium

  • clangd extension for completions, references, errors and hints
  • CodeLLDB extension works great with the provided launch.json and tasks.json