Use Case I: Related KNN work elsewhere

Jump to bottom

dkakkar edited this page Jan 30, 2020 · 1 revision

Another way of calculating KNN on a big dataset is distributed processing on Spark using scikit-learn, however:
- Spark’s MLlib doesn’t have built-in support for KNN calculations, but scikit-learn does
- scikit-learn’s k-NN kneighbors() method is a computational bottleneck for large datasets and needs parallelization.
- scikit-learn’s k-NN kneighbors() is inserted into a Spark map function and run in a distributed environment