Skip to content

Use Case I: Related KNN work elsewhere

dkakkar edited this page Jan 30, 2020 · 1 revision
  • Another way of calculating KNN on a big dataset is distributed processing on Spark using scikit-learn, however:
    • Spark’s MLlib doesn’t have built-in support for KNN calculations, but scikit-learn does
    • scikit-learn’s k-NN kneighbors() method is a computational bottleneck for large datasets and needs parallelization.
    • scikit-learn’s k-NN kneighbors() is inserted into a Spark map function and run in a distributed environment