Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 1.82 KB

README.md

File metadata and controls

17 lines (12 loc) · 1.82 KB

K-NN

Investigating the k-NN performance in Postgres and MongoDB on New York City's taxi dataset.

About Project

In this project, it is aimed to apply k-NN query in two different databases, PostgreSQL and MongoDB. A Python class is created to facilitate the performance analysis between the database management systems. New York taxi data set is used. It is available data set New York taxi data. The accuracy of the results compared with Haversine and Vincenty formulas. These formulas are used for distance calculation between two points on earth.
Also a sample data set has uploaded as GeoJSON file. This data set can be import to MongoDB directly and can be run through the Python class.

The implementation of this project and the results are submitted to the "International Workshop on Collaborative Crowdsourced Cloud Mapping and Geospatial Big Data"

Publication: Coşkun, İ. B., Sertok, S., and Anbaroğlu, B.: K-NEAREST NEIGHBOUR QUERY PERFORMANCE ANALYSES ON A LARGE SCALE TAXI DATASET: POSTGRESQL VS. MONGODB, Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W13, 1531-1538, https://doi.org/10.5194/isprs-archives-XLII-2-W13-1531-2019, 2019.

Haversine and Vincenty

Haversine formula determines earth as a great-circle and calcualetes distance between two points on a sphere. Vincenty formula determines earth as an ellipsoid. Parameters can change according to reference ellipsoid. In this project WGS84 ellipoid parameters are used.