KNN algorithms
SQL based nearest algorithm
Select * from Places
where Latitude between X-D and X+D
and Longitude between Y-D and Y+D
-- Between queries will create large amounts of intermediate data and create high costs on servers.
-- GridID will greatly reduce the amount of intermediate data size.
select * from Places
where latitude between X-D and X+D
and longitude between Y-D and Y+D
and gridID in (gridIDx0,gridIDx1,gridIDx2,gridIDx3,gridIDx4,gridIDx5,gridIDx6,gridIDx7,gridIDx8);
Pros & Cons
Pros: Simple and straightforward
Cons: Some grid will be much denser than others. How to choose the optimal grid size.
Size estimation
SQL spatial datatypes
Use decimal to represent latitude/longtitude to avoid excessive precision (0.0001° is <11 m, 1′′ is <31 m)
To 6 decimal places should get you to around ~10cm of accuracy on a coordinate.

Implementation in SQL
Storage option 1: Store data as spatial data types

Storage option 2: PostgreSQL spatial query - KNN
PostgreSQL supports KNN search on top using distance operator <->
The above query takes about 20 minutes, using KNN specific index (called GiST / SP-GiST) to speed up
Dynamic grids - Squad tree
A squad tree is similar to a trie.

Pros & Cons
Pros:
It suit well for unevenly distributed users.
Cons:
Sometimes the tree could be too high and result in low performance.
Size estimation
Geohashes
Steps
For latitude range [-90, 90] and longtitude range [-180, 180], it will be divided into section: below the average value and bigger than average value.
For example, 42.60411 will be encoded as "101111001001" and -5.59041 will be encoded as "011111000000"

After getting two binary values, they are concatenated together as a 25 bit value "01101 11111 11000 00100 00010". This 32 bit value will be encoded as "ezs42". Typically the precision of a 25 bit value is 4.9KM.
And this 5 bit value will be divided into 5 sections.
Each section corresponds to 0-31 in decimal value.


Redis impl
Redis uses a 52 bit Geohash encode, and it has a precision value of 0.6m.
Redis will SkipList to store geoHashes for faster neighbor search.

Hilbert Curves
Last updated
Was this helpful?