KNN algorithms

SQL based nearest algorithm

Select * from Places 
where Latitude between X-D and X+D 
and Longitude between Y-D and Y+D 

-- Between queries will create large amounts of intermediate data and create high costs on servers.
-- GridID will greatly reduce the amount of intermediate data size.

select * from Places 
where latitude between X-D and X+D 
and longitude between Y-D and Y+D 
and gridID in (gridIDx0,gridIDx1,gridIDx2,gridIDx3,gridIDx4,gridIDx5,gridIDx6,gridIDx7,gridIDx8);
Squad tree

Pros & Cons

  • Pros: Simple and straightforward

  • Cons: Some grid will be much denser than others. How to choose the optimal grid size.

Size estimation

SQL spatial datatypes

  • Use decimal to represent latitude/longtitude to avoid excessive precision (0.0001° is <11 m, 1′′ is <31 m)

  • To 6 decimal places should get you to around ~10cm of accuracy on a coordinate.

Implementation in SQL

Storage option 1: Store data as spatial data types

Storage option 2: PostgreSQL spatial query - KNN

  1. PostgreSQL supports KNN search on top using distance operator <->

  1. The above query takes about 20 minutes, using KNN specific index (called GiST / SP-GiST) to speed up

Dynamic grids - Squad tree

  • A squad tree is similar to a trie.

Squad tree

Pros & Cons

  • Pros:

    • It suit well for unevenly distributed users.

  • Cons:

    • Sometimes the tree could be too high and result in low performance.

Size estimation

Geohashes

Steps

  1. For latitude range [-90, 90] and longtitude range [-180, 180], it will be divided into section: below the average value and bigger than average value.

  • For example, 42.60411 will be encoded as "101111001001" and -5.59041 will be encoded as "011111000000"

Squad tree
  1. After getting two binary values, they are concatenated together as a 25 bit value "01101 11111 11000 00100 00010". This 32 bit value will be encoded as "ezs42". Typically the precision of a 25 bit value is 4.9KM.

  • And this 5 bit value will be divided into 5 sections.

  • Each section corresponds to 0-31 in decimal value.

Squad tree
Squad tree

Redis impl

  • Redis uses a 52 bit Geohash encode, and it has a precision value of 0.6m.

  • Redis will SkipList to store geoHashes for faster neighbor search.

Squad tree

Hilbert Curves

Last updated

Was this helpful?