Fast GPU-based Locality Sensitive Hashing for K-Nearest Neighbor Computation
Jia Pan, and Dinesh Manocha
University of North Carolina at Chapel Hill




Abstract
We present an efficient GPU-based parallel LSH algorithm

to perform approximate k-nearest neighbor computation in

high-dimensional spaces. We use the Bi-level LSH algorithm,

which can compute k-nearest neighbors with higher

accuracy and is amenable to parallelization. During the first

level, we use the parallel RP-tree algorithm to partition

datasets into several groups so that items similar to each

other are clustered together. The second level involves computing

the Bi-Level LSH code for each item and constructing

a hierarchical hash table. The hash table is based on parallel

cuckoo hashing and Morton curves. In the query step, we

use GPU-based work queues to accelerate short-list search,

which is one of the main bottlenecks in LSH-based algorithms.

We demonstrate the results on large image datasets

with 200,000 images which are represented as 512 dimensional

vectors. In practice, our GPU implementation can

obtain more than 40x acceleration over a single-core CPU based

LSH implementation.


Papers
 
Bi-level Locality Sensitive Hashing for K-Nearest Neighbor Computation (PDF)

introducing Bi-level Algorithm, International Conference on Data Engineering (ICDE), 2012

 

Fast GPU-based Locality Sensitive Hashing for K-Nearest Neighbor Computation (PDF)

discussing the GPU Implementation, International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS), 2011

 

Software

 

Experimental code

 


Related Links

GAMMA Research Group