Sorting Using Graphics Processors


Sorting is a fundamental and universal problem in computer science. This page shows our research into using the GPU to speed up sorting, and/or use the GPU as a coprocessor to allow the CPU to do other computations. We present two unique and novel approaches to two different problems in sorting. These different approaches can be thought of as sorting using GPUs, and sorting on GPUs. Our first approach is to use the rasterizer of the GPU to perform comparison operations on polygons. We consider this sorting with the GPU, since it uses the GPU's rasterizer as a comparison operator. This solves the important problem of spatial ordering in 3 dimensions. Our second approach is to use the blending hardware of the GPU to compare values, and sort them along with keys. We consider this sorting with the GPU; using the GPU to sort values.

  1. Interactive Visibility Ordering of Geometric Primitives in Complex Environments
  2. A Cache-Efficient Sorting Algorithm for Database and Data Mining Computations Using Graphics Processors

1. Interactive Visibility Ordering of Geometric Primitives in Complex Environments

Naga Govindaraju, Michael Henson, Ming C. Lin, Dinesh Manocha

Transparent Power Plant Opaque Power Plant

These images demonstrate the performance of our visibility ordering algorithm on a CAD model with 820K triangles and high depth complexity. The left image shows the original model rendered with opaque objects. The outer walls and structures (represented with 91K triangles) of the power plant are rendered with transparency in the right image. Our algorithm computes a back-to-front ordering of the transparent primitives at 7-10 frames on a 3.4 GHz PC with NVIDIA GeForce FX 6800 GPU and renders them with a vertex shader.


We present a novel algorithm for visibility ordering among geometric objects in complex and dynamic environments. Our algorithm rearranges the objects in a back-to-front or a front-to-back order from a given viewpoint. We describe a novel sorting algorithm that performs comparisons between the primitives by performing occlusion queries on the GPUs. In particular, our algorithm minimizes the cost of comparison operations and exhibits expected linear-time performance in environments with high temporal coherence. Our visibility ordering algorithm requires no preprocessing and is applicable to all kind of models, including polygon soups and deformable models. We have used our algorithm for order-independent transparency computations in high-depth complexity environments and performing N-body collision culling in dynamic environments. We have implemented our algorithm on a PC with a 3.4 GHz Pentium 4 CPU with an NVIDIA GeForce FX 6800 Ultra GPU and applied it to complex environments with tens or hundreds of thousands of polygons. Our algorithm can compute a visibility ordering among the objects and triangles at interactive frame rates.

Dynamic Scene

Dynamic Scene: This scene is composed of 18 deforming bunnies moving in a room. The scene consists of 25K triangles and has a high depth complexity of 8 - 10 in many view directions. One such view is shown in the left image where the scene is rendered with transparency using our visibility ordering algorithm (at 10 frames a second). In the right image, the same scene rendered with opaque objects is shown. In this dynamic environment, we are able to perform interactive collision computations at 25 frames per second.

2. A Cache-Efficient Sorting Algorithm for Database and Data Mining Computations Using Graphics Processors

Comparison with CPU-based Algorithms Comparison with Other GPU-based Algorithms
Comparison with CPU-based Algorithms: This graph highlights the performance of our algorithm against optimized Quicksort implementations on CPUs. We have compared our algorithm against two implementations of Quicksort on a 3.4 GHz Pentium 4 CPU. Our results indicate a 6-25 times speedup over the Quicksort algorithm. Comparison with Other GPU-based Algorithms: We have compared the performance of our new algorithm with other GPU-based algorithms. Our results indicate a factor of 2 - 2.5 performance improvement over GPU-based PBSN algorithm and 20 - 30 times performance improvement over prior GPU-based bitonic sorting algorithms.


We present a fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications. Our algorithm uses texture mapping and blending functionalities of GPUs to implement an efficient bitonic sorting network. We take into account the communication bandwidth overhead to the video memory on the GPUs and reduce the memory bandwidth requirements. We also present strategies to exploit the tile-based computational model of GPUs. Our new algorithm has a memory-efficient data access pattern and we describe an efficient instruction dispatch mechanism to improve the overall sorting performance. We have used our sorting algorithm to accelerate join-based queries and stream mining algorithms. Our results indicate up to an order of magnitude improvement over prior CPU-based and GPU-based sorting algorithms.

Related Links


CB #3175, Department of Computer Science
University of North Carolina
Chapel Hill, NC 27599-3175
(919) 962-1749