BACK Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Authors

Luc Buatois and Guillaume Caumon and Bruno Lévy

Journal/Article/Conference

High Performance Computation Conference (HPCC-07), http://www.tlc2.uh.edu/hpcc07/

Abstract

A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse generalpurpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).

BibTeX Reference

@INPROCEEDINGS{Buatois07,
 author =   {Luc Buatois and Guillaume Caumon and Bruno Lévy},
 year =   {2007},
 title =   {Concurrent Number Cruncher: An Efficient Sparse Linear Solver on
 editor =   {R. Perrott et al.},
 booktitle =   {High Performance Computation Conference (HPCC-07), http://www.tlc2.uh.edu/hpcc07/},
 publisher =   {Springer},
 series =   {Lecture Notes in Computer Science 4782},
 volume =   {4782},
 pages =   {358--371},
 note =   {Texas instrument Student paper award},
 abstract =   {
}
BACK