Sign in

Kernel Tuner

Kernel Tuner greatly simplifies the development of highly-optimized and auto-tuned CUDA, OpenCL, and C code, supporting many advanced use-cases and optimization strategies that speed up the auto-tuning process.

36
mentions
14
contributors

Cite this software

DOI:

10.5281/zenodo.1220113

What Kernel Tuner can do for you

  • Allows developers to easily unit test and auto-tune GPU code
  • Generic auto-tuning of user-defined parameters for CUDA, OpenCL, and C kernels
  • Supports more than 20 different search optimization methods to speedup tuning
  • Successfully used in 10+ different eScience projects, across various disciplines

Kernel Tuner simplifies the development of efficient GPU programs, or kernels. It does so by making kernels written in C/C++, OpenCL, or CUDA accessible from Python, while taking care of the required synchronization between data kept in host memory and data kept in device memory.

This has a number of advantages. First, it simplifies auto-tuning of the kernel parameters. In fact, Kernel Tuner comes standard with a variety of strategies for efficiently searching the parameter space, leading to greatly improved performance of tuned kernels. Second, it allows for unit testing of GPU code from within Python.

Kernel Tuner does not add any additional dependencies to the kernel code, and does not require extensive code changes. Furthermore, it is noteworthy that kernels tuned by Kernel Tuner do not require any changes after tuning to make them production ready--tuned kernels can be used as-is from any host programming language.

Participating organisations

ASTRON
CWI
Netherlands eScience Center

Mentions

Versioned documentation using only GitHub actions and GitHub pages

Author(s): Ben van Werkhoven
Published by Netherlands eScience Center in 2022

Kernel Tuner tutorial at Supercomputing 2021

Author(s): Ben van Werkhoven
Published in 2021

Writing Testable GPU Code

Author(s): Ben van Werkhoven
Published in 2018

Testimonials

With Kernel Tuner, we were able to accelerate our CUDA kernels by a factor of 10 in just a few weeks
Chiel van Heerwaarden, Wageningen University & Research

Contributors

Ben van Werkhoven
Ben van Werkhoven
Lead developer
Netherlands eScience Center
AS
Alessio Sclocco
Felipe Zapata
Felipe Zapata
Floris-Jan Willemsen
Floris-Jan Willemsen
Inti Pelupessy
Inti Pelupessy
Jisk Attema
Jisk Attema
Johannes Hidding
Johannes Hidding
LO
Leon Oostrum
Nicolas Renaud
Nicolas Renaud
Patrick Bos
Patrick Bos
Stijn Heldens
Stijn Heldens
WP
Willem Jan Palenstijn
CWI

Related projects

ConFu

Consolidating and Future-proofing Kernel Tuner by developing Software Engineering Best Practices

Updated 4 days ago
Running

RECRUIT

Reducing Energy Consumption in Radio-astronomical and Ultrasound Imaging Tools

Updated 4 days ago
Running

CORTEX

Self-learning machines hunt for explosions in the universe and speed up innovations in industry and...

Updated 4 days ago
Running

CHEOPS

Verified construction of correct and optimised parallel software

Updated 4 days ago
Running

ESiWACE2

For future exascale climate and weather predictions

Updated 4 days ago
Running

Retina COVID19

Real Time National Policy Adjustment and Evaluation on the Basis of a Computational Model for COVID19

Updated 4 days ago
Finished

A methodology and ecosystem for many-core programming

Boosting the performance of current and future programs

Updated 4 days ago
Finished

DIRAC

Distributed radio astronomical computing

Updated 4 days ago
Finished

Triple-A 2

Accelerating astronomical applications 2

Updated 4 days ago
Finished

Parallelisation of multi point-cloud registration

Studying subcellular structures and functions

Updated 4 days ago
Finished

3D Geospatial Data Exploration for Modern Risk Management Systems

The country below sea level

Updated 4 days ago
Finished

Real-time detection of neutrinos from the distant Universe

Observing processes that are inaccessible to optical telescopes

Updated 4 days ago
Finished

A Jungle Computing Approach to Large-Scale Online Forensic Analysis

Programming tools that simplify application development and deployment

Updated 4 days ago
Finished

Related tools

AMBER

AM

A real-time pipeline to search for Fast Radio Bursts and other transient radio sources.

Updated 5 months ago
12 3

iScore

IS

A framework and predictor based on support vector machine and random walk graph kernel for scoring protein-protein interfaces.

Updated 5 months ago
3 4

Lightning

LI

Lightning: Fast data processing using GPUs on distributed platforms

Updated 5 months ago
2

PowerSensor3

PO

PowerSensor is a low-cost, custom-built device that measures the instantaneous power consumption of GPUs and other devices at a high time resolution.

Updated 1 month ago
6