Kernel Float

C++ Header-only library for CUDA that simplifies working with vector types and reduced precision floating-point arithmetic in GPU code.

2
contributors
Get started
193 commitsLast commit ≈ 5 days ago11 stars2 forks

Description

Kernel Float logo

CUDA natively offers several reduced precision floating-point types (__half, __nv_bfloat16, __nv_fp8_e4m3, __nv_fp8_e5m2) and vector types (e.g., __half2, __nv_fp8x4_e4m3, float3). However, working with these types is cumbersome: mathematical operations require intrinsics (e.g., __hadd2 performs addition for __half2), type conversion is awkward (e.g., __nv_cvt_halfraw2_to_fp8x2 converts float16 to float8),
and some functionality is missing (e.g., one cannot convert a __half to __nv_bfloat16).

Kernel Float resolves this by offering a single data type kernel_float::vec<T, N> that stores N elements of type T. Internally, the data is stored as a fixed-sized array of elements. Operator overloading (like +, *, &&) has been implemented such that the most optimal intrinsic for the available types is selected automatically. Many mathetical functions (like log, exp, sin) and common operations (such as sum, range, for_each) are also available.

By using this library, developers can avoid the complexity of working with reduced precision floating-point types in CUDA and focus on their applications.

Logo of Kernel Float
Keywords
Programming languages
  • C++ 89%
  • Cuda 9%
  • C 1%
  • Python 1%
License
</>Source code

Participating organisations

Netherlands eScience Center

Contributors

Related projects

COMPAS

A Computational Answer to the Soaring MRI demand

Updated 21 hours ago
In progress

ESiWACE3

Centre of Excellence in Simulation of Weather and Climate in Europe

Updated 27 months ago
In progress

CORTEX

Self-learning machines hunt for explosions in the universe and speed up innovations in industry and...

Updated 33 months ago
In progress

Related software

Kernel Tuner

KE

Kernel Tuner greatly simplifies the development of highly-optimized and auto-tuned CUDA, OpenCL, and C code, supporting many advanced use-cases and optimization strategies that speed up the auto-tuning process.

Updated 23 months ago
124 15
The RSD will be doing some maintenance on the 5th of September between 11:00 AM and 11:40 AM (CEST). We expect some downtime in this period.