I'm trying to get half-precision intrinsics working on CUDA. The half
type, and __float2half()
and __half2float()
functions compile and work as expected. However, I'm getting a compilation error whenever I reference __hmul
, __hneg
or similar. The exact error is:
identifier "__hmul" is undefined
My code is as follows:
#include <cuda_runtime.h>
#include <cuda_fp16.h>
__global__ void foo(float in, float multiplier, float& out)
{
half in_half = __float2half(in);
half multiplier_half = __float2half(multiplier);
half out_half = __hmul(in_half, multiplier_half);
out = __half2float(out_half);
}
I've included what I believe are the right headers. Am I missing a header, or something else?
I'm using Visual Studio 2015, compiling against cudart_static.lib
, and targeting sm_52
and sm_61
(GTX 970 and above).