3

I'm trying to get half-precision intrinsics working on CUDA. The half type, and __float2half() and __half2float() functions compile and work as expected. However, I'm getting a compilation error whenever I reference __hmul, __hneg or similar. The exact error is:

identifier "__hmul" is undefined

My code is as follows:

#include <cuda_runtime.h>
#include <cuda_fp16.h>

__global__ void foo(float in, float multiplier, float& out)
{
    half in_half = __float2half(in);
    half multiplier_half = __float2half(multiplier);
    half out_half =  __hmul(in_half, multiplier_half);

    out = __half2float(out_half);
}

I've included what I believe are the right headers. Am I missing a header, or something else?

I'm using Visual Studio 2015, compiling against cudart_static.lib, and targeting sm_52 and sm_61 (GTX 970 and above).

Chris Nolet
  • 8,714
  • 7
  • 67
  • 92

1 Answers1

6

CUDA compute capability versions sm_52 and below do not support the __hmul function. The target needs to be at least sm_53.

In Visual Studio, go to your project properties, open the Cuda C/C++ group and select the Device tab, then update the Code Generation settings so that all target architectures are sm_53 or later.

Chris Nolet
  • 8,714
  • 7
  • 67
  • 92
  • Please remember to come back an accept this answer so that the question falls off the unanswered queue for the CUDA tag. – talonmies May 29 '17 at 14:34
  • Done, thanks! Sorry – I was originally going to delete the question since the answer was trivial, but decided to self-answer and keep it up in the end. Thanks for pointing me in the right direction with your earlier comments, talonmies and @RobertCrovella. – Chris Nolet May 31 '17 at 18:32