0

The program I am writing (Accelerator.cu) will not compile under NVCC 8.0.61 with nvcc -std=c++11 -o accelerator accelerator.cu. Other answers exist for why __device__, __global__ and __shared__ fail, but none have revealed the cause of this error in custom code. I am attempting to follow the guide https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-global-scope. However, when attempting the following code:

#include <cuda_runtime_api.h>
#include <cuda.h>

// CUDA Acceleration Adapter.
class Accelerator {
public:
    __device__ __managed__  float**  A;
    __device__ __managed__  float*  B;
    __device__ __managed__  int  N;
    __device__ __managed__  int  C;

    Accelerator () {}

    Accelerator (int N, int C) {
        // initialize variables (unified memory).
        N = records;

        // are "inputs"
        this->C = C;
    }

    void setData (vector<vector<float>>& A, vector<float>& B) {
        // convert vectors to arrays that the GPU can address.
    }

    void accelerate (vector<float>& results) {
        // run kernel.
        // convert results back to vector.
    }

    __global__ void evaluate (float **A, float *B, int N, int C) {
        // do stuff.
    }

};

void main () {
    Accelerator testAcc();
}

However, I receive errors for all A

accelerator.cu(8): error: attribute "device" does not apply here
accelerator.cu(8): error: attribute "managed" does not apply here

and similar errors for the remaining 3 member variables.

This is the first time I have attempted writing my own GPU-accelerated program. If someone knows what is going wrong, some help would be greatly appreciated.

Jake Long
  • 13
  • 1

1 Answers1

1

You'll probably run into a number of issues. I'll focus mainly on something that attempts to get what you've shown to compile, without delving into every aspect of CUDA programming that you might touch on here.

The issue you cite (e.g. usage of __device__ on a class member variable) is explicitly forbidden.

The usage of __managed__ is similarly disallowed (since it is implicitly a __device__ scoped static allocation). In a scenario like this, you should use ordinary class member variables, and dynamically allocate for them as needed, perhaps in the constructor, perhaps using a dynamic managed allocator (cudaMallocManaged). The use of pointer variables as class member variables certainly suggests this approach.

There are possibly other challenges with what you have outlined. For example, a __global__ function may not be a class member function.

You probably have some considerable learning to do in CUDA programming, but here is nevertheless a hacked up version of what you have shown which is not stepping on any obvious issues:

#include <vector>
using namespace std;
// CUDA Acceleration Adapter.
    __global__ void evaluate (float *a, float *b, int n, int c) {
        for (int i = 0; i < n; i++) a[i]++;
        for (int i = 0; i < c; i++) b[i]++;
    }
class Accelerator {
public:
  float*  A;
  float*  B;
  int  N;
  int  C;

    Accelerator () {}

    Accelerator (int N, int C) {
        // initialize variables (unified memory).
        this->N = N;

        // are "inputs"
        this->C = C;
        cudaMallocManaged(&A, N*sizeof(float));
        cudaMallocManaged(&B, C*sizeof(float));
    }

    void setData (vector<float>& A, vector<float>& B) {
        for (int i=0; i < N; i++) (this->A)[i] = A[i];
        for (int i=0; i < C; i++) (this->B)[i] = B[i];
    }

    void accelerate (vector<float>& results) {
        evaluate<<<1,1>>>(A, B, N, C);
        cudaDeviceSynchronize();
        for (int i = 0; i<N; i++) results[i] = A[i];
    }

};
int  main () {
    Accelerator testAcc(5,3);
}
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257