简单的JCuda使用示例，用于在GPU上计算两个向量的和

在上面的示例中，我们使用JCuda计算了两个向量的和。首先，在CPU上分配了两个向量和一个结果向量的内存，并将其初始化为相同的值。然后，我们使用JCuda将这些向量的数据复制到GPU上，并使用CUDA的Kernel函数计算向量的和。最后，我们使用JCuda将结果数据从GPU复制到CPU，并验证结果是否正确。此外，我们还使用JCudaDriver初始化CUDA和加载和运行CUDA的Kernel函数。

baeiou

1026人浏览 · 2023-04-27 09:10:56

baeiou · 2023-04-27 09:10:56 发布

import jcuda.*;
import jcuda.runtime.*;

public class VectorAddition {
    public static void main(String[] args) {
        // Set the size of the vectors
        int N = 1000000;

        // Allocate the memory on the CPU
        float hostInputA[] = new float[N];
        float hostInputB[] = new float[N];
        float hostOutput[] = new float[N];

        // Initialize the input vectors
        for (int i = 0; i < N; i++) {
            hostInputA[i] = i;
            hostInputB[i] = i;
        }

        // Allocate the memory on the GPU
        Pointer deviceInputA = new Pointer();
        Pointer deviceInputB = new Pointer();
        Pointer deviceOutput = new Pointer();
        JCuda.cudaMalloc(deviceInputA, N * Sizeof.FLOAT);
        JCuda.cudaMalloc(deviceInputB, N * Sizeof.FLOAT);
        JCuda.cudaMalloc(deviceOutput, N * Sizeof.FLOAT);

        // Copy the input vectors from the host to the GPU
        JCuda.cudaMemcpy(deviceInputA, Pointer.to(hostInputA), N * Sizeof.FLOAT, cudaMemcpyKind.cudaMemcpyHostToDevice);
        JCuda.cudaMemcpy(deviceInputB, Pointer.to(hostInputB), N * Sizeof.FLOAT, cudaMemcpyKind.cudaMemcpyHostToDevice);

        // Perform the vector addition on the GPU
        int blockSize = 256;
        int gridSize = (N + blockSize - 1) / blockSize;
        JCudaDriver.cuInit(0);
        CUdevice device = new CUdevice();
        JCudaDriver.cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        JCudaDriver.cuCtxCreate(context, 0, device);
        CUmodule module = new CUmodule();
        JCudaDriver.cuModuleLoad(module, "vectorAdd.ptx");
        CUfunction function = new CUfunction();
        JCudaDriver.cuModuleGetFunction(function, module, "vectorAdd");
        Pointer kernelParameters = Pointer.to(Pointer.to(deviceInputA), Pointer.to(deviceInputB), Pointer.to(deviceOutput), Pointer.to(new int[]{N}));
        JCudaDriver.cuLaunchKernel(function, gridSize, 1, 1, blockSize, 1, 1, 0, null, kernelParameters, null);

        // Copy the result from the GPU to the host
        JCuda.cudaMemcpy(Pointer.to(hostOutput), deviceOutput, N * Sizeof.FLOAT, cudaMemcpyKind.cudaMemcpyDeviceToHost);

        // Verify the result
        for (int i = 0; i < N; i++) {
            if (Math.abs(hostOutput[i] - 2 * i) > 1e-5) {
                System.out.println("Result verification failed at element " + i);
                System.exit(-1);
            }
        }

        // Clean up
        JCuda.cudaFree(deviceInputA);
        JCuda.cudaFree(deviceInputB);
        JCuda.cudaFree(deviceOutput);
        JCudaDriver.cuModuleUnload(module);
        JCudaDriver.cuCtxDestroy(context);
    }
}

DAMO开发者矩阵

DAMO开发者矩阵，由阿里巴巴达摩院和中国互联网协会联合发起，致力于探讨最前沿的技术趋势与应用成果，搭建高质量的交流与分享平台，推动技术创新与产业应用链接，围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐

2026-01-29：统计镜子反射路径数目。用go语言，给定一个大小为 m × n 的二值网格 grid（0 表示空格，1 表示镜子）。机器人从左上角 (0,0) 出发，目标是到达右下角 (m−1,n

2026-01-29：统计镜子反射路径数目。用go语言，给定一个大小为 m × n 的二值网格 grid（0 表示空格，1 表示镜子）。机器人从左上角 (0,0) 出发，目标是到达右下角 (m−1,n−1)。机器人每步只能向右或向下移动，但如果准备进入的格子里有镜子，它不会直接进入，而是在进入前被“反射”改换方向并跳到镜子相应一格之外的位置：若机器人想向右进入一个镜子格子，它会被转向向下并移动到该