1) The first way I tried to resolve my problem: cudaDeviceSynchronize()
Some posts and blogs suggest using cudaDeviceSynchronize() as a solution for not working printf() in CUDA [1][2]. The point of their suggestion is that the host code* is over before receiving printf() message from the device codes**. So, before ending the host code, by synchronizing the device codes (cudaDeviceSynchronize), the host (or CPU) should print out the message of printf() from the device (or GPU).
* host code: the code working in CPU
** device code: the code working in GPU
But, this solution is not working for my problem.
Below code is the code that I used.
■ hello_world.cu
1 #include <iostream>
2 #include "cuda_runtime.h"
3
4 __global__ void kernel (void){
5 printf("Hello world\n");
6
7 }
8
9 int main(void){
10 kernel<<<10,1>>>();
11 cudaDeviceSynchronize();
12 return 0;
13 }
- compile command: $nvcc -o hello_world.out hello_world.cu
2) The second way that I tried: -arch [3,4]
First, I checked the GPU's compute capacity. According to Wikipedia[5], the compute capacity of GPUs in the cluster is as bellow;
• node1: GeForce GT 710 - compute capacity 3.5
• node2: Geforce GTX 750 Ti - compute capacity 5.0
Next, I checked the range of the compute capacity supported in CUDA installed in my cluster.
cmt323@master:chapter3$ nvcc --list-gpu-code
sm_35
sm_37
sm_50
sm_52
sm_53
sm_60
sm_61
sm_62
sm_70
sm_72
sm_75
sm_80
sm_86
The above sm_xx means that the CUDA supports GPU whose compute capacity is x.x. For instance, according to the above list, the CUDA (whose version is 11.1) supports my Geforce GT 710 (sm_35 → compute capacity 3.5). And, when compiling CUDA code, we should specify and match the architecture to CUDA code.
So, I compile the hello_world.cu by adding -arch=sm_35
for node1.
cmt323@master:chapter3$ nvcc -o hello_world.out -arch=sm_35 hello_world.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Then, I could see the correct output from the CUDA code.
cmt323@master:chapter3$ ./hello_world.out
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Reference
[1] stackoverflow.com/questions/13320321/printf-in-my-cuda-kernel-doesnt-result-produce-any-output
[2] stackoverflow.com/questions/58531349/cuda-kernel-printf-produces-no-output-in-terminal-works-in-profiler
[3] arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
[4] stackoverflow.com/questions/21915619/why-printf-is-not-working-in-cuda
[5] en.wikipedia.org/wiki/CUDA
'프로그래밍 언어 > CUDA' 카테고리의 다른 글
[CUDA] CudaDeviceProp 정리 (0) | 2021.04.05 |
---|---|
[CUDA] 클러스터의 계산노드 nouveau 문제점 해결 (0) | 2020.12.28 |
[CUDA] CUDA11.1 Install: Missing recommended library: (0) | 2020.12.25 |
[CUDA] 한 컴퓨터에 여러 버전 cuda 설치하기 (3) | 2020.12.24 |
댓글