본문 바로가기
프로그래밍 언어/CUDA

[CUDA] printf not working

by UltraLowTemp-Physics 2021. 1. 5.
728x90

1) The first way I tried to resolve my problem: cudaDeviceSynchronize()

Some posts and blogs suggest using cudaDeviceSynchronize() as a solution for not working printf() in CUDA [1][2]. The point of their suggestion is that the host code* is over before receiving printf() message from the device codes**. So, before ending the host code, by synchronizing the device codes (cudaDeviceSynchronize), the host (or CPU) should print out the message of printf() from the device (or GPU). 

* host code: the code working in CPU
** device code: the code working in GPU    

But, this solution is not working for my problem.

Below code is the code that I used. 

■ hello_world.cu

  1 #include <iostream>
  2 #include "cuda_runtime.h"
  3
  4 __global__ void kernel (void){
  5     printf("Hello world\n");
  6
  7 }
  8
  9 int main(void){
 10     kernel<<<10,1>>>();
 11     cudaDeviceSynchronize();
 12     return 0;
 13     }

- compile command: $nvcc -o hello_world.out hello_world.cu

 

2) The second way that I tried: -arch [3,4]

First, I checked the GPU's compute capacity. According to Wikipedia[5], the compute capacity of GPUs in the cluster is as bellow;  

• node1: GeForce GT 710 - compute capacity 3.5
• node2: Geforce GTX 750 Ti - compute capacity 5.0 

Next, I checked the range of the compute capacity supported in CUDA installed in my cluster. 

cmt323@master:chapter3$ nvcc --list-gpu-code 
sm_35 
sm_37 
sm_50 
sm_52 
sm_53 
sm_60 
sm_61 
sm_62 
sm_70 
sm_72 
sm_75 
sm_80 
sm_86

The above sm_xx means that the CUDA supports GPU whose compute capacity is x.x. For instance, according to the above list, the CUDA (whose version is 11.1) supports my Geforce GT 710 (sm_35 → compute capacity 3.5). And, when compiling CUDA code, we should specify and match the architecture to CUDA code. 

So, I compile the hello_world.cu by adding -arch=sm_35 for node1. 

cmt323@master:chapter3$ nvcc -o hello_world.out -arch=sm_35 hello_world.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).

 

Then, I could see the correct output from the CUDA code.

cmt323@master:chapter3$ ./hello_world.out
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world

Reference

[1] stackoverflow.com/questions/13320321/printf-in-my-cuda-kernel-doesnt-result-produce-any-output  
[2] stackoverflow.com/questions/58531349/cuda-kernel-printf-produces-no-output-in-terminal-works-in-profiler  
[3] arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/  
[4] stackoverflow.com/questions/21915619/why-printf-is-not-working-in-cuda  
[5] en.wikipedia.org/wiki/CUDA

728x90

댓글