I am working on image processing algorithm using CUDA. In my algorithm i want to find sum of all pixels of image using CUDA kernel. so i made kernel method in cuda for measure sum of all pixels of 16 bit gray scale image, but i got wrong answer. So i make simple program in cuda for find sum of 1 to 100 numbers and my code is below. In my code i got not exact sum of that 1 to 100 numbers using GPU, but i got exact sum of that 1 to 100 numbers using CPU. So what i had done in that code ? I am working on image processing algorithm usin