Programming - CUDA API Introduction

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com
·@drifter1·
0.000 HBD
Programming - CUDA API Introduction
<img src="https://upload.wikimedia.org/wikipedia/en/b/b9/Nvidia_CUDA_Logo.jpg">
<p>[<a href="https://en.wikipedia.org/wiki/File:Nvidia_CUDA_Logo.jpg">Image 1</a>]</p>
<h2>Introduction</h2>
<p>
Hey it's a me again <a href="https://peakd.com/@drifter1">drifter1</a>!
Today marks the start of a new <strong>Parallel Programming series</strong>.
After covering the basics of multi-threading and multi-process parallelization in <strong>Networking</strong>,
a little bit of MPI (Message Passing Interface) for Distributed Programming,
and the OpenMP (Open Multi-Processing) API for easier multi-threaded, shared-memory parallelism,
its now time to get into more advanced topics.
By advanced I of course mean highly multi-threaded processing, which can be achieved by using GPUs for example.
</p>
<p>
For GPU Computing there are two APIs out there:
<ul>
    <li>Nvidia's CUDA API, which only works with Nvidia Graphics Cards</li>
    <li>OpenCL, which works with any Graphics Card</li>
</ul>
Using APIs such as CUDA or OpenCL its possible to use GPUs for general-purpose parallel computing and programming!
</p>
<p>
Because the CUDA API is specifically implemented for Nvidia Graphics Cards its also much easier to begin with,
and thus this series will be about <strong>Nvidia's CUDA API</strong>!
</p>
<p>
Last, but not least, this series will be guided by <a href="https://docs.nvidia.com/cuda/index.html">Nvidia's Documentation on CUDA</a>, but also on my own knowledge and skills that I gained from various projects.
</p>
<p>
So, without further ado, let's dive straight into it!
</p>
<hr>
<h2>GitHub Repository</h2>
The code of this series will be uploaded to a GitHub Repository, that is yet to be created!
<hr>
<h2>Requirements - Prerequisites</h2>
<ul>
<li>Knowledge of the Programming Language C, or even C++</li>
<li>Familiarity with Parallel Computing/Programming in general</li>
<li><a href="https://developer.nvidia.com/cuda-gpus">CUDA-Capable Nvidia GPU</a> (compute capability should not matter that much)</li>
<li><a href="https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html">CUDA Toolkit</a> installed</li>
</ul>
<hr>
<h2>Installation Guide</h2>
<p>
The Documentation of the API is fantastic, meaning that all possible installations should be covered.
</p>
<h3>Example for Pascal Architecture and Ubuntu OS</h3>
<p>
I personally have a <a href="https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-1080-ti/specifications">GeForce GTX 1080 Ti</a>, which is of the <strong>Pascal Architecture</strong> and am using <strong>Ubuntu 20.04 LTS</strong> as my operating system.
</p>
<p>
To install the CUDA Toolkit on a GNU/Linux System like Ubuntu, there are basically two choices:
<ul>
    <li>Install from the Package Repository using the Package Manager (apt on Ubuntu)</li>
    <li>Manual Runfile Installation</li>
</ul>
Because Ubuntu's repository is mostly up-to-date, manual runfile installation makes only sense if the latest features are a must.
Also note that manual installation also means manual updating!
</p>
<p>
So, after verifying that the GPU and Operating System is CUDA-Capable from the <strong>Pre-Installation Actions</strong>, its as simple as:
<ul>
    <li>Adding the CUDA repository meta-data (<code>sudo dpkg -i ...</code>)</li>
    <li>Installing the CUDA public GPG key (<code>sudo apt-key add ...</code>, <code>sudo apt-key adv ...</code>, <code></code> etc.)</li>
    <li>Updating the Repository cache (<code>sudo apt-get update</code>)</li>
    <li>Installing CUDA (<code>sudo apt-get install cuda</code>)</li>
</ul>
</p>
<hr>
<h2>GPU Computing</h2>
<p>
So, why should you care? Why is general-purpose parallel computing using the GPU so popular?
</p>
<h3>Benefits of GPU Computing</h3>
<p>
<ul>
    <li>GPUs offer much higher instruction thoughput and memory bandwidth than CPUs of the same price and power</li>
    <li>Lots of <a href="https://www.nvidia.com/en-us/gpu-accelerated-applications/">applications</a> run faster on the GPU that on the CPU</li>
    <li>FPGAs are more energy efficient but offer less flexibility than GPUs</li>
</ul>
</p>
<h3>Why are GPUS so capable?</h3>
<p>
Well its simple, GPUs and CPUs are designed for different purposes:
<ul>
    <li>CPUs excel at executing sequences of operations quickly, in a few tens of threads in parallel (high single-thread performance)</li>
    <li>GPUs excel at executing thousands of threads parallel (with quite slower single-thread performance but higher throughput)</li>
</ul>
<img src="https://i.ibb.co/jZ0W3Gs/cpu-gpu.jpg"><br>
[<a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html">Image 2</a>]<br><br>
GPUs are designed for highly parallel computing, which is about data processing and computation, rather than data caching and flow control.
Thus, GPUs have less memory access latency.
</p>
<h3>Should applications only run on the GPU?</h3>
<p>
Most applications have to mix parallel and sequential parts, and so CPUs and GPUs are combined together in order to maximize the overall performance.
If the application benefits of high-degrees of parallelism then the massive parallel nature of the GPU will of course achieve higher performance then CPUs.
If the application is mostly sequential then parallelism can even make things less efficient, which of course also a problem in CPU multi-processing or multi-threading!
</p>
<hr>
<h2>CUDA API</h2>
<p>
So, after this brief Introduction to the world of GPU Computing, let's now head back to CUDA!
</p>
<p>
The Nvidia CUDA API is a general-purpose parallel computing platform and programming model that uses Nvidia GPUs in order to solve complex computational problems.
CUDA comes with a software environment that can be used in the C/C++ programming language as a high-level API.
CUDA is also supported by other programming languages, APIs and directive-based approaches, which include, but are not limited to, FORTRAN, DirectCompute, OpenACC. 
</p>
<h3>The Ease of Learning</h3>
<p>
CUDA has a low learning curve for programmer familiar with C/C++, as its based on three key abstractions:
<ol>
    <li>Hierarchy of thread groups</li>
    <li>Shared Memories</li>
    <li>Barrier Synchronization</li>
</ol>
Those three elements are exposed as a minimal set of language extensions, making getting into CUDA quite easy!
</p>
<p>
Using these abstractions CUDA provides data and thread parallelism at its core.
Solving a problem using the GPU is as simple as partitioning the problem into sub-problems that can be solved independently in parallel by blocks of threads.
Each sub-problem is then split futher into smaller pieces that can be solved cooperatively in parallel by all threads within the block.
</p>
<h3>GPU Architecture</h3>
<p>
GPUs are built around an array of Streamining Multiprocessors (SMs).
</p>
<img src="https://docs.nvidia.com/cuda/cuda-c-programming-guide/graphics/automatic-scalability.png"><br><br>
[<a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html">Image 3</a>]
<p>
SMs partition a multi-threaded program into blocks of threads, making GPUs with more multi-processors automatically execute programs faster than GPUs with fewer multiprocessors.
Similarly, GPUs with more blocks and more threads in each block, also execute the highly-parallel programs much faster.
</p>
<p>
Nvidia GPUs have a number of CUDA cores, which basically means how many instructions can be executed per circle.
How many threads per block and blocks in general the program should use depends on the application.
CUDA has some limits per block, dimension etc. that also depend on the architecture and compute capability.
In the end its just trial-and-error with such parameters in order to get the best results.
There are of course some guidelines that should always be followed!
</p>
<p>
The thread and block hiearchy will be discussed deeply next time, where we will also write our first CUDA program!
</p>
<hr>
<h2>RESOURCES:</h2>
<h3>References</h3>
<ol>
    <li><a href="https://docs.nvidia.com/cuda/index.html">https://docs.nvidia.com/cuda/index.html</a></li>
</ol>
<h3>Images</h3>
<ol>
    <li><a href="https://en.wikipedia.org/wiki/File:Nvidia_CUDA_Logo.jpg">https://en.wikipedia.org/wiki/File:Nvidia_CUDA_Logo.jpg</a></li>
    <li><a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html">https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html</a></li>
    <li><a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html">https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html</a></li>
</ol>
<hr>
<h2>Previous articles about the CUDA API</h2>
No other articles yet!
<hr>
<h2>Final words | Next up</h2>
<p>And this is actually it for today's post!</p>
<p>Next time we will get into more details around the Thread Hiearchy in CUDA!</p>
See ya!
<p><img src="https://media.giphy.com/media/ybITzMzIyabIs/giphy.gif" width="500" height="333"/></p>
<p>Keep on drifting!</p>
👍 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,