PMPP Chapter 3: Scalable Parallel Execution
CUDA Thread Organization
All CUDA threads in a grid execute the same kernel function; they rely on coordinates to distinguish
PMPP Chapter 2: Data Parallel Computing
Data Parallelism
When modern software applications run slowly, the problem is usually having too much data to be processed.
* Image
PMPP Chapter 1: Introduction
Traditionally (before 2003), microprocessors are based on single central processing unit (CPU), but due to energy consumption and heat dissipation
Programming Massively Parallel Processors
Programming Massively Parallel Processors is an excellent book on GPU programming. Here is the collection of my notes and related
Running NVIDIA and AMD GPUs in a Single K3S Cluster
Modern AI and HPC workloads often require different types of accelerators depending on the specific use case. Running a heterogeneous