A Streaming Multiprocessor (SM) is a fundamental component of NVIDIA GPUs, consisting of multiple Stream Processors (CUDA Core) responsible for executing instructions in parallel.
These are general purpose processors with a low clock rate target and a small cache.
SMs execute several thread blocks in parallel. As soon as one of its thread block has completed execution, it takes up the serially next thread block.
From Stephen Jones, I learned that each SM can managed 64 warps, so a total of 2048 threads. However, it really processes 4 warps at a time.
To achieve this purpose, an SM contains the following:
How many thread blocks at the same time?
An SM may contain up to 8 thread blocks in total.
In general, SMs support instruction-level parallelism but not branch prediction.
Each architecture in GPU consists of several SM.