There are as many as three types of specialized processor cores on a modern Nvidia graphics card. CUDA cores, which are the mainline programmable graphics cores. Next we have ray tracing cores, which are built to rapidly calculate the effects of light rays bouncing around a scene in real time. Offering the most photo-realistic graphics ever seen on real time graphics. Finally, there are the tensor cores. These don’t get that much attention from mainstream media and users, but they are actually an essential component to the entire card’s value proposition. Before we can understand why tensor cores are valuable, we have to talk about what a “tensor” is in the first place.
A tensor is a mathematical description of a group of values that are related to each other in some way. For example, a point in 3D space is described using three numbers, one for each axis. If you move the point in 3D space, all three numbers change in a fixed relationship to each other. Technically single numbers (scalars), vectors and matrices of numbers are actually just special case tensors. However, when people refer to tensors it’s usually for more complex sets of related numbers. For example, hydrostatic pressure, shear force and material stress are all easily expressed as tensors.
So far so good, but when you start to do calculations involving tensors, things get complicated quickly. When you have to multiply two tensors, there are numerous smaller calculations that have to happen. Even worse, while a CPU crunches away at those small calculations, many of the intermediate results have to be stored within the registers of the CPU. In other words, doing a relatively small number of mathematical operations on complex tensors can quickly clog up the inner workings of a CPU core!
While any CPU which can do all basic mathematical operations has the ability to process tensors, they don’t all do them at the same speed or efficiency. This is a problem that’s had various acceleration solutions over the years. You might remember, for example, the MMX extensions added to early Pentium CPUs.
MMX was mainly used to accelerate operations that are common in multimedia. For example, a video is a grid of pixels. Each pixel has a color value and a position in the grid. Now imagine you wanted to do something like adding a color-correction filter to the footage, or you want to scale it up or otherwise make a change that affects the value of every single pixel in the grid. You have one operation, but you have to apply it to millions of values. If you did them in a serial fashion, one after the other, it would take forever. MMX allowed the CPU to apply an operation to the entire matrix of values at once. MMX was later complemented by SSE and AVX. All instruction sets that accelerate the processing of matrices of data.
Which brings us to Tensor cores. Instead of a general-purpose CPU with special instructions to increase the speed of calculation for matrices of numbers, you have an entire CPU core that only does these types of calculation. That seems like a niche use case, but actually these types of processors have exploded in popularity for data centers and workstations. Neural nets and other related machine learning methods that are used to build algorithms need exactly this type of math muscle to work.
Tensor Cores in Day-to-Day Use
Outside of data centers, tensor cores can really supercharge individual workstations that need to do physical simulations. Remember we said that things like material stresses and shear forces are commonly expressed as tensors. Which is why tensor cores can speed up things like fluid and gas simulations, virtual car crash testing or any type of physical force that can be expressed as a tensor.
Consumer cards that come with tensor cores, essentially the Nvidia RTX line of cards, aren’t just carrying dead weight either. Nvidia has come up with DLSS (Deep Learning Super Sampling) which, as of DLSS 2.0, now uses those tensor cores to upscale lower-resolution images to higher resolution ones with stunning results. The end result is that the GPU doesn’t work as hard creating the low-res image and the tensor cores give it a sharp lick of paint before sending it to the monitor.
The relatively low-performance real-time ray tracing cores in RTX cards also put out a rather grainy, noisy picture. The tensor cores can rapidly denoise the image, to give you a pristine picture as well as a high-res one.
Other applications are sure to make good use of the tensor cores even for regular users. RTX Voice, for example, can remove nearly any background noise from a live audio feed. Non-RTX cards can also do this, but much, much less efficiently.
Should You Care?
We’d say yes. Even if you can’t think of any immediate use for tensor cores in your own use cases, it’s clear that Nvidia is in it for the long haul. RT and tensor cores are going nowhere. This means that software developers can begin making software to specifically take advantage of them. While tensor cores may seem a little niche now, in a year or two it wouldn’t surprise us if killer apps for the technology begin to appear.