Nvidia Drops the Volta Bomb

Over at GDC Nvidia, today reported the Tesla Volta V100 processor, this is a Volta GPU based on Tensor architecture. Tesla Volta V100 will be manufactured on TSMC’s 12nm finfet handle, driving the breaking points of photo lithography as this GPU is huge.

The Tesla Volta V100 graphics processor has 5,120 CUDA / Shader cores and is based upon an incredible 21 Billion transistors. It offers what Nvidia calls 120 Tensor TeraFLOPS of performance. Gaming wise it would perform in the 15 TFLOP (fp32) region, delivered by a new type of architecture called Tensor cores. The R&D behind this did cost Nvidia many years and about $3 billion worth in investments, CEO JHH stated in his keynote.

However this GPU is not going into the GTX line, rather its for the server and deap learning Tesla lineup.The new Tensor Core is based on a 4×4 matrix array and fully optimized for deep learning. Nvidia stated, they felt Pascal is fast, but isn’t fast enough. The GPU is huge, it’s 815mm2 huge and would fit roughly the palm of your hand.

  • Massive 815mm2 die size
  • 12nm FinFet (TSMC)
  • 21B Transistors
  • 15 FP32 TFLOPS / 7.5 FP64 TFLOPS
  • 120 Tensor TFLOPS
  • 16GB HBM2 which manages @ 900 GB/s
  • 5120 Shader processor cores

untitled-3

Tesla Volta V100 is capable of pushing 15 FP32 TFLOPS and much like Pascal GP100 is once again tied towards 4096-bit HBM2 graphics memory (stacked on-die cache). The unit will get 16GB of it divided over four stacks (= 4GB per stack). The memory is fabbed by Samsung. That HUGE die at 815 mm2 is fabbed by TSMC on a 12nm FFN fabrication process. In Q3 you will see the first enterprise based products based on Volta that start at 69.000 dollar. For us gamers, when GeForce GTX 1180 or 2080 will be released. That remains to be topic of a long discussion. Below a comparative specification list of the primary Tesla GPUs running up-to Volta, which runs in the 5120 shader processors at the 1.45 GHz marker for Boost frequency btw. It’ll have 320 Texture Units, sheesh.

 

 

Nvidia Tesla Lineup
Tesla K40Tesla M40Tesla P100Tesla V100
GPUGK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)GV100 (Volta)
SMs15245680
TPCs15242840
FP32 Cores / SM1921286464
FP32 Cores / GPU2880307235845120
FP64 Cores / SM6443232
FP64 Cores / GPU9609617922560
Tensor Cores / SMn/an/an/a8
Tensor Cores / GPUn/an/an/a640
GPU Boost Clock810/875 MHz1114 MHz1480 MHz1455 MHz
Peak FP32 TFLOP/s5.046.810.615
Peak FP64 TFLOP/s1.682.15.37.5
Peak Tensor Core TFLOP/sn/an/an/a120
Texture Units240192224320
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM24096-bit HBM2
Memory SizeUp to 12 GBUp to 24 GB16 GB16 GB
L2 Cache Size1536 KB3072 KB4096 KB6144 KB
Shared Memory Size / SM16 KB/32 KB/48 KB96 KB64 KBConfigurable up to 96 KB
Register File Size / SM256 KB256 KB256 KB256KB
Register File Size / GPU3840 KB6144 KB14336 KB20480 KB
TDP235 Watts250 Watts300 Watts300 Watts
Transistors7.1 billion8 billion15.3 billion21.1 billion
GPU Die Size551 mm²601 mm²610 mm²815 mm²
Manufacturing Process28 nm28 nm16 nm FinFET+12 nm FFN

 

Now slightly more detail, a fully enabled  GV100 GPU actually consists of six GPCs, 84 Volta SMs, 42 TPCs (each including two SMs), and eight 512-bit memory controllers (4096 bits total). Each SM has 64 FP32 Cores, 64 INT32 Cores, 32 FP64 Cores, and 8 new Tensor Cores. Each SM also includes four texture units.

With 84 SMs, a full GV100 GPU has a total of 5376 FP32 cores, 5376 INT32 cores, 2688 FP64 cores, 672 Tensor Cores, and 336 texture units. Each memory controller is attached to 768 KB of L2 cache, and each HBM2 DRAM stack is controlled by a pair of memory controllers. The full GV100 GPU includes a total of 6144 KB of L2 cache. The figure in above table shows a full GV100 GPU with 84 SMs (different products can use different configurations of GV100). The Tesla V100 accelerator uses 80 SMs.

Source: Guru3d

Related posts