At NVIDIA NVDA +0.62%’s GPU Technology Conference (GTC) 2016 in San Jose, California the company announced products based on their latest GPU architecture, code-named Pascal. This conference is traditionally attended by some of the leading researchers in GPU-accelerated compute technologies and over the past few years has become increasingly focused on Deep Neural Networks (DNN). DNNs are the latest key to artificial intelligence (AI) and cognitive computing. Incredible strides have been made over the last three years in AI thanks to Graphics Processing Units (GPUs). Companies like Google GOOGL -0.20%, Microsoft MSFT -0.17%, IBM IBM -0.07%, Toyota, Baidu and others are looking at deep neural networks to help solve many of their complex analytical and data-rich problems. NVIDIA is helping these companies to harness the power of their GPUs to accelerate the deep learning these systems need to do. Thanks to NVIDIA’s early involvement in deep neural networks research and their latest GPU hardware, the company is in the driver’s seat right now when it comes to delivering silicon to accelerate deep neural networks.
The newly announced GPU, named GP100 is the first of the Pascal family of GPUs from NVIDIA running on the 16nm FinFET process from TSMC and uses the company’s latest GPU architecture. The GP100 is designed first and foremost for the datacenter in an NVIDIA Tesla Compute card format which is for DNN, cloud, enterprise and other HPC purposes. I expect the GP100 will eventually find its way into the consumer market as a gaming card with many changes, but its primary purpose is to serve as an enterprise acceleration processor. Because of Pascal’s performance, power and software capabilities it will really start to challenge CPU-driven DNN. It also utilizes NVIDIA’s latest CUDA 8 programming language which has become the de-facto standard in GPU computing since it started nearly a decade ago.
As has been made quite clear with IBM, Google and Baidu’s adoption of GPUs for DNN workloads, GPUs are currently a better choice versus FPGAs in training. FPGAs may still have a role, but they are likely more useful in production. The GP100 GPU itself is a 15.3 billion transistor chip built on TSMC’s 16nm FinFET process, NVIDIA is able to cram these 15.3 billion transistors on a 610mm^2 chip which is actually larger than the previous generation even though the previous generation was a 28nm chip. Pascal is effectively a full node shrink from the previous generation Maxwell which fit only 8 billion transistors into 601mm^2 effectively the same amount of space. Pascal also increases the amount of FP32 CUDA shader cores from 3072 to 3584 which is a pretty sizable increase and helps deliver 10 TFLOPS of performance.
Read More: http://www.forbes.com/sites/patrickmoorhead/2016/04/11/nvidia-extends-their-datacenter-performance-lead-in-neural-network-computing-at-gtc16/#70bd7c796818
The GP100 is for Deep Neural Networks
The newly announced GPU, named GP100 is the first of the Pascal family of GPUs from NVIDIA running on the 16nm FinFET process from TSMC and uses the company’s latest GPU architecture. The GP100 is designed first and foremost for the datacenter in an NVIDIA Tesla Compute card format which is for DNN, cloud, enterprise and other HPC purposes. I expect the GP100 will eventually find its way into the consumer market as a gaming card with many changes, but its primary purpose is to serve as an enterprise acceleration processor. Because of Pascal’s performance, power and software capabilities it will really start to challenge CPU-driven DNN. It also utilizes NVIDIA’s latest CUDA 8 programming language which has become the de-facto standard in GPU computing since it started nearly a decade ago.
Significant compute cluster performance increase via brute force
As has been made quite clear with IBM, Google and Baidu’s adoption of GPUs for DNN workloads, GPUs are currently a better choice versus FPGAs in training. FPGAs may still have a role, but they are likely more useful in production. The GP100 GPU itself is a 15.3 billion transistor chip built on TSMC’s 16nm FinFET process, NVIDIA is able to cram these 15.3 billion transistors on a 610mm^2 chip which is actually larger than the previous generation even though the previous generation was a 28nm chip. Pascal is effectively a full node shrink from the previous generation Maxwell which fit only 8 billion transistors into 601mm^2 effectively the same amount of space. Pascal also increases the amount of FP32 CUDA shader cores from 3072 to 3584 which is a pretty sizable increase and helps deliver 10 TFLOPS of performance.
Read More: http://www.forbes.com/sites/patrickmoorhead/2016/04/11/nvidia-extends-their-datacenter-performance-lead-in-neural-network-computing-at-gtc16/#70bd7c796818
No comments:
Post a Comment