In the Post-Moore era, the process technology has been gradually approaching the physical limit, with the speed of progress is gradually slowing down; the computation mode of semiconductor chips is also changing from all-purpose towards specific-purpose.
Siege GPU, strong competitors with new architectures!
The threat that GPU faces in the AI industry comes from strong competitors in new architectures. There are veteran giants such as Intel, who has been investing heavily in AI chips and new architectures.
The unicorn enterprises like SambaNova and Untether AI were both invested by Intel. In addition to launching its own artificial intelligence chip, Google is deeply involved in SambaNova’s investment. Apart from the western start-ups from North America and Europe such as Graphcore, Cerebras, Groq, and Tenstorrent; TensorChip, a new architecture AI chip company from China, has never tried to conceal its ambition of replacing NVIDIA.
SambaNova, invested by Google and Intel at the same time
“The fraction of our chip is better than your entire chip.” As soon as this opinion came out, the entire Silicon Valley turned its attention to the CEO of Sambanova– Rodrigo Liang. As a fast-growing unicorn company, SambaNova has received heavy investment from Google and continuous follow-up investment from Intel. Other participating institutions include SoftBank, Temasek, and Walden International. Sambanova got $676 million in series D which brought its valuation to $5.1 billion. The shots of these top industrial capital have made the industry realize that the War of Replacing GPU has already started.
In the interview, Liang mentioned that he believes only a reconfigurable data stream processor system can keep up with the development trend of the entire industry. Reconfigurable data stream technology based on memory (SRAM) is breaking through the limits of computing limitations of AI hardware and software constantly. Compared with the NVIDIA A100, which is used to be the leading product of the AI benchmark test in data center, Sambanova presents it can provide better performance.
Untether AI, which got three consecutive rounds of investment from Intel.
The Canadian startup Untether AI has announced that it has received $125 million in funding since they established in 2018 to develop its novel computing architecture and provide its customers with powerful computing power support. Untether AI developed a new chip architecture which can increase the speed of data movement by 1,000 times.
Untether’s main product TsunAimi accelerator card is composed of four runAI200 chips that are crafted in 16nm process , providing 2000 TOPS of computing power, which is 16 times of mainstream products’ performance. Compared with SambaNova, Untether AI pays more attention to the improvement of computing power and energy efficiency of AI chips by memory-computing. Based on digital storage computing (SRAM) technology, the computing energy efficiency is increased to 8TOPS/W.
TensorChip, builder of RMU from China
We were surprised to notice the existence of complementary advanced architectures outside of North America. A Chinese company named TensorChip designed a new RMU architecture by combining reconfigurable technology with memory-computing technology. This idea seems very close to SambaNova’s RDU.
TensorChip obtains the high energy efficiency and large computing power, with an energy efficiency ratio of 10TOPS/W. It not only has the reconfigurable ability recommended by SambaNova, but also surpasses the energy efficiency ratio of Untether AI. Compared with Sambanova’s which focus on large AI models, TensorChip’s RMU has a broader application range, covering cloud computing and edge computing.
Groq, created by the former Google TPU team
Groq was founded in 2016, so far, the total funding has reached $362.3 million. Jonathan Ross, CEO of Groq, was involved in the development of Google’s Tensor Processing Unit (TPU), which is a customized chip for accelerating machine learning.
Allegedly, Groq’s architecture pays close attention to the low-latency and single-threaded performance when the Batch Size is 1. For the GPU, when it is used as a processing unit in a machine learning application, once the data is input with a small batch size, the gaps in the data stream will appear to cause the stagnation of the GPU. In this case, the performance will be significantly decreased. On the contrary, the Groq processor is 17.6 times faster than the GPU-based platform when the batch size is 1; it is 2.5 times faster when the batch size is large.
Tenstorrent, the company which AMD Chief Architect Jim Keller currently work at
Tenstorrent was founded in 2016 and has raised $200 million at a valuation of US$1 billion to build a sustainable product route and continue to challenge NVIDIA in the AI market. Its AI chip–Grayskull has a larger on-chip memory (SRAM), while NVIDIA relies on fast off-chip GDDR or HBM. Grayskull uses only 75 watts of power to perform 368 trillion operations per second, while NVIDIA consumes about 300 watts of power to achieve the same performance.
Who will be the overlord of AI computing after NVIDIA
These leapfrog breakthroughs in chip architecture is milestones. These innovations have circumvented GPU patent barriers and opened up new ideas and products in the computing field. SambaNova, Untether AI, TensorChip, Groq, and Tenstorrent, the efforts of these companies are changing the pattern of the AI computing area. Perhaps one day, the latest AI chips will no longer be baked by the Chinese man in a leather jacket (referring to NVIDIA CEO Jensen Huang) from his own kitchen. Although GPU performs well in the field of AI computing, its main job is still graphics rendering and display.
These innovations are catching up with the GPUs and trying to fully convert the world to the AI era. The huge waves of change will engulf each of us and bring unimaginable changes to the world of AI computing.
Media Contact
Company Name: Science and Technology Daily
Contact Person: Sylvia Zhan
Email: Send Email
Country: China
Website: digitalpaper.stdaily.com