Tesla has upgraded its in-house AI supercomputer with hundreds of new Nvidia A100 GPUs. That count has since reached 7,360 A100 GPUs, up from just a hundred or a hundred mm.
According to Tesla Engineering Manager Tim Zaman, this technology makes the AI firm a world-leading multi-switch machine.
An Nvidia A100 GPU is a powerful Ampere architecture solution for datacenters. Yes, it uses the same GPU architecture as GeForce RTX 30-series GPUs, which are some of the best graphics cards currently available. A 100 is still available. It is equipped with 80GB of memory, carries 2TB of bandwidth and requires 500W of power. Besides the technology, the A100 can be implemented to accelerate the tasks in AI, data analytics, and highly-performance computing.
The first system that Nvidia showed that an A100 was the Nvidia DGX a100 which packed in eight atop GPUs connected via six NVSwitch, with a combined 4.8 TBps of bi-directional bandwidth for up to 10 PetaOPS of INT8, 5 PFLOPS of FP16, 2.5 TF32 and 156 TFLOPS of FP64 in one node.
That was eight A100 GPUs. Today’s AI supercomputer has seven 360 GPUs. Tesla hasn’t publicly benchmarked its AI supercomputer, but the similarly-equipped GPU-based NERSC Perlmutter, which has 6,144 Nvidia A100 GPUs, accomplishes 70.87 Linpack petaflops. Using data from other GPU supercomputers as performance reference points, HPC Wire estimates that the Tesla AI supercomputer can reach about 100 Linpack petaflops.
Tesla does not intend to continue the long-term evolution of its high-tech AI supercomputers on Nvidia GPU. This world’s top-tier computer is a simple aural representation of the future Dojo supercomputer, which was first announced in 2020 by Elon Musk. Our model Tesla, powered by the DAX (Pittra) and XP, went on looking at the chips which supplant Nvidia’s GPUs for maximum performance, throughput and bandwidth at any granularity.
The Tesla Dojo D1 is a built-in circuit design, which can be adapted to AI, and that’s one of the first aSICs in this area. The current D1 test chips are made with the TSMC N7 and are in about 50 million transistors.
More information about the Dojo DB1 and Dojo-system could be revealed during the hot chip symposium next week, but three Tesla lectures will be scheduled for next Tuesday. The speakers will address Dojo’s D1 chip architecture, Dojo and ML training, and enable AI through system integration.