Over the past few years, NVIDIA has been deep into the AI space, and their GPUs have become the first choice not only for HPC but also for data centers, including AI and deep learning ecosystems. In a newly published developer blog post, NVIDIA announced that they are using AI to design and develop GPUs, and that their latest Hopper GPU has nearly 13,000 circuit instances that were created entirely by AI.
In a new blog posted on NVIDIA Develope, the company reiterates its strengths and how it itself used its AI capabilities to design its most powerful GPU to date, the Hopper H100. NVIDIA GPUs are primarily designed using state-of-the-art EDA (electronic design automation) tools, but with the help of AI that leverages the PrefixRL approach help, using deep reinforcement learning to optimize parallel prefix circuits, the company was able to design smaller, faster and more energy-efficient chips while delivering better performance.
Arithmetic circuits in computer chips are constructed using networks of logic gates (such as NAND, NOR and XOR) and wires. The ideal circuit should have the following characteristics.
● Small: Smaller area so that more circuits can be mounted on the chip.
● Fast: lower latency to improve chip performance.
● Consume less power: lower power consumption of the chip.
NVIDIA has designed nearly 13,000 AI-assisted circuits using this approach, reducing their area by 25% compared to equally fast and functionally identical EDA tools. But PrefixRL was mentioned as a very computationally demanding task, and for each GPU physically simulated, it required 256 CPUs and over 32,000 GPU hours. To remove this bottleneck, NVIDIA developed Raptor, an in-house distributed reinforcement learning platform that specifically leverages NVIDIA hardware for this industrial reinforcement learning.
Raptor has several features that improve scalability and training speed, such as job scheduling, custom networks, and GPU-aware data structures. In the context of PrefixRL, Raptor enables mixed job allocation across CPU, GPU, and Spot instances.
The networks in this reinforcement learning application are diverse and benefit from the following.
● Raptor’s ability to switch between NCCLs for peer-to-peer transfers to transfer model parameters directly from the learner GPU to the inference GPU.
● Redis for asynchronous and smaller messages, such as rewards or statistics.
● A JIT-compiled RPC for handling high-volume and low-latency requests, such as uploading experience data.
NVIDIA concluded that applying AI to real-world circuit design problems could lead to better GPU designs in the future. The full paper is here, and you can visit the developer blog here for more information.