According to news on September 3, Triton is a Python-like Open-source programming language, which enables researchers without CUDA experience to write efficient GPU codes (can be understood as a simplified version of CUDA), and it is said that Xiaobai can also write codes comparable to professionals. Less effort to achieve maximum hardware performance, but Triton initially only supports Nvidia GPUs.
OpenAI claims that Triton can achieve comparable performance to cuBLAS on FP16 matrix multiplication with only 25 lines of code.
From Github, we can see that OpenAI has begun to merge AMD ROCm-related branch codes in the latest Triton version, which has exposed a lot of things. In other words, the latest Triton backend has been adapted to the AMD platform, which is of great significance.

Officially, they have passed most of the unit tests on “test_core.py”, but skipped some tests for various reasons.

OpenAI also announced that it will hold the Triton Developer Conference at the Microsoft Silicon Valley Park in Mountain View, California, from 10:00 am to 4:00 pm on September 20th, and the schedule includes “Introducing Triton to AMD GPU ” and “Triton’s Intel XPU”, it is expected that Triton will soon get rid of the history of NVIDIA CUDA monopoly.

It is worth mentioning that Triton is open-source. Compared with closed-source CUDA, other hardware accelerators can be directly integrated into Triton, which greatly reduces the time to build an AI compiler stack for new hardware.
In the previously released PyTorch 2.0 version, TorchInductor introduced OpenAI Triton support, which can automatically generate fast code for multiple accelerators and backends, and at the same time implement Python instead of CUDA programming to write code for the underlying hardware. In other words, Triton is already a key component of the PyTorch 2.0 backend compiler.
In fact, previously AMD ROCm mainly used the Hipify tool to achieve CUDA compatibility, and as AMD began to provide ROCm support for RDNA 3 consumer graphics cards, it is expected that more platforms will choose to adapt to AMD hardware in the future.
