Home News Tesla disclosed the new progress of its self-developed Dojo supercomputing project

Tesla disclosed the new progress of its self-developed Dojo supercomputing project

0

As Tesla’s own custom supercomputing platform, Doji is designed to support the company’s artificial intelligence (AI) / machine learning (ML) applications. In particular, it uses video data captured by its fleet of self-driving vehicles to conduct relevant training. The electric car giant already has a large NVIDIA GPU-based supercomputer, but Dojo is already using Tesla’s own chips and full infrastructure.

Electrek expects the custom supercomputer to enhance Tesla’s ability to train neural networks using video data, and the related computer vision technology could provide critical support for its autonomous driving efforts.

In fact, as early as last year during the AI Day event, Tesla has said it was working on a Dojo supercomputer. On top of each chip+training block, the company is also working to build a complete Dojo cabinet (or Exapod cluster).

After a year, Tesla unveiled the latest progress on its Dojo project during Friday’s AI Day event.

The company confirmed that it is now able to replace six GPUs with a single Dojo block (tile) at a much higher cost.

More specifically, the company has successfully evolved from a set of “chips + training blocks” to the current “system trays” / complete cabinets.

Each tray contains six of these computing blocks and has the performance of 3 to 4 fully loaded supercomputing racks.

The company is currently integrating host interfaces into system trays to build a complete set of mainframe components to install these system trays into a Dojo enclosure.

However, the company still needs to conduct more R&D testing on several cabinets before they can be combined into the infrastructure required for the Dojo Exapod.

Bill Chang, Principal Systems Engineer at Dojo, added — In response to the unprecedented heat and power density, they had to revisit every aspect of the data center infrastructure to develop custom high-performance heat and power systems.

Embarrassingly, the infrastructure testing earlier this year also caused a huge shock to the local power grid’s substation. They ended up pushing power consumption to over 2 megawatts before causing the grid to trip and getting a “hello” call from the government.

Finally, Tesla shared key specs for the Dojo Exapod — 1.1 EFLOP @ BF16 / CPP8 performance, 1.3 TB of SRAM, and 13 TB of high-bandwidth DRAM.

If all goes well, the company plans to have its first full Exapod cluster in Q1 2023 (it currently plans to have 7 in Palo Alto) while trying to leverage the event to recruit more talent.

Exit mobile version