Tesla 2022 AI Day meeting minutes, this AI Day mainly showcased the humanoid robot Optimus, the self-driving development process, and Dojo supercomputing. The Tesla humanoid robot made its debut and did a dance move. After that, the company showed some video clips of the robot doing other tasks, such as picking up boxes and watering. Finally, Tesla showed the shape of the robot that will be mass-produced, though it was not fully functional and was only briefly demonstrated.
Optimus will likely be produced in the millions subsequently, and because the robot uses vision algorithms, computing chips, and batteries that are shared with Tesla’s car line, it is not a complete scratch product and can get better cost control than other humanoid robots. The future price may be less than $20,000.
The latest generation uses a 2.3kWh capacity battery with an integrated design that can increase the voltage by 52 volts.
The chip part uses Tesla’s fully automated driving chip, but changes have been made for specific software and hardware adaptations.
For the different movements of the robot, the 28 joints of the torso are simulated for force power consumption, and six separate motors are designed to simulate muscles, which can achieve huge power in small size and weight.
The palm design partly accomplishes 11 angles of free movement through six actuators and is able to lift more than 20 pounds of objects and use tools.
Through AI calculations, the robot can combine the target path and related trajectory to generate the best plan, and when there is an unexpected situation in the outside world, it can also adapt to the action to achieve the naturalness of the robot’s movement based on stability, which is the indoor embodiment of Tesla’s vision solution.
At present, the development of Optimus is still in its infancy, and there are still many points that need to be optimized.
In 2021, roughly 2,000 users used the FSD Beta software developed by Tesla until September 2022, 160,000 users used the software.
FSD Beta can realize functions such as automatic parking, driving according to traffic lights or traffic signs, and turning at intersections with other vehicles.
By collecting large amounts of data, generating corresponding images and training powerful neural network models
Tesla’s inference system can assign the execution of a single neuronal network on two independent computers that are interconnected with the Autopilot system.
The trajectory scoring system includes four areas: collision prediction, comfort analysis, intervention likelihood, and anthropomorphic discriminator.
Tesla’s Autopilot system uses a full range of 3D technology to predict the likelihood of occupancy of the surroundings, identify cars, people, etc., and recognize randomly moving objects on the road, operating at a speed of 10 milliseconds.
Based on image processing, the vehicle can identify moving and non-moving objects, predict the direction in which objects will move, and can recognize road conditions and drive at reduced speeds on downhill sections.
Tesla built three supercomputers with 14,000 GPUs, 10,000 of which are used to train the system and run it, and 4,000 are used for automatic tagging.
Training with the optimized video model improved the training speed by 30% and the IOPS storage performance by four times.
FDS Lanes: The goal is to generate a comprehensive set of roads and connections to different roads through a neural network system. The neural network system consists of a visual part, a map part, and a language part. The vision part converts the data from the 8 cameras on the vehicle into visual representative data by means of a converter, which is passed into the next part. In the map part, the data is processed by the existing road navigation model to enhance the incoming data and topologize the incoming data. In the language section, these complex data sets are used to predict road conditions and the corresponding connectivity and encode this information into a language specific to Tesla. With this system, all operations can be compiled into the training engine in a simple and complete way.
The FDS road neural system can process up to 75 million parameters in a latency of 9.6 milliseconds while maintaining energy consumption at around 8w.
In total, the FDS neural system is capable of processing over 1 billion parameters, over 15w neural network layers and over 37.5 million neural nodes simultaneously. To achieve this, each level needs to be optimized.
Tesla’s automatic labeling system also becomes more efficient, collecting information about the environment, reconstructing a 3D model that can be used for training, and then having the system complete automatic labeling on the model, with a small amount of manual correction for special markers. In the current Tesla data labeling system has been run through, and the efficiency has been greatly improved compared to the previous one.
The data engine is the process of improving the performance of neural networks through data. Using the data engine function, the accuracy of vehicle actions has been rapidly improved, from about 88% last September to more than 99% at present.
The working mode of the data engine: In shadow mode, Tesla Motors continuously collects data while driving and defines the vehicle’s own decisions as correct if they are consistent with the driver’s decisions, otherwise they are defined as inaccurate, and puts this data with the defined results into the evaluation set. The most meaningful data in the evaluation set is labeled and added to the training set, and deep learning is used to train the online and offline models, and finally, the updated models are then updated to the vehicle configuration.
FSD Beta may open at the end of this year for the global push.
The Dojo is designed to be cheaper and more powerful than commercially available cloud computing.
Dojo was designed from the beginning with the hardware level in mind for deep neural network training, so the entire Dojo system from the chip to the unit to the transmission bandwidth of the server room is very substantial, and Tesla also applied the Occupancy Network to the Dojo system to achieve a better match between AI hardware and AI software, and finally, the results achieved in reducing latency and performance loss The results in terms of reduced latency and performance loss are impressive.
The voltage regulation module can deliver 1000A of current with ultra-high density, utilizing multiple layers of vertical power management material transitions.
Tesla’s future goal is to reduce CTE by 54% and improve performance by a factor of 3. Increasing density is the core and cornerstone of improving system performance.
System Tray parameters: 75mm height, 54 PFLOPS (BF16/CFP8), 13.4 TB/S (para-split bandwidth), 100+ KW Power
Standard Interface Processor parameters: 32GB (high bandwidth dynamic random access memory), 900 TB/S (TTP bandwidth), 50 GB/S (Ethernet bandwidth), 32GB/S (4th generation PCI bandwidth)
High Interface Processor parameters: 640GB (High Bandwidth Dynamic Random Access Memory), 1 TB/S (Ethernet Bandwidth), 18 TB/S (Aggregate Bandwidth To Tiles)
[Dojo system build goal: solve constrained models that are hard to scale]
Single gas pedal to forward and backward channels, then to optimizers, then to multiple gas pedals to run multiple copies of the process; linearly scalable; larger activation models trying to run forward channels will encounter the problem that the batch size suitable for a single gas pedal is often smaller than the batch specification surface; synchronous batch specification mode set on multiple gas pedals; communication bottlenecks; models do not scale linearly
High-density integration is designed to accelerate the computationally constrained and latency-constrained parts of the model; a slice of the Dojo grid can be split to run the model (as long as the slice is large enough); fine-grained synchronization primitives in uniform low latency accelerate parallelism across integration boundaries; Tensors are stored as RAM for Chardon and replicated in time for execution at each layer; another of the tensor replication data transfer overlaps with the computation, and the compiler can also recompute the layers.
Stable diffusion model: compiler is mapped in model parallelism; communication phase starts with nodes computing local mean and standard deviation; coordination continues in parallel afterwards; the expectation is that 350 nodes on each die coordinate by mean and standard deviation values
Compiler operation: extraction of the communication tree from the compiler; real hardware time nodes with intermediate radiation reduction values accelerated by hardware; this operation takes only 5 microseconds on 25 Dojo compilers and the same operation takes 150 microseconds on 24 GPUs. This is an order of magnitude improvement over the GPU.
Peak memory usage: Dojo is built to solve larger complex models; two current GPU cluster usage models, one is an auto-labeling network (for generating ground truth offline models) second, an occupancy network (large models of high arithmetic strength).
Test results: Measurements of GPUs and Dojo on a multimode system show that it can already outperform any 100 older generation PRMS running with current hardware; doubled the throughput of A100; key compiler optimizations achieve more than 3x the performance of M100.
The result: one Dojo tile replaces six ML computers on GPU boxes at a cost less than one GPU box. A network that took over a month to train now takes less than a week
Problem: Too much computation causes the data loader running on the host computer to simply not keep up with the ML hardware.
Extended transport protocol; built Dojo network interface card; added data loader host with DINA card; reconnected mesh via Ethernet switch. Optimized the occupancy from 4% to 97%, and actually expect this number to reach 100% soon.
Build high arithmetic strength auto-labeling network: serving a single giant gas pedal; pytorch layer fully up to speed as expected; high-performance dense computer expects to provide the same throughput with only 4 Dojo’s; plan to build in the first quarter of 2023.
Results show: 6 chips densely integrated; 54 petaflops of computation; 640 gigabytes of high-bandwidth memory; new versions of cluster components and continuous improvements; next generation will achieve 10x improvement
1: Why to use the tendon method to drive the robot, we all feel that the reliability of using tendon drive is low. Also, why use spring?
A: First of all, the reliability of metal cable as the tendon is still very high. At the same time, the tendon solution has lower energy consumption, and we can find a similar solution with tendons and springs in human hands. We use the tendon to contract and the spring to stretch.
Elon Musk added: Because we want to mass produce quickly, we won’t wait until all the problems are solved. We want to deploy the robot in the factory and see what the robot has to offer. Of course, this is our first version, later there will definitely be 2.0, and 3.0 hand architecture will be upgraded.
2: Will the robot have a personality? Will it tell jokes with us and become our friends?
A: Of course, as long as the core artificial intelligence and key actuators are solved, people may give robots to wear a variety of clothes, and the future will be very interesting.
3: Wondering if there will be interventions between humans and robots, such as marking when humans disagree with what is happening?
A: If a robot does something bad, we will be monitoring the robot remotely.
Elon Musk added: We want our robots to become more human-like than in science fiction, and as AI develops, we can mimic learning to be more human-like, and it can perform simple commands, or even perform the actions you want. So you can give it an advanced command, and then it can break it down into a series of actions and take those actions.
4: Earlier you said that robots would significantly increase the quantum of socioeconomic output. At the very beginning, you said that Tesla’s mission was to drive the world’s transition to sustainable energy. Is that still the mission for robots? Will Tesla change its mission to “transforming the world to infinite productivity”?
A: The emergence of robots is certainly furthering the world’s transition to sustainable energy. I’m also excited about what robots will be able to do in a few years; you must be interested in finding out where technology is going to be in a few years. Me, I’m also very interested.
5: Will robots have the ability to talk later on? What is the ultimate goal for robots?
A: Of course, it will have the ability to talk, and there will definitely be an interesting endgame behind the robot, perhaps similar to the movie Terminator. But we will be very careful about the safety of the robot, we will have a “stop button”. There will be a local ROM in the robot that cannot be upgraded over the network. This is very important for security. It will be fun and not boring.
6: What is the goal of the Dojo project? To lend the same arithmetic power as Amazon Cloud? I see with 7nm, then the investment is also very large. What about commercialization?
A: Dojo is a large computer, and it makes sense to do the same as Amazon, which is the most efficient. The world is in transition with software 2.0 (software 2.0: replacing logic programming with neural networks), and the latter software will have many neural networks. This requires Dojo.
7: Will robots understand our emotions, our art? How can robots subsequently serve our creativity?
A: As with DallE-2, robots can already create art. The future is very interesting.
Ashok added: Robots can create physical art, like dancing. Artificial intelligence painting is digital art.
8: Tesla’s Autopilot model is inspired by natural language processing models, and would like to know the history of this, why it was done, and how much it has improved since using language models?
A: Two aspects: the first is that we used a dense network to train roads before, and the previous model couldn’t handle dense data. Also, road prediction is a multi-model problem, sometimes we can’t know what’s on the other side of the road, and we want the model’s predictions to be coherent, which the language model can provide.
9: How does FSD’s neural network do unit testing?
A: Besides software testing, there is neural network testing. For a neural network, we will throw all the historical data of previous errors to see if he will perform better. At the same time, we have shadow mode, we push the neural network quietly to the user, and the user is assisting us in QA testing. We do 9 rounds of testing before pushing it to the user, and our infrastructure ensures that the session is efficient.
10: Ask a question about the underlying models, we see that large models are now performing better, for example, from GPT-3 to PaLM, we find that large models can do inference. Do you think we should increase the amount of data and the number of parameters so that we can get a “Teacher Model” that can solve all the problems and then cut out a student model to be the base model on the street?
A: This is how we do our annotation system. Our cloud-based annotation model is very large, and we deploy a small part of it on the cart side. Regarding the base model, our dataset is several petabytes, and the model performs very well on such a large dataset. People say we can’t do perception with cameras, but look how well we do it with big data. We do tailoring on such a model, and what comes out at the end is what you see.
11: At the very beginning Lao Ma said Tesla is doing General Artificial Intelligence (AGI), want to know how the company ensures security?
A: I think there should be an AI legal body, such as managing humanoid robots, self-driving cars and so on. We think there should be a referee, just like we do with drugs. When the robots hit the streets, they will collect data through cameras, and this data set will be the largest data set in the world. By then the company will have a large contribution to general artificial intelligence (AGI) with the help of the data set and training models.
12: What is the difference between the perception of a Semi truck and that of a passenger car?
A: Human driving is two eyes + one brain, and the brain is still very slow to react. The car can definitely do better with 8 cameras and a high-speed calculator.
13: Can the robot be installed and deployed with different software and hardware?
A: Our neural network doesn’t support it, next question.
14: Right now there is FSD in the US and Canada, what are the bottlenecks in rolling out to other countries? I also noticed that you want to fuse low-speed and high-speed scenes into the same neural network, what is the progress now?
A: Technically it can be rolled out by the end of the year, but there are regulatory requirements in different countries. We are waiting for regulatory approval. Technically, it will be ready by the end of the year. We will push the new version in North America next month, and it’s big progress. Before Autopilot and FSD were very different, but now they are more and more alike. A few months ago, we used the same vision stack for Autopilot and FSD. FSD uses a more complex model, but in terms of road detection, Autopilot and FSD are still different.
Another colleague added: FSD with parking will be available by the end of the year, and by then parking lot to parking lot Autopilot will be possible.
Maska added: The number of interventions per mile is an important metric that we are improving with the naked eye.
15: A question for each, if you went back to 20 years old, what would you like to say to your 20-year-old self and offer any advice?
A: Let him join Tesla, haha. More contact with smart people; read more books; don’t stress too much, cherish the moment, and it’s nice to stop and smell the roses on the side of the road. When I was doing the Falcon rocket, the test site was by a very beautiful beach, and we didn’t have a single coffee by the beach, I should have had one.
16: Elon Musk you are in the same state of doing robotics now as you were 10 years ago when you were doing autopilot, but autopilot development seems to be more difficult than expected. What’s your approach to making robotics and AGI come faster?
A: AGI is growing fast, AI is now winning all the rule-based games, drawing, writing, it can all be done. there is also a lot of AI talent, and AI’s capabilities are growing linearly. Tesla has strong actuator development capabilities and compared to four-wheeled robots, bipedal robots can do it if they get the actuators right.
17: Is the next super factory robot-only? When will we be able to order our own robots?
A: We will find some simple jobs in the factory for the robots, such as loading and unloading jobs. Later on, in expanding the boundaries of the robot’s capabilities. As for when can we buy robots? I don’t know, 3-5 years. 3-5 years later people will be able to receive it.
18: Will the robot software be open source?
A: Beware of people using robots to do bad things. There are security-related issues that need to be dealt with, so the odds are not good.
19: How big is the bandwidth for robots?
A: You need to figure out what you want the robot to do, and translate that into questions like how high to lift the arm. Then you can answer the bandwidth question.
20: What is the unique thing that made Tesla so great?
A: Tesla is big now, and there are many experts in all areas. We went from electric cars to unmanned electric cars. I think my role is to provide the environment for great engineers to develop. There are some companies where the capabilities of employees are suppressed. There are companies in Silicon Valley where employees are not developed in the company. Tesla is not one of those companies, and employees join Tesla with capabilities that are not available in other companies.
21: How do companies balance the risk of a crash with driving performance in FSDs? Do you think FSD regulation should be more transparent?
A: First of all, the passive safety of Tesla cars is the strongest of all cars. In addition to active safety, we publish the accident rate of cars without Autopilot, cars with Autopilot and cars with FSD, and cars with FSD have the lowest accident rate. In addition, FSD can avoid accidents, but people who are saved are not aware that they are saved. So we have to look at the overall accident rate. Deploying FSD is definitely safer than not deploying it.
22: Why do robots need to be left-right balanced? Humans have left and right-handedness and robots are designed in this way, won’t one side break faster? Also, humans sometimes have some “mythical” ideas that they can have longer hands and can easily reach things further away.
A: Right now we still want to produce a useful robot to help people as soon as possible. Producing a useful robot is the hardest part, we will measure the utility of a thing, like how many people it helped today and how it helped, which brings us back to reality. It’s extraordinarily difficult for a company to mass produce something that people like and is useful. Looking ahead, if we make a robot with 8 arms, or open an interface for other companies to use. Other companies can do plug-ins based on robots.