April 25 – An internal memo reveals that in late summer 2022, Meta CEO Mark Zuckerberg gathered a team of company executives to analyze the company’s computing power for five hours, particularly its ability to handle cutting-edge artificial intelligence.
The memo noted that despite Meta’s high-profile investments in AI research and the company’s growing reliance on AI to support its growth, the social media giant has been slow to adopt expensive AI-optimized hardware and software systems for its major businesses, which has hampered its ability to keep pace with innovation as it scales. To support AI efforts, Meta needed to “fundamentally change our physical infrastructure design, software systems and approach to delivering a stable platform.
According to the company, the restructuring has increased Meta’s capital expenditures by about $4 billion per quarter, almost twice as much as in 2021, and has caused it to suspend or cancel plans to build data centers in four locations.
And Meta is facing serious financial difficulties, as the company has been making unprecedented layoffs since last November.
Meanwhile, ChatGPT came out of nowhere last November, sparking competition among tech giants who have been releasing generative AI products. And five sources said that generative AI requires a lot of computing power, which has increased the urgency of Meta’s expansion.
Meta’s slow adoption of GPUs for AI is one of the main issues, the sources said. GPU chips are ideal for AI processing because they can perform a large number of tasks simultaneously, reducing the time needed to process billions of pieces of data. However, sources say GPU chips are expensive, and chipmaker Nvidia controls 80 percent of the market and maintains a leadership position in the corresponding software.
Until last year, Meta mainly used a lot of ordinary CPUs to run AI workloads. CPUs are the workhorse chips of the computer world, and while they have dominated data centers for decades, they have underperformed for AI work.
This has led to competitors outpacing Meta in the AI space, using GPU chips and having better AI software so they can develop new AI products and services faster.
Meta also started using its own custom chips designed in-house to train AI, but by 2021, this two-pronged approach was proving slower and less efficient than methods built around GPUs, which were also more flexible than Meta’s chips in running different types of models, according to the two sources.
Later, as Mark Zuckerberg shifted the company to the Meta universe, the lack of arithmetic power left the company unable to deal with threats, including the rise of TikTok and Apple-led changes to ad privacy.
Those issues raised concerns for former Meta board member Peter Thiel, who resigned from his post in early 2022 without explaining why. At a board meeting before he left, Thiel pointed out that Mark Zuckerberg and his executives were too focused on growing the metaverse at the expense of Meta’s core social media business, leaving the company vulnerable to competitors such as TikTok, according to two people familiar with the matter.
Meta had planned to launch a custom chip in 2022, but later abandoned it in favor of a multibillion-dollar order for Nvidia GPU chips that same year. By this time Meta was already behind peers like Google, which in 2015 began deploying its own version of a custom GPU, called TPU.
Meta next began reorganizing its AI department, appointing two new engineers to lead it. During this time, dozens of executives left Meta and almost all of the AI infrastructure leadership was replaced.
Next, Meta began to retrofit its data center to accommodate the introduction of GPUs, which required more power and generated more heat, and had to be tightly clustered together with dedicated network connections between them. This work required significant network capacity and a new liquid cooling system to manage the heat of the clusters, so they needed to be “completely redesigned”.
As the work progressed, Meta began internal planning for a new, more ambitious chip, similar to a GPU, capable of both training AI models and reasoning. The project will be completed around 2025, two sources said.
Jon Carvill, a spokesman for Meta, declined to comment on the chip project.
While Meta is scaling up its GPUs, companies like Microsoft and Google are promoting commercial generative AI products, and Meta hasn’t made much substantial progress on that front.
Meta’s chief financial officer acknowledged in February that the company is not currently using most of its computing power for generative work. “Basically all of our AI capabilities are used for advertising, dynamic messaging and Reels,” she said, referring to Meta’s TikTok-like short videos, which are popular with younger users.
Meta didn’t start prioritizing generative AI products until after ChatGPT launched last November, according to the four sources. While the company’s AI research department has been releasing prototypes of the technology since late 2021, it has not focused on turning them into products. However, as investor interest grew, Mark Zuckerberg announced a new high-level generative AI team in February that he said would “accelerate” the company’s work in this area.
Chief Technology Officer Andrew Bosworth also said this month that generative AI is the area where he and Mark Zuckerberg are spending most of their time, and predicted that Meta will launch new products this year.
Two people familiar with the new team said the team’s work is in the early stages and is focused on building the base model, a core process that can later be fine-tuned and adapted to different products.
Meta spokesman Carvill said the company has been developing generative AI products on different teams for more than a year. He confirmed that the work accelerated in the months following the launch of ChatGPT.