Home News Meta turns to the hot AI track, chip, productization and supporting catch-up...

Meta turns to the hot AI track, chip, productization and supporting catch-up are the challenges

0

Meta is facing a thorny problem: despite spending huge sums of money on AI research, product transformation is progressing slowly, and it was not until ChatGPT became popular that it began to pay attention. Meta has not been able to deploy more expensive chips such as GPUs for generative AI, but relies on CPUs. When Meta turned to multibillion-dollar Nvidia GPUs when it didn’t work out, it was already being shunned by giants like Microsoft and Google. Currently, Meta plans to develop a new type of chip that can train AI models and perform inference like a GPU, and plans to complete it around 2025. In addition, the relevant person in charge also said that Meta’s tools and processes for AI development clearly need improvement.

Meta’s internal emails show that at the end of the summer of 2022, Meta CEO Mark Zuckerberg summoned his key assistants to conduct a five-hour analysis and discussion on Meta’s computing power, focusing on the development of Meta. Ability to work with advanced artificial intelligence (AI).

Meta

According to internal emails, company announcements, and people familiar with the matter, Meta is facing a thorny problem: how to introduce AI-friendly hardware and software systems into its main business, despite massive investments in AI research. Meta progress is slow. With Meta increasingly relying on AI to underpin further growth, this has affected the pace at which the company can drive sweeping innovation.

The email from Santosh Janardhan, Meta’s new head of infrastructure, states: “In terms of development for AI, we are clearly behind in terms of tools, workflows and processes that need to be addressed. Big investment.” The email was posted on Meta’s intranet in September and was first exposed recently.

Underpinning AI efforts will require Meta to “fundamentally change the way hardware infrastructure is designed, software systems, and methods of providing a stable platform,” the email said.

For more than a year, Meta has been working on a massive project to fill in the gaps in its AI infrastructure. While Meta has openly admitted to being somewhat behind on AI hardware development, details of this, including computing pressure, management changes and an abandoned AI chip project, have never been reported before.

Regarding the email and related reorganization, Meta spokesman Jon Carvill said, “With deep expertise in AI research and engineering development, Meta is building and deploying state-of-the-art infrastructure at scale. has been proven.”

“As we deliver new AI experiences for apps and consumer products, we’re confident we can continue to expand our infrastructure capabilities to meet short- and long-term needs,” he said.

But he declined to comment on the news that Meta abandoned the AI chip project.

According to the information disclosed by Meta, the restructuring has caused the company’s capital expenditure to increase by about $4 billion per quarter, almost doubling from 2021, and has caused the construction plan of the data center in 4 locations to be suspended or cancelled.

Meta is also facing financial pressure. Since November, Meta has initiated mass layoffs not seen since the dot com bust of the millennium.

On the other hand, Microsoft-backed OpenAI released ChatGPT on November 30 last year. This AI chatbot quickly became the fastest-growing consumer application in history and triggered an AI arms race among technology giants. Big tech companies are rolling out their own generative AI products. In addition to recognizing patterns in data, this AI can generate text and visual content in a human-like way based on input.

Generative AI consumes a lot of computing power, making Meta’s need to expand its computing infrastructure even more urgent, multiple sources said.

  1. Insufficient investment in money-burning projects

A big part of the problem, the sources said, was that Meta was belatedly starting to introduce GPU chips to its AI work. GPU is very suitable for AI computing, and can perform a large number of tasks in parallel, greatly reducing the time-consuming processing of massive data. Of course, the price is also more expensive, and 80% of the market share is in the hands of Nvidia.

Therefore, Meta largely relies on the CPU to undertake AI computing tasks. CPUs, the workhorse chips of the computer industry, have filled data centers around the world for the past few decades, but are less well suited to handle AI computing tasks.

Meta also uses custom chips designed in-house for AI inference, according to two sources. However, in 2021, it turns out that using CPUs and custom chips is slower and less efficient than GPUs in AI. In addition, GPUs are also more flexible in running different types of AI models than the chips employed by Meta.

Meta declined to comment on the performance of its AI chips.

As Mark Zuckerberg pushes Meta to pivot to the Metaverse, computing pressures have affected Meta’s ability to deploy AI to counter competitive threats, such as the rise of social media rival TikTok and Apple-led changes to ad privacy policies, the sources said.

The setbacks also caught the attention of former Meta board member Peter Thiel. He resigned from Meta’s board in early 2022 without explanation.

At a board meeting before his resignation, Thiel told Meta executives that they were too complacent about Meta’s core social media business and too obsessed with the Metaverse, people familiar with the matter said.

  1. Switch to GPUs, but fall behind

In 2022, after canceling plans for a large-scale deployment of custom inference chips, Meta executives instead began buying multibillion-dollar Nvidia GPUs, one source said. By this time, Meta had fallen significantly behind rivals like Google. Google began deploying a custom version of the GPU, known as the TPU, as early as 2015.

In the spring of 2022, Meta executives also set out to restructure Meta’s AI division, appointing two new engineering leads, including Janalhan, the author of the September email. More than a dozen managers have left Meta during the months-long turmoil, according to profiles on the LinkedIn platform and people familiar with the matter. The management team of the MetaAI infrastructure was almost completely replaced.

Next, Meta began redesigning its data center infrastructure to accommodate the GPU chips it was about to deploy. Compared with the CPU, the GPU consumes more power and generates more heat, and needs to connect a large number of chips through a specially designed network to form a cluster.

According to Janalhan’s email and information provided by sources, these facilities need 24 to 32 times the network capacity, as well as a new water cooling system to manage the heat dissipation of the chip cluster, so the relevant facilities need to be “completely redesigned”.

As work progressed, Meta laid out internal plans to begin developing a new chip of its own. The chip, which can train AI models and perform inference like a GPU, is currently scheduled for completion around 2025.

Some data center construction projects, currently on hold and transitioning to a new design, will restart later this year, Meta spokesman Cavill said. He declined to comment on the chip project within Meta.

  1. The progress of product landing is slow

In the process of expanding GPU computing power, Meta currently has almost no new product technology to show. In contrast, companies like Microsoft and Google are pushing generative AI products for public use (Bing Chat, Bard, etc.).

In February, Meta CFO Susan Li admitted that not much computing power is currently being devoted to generative AI. She said, “All of our AI capabilities are basically given to advertisements, information streams and short video Reels.”

According to sources, Meta didn’t take a generative AI product seriously until after the launch of ChatGPT last November. They said that although FAIR, Facebook’s AI laboratory, has been releasing prototypes of related technologies since late 2021, it has not turned research into products.

That’s changing as investor interest picks up. In February, Mark Zuckerberg announced the creation of a top-tier generative AI team that would “significantly advance” the company’s work in the field.

Meta chief technology officer Andrew Bosworth also said this month that generative AI is the area where he and Mark Zuckerberg spend the most time so far, and expects to release a related product this year.

Two people familiar with the new team said that the team’s work is in the early stages, focusing on building a basic model as the core, which can be adjusted for different product needs in the future.

Many teams at Meta have been working on generative AI products for more than a year, according to Meta spokesman Carville. In the months since ChatGPT arrived, work has picked up speed, he confirmed.

Exit mobile version