Mark Zuckerberg flexes that Meta's cluster of Nvidia H100 chips is bigger than the competition's

Date: 2024-10-31
Mark Zuckerberg at the Meta Connect 2024
Mark Zuckerberg said Meta's Llama 4 AI models were training on a GPU cluster "bigger than anything that I've seen reported for what others are doing."
  • Mark Zuckerberg says Meta's Llama 4 AI models are training on the biggest GPU cluster in the industry.
  • During Meta's earnings call, he said the cluster was "bigger than 100,000 H100s."
  • Elon Musk has said xAI is using 100,000 of Nvidia's H100 GPUs to train its Grok chatbot.

Elon Musk has talked up his AI startup's huge inventory of in-demand Nvidia chips. Now it's Mark Zuckerberg's turn to flex.

A lot of computing power is going into training Meta's forthcoming Llama 4 artificial-intelligence models — more than anything offered by the competition, Zuckerberg said.

During Meta's third-quarter earnings call Wednesday, the Meta CEO said Llama 4 was "well into its development" and being trained on a cluster of graphics processing units bigger than that of its rivals.

"We're training the Llama 4 models on a cluster that is bigger than 100,000 H100s, or bigger than anything that I've seen reported for what others are doing," he said.

That 100,000 number may refer to Musk's AI startup, xAI, which launched its Colossus supercomputer in the summer. The Tesla CEO has called it the "most powerful AI training system in the world" and said xAI was using 100,000 of Nvidia's H100 GPUs to train its Grok chatbot.

Nvidia's H100 chip, also known as Hopper, is highly sought after by tech giants and AI startups for computing power and to train large language models. It costs an estimated $30,000 to $40,000 a chip.

The number of H100s that a company has amassed has factored into hiring top AI talent. Perplexity CEO Aravind Srinivas said in a podcast interview that the topic came up when he tried to poach someone from Meta.

"I tried to hire a very senior researcher from Meta, and you know what they said? 'Come back to me when you have 10,000 H100 GPUs,'" Srinivas said in March.

Meta released its Llama 3 models in April and July. Zuckerberg added in the earnings call Wednesday that Meta's Llama 4 models would have "new modalities, capabilities, stronger reasoning" and be "much faster." The smaller models will probably be ready to launch soon, likely in early 2025, he said.

Asked about Meta's big spending on AI, Zuckerberg said the company was building out its AI infrastructure faster than expected and that he's "happy the team is executing well on that," even if it means higher costs, which is "maybe not what investors want to hear."

Meta expects its capital expenditures to continue to grow into next year as it scales up its AI infrastructure.

The Meta CEO didn't say exactly how big the company's H100 chip cluster was. Meanwhile, Musk said on X earlier this week that xAI would double its cluster size in the coming months to 200,000 H100 and H200 chips.

Read the original article on Business Insider

Leave Your Comments