close
close

OpenAI keynote on building a scalable AI infrastructure

OpenAI keynote on building a scalable AI infrastructure

OpenAI Hot Chips 2024_Page_19
OpenAI Hot Chips 2024_Page_19

At Hot Chips 2024, OpenAI will be giving an hour-long keynote on building a scalable AI infrastructure. This makes a lot of sense, as OpenAI as an organization consumes a lot of computing power and will likely consume even more in the coming years.

Please note that we are running these live at Hot Chips 2024 this week, so please excuse any typos.

OpenAI keynote on building a scalable AI infrastructure

I think most of our readers are familiar with ChatGPT and OpenAI and how LLMs work. We’ll just show the next few slides as I think our readers have that understanding.

OpenAI Hot Chips 2024_Page_03
OpenAI Hot Chips 2024_Page_03
OpenAI Hot Chips 2024_Page_04
OpenAI Hot Chips 2024_Page_04
OpenAI Hot Chips 2024_Page_05
OpenAI Hot Chips 2024_Page_05

In terms of scale, the idea is that GPT-1 was cool in 2018. GPT-2 was more coherent. GPT-3 had contextual learning. GPT-4 is actually useful. Future models with new behaviors are expected to be even more useful.

OpenAI Hot Chips 2024_Page_06
OpenAI Hot Chips 2024_Page_06

An important observation is that scaling creates better and more useful AI.

OpenAI Hot Chips 2024_Page_07
OpenAI Hot Chips 2024_Page_07

The question was how OpenAI would know if training a larger model would produce a better model. OpenAI found that every time the compute power was doubled, it produced better results. The graph below shows a four orders of magnitude increase in compute power, and the scaling still works.

OpenAI Hot Chips 2024_Page_08
OpenAI Hot Chips 2024_Page_08

OpenAI looked at tasks like coding and found that a similar pattern was observed. This was done on a logarithmic mean scale so that pass/fail was not too biased towards solving simple coding problems.

OpenAI Hot Chips 2024_Page_09
OpenAI Hot Chips 2024_Page_09

This is the MMLU benchmark. This is an attempt to be the ultimate machine learning benchmark, but due to logarithmic progression, GPT-4 already achieved ~90% in the test.

OpenAI Hot Chips 2024_Page_10
OpenAI Hot Chips 2024_Page_10

This is a representation of the industry calculations used to train various frontier models. The number has increased fourfold annually since 2018.

OpenAI Hot Chips 2024_Page_13
OpenAI Hot Chips 2024_Page_13

GPT-1 was a box for a few weeks. It was scaled to take advantage of large GPU clusters.

OpenAI Hot Chips 2024_Page_14
OpenAI Hot Chips 2024_Page_14

In 2018, computing power went from growing 6-7x per year to growing 4x per year. The idea is that in 2018, many of the easily achievable goals were achieved. In the future, things like cost and power consumption will be more of a challenge.

OpenAI Hot Chips 2024_Page_15
OpenAI Hot Chips 2024_Page_15

On the inference side, demand is driven by intelligence. The majority of compute power for inference is used for top-end models. The smaller models tend to require much less compute power. The demand for inference GPUs is growing significantly.

OpenAI Hot Chips 2024_Page_16
OpenAI Hot Chips 2024_Page_16

Here are the three best arguments for AI computing.

OpenAI Hot Chips 2024_Page_17
OpenAI Hot Chips 2024_Page_17

The idea behind it is that the world needs more AI infrastructure than it plans.

OpenAI Hot Chips 2024_Page_18
OpenAI Hot Chips 2024_Page_18

Here is the actual solar demand in black, here are expert forecasts of demand. Although the line continued to go up, the experts did not agree.

OpenAI Hot Chips 2024_Page_19
OpenAI Hot Chips 2024_Page_19

For about 50 years, Moore’s Law continued to rise for longer than many would have thought possible.

OpenAI Hot Chips 2024_Page_20
OpenAI Hot Chips 2024_Page_20

OpenAI therefore believes that massive investment is needed in the field of AI, as increases in computing power have already brought benefits of over eight orders of magnitude.

OpenAI says we need to design for mass deployment. One example is RAS. Clusters get so large that hard and soft failures occur. Silent data corruption occurs, which is sometimes not reproducible even if you can isolate the GPU. Cluster failures have a large explosion radius.

OpenAI Hot Chips 2024_Page_22
OpenAI Hot Chips 2024_Page_22

OpenAI says that repair costs need to be reduced. The blast radius needs to be reduced so that when one component fails, fewer other components fail.

OpenAI Hot Chips 2024_Page_23
OpenAI Hot Chips 2024_Page_23

One idea is to use graceful degradation. This is very similar to what we do at STH in our hosting clusters, so it doesn’t require engineer time. Validation is also important at scale.

OpenAI Hot Chips 2024_Page_24
OpenAI Hot Chips 2024_Page_24

Power supply will be a major challenge as the amount of electricity available worldwide is limited. GPUs will all be powering up and powering down at the same time, causing stress issues for data centers.

OpenAI Hot Chips 2024_Page_25
OpenAI Hot Chips 2024_Page_25

Like our key findings, OpenAI also has findings. I leave these for you to read:

OpenAI Hot Chips 2024_Page_26
OpenAI Hot Chips 2024_Page_26

Interestingly, performance is only one of the four points, although all focus on performance.

Closing words

The challenges at scaling and at the cluster level are enormous. If we look at the Top500, today’s large AI clusters are roughly similar to the top 3-4 systems on that list combined. It was cool to hear a large customer talk about how they see the need for AI hardware.

Leave a Reply

Your email address will not be published. Required fields are marked *